What is Linear Regression?
This Algorithm is used to find the relationship between 2 continuous variables [ one independent variable and one dependent variable ]. It is a linear model which assumes a linear relationship between input and output variables. If we have single input variables then we call it as simple linear regression, if we have multiple we call it as multiple linear regression. It is both a statistical algorithm and machine language algorithm.
The Equation is 'Y = M * X + C'
Y = Independent Value
M = Slope/Weight
X = Dependent Value
C = Bias
The Core idea is to obtain a line that best fits the data. 'Y' is the output variable we want to predict, X is the input variable and M & C can be called as coefficients that we need to estimate.
To find m and b values we have methods like statistical method or ordinary least squares or gradient descent.
How does it Work?
Goal is to find the best fit line which minimize the error ( distance between the line and the data point ). The value of 'm' & 'b' must be choose so that they minimize the error. So the algorithm will try multiple m and b values and calculate the error. Finally it takes the best m and b which has low error.
We can calculate the error in multiple ways like by using mean squared error formula as loss function. This helps us to evaluate the performance of the model
Mean Squared Error = (1/n) sum(yi -yi^)^2
n = total no of samples
y = acute value
yi = predicted value
Advantages
- Linear Regression is simple to implement and easier to interpret the output coefficients.
- High Performance on linearly separable datasets.
- Linear Regression is susceptible to over-fitting but it can be avoided using some dimensionality reduction techniques
Disadvantages
- Prone to Underfitting.
- Sensitive to outliers.
- Linear Regression assumes that the data is independent.
Comments
Post a Comment