Linear regression is a very basic machine learning algorithm. This article will introduce the basic concepts of linear regression, advantages and disadvantages, speed evaluation of 8 methods, and comparison with logistic regression.
What is linear regression?
The position of linear regression is shown in the figure above. It belongs to machine learning-supervised learning-regression-linear regression.
What is regression?
The purpose of regression is to predict, such as predicting the weather temperature tomorrow, predicting the trend of stocks...
The reason why the return can be predicted is because he has used the historical data to understand the "routines" and then use this routine to predict future results.
What is linear?
"The more..., the more..." that fits this statement may be a linear relationship:
The bigger the "house", the higher the "rent"
The more "burger" you buy, the more "money" you spend.
The more "water" in the cup, the greater the "weight"
But not all "the more..., the more..." are linear, such as "the longer the charge, the higher the power", it is similar to the following nonlinear curve:
There are more than two variables (two-dimensional plane) in linear relationship. When there are 2 variables (three-dimensional space), the linear relationship is a plane, and when there are 3 variables (four-dimensional space), the linear relationship is a volume. And so on...
What is linear regression?
Linear regression was originally a concept in statistics and is now often used in machine learning.
If there is a "linear relationship" between 2 or more variables, then we can use historical data to find out the "routines" between variables and build an effective model to predict future variable results.
Advantages and disadvantages of linear regression
- The modeling speed is fast, does not require very complicated calculations, and runs fast when the amount of data is large.
- The understanding and interpretation of each variable can be given according to the coefficient
Disadvantages: Non-linear data cannot be well fitted. So you need to first determine whether the variables are linear.
Why use linear regression today in the deep learning of killing the Quartet?
On the one hand, the relationship that linear regression can simulate is far more than a linear relationship. "Linearity" in linear regression refers to the linearity of the coefficients, and by the nonlinear transformation of features and the generalization of generalized linear models, the functional relationship between output and features can be highly nonlinear. On the other hand, and more importantly, the easy interpretability of the linear model makes it an irreplaceable position in the fields of physics, economics, and business.
8 Python linear regression method speed evaluation
- Scipy.polyfit( ) or numpy.polyfit( )
- Stats.linregress( )
- Optimize.curve_fit( )
- Statsmodels.OLS ( )
- Simple multiplication to find the inverse of the matrix
- First calculate the Moore-Penrose generalized pseudo-inverse matrix of x, then take the dot product with y
- sklearn.linear_model.LinearRegression( )
The result: Surprisingly, the inverse solution of the simple matrix is much faster than the widely used scikit-learnlinear_model.
Detailed evaluation can view the original "Data science with Python: 8 ways to do linear regression and measure their speed"
Linear regression vs logistic regression
Linear regression and logistic regression are classic 2 algorithms. Often used for comparison, here are some of the differences between the two:
- Linear regression can only be used for regression problems. Although the name is called regression, it is more used for classification problems. (For the difference between regression and classification, please see this article.I understand supervised learning in one article (basic concept + 4 step flow + 9 typical algorithm)》)
- Linear regression requires that the dependent variable is a continuous numerical variable, while logistic regression requires that the dependent variable be a discrete variable
- Linear regression requires a linear relationship between independent and dependent variables, while logistic regression does not require linear relationships between independent and dependent variables.
- Linear regression can intuitively express the relationship between independent and dependent variables, and logistic regression can not express the relationship between variables.
Independent variable: the variable of active operation, which can be regarded as the cause of the "dependent variable"
The dependent variable: because of the change in the "independent variable", can be seen as the result of the "independent variable". It is also the result we want to predict.
Baidu Encyclopedia + Wikipedia