A text to understand linear regression

Linear regression is a very basic machine learning algorithm. This article will introduce the basic concepts of linear regression, advantages and disadvantages, speed evaluation of 8 methods, and comparison with logistic regression.


What is linear regression?

Linear regression position

The position of linear regression is shown in the figure above. It belongs to machine learning-supervised learning-regression-linear regression.

Extended reading:

'I understand machine learning in one article! (3 learning methods + 7 practical steps + 15 common algorithms)"

'I understand supervised learning in one article (basic concept + 4 step flow + 9 typical algorithm)"


What is regression?

The purpose of regression is to predict, such as predicting the weather temperature tomorrow, predicting the trend of stocks...

The reason why the return can be predicted is because he has used the historical data to understand the "routines" and then use this routine to predict future results.

The underlying logic of regression


What is linear?

"The more..., the more..." that fits this statement may be a linear relationship:

The bigger the "house", the higher the "rent"

The more "burger" you buy, the more "money" you spend

The more "water" in the cup, the greater the "weight"


But not all "the more..., the more..." are linear, such as "the longer the charge, the higher the power", it is similar to the following nonlinear curve:

Charging time and power are nonlinear

There are more than two variables (two-dimensional plane) in linear relationship. When there are 2 variables (three-dimensional space), the linear relationship is a plane, and when there are 3 variables (four-dimensional space), the linear relationship is a volume. And so on...

Linear relationships can be multiple variables


What is linear regression?

Linear regression was originally a concept in statistics and is now often used in machine learning.

If there is a "linear relationship" between 2 or more variables, then we can use historical data to find out the "routines" between variables and build an effective model to predict future variable results.

Popular interpretation of linear regression


Advantages and disadvantages of linear regression

Advantages and disadvantages of linear regression


  1. The modeling speed is fast, does not require very complicated calculations, and runs fast when the amount of data is large.
  2. The understanding and interpretation of each variable can be given according to the coefficient

Disadvantages: Non-linear data cannot be well fitted. So you need to first determine whether the variables are linear.

Why use linear regression today in the deep learning of killing the Quartet?

On the one hand, the relationship that linear regression can simulate is far more than a linear relationship. "Linearity" in linear regression refers to the linearity of the coefficients, and by the nonlinear transformation of features and the generalization of generalized linear models, the functional relationship between output and features can be highly nonlinear. On the other hand, and more importantly, the easy interpretability of the linear model makes it an irreplaceable position in the fields of physics, economics, and business.


8 Python linear regression method speed evaluation

  1. Scipy.polyfit( ) or numpy.polyfit( )
  2. Stats.linregress( )
  3. Optimize.curve_fit( )
  4. Numpy.linalg.lstsq
  5. Statsmodels.OLS ( )
  6. Simple multiplication to find the inverse of the matrix
  7. First calculate the Moore-Penrose generalized pseudo-inverse matrix of x, then take the dot product with y
  8. sklearn.linear_model.LinearRegression( )

8 linear regression method speed evaluation results

The result: Surprisingly, the inverse solution of the simple matrix is ​​much faster than the widely used scikit-learnlinear_model.

Detailed evaluation can view the original "Data science with Python: 8 ways to do linear regression and measure their speed"


Linear regression vs logistic regression

Linear regression and logistic regression are classic 2 algorithms. Often used for comparison, here are some of the differences between the two:

The difference between linear regression and logistic regression

  1. Linear regression can only be used for regression problems. Although the name is called regression, it is more used for classification problems. (For the difference between regression and classification, please see this article.I understand supervised learning in one article (basic concept + 4 step flow + 9 typical algorithm)》)
  2. Linear regression requires that the dependent variable is a continuous numerical variable, while logistic regression requires that the dependent variable be a discrete variable
  3. Linear regression requires a linear relationship between independent and dependent variables, while logistic regression does not require linear relationships between independent and dependent variables.
  4. Linear regression can intuitively express the relationship between independent and dependent variables, and logistic regression can not express the relationship between variables.


Independent variable: A variable that is actively operated and can be regarded as the cause of the "dependent variable"

Dependent variable: Changes due to changes in the "independent variable" can be seen as the result of the "independent variable". It is also the result we want to predict.


Baidu Encyclopedia + Wikipedia

Baidu Encyclopedia version

Linear regression is a statistical analysis method that uses regression analysis in mathematical statistics to determine the quantitative relationship between two or more variables. It is widely used. Its expression is y = w'x+e, where e is a normal distribution whose error follows a mean value of 0.

In the regression analysis, only one independent variable and one dependent variable are included, and the relationship between the two can be represented by a straight line approximation. This regression analysis is called a linear regression analysis. If the regression analysis includes two or more independent variables, and the linear relationship between the dependent variable and the independent variable is called multiple linear regression analysis.

Read More

Wikipedia version

In statistics, linear regression is a linear method used to model the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). A case of explanatory variables is called simple linear regression. For multiple explanatory variables, this process is called multiple linear regression. This term is different from multiple linear regression in which multiple related dependent variables are predicted rather than a single scalar variable.

In linear regression, relationships are modeled using linear prediction functions, where unknown model parameters are estimated from data. This model is called a linear model. Most commonly, the conditional mean of the response of a given explanatory variable (or predictor) is assumed to be an affine function of these values; less common is the use of conditional median or some other quantile. Like all forms of regression analysis, linear regression focuses on the response of the conditional probability distribution to the value of the predictor, rather than the joint probability distribution of all of these variables, which is the domain of multivariate analysis.

Read More


Linear regression using Python's Scikit-Learn library

Machine learning algorithm_linear regression

Popular understanding of linear regression (1)