ComputersSoftware

Regression in Excel: equation, examples. Linear regression

Regression analysis is a statistical research method that allows one to show the dependence of a parameter on one or several independent variables. In the precomputer era, its use was rather difficult, especially if it was a large amount of data. Today, after learning how to build a regression in Excel, you can solve complex statistical problems in just a few minutes. Below are specific examples from the field of economics.

Types of regression

The very concept was introduced into mathematics by Francis Galton in 1886. Regression happens:

  • Linear;
  • Parabolic;
  • Power-law;
  • Exponential;
  • Hyperbolic;
  • Indicative;
  • Logarithmic.

Example 1

Let's consider the problem of determining the dependence of the number of the retired members of the collective on the average salary at 6 industrial enterprises.

A task. Six enterprises analyzed the average monthly salary and the number of employees who resigned at their own volition. In tabular form we have:

A

B

C

1

X

Number of resigned

Salary

2

Y

30000 rubles

3

1

60

35,000 rubles

4

2

35

40000 rubles

5

3

20

45,000 rubles

6th

4

20

50,000 rubles

7th

5

15

55,000 rubles

8

6th

15

60000 rubles

For the task of determining the dependence of the number of employees who left to the average salary at 6 enterprises, the regression model has the form of the equation Y = a 0 + a 1 x 1 + ... + a k x k , where x i are the influencing variables, i i are the regression coefficients, Number of factors.

For this task, Y is the indicator of the retired employees, and the influencing factor is the salary, which is denoted by X.

Using Excel table processor capabilities

The analysis of regression in Excel should be preceded by the application of built-in functions to the available tabular data. However, for these purposes it is better to use the very useful add-on "Analysis package". To activate it, you need:

  • From the "File" tab go to the "Settings" section;
  • In the opened window select the line "Add-ins";
  • Click on the "Go" button, located below, to the right of the "Management" line;
  • Put a check mark next to the name "Analysis package" and confirm your actions by clicking "Ok".

If done correctly, the right button appears on the right side of the Data tab, located above the Excel worksheet.

Linear Regression in Excel

Now that we have all the necessary virtual instruments at hand for econometric calculations, we can begin to solve our problem. For this:

  • Click on the button "Data analysis";
  • In the opened window click on the button "Regression";
  • In the appeared tab enter the range of values for Y (the number of dismissed workers) and for X (their salary);
  • We confirm our actions by clicking the "Ok" button.

As a result, the program will automatically fill out a new sheet of the table processor with regression analysis data. Note! In Excel, there is an opportunity to independently set the place that you prefer for this purpose. For example, it could be the same sheet as the Y and X values, or even a new book specifically designed for storing such data.

Analysis of the regression results for the R-square

In Excel, the data obtained during the processing of the data of the example in question is:

First of all, pay attention to the value of the R-square. It is the coefficient of determination. In this example, the R-square = 0.755 (75.5%), ie, the calculated parameters of the model explain the relationship between the parameters considered by 75.5%. The higher the value of the coefficient of determination, the selected model is considered more applicable for a particular task. It is believed that it correctly describes the actual situation with an R-square value above 0.8. If the R-square is <0.5, then such an analysis of the regression in Excel can not be considered reasonable.

The analysis of the coefficients

The number 64.1428 shows what the value of Y will be if all the variables xi in the model under consideration are reset. In other words, it can be argued that the value of the analyzed parameter is influenced by other factors that are not described in a particular model.

The following coefficient -0,16285, located in cell B18, shows the weighting effect of the variable X on Y. This means that the average monthly salary of employees within the model in question affects the number of those who left with a weight of -0,16285, that is, the degree of its influence at all Small. The "-" sign indicates that the coefficient has a negative value. This is obvious, since everyone knows that the more the salary at the enterprise, the less people express the desire to terminate the employment contract or leave.

Multiple Regression

By this term we mean the equation of connection with several independent variables of the form:

Y = f (x 1 + x 2 + ... x m ) + ε, where y is the resultant variable (dependent variable), and x 1 , x 2 , ... x m are attribute factors (independent variables).

Evaluation of parameters

For multiple regression (MR), it is performed using the least squares (OLS) method. For linear equations of the form Y = a + b 1 x 1 + ... + b m x m + ε we construct a system of normal equations (see below)

To understand the principle of the method, we consider the two-factor case. Then we have a situation described by formula

Hence we obtain:

Where σ is the variance of the corresponding characteristic reflected in the index.

The OLS is applicable to the MP equation on a standardized scale. In this case we obtain the equation:

In which t y , t x 1, ... t xm - Standardized variables for which the averages are 0; Β i are the standardized regression coefficients, and the standard deviation is 1.

Note that all β i In this case are given as normalized and centralized, so their comparison among themselves is considered correct and admissible. In addition, it is customary to screen out factors, discarding those that have the lowest values of βi.

The problem using the linear regression equation

Suppose there is a table of the price dynamics of a particular commodity N during the last 8 months. It is necessary to take a decision on the expediency of purchasing its batch at a price of 1,850 rubles per ton.

A

B

C

1

Month number

Name of the month

Commodity price N

2

1

January

1750 rubles per ton

3

2

February

1755 rubles per ton

4

3

March

1767 rubles per ton

5

4

April

1760 rubles per ton

6th

5

May

1770 rubles per ton

7th

6th

June

1790 rubles per ton

8

7th

July

1810 rubles per ton

9

8

August

1840 rubles per ton

To solve this problem in the Excel table processor, you need to use the already known tool "Data analysis". Then select the "Regression" section and set the parameters. It must be remembered that in the "Input interval Y" field, the range of values for the dependent variable (in this case the prices for the goods in specific months of the year) should be entered, and in the "Entry interval X" - for the independent (month number). Confirm the action by clicking "Ok". On the new sheet (if so stated), we get the data for the regression.

We build on them a linear equation of the form y = ax + b, where the parameters of a and b are the coefficients of the row with the name of the month number and the coefficients and the lines "Y-intersection" from the sheet with the results of the regression analysis. Thus, the linear regression equation (VR) for problem 3 is written as:

The price of the commodity N = 11,714 * the number of the month + 1727,54.

Or in algebraic notation

Y = 11.714 x + 1727.54

Analysis of results

To determine whether the resulting linear regression equation is adequate, the coefficients of multiple correlation (KMC) and determination are used, as well as the Fisher criterion and the Student's test. In the Excel table with regression results they appear under the names of multiple R, R-square, F-statistics and t-statistics respectively.

KMK R makes it possible to evaluate the tightness of the probabilistic relationship between the independent and dependent variables. Its high value indicates a fairly strong relationship between the variables "Number of the month" and "Price of goods N in rubles per 1 ton." However, the nature of this relationship remains unknown.

The square of the determination coefficient R 2 (RI) is a numerical characteristic of the fraction of the total spread and shows the scatter of which part of the experimental data, i.e. The values of the dependent variable correspond to the linear regression equation. In the problem under consideration, this value is 84.8%, that is, the statistical data are described with a high degree of accuracy by the received SD.

F-statistic, also called the Fisher criterion, is used to assess the significance of linear dependence, refuting or confirming the hypothesis of its existence.

The value of t-statistics (Student's test) helps to evaluate the significance of the coefficient for an unknown or free term of linear dependence. If the value of the t-test is> t cr , then the hypothesis of the insignificance of the free term of the linear equation is rejected.

In the problem under consideration, for the free term using the Excel tools it was found that t = 169.20903, and p = 2.89E-12, that is, we have the zero probability that the correct hypothesis about the insignificance of the free term will be rejected. For the coefficient of unknown t = 5.79405, and p = 0.001158. In other words, the probability that the correct hypothesis about the insignificance of the coefficient for the unknown will be rejected is 0.12%.

Thus, it can be argued that the resulting linear regression equation is adequate.

The problem of the expediency of buying a block of shares

Multiple regression in Excel is performed using the same "Data Analysis" tool. Let us consider a specific applied problem.

Management company "NNN" should make a decision on the expediency of buying a 20% stake in JSC "MMM". The cost of the package (JV) is 70 million US dollars. Specialists of "NNN" collected data on similar transactions. It was decided to assess the value of the stake in such parameters, expressed in millions of US dollars, as:

  • Accounts payable (VK);
  • Volume of annual turnover (VO);
  • Accounts receivable (VD);
  • Value of fixed assets (SOF).

In addition, the enterprise wage arrears parameter (V3 P) is used in thousands of US dollars.

Solution using the Excel spreadsheet

First of all, you need to create a table of input data. It has the following appearance:

Further:

  • The "Data analysis" window;
  • Select the "Regression" section;
  • In the "Input interval Y" box enter the range of values of the dependent variables from column G;
  • Click on the icon with the red arrow to the right of the "Input interval X" window and select the range of all values from columns B, C, D, F on the sheet.

Mark the item "New worksheet" and click "Ok".

A regression analysis is obtained for this task.

Study results and conclusions

"Collecting" from the rounded data presented above on the sheet of the Excel table processor, the regression equation:

SP = 0,103 * SOF + 0,541 * VO - 0,031 * VK + 0,405 * VD + 0,691 * VZP - 265,844.

In a more familiar mathematical form, it can be written down as:

Y = 0.103 * x1 + 0.541 * x2 - 0.031 * x3 + 0.405 * x4 + 0.691 * x5 - 265.844

The data for MMM are as follows:

SOF, USD

VO, USD

VK, USD

VD, USD

VZP, USD

JV, USD

102.5

535.5

45.2

41.5

21.55

64.72

Substituting them into the equation of regression, get a figure of 64.72 million US dollars. This means that MMM shares should not be purchased, since their value of 70 million US dollars is rather overstated.

As you can see, the use of the Excel table processor and the regression equation made it possible to make an informed decision about the feasibility of a very specific transaction.

Now you know what regression is. Examples in Excel, discussed above, will help you in solving practical problems from the field of econometrics.

Similar articles

 

 

 

 

Trending Now

 

 

 

 

Newest

Copyright © 2018 en.delachieve.com. Theme powered by WordPress.