Professional Documents
Culture Documents
Unit 9
Regression Analysis
Table of Contents
Page 2 of 24
A Case
Mr. Ajit is a G.M of a tyre manufacturing company. He is very happy that the sales of
tyres are increasing. However he was of the opinion that increase in sales is due to sales
force. His secretary, Ms. Anitha pointed out that the performance record sent by
Marketing Manager does not show any changes. Mr. Ajit was very curious. When he was
talking to his friends son, Mr. Suresh who holds a position in Motor Vehicle Registration
office he learnt that Registration of vehicles is increasing. Mr. Ajit immediately thinks of
his statistician, Mr. Satish. He consults him. Mr. Satish promises to come back with
solution to the problem.
9.2. Introduction
The word Regress means the tendency of the data to tend to the normal value.
Regression is defined as, “the measure of the average relationship between two or more
variables in terms of the original units of the data.”
Correlation analysis attempts to study the relationship between the two variables x and y. Regression
analysis attempts to predict the average x for a given y. In Regression it is attempted to quantify the
dependence of one variable on the other.
There are two variables x and y. y depends on x. The dependence is expressed in the form
of the following equation. In regression one of the variables is dependent and the others are
independent.
Y = a + bx
Page 3 of 24
9.3. Regression Analysis
Regression Analysis is used to:
Estimate the values of the dependent variables from the values of the independent variables
Get a measure of the error involved while using the regression line as a basis for estimation
Regression coefficient is used to calculate correlation coefficient; the square of correlation that prevails
between the given two variables. It provides a mathematical relationship between two or more variables. It is
based on cause and effect relationship.
The line drawn such that sum of vertical deviation is zero and sum of their squares is minimum
is called Regression line of y on x. It is used to estimate y – values for given x – values.
The line drawn such that sum of horizontal deviation is zero and sum of their squares is
minimum is called Regression line of x on y. It is used to estimate x - values for given y -
values.
The smaller angle between these lines, higher is the correlation between the variables. If we fit a straight line
to scatter diagram data some of the points will lie above the straight line and some below the line. The
deviation of each point from the line is called Error.
The regression lines always intersect at x y . The regression lines have equation,
The regression equations found by the above conditions is said to fit by method of least squares. ‘byx’ and
‘bxy’ are called Regression Coefficients.
The regression model captures the systematic behaviour of data. The non-systematic behaviour of data
cannot be captured and are known as errors. The errors are due to random components that cannot be
predicted. Assuming that the random errors are “Normally distributed” we can construct confidence level
and interval for random errors.
Page 4 of 24
9.5. Regression Coefficient
Regression coefficient is used to calculate correlation coefficient; the square of correlation that prevails
between the given two variables. It provides a mathematical relationship between two or more variables. It is
based on cause and effect relationship.
byx.bxy r 2 byx.bxy 1
byx.bxy 1
If byx is negative, then bxy is also negative and r is negative.
y x
They can also be expressed as byx r and byx r
x y
It is an absolute measure.
Table 9.1
Correlation Coefficient Regression Coefficient
rxy = ryx byx = bxy
-1< r <1 if byx can be greater than one, but bxy must
be less than one such that byx.byx<1
It is not based on cause and effect It is based on cause and effect relationship
relationship
Page 5 of 24
(Cont. from topic ‘A Case’)
Mr. Satish collects data on Number of Vehicles registered and number of tyres sold as
follows:
Table 9.2
Number of Vehicle 23 29 29 35 42 46 50 54 64 66 76 78
Registered in week
(X)
Number of Tyre’s sold 69 96 102 118 125 126 138 178 156 184 176 225
per week (Y)
He worked out the regression equation of sales on number of vehicles registered as follows:-
Table 9.3
X Y X 2
XY
23 69 529 1587 82.432 180.4305
29 96 841 2784 95.7959 0.0416
29 102 841 2958 95.7959 38.4904
35 118 1225 4130 109.1594 78.1557
42 125 1764 5250 124.7502 0.0624
46 126 2116 5796 133.6592 58.6629
50 138 2500 6900 142.5682 20.8681
54 178 2916 9612 151.4772 703.4609
64 156 4096 9184 173.7497 315.0502
66 184 4356 12144 178.2042 33.5918
76 176 5776 13376 200.4766 599.1060
78 225 6084 17550 204.9311 402.7592
Total 592 1693 33044 92071 2430.68
712.472
byx 2.2272
319.889
592
49.33
12
1693
141.083
12
Page 6 of 24
(Cont. from previous page)
And he concludes that there is good relationship between the variables. His conclusion is
that increase is number of registration has increased the sales. He further supports it by
calculating correlation coefficient. The calculation through MS-Excel is shown at later
below. This information will help Mr. Ajit to plan his future production.
He worked out the regression equation of sales on number of vehicles registered as follows:
Table 9.4
Y
17 16.6555 0.1187
17 17.1765 0.0311
18 17.6975 0.0915
18 18.2185 0.0477
19 18.7395 0.0678
19 19.5605 0.3142
19 20.0815 1.1696
20 20.6025 0.3630
21 21.1235 0.0153
22 21.6445 0.1264
Total 2.3453
2.3453
S YX
10
0.23453 0.484
Page 7 of 24
9.7. Examples
Example 9.1:
225 190
22.5 19
10 10
Regression equation of Y on X is: Regression Equation of X on Y is:
Y Y byx X X 10 43 (5)(0) 43
bxy 1.392
10 43 (5)(0) 430 10 24 (5) 2 24
byx 0.521
10 85 (5) 2 825 X 22.5 1.792(Y 19)
Y 19 0.521( X 22.5) X 1.792Y 11.548
Y 0.521X 7.2775 r 0.5211.792 0.966
Page 8 of 24
(Cont. from previous page)
Regression Statistics
Multiple R 0.966353136
R Square 0.933838384
Adjusted R Square 0.925568182
Standard Error 0.445516384
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 22.41212121 22.41212121 112.9160305 5.38409E-06
Residual 8 1.587878788 0.198484848
Total 9 24
Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Residual Output
Predicted Age of
Observation Wife Residuals
1 16.65454545 0.345454545
2 17.17575758 -0.175757576
3 17.6969697 0.303030303
4 18.21818182 -0.218181818
5 18.73939394 0.260606061
6 19.26060606 -0.260606061
7 19.78181818 -0.781818182
8 20.3030303 -0.303030303
9 20.82424242 0.175757576
10 21.34545455 0.654545455
Page 9 of 24
Example 9.2:
Table 9.7
Series X Series Y
Mean S.D 65 67
S.D 2.5 3.5
Correlation coefficient 0.8
A study of wheat prices at Mumbai and Kanpur yields the following data:
Mumbai Kanpur
Page 10 of 24
The correlation coefficient between the prices of Mumbai and Kanpur is 0.774. Estimate the price at Kanpur,
if the price at Mumbai is Rs.8.
Solution:
Given
The regression equation which we need to find is Y on X (where X Mumbai and Y Kanpur)
y
Where, b yx r
x
Y 8.10 0.774
0.207
X 7.50
0.326
Y 0.4914 8 4.4145
Y 8.1195
The price at Kanpur is Rs. 8.12, when the price at Mumbai is Rs. 8.
Page 11 of 24
Example
The following table shows the amount spent on advertising and the corresponding sales of the product from
10 companies:
a. Plot a scatter gram showing the relationship between advertising cost and sales of the
product.
c. Use the regression line to forecast sales if advertising costs were Rs. 10 lakh.
Solution:
a. A scatter gram showing the relationship between advertising cost and sales of the product.
40
30
Sales (Rs. in lakh)
20
10
0
0 5 10 15
Advertising cost (Rs. in lakh)
Page 12 of 24
b. The equation of the regression line of sales on advertising costs.
Y X X2 XY
25 8 64 200
35 12 144 420
29 11 121 319
24 5 25 120
38 14 196 532
12 3 9 36
18 6 36 108
27 8 64 216
17 4 16 68
30 9 81 270
Y = 225 X = 80 X2 = 756 XY = 2289
n xy x y y bx
b= a=
n x x
2
2 n n
= 8.3276
Y= 8.33 + 2.15x
c. Forecast of sales if advertising costs were Rs. 1000 lakh, we put X = 10 in the equation,
Y = 8.33 + 2.15 x 10
= 29.83
As the original data was given to the nearest integer (whole number), the forecast of sales
= 30 (or Rs. 30 lakh)
Page 13 of 24
“The standard error of estimate uses to ascertain how good and representative the regression
line is as a description of the average relationship between two series.”
2
Sx y ,
Sy x 6 1 r 2 ,
Sx y
2
a b
, and
2
c
Sx y
Page 14 of 24
Example 9.3:
The following results were worked out from scores in Statistics and Mathematics in a
certain examination.
Table 9.8
Scores in Statistics (X) Scores in Mathematics (Y)
Mean 40 48
Standard Deviation 10 15
Karl Pearson’s correlation coefficient between x and y is = + 0.42. Find the regression lines
x on y and y on x. Use the regression lines to find the value of y when x = 50 and value of x
when y = 30.
Solution:
Given the following data:
Therefore;
When y=30; x=35.518 using equation (3)
When x=50; y=54.3 by using equation (4)
Page 15 of 24
Example 9.4:
Table 9.10
X Y X– Y– (X – )2 (Y – )2 (X – ) (Y – )
X - 12 Y - 16
12 8 0 2 0 4 0
4 22 -8 6 64 36 - 48
20 10 8 -6 64 36 - 48
8 16 -4 0 16 0 0
16 14 4 -2 16 4 -8
160 80 - 104
b yx
104 0.65 and b yx
104 1.3
2 2
160 80
b
1
X 12 1.3(Y 16)
Therefore, X 32.8 1.3Y
When Y = 20; X = 32.8 – 1.3 x 20 = 6.8
b
1
Y 16 0.65( X 12)
Therefore, Y 23.8 0.65 X
When X = 15; Y = 23.8 – 0.65 x 15 = 14.05
Page 16 of 24
9.9. Application in Finance
9.9.1. Correlation between Two Variables
The results and conclusions for time series data is valid for one company only. But for cross sectional data it
is valid for a group of companies at industry level.
One can determine regression equation between advertisement expenses and sales revenue
for different sectors of industries say, manufacturing, IT, chemical, pharmaceutical etc.
We may take a particular company and study the correlation between prices of its stock in BSE and NSE.
Beta measures which reflects the sensitiveness of a stock to movement in the stock market
index like NSE-Nifty or BSE-Sensex, as a whole. Always Beta value for market is taken as
one.
A stock with beta more than one say, 1.10, would rise 10% as much as the market index or would fall 10%
as compared to the index.
The volatility of stock is measured by its beta value. Beta represents the risk associated with the stock.
An aggressive investor would opt for a stock with beta value more than one.
A conservative investor would opt for the stock with beta value less than one.
Beta is measured through regression analysis. The percentage daily/weekly/monthly change in stock is taken
as dependent variable and the corresponding change in market index such as BSE or NSE is taken as
independent variable. Then the regression equation is fitted which is of the form Y= + X.
Thus a stock’s “” measures the relationship between the stock’s rate of return (Y) and the average rate of
return for the market as a whole.
The coefficient of determination “r2” obtained in the study provides a measure of volatility explained in a
stock’s price by the market.
Page 17 of 24
Example 9.5:
The following data relates to the closing BSE sensex and stock price of RIL for 10 trading
days during a period. Find “” and interpret.
Table 9.11
Days BSE Stock price of RIL
1 12342 1150
2 12378 1163
3 12360 1148
4 12461 1150
5 12479 1147
6 12538 1169
7 12730 1192
8 12928 1213
9 12848 1216
10 12885 1208
Solution:
First we calculate the percentage changes in both BSE (X) and RIL(Y) as follows
BSE / RIL
indexfor 2 nd
day indexfor 1st day 100
Indexfor1st day
Table 9.12
X Y
+0.2917 1.1304
-0.1454 -1.2898
0.8172 0.1742
0.1445 -0.2609
0.4728 1.9180
1.5313 1.9675
1.5554 1.7617
-0.6188 0.2473
0.2880 -0.6579
Page 18 of 24
(Cont. from previous page)
Regression Statistics
Multiple R 0.657986268
R Square 0.432945929
Adjusted R
Square 0.351938204
Standard
Error 0.961822395
Observations 9
ANOVA
Significance
df SS MS F F
Regression 1 4.9442110 4.9442110 5.34450178 0.05404187
Residual 7 6.4757162 0.9251023
Total 8 11.419927
Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 0.0291111 0.392985 0.0740769 0.9430215 -0.9001508 0.958373159
X 1.0903451 0.4716397 2.3118178 0.0540418 -0.0249055 2.205595895
Page 19 of 24
Example 9.6:
Table 9.13
Incentive increase in % of Base Year Turnover (Rs. in crores)
1 110
2 120
3 132
5 160
8 215
10 260
Solution:
Table 9.14
X Y
X2 Y2
Log x Log y
0 2.04 0 0
0.3 2.08 0.09 0.63
0.48 2.12 0.23 1.01
0.70 2.2 0.49 1.54
0.90 2.33 0.81 2.11
1.00 2.41 1.00 2.41
3.38 13.19 2.62 7.7
Page 20 of 24
Example
Find the second degree regression polynomial y = a + bx + cx2 by least square method to the data given
below.
X 0 1 2 3 4
Y 1 0 3 10 21
Solution:
We need to fit a second degree regression polynomial of the form y = a + bx + cx 2. In order to obtain the value for the
constants a, b and c the normal equations are:
∑y = Na + b∑x + c∑x2
∑xy = a∑x + b∑x2 + c∑x3
∑x2y = a∑x2 + b∑x3 + c∑x4
Calculation
X Y X2 XY X2Y X3 X4
0 1 0 0 0 0 0
1 0 1 0 0 1 1
2 3 4 6 12 8 16
3 10 9 30 90 27 81
4 21 16 84 336 64 256
10 35 30 120 438 100 354
Substituting the values in the above equations and solving the simultaneous equations we get:
35 = 5a + 10b + 30c
120 = 10a + 30b + 100c
438 = 30a + 100b + 354c
a=1
b=-3
c=2
Therefore, the second degree parabola is Y = 1 – 3x + 2x2.
Page 21 of 24
9.11. Logistic Regression
In linear regression model the variables are assumed to take continuous values in the interval. However there
are situations wherein the dependent variable follows Binomial distribution. In such cases logistic regression
is used.
Page 22 of 24
Example 9.7:
Suppose an event either is successful or failure. These are the values of Y, Viz 1 or 0 taken
by dependent variable. The corresponding revenue is given for twenty events as follows:
Y X
0 3.45
1 3.36
0 3.12
0 3.15
0 3.14
1 3.48
1 3.42
1 3.32
0 3.31
1 3.29
1 3.46
1 3.34
0 3.25
1 3.41
1 3.48
1 3.21
1 3.25
1 3.16
1 3.28
0 3.22
Note:
Page 23 of 24
(Cont. from previous page)
Note:
There are 4 reading in the interval 3.1-3.2 and only one corresponds to 1
P = ¼
9.12. Summary
In this unit we learnt what is regression, how to measure and how to interpret SPSS output. Further the
application of regression in financial field was explained with example. We also learnt how to calculate the
standard error of the estimate.
Page 24 of 24