You are on page 1of 6

REGRESSION ANALYSIS

Linear regression is the most popular regression model. In this model, we wish to predict response
to n data points ( x1 , y1 ), ( x2 , y 2 ),......, ( xn , y n ) by a regression model given by
y  a0  a1 x (1)
where a 0 and a1 are the constants of the regression model.
A measure of goodness of fit, that is, how well a0  a1 x predicts the response variable y
is the magnitude of the residual  i at each of the n data points.
Ei  yi  (a0  a1 xi ) (2)
Ideally, if all the residuals  i are zero, one may have found an equation in which all the
points lie on the model. Thus, minimization of the residual is an objective of obtaining regression
coefficients.
The most popular method to minimize the residual is the least squares methods, where the
estimates of the constants of the models are chosen such that the sum of the squared residuals is
n

E
2
minimized, that is minimize i .
i 1

Why minimize the sum of the square of the residuals? Why not, for instance, minimize
the sum of the residual errors or the sum of the absolute values of the residuals? Alternatively,
constants of the model can be chosen such that the average residual is zero without making
individual residuals small. Will any of these criteria yield unbiased parameters with the smallest
variance? All of these questions will be answered below. Look at the data in Table 1.

Table 1 Data points.


x y
2.0 4.0
3.0 6.0
2.0 6.0
3.0 8.0

To explain this data by a straight line regression model,


y  a 0  a1 x (3)
n
and using minimizing  E as a criteria to find a
i 1
i 0 and a1 , we find that for (Figure 1)

y  4x  4 (4)
10

6
y

0
0 1 2 3 4

Figure 1 Regression curve y  4 x  4 for y vs. x data.

4
the sum of the residuals, E
i 1
i  0 as shown in the Table 2.

Table 2 The residuals at each data point for regression model y  4 x  4 .


x y y predicted   y  y predicted
2.0 4.0 4.0 0.0
3.0 6.0 8.0 -2.0
2.0 6.0 4.0 2.0
3.0 8.0 8.0 0.0
4


i 1
i 0

4
So does this give us the smallest error? It does as E
i 1
i  0 . But it does not give unique

values for the parameters of the model. A straight-line of the model


y 6 (5)
4
also makes E
i 1
i  0 as shown in the Table 3.

Table 3 The residuals at each data point for regression model y  6


x y y predicted   y  y predicted
2.0 4.0 6.0 -2.0
3.0 6.0 6.0 0.0
2.0 6.0 6.0 0.0
3.0 8.0 6.0 2.0
4

E
i 1
i 0
9
y=6
7
y

3
1.5 2 2.5 3 3.5

Figure 2 Regression curve y  6 for y vs. x data.

Since this criterion does not give a unique regression model, it cannot be used for finding
the regression coefficients. Let us see why we cannot use this criterion for any general data. We
want to minimize
n n

 Ei    yi  a0  a1 xi 
i 1 i 1
(6)

Differentiating Equation (6) with respect to a 0 and a1 , we get


n
  Ei n
i 1
  1  n (7)
a0 i 1

n
  Ei n _
i 1
   xi   n x (8)
a1 i 1

Putting these equations to zero, give n  0 but that is not possible. Therefore, unique values of
a 0 and a1 do not exist.
n
You may think that the reason the minimization criterion E
i 1
i does not work is that negative
n
residuals cancel with positive residuals. So is minimizing E
i 1
i better? Let us look at the data
4
given in the Table 2 for equation y  4 x  4 . It makes E
i 1
i  4 as shown in the following table.

Table 4 The absolute residuals at each data point when employing y  4 x  4 .


x y y predicted   y  y predicted
2.0 4.0 4.0 0.0
3.0 6.0 8.0 2.0
2.0 6.0 4.0 2.0
3.0 8.0 8.0 0.0
4


i 1
i 4
4
The value of E i 1
i  4 also exists for the straight line model y  6 . No other straight line model
4
for this data has  E i  4 . Again, we find the regression coefficients are not unique, and hence
i 1
this criterion also cannot be used for finding the regression model.
Let us use the least squares criterion where we minimize
n n 2

S r   Ei    yi  a0  a1 xi 
2
(9)
i 1 i 1

S r is called the sum of the square of the residuals.


To find a 0 and a1 , we minimize S r with respect to a 0 and a1 .
S r n
 2  yi  a0  a1 xi  1  0 (10)
a0 i 1

S r n
 2  yi  a0  a1 xi  xi   0 (11)
a1 i 1
giving
n n n
  yi   a0   a1 xi  0 (12)
i 1 i 1 i 1
n n n
  y i xi   a 0 x i   a1 xi2  0 (13)
i 1 i 1 i 1
n
Noting that a
i 1
0  a 0  a 0  . . .  a 0  na 0
n n
na 0  a1  xi  y i (14)
i 1 i 1
n n n
a 0  x i  a1  x i2   x i y i (15)
i 1 i 1 i 1
( xn , yn )
y xi , yi 
Ei  yi  a0  a1 xi

x2 , y2 

x3 , y3 

y  a0  a1 x
x1 , y1 
x

Figure 3 Linear regression of y vs. x data showing residuals and square of residual at a typical
point, xi .

Solving the above Equations (14) and (15) gives


n n n
n xi y i  xi  y i
i 1 i 1 i 1
a1  2
(16)
 n
 n
n xi2   xi 
i 1  i 1 
n n n n

 xi2  y i   xi  xi y i
i 1 i 1 i 1 i 1
a0  2
(17)

n
 n
n xi2   xi 
i 1  i 1 
Redefining
n _ _
S xy   x i y i  n x y (18)
i 1
n _2
S xx   x  n x 2
i (19)
i 1
n

_ x
i 1
i
x (20)
n
n

_ y
i 1
i
y (21)
n
we can rewrite
S xy
a1  (22)
S xx
_ _
a 0  y  a1 x (23)

You might also like