2b Multiple Linear Regression

Multiple Linear Regression
Sasadhar Bera, IIM Ranchi
Multiple Linear Regression Model

Multiple linear regression involves one dependent
variable and more than one independent variable. The
equation that describes multiple linear regression model is
given below:
y = 0 + 1 x1 + 2 x2 + .
. + k xk +
y is dependent variable and x1, x2 , .

.
.,xk are
independent variables. These independent variables being
used to predict the dependent variable.
0 , 1 , 2 , . . ., k are total (k+1) unknown regression

coefficients (also called model parameters). These
regression coefficients are estimated based on observed
sample data.
The term (pronounced as epsilon) is random error.
Data for Multiple Regression

Suppose that n number of observations are collected for
response variable (y) and k number of independent
variables present in the regression model.
i = 1, 2, . . ., n
y
y1
y2
.
yi
.
yn
x1
x11
x21
.
xi1
.
xn1
j = 1, 2, . . .,k
x2
x12
x22
.
xi2
.
xn2
.
.
.
.
.
.
.
xj
x1j
x2j
.
xij
.
xnj
.
.
.
.
.
.
.
xk
x1k
x2k
.
xik
.
xnk
Scalar Notation: Multiple Linear Regression

The scalar notation of regression model:
yi = 0 + 1 xi1 + 2 xi2 + .
i = 1, 2, . . ., n
j = 1, 2, . . .,k
. + j xij + . . + k xik + i
n = total number of observations

k = number of independent variables
j s are model parameters.
Matrix Notation: Multiple Linear Regression

yn1 = Xn(k+1) (k+1) 1 + n1
n = total number of observations, k = total number of
variables, is model parameters in vector notation.
y1
.

y yi

.
y n
1 x11 . x1 j
. . .
.
X 1 x i1 . x ij
. . .
.
1 x . x
n1
nj
.
.
.
.
.
x 1k
.
x ik
.
x nk
0

1
.

j
.

k
1
.

i

.
n
Model Parameter Estimation

The error in regression model is the difference between
actual and predicted value. It may be positive or negative
value.
Error is also known as residual. Predicted value by
regression equation is called fitted value or fit.
The sum of squared difference between the actual and
predicted values known as sum of square of error. Least
square method minimizes the sum of square of error to
find out the best fitting plane.
It is to be noted that the regressor variables in linear
regression model are non-random. That means its values
are fixed.
Model Parameter Estimation (Contd.)

In matrix notation, the regression equation:
y =X +
By using least square estimator, we want estimate

n
that minimizes L =
i 1
2
i
=
T
y X ( y X)
T
The least square estimator must satisfy:
T
T
( L) 2 X y 2 X X 0
( XT X)1 XT y , estimated model parameters.
The fitted regression line: y X

Estimated Residual and Standard Error

For
ith
observation (Xi), predicted value or Fit :
y i Xi
Error in the fit called residual:
ei y i y i
n
2
e
i
Mean Square Error = MSE =
i 1
n k 1
where n is the total number of observations, k is number

of regressors.
Standard error (SE) of estimate = =
MSE
Variance( ) = (X T X) 1
Testing Significance of Regression Model

The test for significance of regression is a test to
determine if there is a linear relationship between the
response variable and regressor variables.
H0 : 1 = 2 = . . . = k = 0
H1 : At least one j is not zero
The test procedure involves an analysis of variance
(ANOVA) partitioning of the total sum of square into a sum
of squares due to regression and a sum of square due to
error (or residual)
Total number of model parameters = p = Number of
regression coefficients = (k+1)
10
Testing Significance of Regression Model (Contd.)

ANOVA table
Source of
Variation
Regression
Residual
error
Total
DF
SS
MS
FCal
SSR
SSR /k =MSR
MSR/MSE
n k-1
SSE
SSE / (n-k-1)
= MSE
n 1
TSS
y
2
i
n
T
SSR yi y XT y i1
n
i 1
n
TSS = SSR + SSE
SSE yi yi y T y XT y
i 1
n
TSS yi y
i 1
11
Significance Test of Individual Regression

Coefficient
Adding an unimportant variable to the model can actually
increase the mean square error, thereby decreasing the
usefulness of the model.
The hypothesis for testing the significance of any
individual regression coefficient, say j is
H0: j = 0
H1: j 0
Test Statistic = Tcal =
j
2 C jj
, ( n k 1)
where 2 is mean square error (MSE) and C is the diagonal

element of (XTX)-1 . Reject H0 if Tcal > t , ( n k 1)
2
12
Confidence Interval of Mean Response

In matrix notation, the regression equation:
y =X +
where Normal (0, 2)
Mean response at a point x0 = [1, x01, x02, . .,x0j, . . .,x0k ]T

Mean response = y = E(y) = E(X ) + E() = X + 0
y|x = E(y | x0 ) = x0
0
var(y | x0 )
x T0 (XT X)1 x 0
(1-) % confidence interval of mean response at point x0
y|x
( n p )
x T0 (XT X)1 x 0
13
Coefficient of Multiple Determination

Coefficient of multiple determination =
R2
SSR
=
TSS
SSR TSS SSE

SSE
1
TSS
TSS
TSS
SSR : Sum of square due to regression
SSE : Sum of square due to error
TSS : Total sum of square
Coefficient of variation is the fraction of variation of the

dependent variable explained by regressor variables.
R2 is measure the goodness of linear fit. The better the
linear fit is, the R2 closer to 1.
14
Coefficient of Multiple Determination (Contd.)

The major drawback of using coefficient of multiple
determination (R2) is that adding a predictor variable to the
model will always increase R2, regardless of whether the
additional variable is significant or not. To avoid such
situation, regression model builders prefer to use adjusted
R2 statistic.
SSE
2
adj
n 1
( n p)
(1 R 2 )
1
1
TSS
n p
(n 1)
In general, adjusted R2 statistic will not increase as variables

are added to the model.
When R2 and adjusted R2 differ dramatically there is a good

chance that non-significant terms have been included in the
15
model.

2b Multiple Linear Regression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2b Multiple Linear Regression

Uploaded by

Copyright:

Available Formats

Multiple Linear Regression

Sasadhar Bera, IIM Ranchi

Multiple Linear Regression Model

y is dependent variable and x1, x2 , .

0 , 1 , 2 , . . ., k are total (k+1) unknown regression

Data for Multiple Regression

Sasadhar Bera, IIM Ranchi

Scalar Notation: Multiple Linear Regression

n = total number of observations

j s are model parameters.

Sasadhar Bera, IIM Ranchi

Matrix Notation: Multiple Linear Regression

Sasadhar Bera, IIM Ranchi

Model Parameter Estimation

Model Parameter Estimation (Contd.)

By using least square estimator, we want estimate

The least square estimator must satisfy:

( XT X)1 XT y , estimated model parameters.

The fitted regression line: y X

Estimated Residual and Standard Error

observation (Xi), predicted value or Fit :

Error in the fit called residual:

Mean Square Error = MSE =

where n is the total number of observations, k is number

Standard error (SE) of estimate = =

Testing Significance of Regression Model

Testing Significance of Regression Model (Contd.)

TSS = SSR + SSE

Sasadhar Bera, IIM Ranchi

Significance Test of Individual Regression

Test Statistic = Tcal =

where 2 is mean square error (MSE) and C is the diagonal

Sasadhar Bera, IIM Ranchi

Confidence Interval of Mean Response

Mean response at a point x0 = [1, x01, x02, . .,x0j, . . .,x0k ]T

(1-) % confidence interval of mean response at point x0

Coefficient of Multiple Determination

SSR TSS SSE

Coefficient of variation is the fraction of variation of the

Sasadhar Bera, IIM Ranchi

Coefficient of Multiple Determination (Contd.)

In general, adjusted R2 statistic will not increase as variables

When R2 and adjusted R2 differ dramatically there is a good

You might also like