Multiple Regression Analysis - Estimation

Multiple Regression Analysis:
Estimation
Multiple Regression Model
y = 0 + 1x1 + 2x2 + + kxk + u
- is still the intercept

0
- to all called slope parameters

1 k
-u is still the error term (or disturbance term)

-Zero mean assumption
E(u) = 0
-Still minimize the sum of squared residuals
Multiple Regression Model:
Example
Demand Estimation:
-Dependent variable: Q, tile cases (in 1000 of cases)
-Right-hand side variables: tile price per case (p), income per
capita I (in 1000 of $), and advertising expenditure A (in
1000 $)
Regression: Q = 0 + 1P + 2I + 3A + u
Interpretation:
- measures the effects of the tile price on the tile consumption,
1
holding all other factors fixed

- represents the effects of income, holding all other factors fixed
2
- represents the effects of advertising, holding all other factors fixed

3
Q = 17.513 0.296P + 0.066I + 0.036A
1. What is the impact of a price change on tile scales?

2. What is the impact of a change in income on tile scales?
3. What is the impact of a change in advertising
expenditures on tile scales?
Calculation of own-price elasticity?

Calculation of income elasticity?
Calculation of advertising elasticity?
Random Sampling
-Collecting sales data of 23 tile stores in 2002 in the
market
-For each observation, Q = + P + I + A + ui 0 1 i 2 i 3 i i
-Goal: Estimate , , , 0 1 2 3
Dependent Price Income Advertising

Variable
Q1 P1 I1 A1 Using OLS to estimate the

coefficients to minimize the
Q2 P2 I2 A2 sum of squared errors.
Q3 P3 I3 A3

Q23 P23 I23 A23
min
n n
ui
2
yi 0 1 Pi 2 I i 3 Ai
2
min

0, 1, 2, 3 i 1

0, 1, 2, 3 i 1
The Generic Multiple Regression
Model Y X Y X ... X
i 0 1 i 1,..., n
1i 2 2i k ki i
i
j
X ji
Y1 1 X11 X12 ... X1k
Y 1 X X 22 ... X 2 k
Y 2 X 21

Y
n nx1 1 X n1 X n 2 ... X nk nx ( k 1)
0 1

1 2

k ( k 1) x1 n nx1
Estimation of regression parameters:
-Least Squares (no knowledge of the distribution of the error or disturbance terms is required).
-The use of the matrix notation allows a view of how the data are housed in software programs.
Components of the Model
-Endogenous Variablesdependent variables, values of

which are determined within the system.
-Exogenous Variablesdetermined outside the system
but influence the system by affecting the values of the
endogenous variables.
-Structural Parametersestimated using statistical
techniques and relevant data.
-Lagged Endogenous Variables
-Lagged Exogenous Variables
-Predetermined Variables
The Disturbance (or Error) Term
Stochastic, a random variable.
Statistical distribution often normal.
Captures:
1. Omission of the influence of other
variables.
2. Measurement error.
Recognition that any regression model is a parsimonious

stochastic representation of reality. Also recognition that any
regression model is stochastic and not deterministic.
OLS Estimates Associated with the
Multiple Regression Model
1
( x x) x y
T T
1 x11 x21 xk1 y1

1 x x22 xk 2 y
x 12
(nx(k 1)) y 2 (nx1)

1 x1n x2 n xkn yn
0

xT is the transpose of x ((k 1) xn) 1 ((k 1) x1)

k
The Gauss-Markov Theorem
Given the assumptions below, it can be shown that the OLS
estimator is BLUE.
- Best
- Linear
- Unbiased
- Estimator
Assumptions:
- Linear in parameters
- Corr (i, j) = 0
- Zero mean
- No perfect collinearity
- Homoscedasticity
Communication and
Aims for the Analyst
Communication
- A technician can run a program and get output.
- An analyst must interpret the findings from examination of this output.
- There are no bonus points to be given to terrific hackers but poor
analysts.
Aims
1. Improve your ability in developing models to conduct structural analysis and to
forecast with some accuracy.
2. Enhance your ability in interpreting and communicating the results, so as to
improve your decision-making.
Bottom Line
1. The analyst transforms the economic model/idea to a mathematical/statistical
one.
2. The technician estimates the model and obtains a mathematical/statistical
answer.
3. The analyst transforms the mathematical/statistical answer to an economic one.
Goodness-of-Fit
yi y i ui
Definitions :

iy y 2
is the total sum of squares (SST)

i
y y 2
is the regression sum of squares (SSR)
ui is the residual (or error) sum of squares (SSE)

2
Then SST SSR SSE

Goodness-of-Fit (continued . . .)
How well does our sample regression line fit our sample
data?
R-squared of regression is the fraction of the total sum

of squares (SST) that is explained by the model.
R = SSR/SST = 1 SSE/SST
More about R-Squared
R can never decrease when another explanatory

or predetermined variable is added to a
regression; usually R will increase.
Because R will usually increase (or at least not

decrease) with increases in the number of right-
hand side or explanatory variables, it is not
necessarily a good way to compare alternative
models with the same dependent variable.
R and Adjusted R
n

i
y y 2
Explained sample variability SSR SSE

R2 i 1
1
R Total sample variability n
SST SST
y y
2
i
i 1
2 SSE /( n k 1)
Adjusted R R 1
SST /(n 1)
Questions:
(a)Why do we care about the adjusted R ?
(b)Is adjusted R always better than R ?
(c)Whats the relationship between R and adjusted R ?
Model Selection Criteria
T
t
e 2
MSE t 1
T
T
T e 2
t
T
s 2 t 1
MSE
T p T T p
T
Is the " penalty factor"
Tp
2P

Akaike Information Criterion (AIC) AIC e T
MSE
(SIC) or (BIC)
p

SIC ( BIC ) T T
MSE
p is the number of parameters to be estimated
Model Selection Criteria Example
Model 1 Model 2 Model 3
AIC 19.35 15.83 17.15
SIC 19.37 15.86 17.17
Which model to choose?

Estimate of Error Variance
s
2 2
u n k 1 SSE df
2
i
-df = n (k + 1), or df = n k 1
-df (i.e. degrees of freedom) is the (number of
observations) (number of estimated parameters)
Variance of OLS Parameter
Estimates

Var s 2 (x T x) -1
Variance Covariance Matrix of OLS Parameter Estimates

this matrix is a function of the residual variance s 2 .
Example: SAS Output of the
Demand Function for Shrimp
Quantity sold
of shrimp
Price of shrimp
Price of finfish
Price of other shellfish
Advertising for shrimp
Advertising for finfish
Advertising for other
shellfish
Model Selection Criteria for the
QSHRIMP Problem
MSE SSE/T 1580.90 / 97 16.29
T 97
s2 MSE (16.29) 17.56
T-P 90
2P 14

AIC e T MSE e 97 (16.29) 18.82
P 7

SIC T T MSE 97 97 (16.29) 22.67

Multiple Regression Analysis - Estimation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiple Regression Analysis - Estimation

Uploaded by

Copyright:

Available Formats

Multiple Regression Analysis:

- is still the intercept

- to all called slope parameters

-u is still the error term (or disturbance term)

holding all other factors fixed

- represents the effects of advertising, holding all other factors fixed

1. What is the impact of a price change on tile scales?

Calculation of own-price elasticity?

Dependent Price Income Advertising

Q1 P1 I1 A1 Using OLS to estimate the

-Endogenous Variablesdependent variables, values of

Recognition that any regression model is a parsimonious

1 x11 x21 xk1 y1

ui is the residual (or error) sum of squares (SSE)

Then SST SSR SSE

R-squared of regression is the fraction of the total sum

R can never decrease when another explanatory

Because R will usually increase (or at least not

Explained sample variability SSR SSE

Model 1 Model 2 Model 3

AIC 19.35 15.83 17.15

SIC 19.37 15.86 17.17

Which model to choose?

Variance Covariance Matrix of OLS Parameter Estimates

MSE SSE/T 1580.90 / 97 16.29

You might also like