You are on page 1of 23

Multiple Regression Analysis:

Estimation
Multiple Regression Model
y = 0 + 1x1 + 2x2 + + kxk + u

- is still the intercept


0

- to all called slope parameters


1 k

-u is still the error term (or disturbance term)


-Zero mean assumption
E(u) = 0
-Still minimize the sum of squared residuals
Multiple Regression Model:
Example
Demand Estimation:
-Dependent variable: Q, tile cases (in 1000 of cases)
-Right-hand side variables: tile price per case (p), income per
capita I (in 1000 of $), and advertising expenditure A (in
1000 $)

Regression: Q = 0 + 1P + 2I + 3A + u

Interpretation:
- measures the effects of the tile price on the tile consumption,
1

holding all other factors fixed


- represents the effects of income, holding all other factors fixed
2

- represents the effects of advertising, holding all other factors fixed


3
Q = 17.513 0.296P + 0.066I + 0.036A

1. What is the impact of a price change on tile scales?


2. What is the impact of a change in income on tile scales?
3. What is the impact of a change in advertising
expenditures on tile scales?

Calculation of own-price elasticity?


Calculation of income elasticity?
Calculation of advertising elasticity?
Random Sampling
-Collecting sales data of 23 tile stores in 2002 in the
market
-For each observation, Q = + P + I + A + ui 0 1 i 2 i 3 i i

-Goal: Estimate , , , 0 1 2 3

Dependent Price Income Advertising


Variable

Q1 P1 I1 A1 Using OLS to estimate the


coefficients to minimize the
Q2 P2 I2 A2 sum of squared errors.
Q3 P3 I3 A3

Q23 P23 I23 A23

min
n n
ui
2
yi 0 1 Pi 2 I i 3 Ai
2
min

0, 1, 2, 3 i 1

0, 1, 2, 3 i 1
The Generic Multiple Regression
Model Y X Y X ... X
i 0 1 i 1,..., n
1i 2 2i k ki i
i
j
X ji
Y1 1 X11 X12 ... X1k
Y 1 X X 22 ... X 2 k
Y 2 X 21



Y
n nx1 1 X n1 X n 2 ... X nk nx ( k 1)

0 1

1 2


k ( k 1) x1 n nx1
Estimation of regression parameters:
-Least Squares (no knowledge of the distribution of the error or disturbance terms is required).
-The use of the matrix notation allows a view of how the data are housed in software programs.
Components of the Model

-Endogenous Variablesdependent variables, values of


which are determined within the system.
-Exogenous Variablesdetermined outside the system
but influence the system by affecting the values of the
endogenous variables.
-Structural Parametersestimated using statistical
techniques and relevant data.
-Lagged Endogenous Variables
-Lagged Exogenous Variables
-Predetermined Variables
The Disturbance (or Error) Term
Stochastic, a random variable.
Statistical distribution often normal.

Captures:
1. Omission of the influence of other
variables.
2. Measurement error.

Recognition that any regression model is a parsimonious


stochastic representation of reality. Also recognition that any
regression model is stochastic and not deterministic.
OLS Estimates Associated with the
Multiple Regression Model
1
( x x) x y
T T

1 x11 x21 xk1 y1


1 x x22 xk 2 y
x 12
(nx(k 1)) y 2 (nx1)


1 x1n x2 n xkn yn

0


xT is the transpose of x ((k 1) xn) 1 ((k 1) x1)


k
The Gauss-Markov Theorem
Given the assumptions below, it can be shown that the OLS
estimator is BLUE.
- Best
- Linear
- Unbiased
- Estimator

Assumptions:
- Linear in parameters
- Corr (i, j) = 0
- Zero mean
- No perfect collinearity
- Homoscedasticity
Communication and
Aims for the Analyst
Communication
- A technician can run a program and get output.
- An analyst must interpret the findings from examination of this output.
- There are no bonus points to be given to terrific hackers but poor
analysts.

Aims
1. Improve your ability in developing models to conduct structural analysis and to
forecast with some accuracy.
2. Enhance your ability in interpreting and communicating the results, so as to
improve your decision-making.

Bottom Line
1. The analyst transforms the economic model/idea to a mathematical/statistical
one.
2. The technician estimates the model and obtains a mathematical/statistical
answer.
3. The analyst transforms the mathematical/statistical answer to an economic one.
Goodness-of-Fit

yi y i ui
Definitions :

iy y 2
is the total sum of squares (SST)

i
y y 2
is the regression sum of squares (SSR)

ui is the residual (or error) sum of squares (SSE)


2

Then SST SSR SSE


Goodness-of-Fit (continued . . .)
How well does our sample regression line fit our sample
data?

R-squared of regression is the fraction of the total sum


of squares (SST) that is explained by the model.

R = SSR/SST = 1 SSE/SST
More about R-Squared

R can never decrease when another explanatory


or predetermined variable is added to a
regression; usually R will increase.

Because R will usually increase (or at least not


decrease) with increases in the number of right-
hand side or explanatory variables, it is not
necessarily a good way to compare alternative
models with the same dependent variable.
R and Adjusted R
n

i
y y 2

Explained sample variability SSR SSE


R2 i 1
1
R Total sample variability n
SST SST
y y
2
i
i 1

2 SSE /( n k 1)
Adjusted R R 1
SST /(n 1)

Questions:
(a)Why do we care about the adjusted R ?
(b)Is adjusted R always better than R ?
(c)Whats the relationship between R and adjusted R ?
Model Selection Criteria
T

t
e 2

MSE t 1

T
T

T e 2
t
T
s 2 t 1
MSE
T p T T p

T
Is the " penalty factor"
Tp

2P

Akaike Information Criterion (AIC) AIC e T
MSE
(SIC) or (BIC)
p

SIC ( BIC ) T T
MSE
p is the number of parameters to be estimated
Model Selection Criteria Example

Model 1 Model 2 Model 3

AIC 19.35 15.83 17.15

SIC 19.37 15.86 17.17

Which model to choose?


Estimate of Error Variance

s
2 2
u n k 1 SSE df
2
i

-df = n (k + 1), or df = n k 1
-df (i.e. degrees of freedom) is the (number of
observations) (number of estimated parameters)
Variance of OLS Parameter
Estimates

Var s 2 (x T x) -1

Variance Covariance Matrix of OLS Parameter Estimates


this matrix is a function of the residual variance s 2 .
Example: SAS Output of the
Demand Function for Shrimp
Quantity sold
of shrimp
Price of shrimp
Price of finfish
Price of other shellfish
Advertising for shrimp
Advertising for finfish
Advertising for other
shellfish
Model Selection Criteria for the
QSHRIMP Problem

MSE SSE/T 1580.90 / 97 16.29

T 97
s2 MSE (16.29) 17.56
T-P 90

2P 14

AIC e T MSE e 97 (16.29) 18.82

P 7

SIC T T MSE 97 97 (16.29) 22.67

You might also like