Professional Documents
Culture Documents
Multiple Introduction
Regression • Multiple regression model allows us to evaluate the impact of multiple
Analysis (3 + 3 + 3) independent variable on a dependent variable
• The slope coefficients measure how much the dependent variable Y changes
when the independent variable Xj changes by one unit, others stay the same
Limitations
• Same as Simple Linear Regression
Assumptions
Violation of Description
assumptions – • Variance of the error term is non-constant.
Heteroskedasticit • Unconditional: Not related to independent variables – no major problems.
y
• Conditional: related to independent variable – a major problem.
Effects
• For T-test, underestimated standard errors of the regression coefficients,
hence overstated actual t-stats too high, so coefficients might appear
significant when they are not (Type I error)
• For F-test, the MSE becomes a biased estimator of the true population
variance
• Does not affect the consistency of the estimators of the regression parameters
and the estimates of regression coefficients
Detection
• Scatter diagrams: plot residual against each independent variable and against
time
• BP test (Regress the residuals-squared against the independent variables)
o Null hypothesis vs alternate hypothesis
Detection
• Null hypothesis vs alternate hypothesis
o Ho: No serial correlation
o Ha: Serial correlation exists
• DW test statistics
o If regression residuals are positively serially correlated, DW-stat will be
less than 2 (0 when serial correlation = +1)
o If regression residuals are positively serially correlated, DW-stat will be
greater than 2 (4 when serial correlation = -1)
o If regression residuals are not serially correlated, DW-stat = 2
o Critical DW-stat is not known with certainty, only upper & lower values
• Decision rule
Correction
• Adjust the coefficient standard errors, e.g. using the Hansen method (which
also corrects for conditional heteroskedasticity). After correction, regression
coefficients/ DW-stat remains the same but robust standard errors are larger
• Modify regression equation to eliminate the serial correlation
Violation of Description
assumptions – • Two or more independent variables are mutually correlated, making the
Multicollinearity interpretation of the regression output problematic.
Effects
• Does not affect the consistency of the estimators of the regression parameters
but itake the estimates of regression parameters inaccurate and unreliable
• Overestimated SEE and coefficient standard error, hence underestimated t-
stats and the null is rejected less frequently leading to Type II errors
Detection
• When there are only two independent variables, one indicator is the high
correlation coefficient between them (rule of thumb: > 0.7)
• When dealing with more than two independent variables, low pair correlations
could still lead to multicollinearity
• Conflicting t- and F-tests: significant F-statistic combined with insignificant
individual t-statistic, but exception still exists
Correction
• Exclude one or more of the independent variables from the regression model
• Advanced tepwise regression to remove variables from the regression model
Model Principles
Specification • The model should be based on economic reasoning. This reduces the risk of
Issues finding relationships by simply mining the data.
• The functional form for the variables should be appropriate given the nature of
the variables. Transforming the data may be necessary.
• The model should be parsimonious, which means accomplishing a lot with a
little. Each variable in the regression should be important.
• Examine the model for violations of regression assumptions before accepting
the results.
• Test the model with out of sample data. This means use data outside the
dataset that was used to create the model.