Professional Documents
Culture Documents
Bivariate Regression
Analysis and Forecasting
Section three
Paul Bottomley
Bottomleypa@cardiff.ac.uk
Silver, Ch.7, pp.116-121.
1
Regression Analysis
Regression is an obvious extension to correlation. It is
useful for building statistical models and making forecasts.
2
Dependent and Independent Variables
To estimate a regression, we must decide first which
variable is independent and which variable is dependent.
causes
X
Independent
Y
Dependent
Price (X)
5
Interpreting Regression Lines
Salary Review
Annual Total Production Costs
Salary () Costs ()
Women
Men
b b
a a
1 unit 1 unit
Years of Quantity
Service / Output
a = starting salary a = fixed costs
b = annual increment b = marginal costs
6
What is the Best Straight Line?
We could draw a line on the scatter plot by hand, but
each person would draw a different best fitting line.
A more precise method is Ordinary Least Squares.
The vertical distances between the data points and the
line are called errors. With n data points, we have n
errors, denoted as e1, e2, e3,.. en,
Sales
*
(Y) *
* ei
* * *
*
* *
ek
*
*
Price (X) 7
What is the Best Straight Line?
Obviously, a good fitting line will have small errors.
For ej the line under-predicts sales for this model of TV.
For ek the line over-predicts sales for this model of TV.
Because errors can be positive or negative, so they dont
cancel each other out, we square each error.
Ordinary Least Squares (OLS) fits the line that minimises:
n
e e e ... e min e
2
1
2
2
2
3
2
n
2
i
i 1
8
The Regression Coefficients
Least squares estimates for the intercept (a) and slope (b)
n XY X Y Covariance XY
b
n X ( X )
2 2
Variance X
_
a Yb X
_
Y X
b
n n
These formula give the best fitting straight line. Calculations
are based on same table used for the correlation (r).
9
Demand for Nikkai Televisions
Sales (Y) Price (X) 250
213 132
200
192 181
168 200 150
Sales
160 149 100
119 191
50
96 163
79 220 0
0 50 100 150 200 250 300
74 186
Price ()
68 260
11
Calculating Regression Coefficients
(9 * 207509) (1682 *1169)
r 0.656
[(9 * 326032) 1682 ][(9 *175775) 1169 ]
2 2
250
Unit Sales
200
150
Slope = -0.94
100
50
0
0 50 100 150 200 250 300 350
Price ()
Note: regression dips below horizontal X-axis when price > 325
13
Interpreting the Regression Line
Nikkai: Y^ = 305.55 0.94xPrice
18
Nikkai: Actual vs. Predicted Sales
350
300
250
Unit Sales
200
150
100
50
0
0 50 100 150 200 250 300 350
P1 Price P2
R2 = r2 = (-0.656)2 = 0.43 X
So 57% variance is unexplained whats missing from our model?
20
Some Thoughts and Reflections
Guidelines: With cross-sectional data an R2 of 0.4 is OK;
with time-series data an R2 should approach 0.8 or 0.9.
But, dont simply look at the fit (R2) of the model, and the
impact of the variable(s) included in the model...
Ask yourself: has the market researcher included the main
drivers? What variables are missing?
21
Problems and Assumptions
Underlying Regression Analysis
Issues affecting interpretation of regression coefficients.
Omitted variables
Use of dummy variables Try asking an
Non-linear relationships Econometrician!
* Max. = 496.94
* Median = -78.07 200
* Q1 = -174.15 0
* Q3 = 196.34
-200
* IQR = 370.49
* Lower hinge = Q1 - (1.5*IQR)
-400
Unstandardized Resid
23
Dr Saeed Heravi (An Econometrician)!
MBA dissertations with
quantitative emphasis
Time series forecasting
Other multivariate models
Cluster analysis for market
segmentation
Perceptual maps for brand
positioning
DEA for store performance
(e.g. Ann Summers)
24