MGMT1050 L04 Regression W17

Lecture 04
Regression & Correlation
Copyright 2017, A. Marshall
Bivariate Statistics
>So far, we have been dealing with
statistics of individual variables
>Now we will look at statistics that
describe the relationship between
pairs of variables
Interactions
Sometimes two variables appear
related:
>smoking and lung cancers
>height and weight
>years of education and income
>engine size and gas mileage
>GMAT scores and MBA GPA
>house size and price
Interactions
>Some of these variables would appear
to positively related & others
negatively
>If these were related, we would
expect to be able to derive a linear
relationship:
y = a + bx
>where,
b is the slope, and

a is the intercept
Linear Relationships
>We will be deriving linear relationships
from bivariate (two-variable) data
>Our symbols for the true regression
relationship will be:
yi 0 1 xi or yi 0 1 xi
Slope
Intercept
1
0
Errorterm
Notation
yi
Individual observations of the

dependent variable and
typically not on the line
yi
The predicted values of the

dependent variable and always
on the line
yi yi ei
Individual error terms, the

difference between the
individual observations and the
values predicted by the linear
equation
Understanding the Two Forms

yi 0 1 xi or yi 0 1 xi
>The first relationship describes the line in
terms of the actual data points, none of which
will be on the line, hence the error term
>The second relationship describes the line
itself, based on the predicted values of y
given x, hence no error term
>When using the equation of the line
empirically, the error term is not actually used
Estimating a Line
>The symbols for the estimated
linear relationship are:
y b0 b1 x
>b1 is our estimate of the slope, 1
>b0 is our estimate of the intercept, 0
Example
>Consider the following example
comparing the returns of Consolidated
Moose Pasture stock (CMP) and the
TSX 300 Index
>The next slide shows 25 monthly
returns
Example Data
10
Example
>From the data, it appears that a
positive relationship may exist
Most of the time when the TSX is up, CMP
is up
Likewise, when the TSX is down, CMP is
down most of the time
Sometimes, they move in opposite
directions
>Lets graph this data

11
Graph Of Data
12
Example Summary Statistics

> The data do appear to be positively related
> Lets derive some summary statistics about
these data:
13
Observations
>Both have means of zero and standard
deviations just under 3
The data was deliberately created to have
means of zero for both variables for the
demonstration of the deviations on next three
slides
>However, each data point does not have

simply one deviation from the mean, it
deviates from both means
>Consider Points A, B, C and D on the next
graph
14
Graph of Data
15
Implications
>When points in the upper right and
lower left quadrants dominate, then
the sums of the products of the
deviations will be positive
>When points in the lower right and
upper left quadrants dominate, then
the sums of the products of the
deviations will be negative
16
An Important Observation
>The sums of the products of the
deviations will give us the appropriate
sign of the slope of our relationship
17
Covariance
N
COV X, Y XY
x y
i
i1
Note the notation!

n
cov X, Y sXY
x x y y
i1
n 1
xy
xy
i
i i
n 1
18
Covariance
>Excel:
=COVARIANCE.P(range1,range2)
Population Covariance
=COVARIANCE.S(range1,range2)
Sample Covariance
>Excel2007, and earlier:

=COVAR(range1,range2)
Returns POPULATION covariance
Need to multiply by n/(n-1) to convert to
sample covariance
19
Using Covariance
>In the same units as Variance (if both
variables are in the same unit), i.e.
units squared
>Very useful in Finance for measuring
portfolio risk
>Unfortunately, it is hard to interpret
for two reasons:
What does the magnitude/size imply?
The units are confusing
20
A More Useful Statistic

>We can simultaneously adjust for both
of these shortcomings by dividing the
covariance by the two relevant
standard deviations
>This operation
Removes the impact of size & scale
Eliminates the units
21
Correlation
>Correlation measures the sensitivity of
one variable to another, but ignoring
magnitude
>Range: -1 to 1
+1: Implies perfect positive co-movement
-1: Implies perfect negative co-movement
0: No relationship
22
Calculating Correlation
XY
XY
X Y
sXY
XY
rXY
sXs Y
=CORREL(range1,ran
ge2)
23
Regression Analysis
24
Regression Analysis
>A statistical technique for determining
the best fit line through a series of
data
25
Error
>No line can hit all, or even most of the
points - The amount we miss by is called
ERROR
>As we have learned, error does not
mean mistake! It simply means the
inevitable missing that will happen
when we generalize, or try to describe
things with models
>When we looked at the mean and
variance, we called the errors deviations
26
What Regression Does

>Regression finds the line that minimizes
the amount of error, or deviation from the
line
The mean is the statistic that has the
minimum total of squared deviations
>Likewise, the regression line is the unique

line that minimizes the total of the
squared errors.
>The Statistical term for what gets
minimized is Sum of Squared Errors or
SSE
Well come back to this measure later
27
Example
>Suppose we are examining the sale
prices of compact cars sold by rental
agencies and that we have the
following summary statistics:
28
Summary Statistics
> Our best estimate
of the average price
would be $5,411
> Using the Empirical
Rule, a 95%
Confidence Interval
would be $5,411
(2)(255) or $5,411
(510) or $4,901 to
$5,921
29
Something Missing?
>Clearly, looking at this data in such a
simplistic way ignores a key factor:
the mileage of the vehicle
30
Price vs. Mileage
31
Importance of the Factor

>After looking at the scatter graph, you
would be inclined to revise your
estimate depending on the mileage
25,000 km about $5,700 - $5,900
45,000 km about $5,100 - $5,300
32
Key Summary Statistics
33
Estimating the Regression

Line
>From Graph
Right-click on a data point

Choose Add Trendline > Linear
Options tab: check Display equation on
chart
>Excel Functions
=INTERCEPT(known ys,known xs)
=SLOPE(known ys,known xs)
>Regression tool in Data Analysis

>Using Formulas Manually
34
From Graph
35
Excel Functions
=INTERCEPT(B3:B102,A3:A102)
=SLOPE(B3:B102,A3:A102)
36
From Graph & Functions

>While these methods are a quick way
to get the equation, they do not
provide you with any statistics to use
to assess the model
>The Regression tool in Data Analysis
provides a full range of statistics
37
The Regression Tool

>Office 2007 & later:
Data Analysis will appear at far right on
ribbon bar
Choose Regression from the dialogue box
menu.
38
Switch to Excel
File CarPrice.xls
Tab Raw Data
39
Excel Regression Output
40
Stripped Down Output
41
Interpretation
>Our estimated relationship is
>Price = $6,533 - 0.031(km)
Every 1000 km reduces the price by an
average of $31
What does the $6,533 mean?
Careful! It is outside the data range!
42
A Useful Formula
cov(
x
,
y
)
1 b1
var(x)
>The estimate of the slope coefficient
is the ratio of the covariance between
the dependent and independent
variables and the variance of the
independent variable
43
Estimating Slope
cov(x, y) - 1356256
1 b1
0.03115...
var(x)
43528690
>We get the same slope as the
regression output using the ratio of
the covariance to the variance of the
independent variable
44
Estimating the Intercept

>We now have the slope, but need the
intercept
>Every estimated regression line goes through
the point (x-bar, y-bar), the joint mean
>Therefore:
b y b x
0
0
1
5411.41 0.03115... 36009.45

6533.383
45
Manual Calculation
> To calculate regression manually, you will
need the following summary statistics
x, y, x , y , xy
2
46
For Manual Calculation
(Note: Data are both in 1000s to keep squared

values small)
47
Formulae - I
x y
xy
n
b1
2
2
x n
b0 y b1 x
48
Formulae - II
cov x, y
b1
, b0 y b1 x
var x
cov x, y
x y
xy
n 1
var x
n 1
49
Car Price Example Calculations

x y
xy
19351.92
3600.945 541.141
100
n
2
2
3600.945
2
133977.3892
x n
100
134.2693
0.031158
4309.3403
y
x 541.141
3600.945
b0 y b1 x
b1
0.031158
n
n
100
100
6.5334
b1
50
Assessing the Model
51
Assessing and Testing

>You will learn the statistics for
assessing and testing the quality of
the model in Business Analytics II
>We must ask, Is the model logical?
>We will look at the R2, (R-Squared)
AKA Coefficient of Determination
52
R : Coefficient of Determination
2
>The R2 (R-squared) tells of the

proportion of the variability in our
dependent variable is explained by
variability of the independent variable
>It is the square of the correlation
coefficient for simple regression
53
Car Price Example
54
Manual Calculations
R
2
cov x, y
s s
2
x
2
y
1.35626
43.52869 0.06499
2
0.65023 65.023%
Minor difference due to
rounding
Copyright
2017, A. Marshall
55
Car Price Example: Quality

>Logical: Price is lowered as mileage
increases, and by a plausible amount.
>The R-squared: 0.65: 65% of the
variation in price is explained by
variation in mileage
56
CMP-TSX Example
57
TSX-CMP Regression Output

(Abridged)
58
Interpreting the Output
59
The TSX-CMP Example

>Cov(TSX,CMP) = 4.875
>Var(TSX) = 7.25
>b1 = 4.875/7.25 = 0.6724
60
For Next Class

>Do questions from Chapter 4
>Read Chapter 6
>Assignment Posted, Quiz (?) for next
week
>Midterm, Sunday, February 12, covering
Weeks 1-4, Chapters 1-5
YOU LEARN STATISTICS

BY DOING STATISTICS
61

MGMT1050 L04 Regression W17

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MGMT1050 L04 Regression W17

Uploaded by

Copyright:

Available Formats

Lecture 04

Regression & Correlation

Copyright 2017, A. Marshall

Copyright 2017, A. Marshall

b is the slope, and

Copyright 2017, A. Marshall

Individual observations of the

The predicted values of the

Individual error terms, the

Copyright 2017, A. Marshall

Understanding the Two Forms

Copyright 2017, A. Marshall

Copyright 2017, A. Marshall

Copyright 2017, A. Marshall

>Lets graph this data

Copyright 2017, A. Marshall

Example Summary Statistics

Copyright 2017, A. Marshall

>However, each data point does not have

Copyright 2017, A. Marshall

Copyright 2017, A. Marshall

Copyright 2017, A. Marshall

Note the notation!

Copyright 2017, A. Marshall

>Excel2007, and earlier:

A More Useful Statistic

Copyright 2017, A. Marshall

Copyright 2017, A. Marshall

Copyright 2017, A. Marshall

Copyright 2017, A. Marshall

What Regression Does

>Likewise, the regression line is the unique

Copyright 2017, A. Marshall

Copyright 2017, A. Marshall

Copyright 2017, A. Marshall

Price vs. Mileage

Copyright 2017, A. Marshall

Importance of the Factor

Copyright 2017, A. Marshall

Key Summary Statistics

Copyright 2017, A. Marshall

Estimating the Regression

Right-click on a data point

>Regression tool in Data Analysis

Copyright 2017, A. Marshall

Copyright 2017, A. Marshall

From Graph & Functions

Copyright 2017, A. Marshall

The Regression Tool

Copyright 2017, A. Marshall

Copyright 2017, A. Marshall

Excel Regression Output

Copyright 2017, A. Marshall

Stripped Down Output

Copyright 2017, A. Marshall

Copyright 2017, A. Marshall

Copyright 2017, A. Marshall

Copyright 2017, A. Marshall

Estimating the Intercept

5411.41 0.03115... 36009.45

Copyright 2017, A. Marshall

Copyright 2017, A. Marshall

For Manual Calculation

(Note: Data are both in 1000s to keep squared