Professional Documents
Culture Documents
Bivariate Statistics
>So far, we have been dealing with
statistics of individual variables
>Now we will look at statistics that
describe the relationship between
pairs of variables
Interactions
Sometimes two variables appear
related:
>smoking and lung cancers
>height and weight
>years of education and income
>engine size and gas mileage
>GMAT scores and MBA GPA
>house size and price
Copyright 2017, A. Marshall
Interactions
>Some of these variables would appear
to positively related & others
negatively
>If these were related, we would
expect to be able to derive a linear
relationship:
y = a + bx
>where,
Linear Relationships
>We will be deriving linear relationships
from bivariate (two-variable) data
>Our symbols for the true regression
relationship will be:
yi 0 1 xi or yi 0 1 xi
Slope
Intercept
1
0
Errorterm
Copyright 2017, A. Marshall
Notation
yi
yi
yi yi ei
Estimating a Line
>The symbols for the estimated
linear relationship are:
y b0 b1 x
>b1 is our estimate of the slope, 1
>b0 is our estimate of the intercept, 0
Example
>Consider the following example
comparing the returns of Consolidated
Moose Pasture stock (CMP) and the
TSX 300 Index
>The next slide shows 25 monthly
returns
Example Data
10
Example
>From the data, it appears that a
positive relationship may exist
Most of the time when the TSX is up, CMP
is up
Likewise, when the TSX is down, CMP is
down most of the time
Sometimes, they move in opposite
directions
11
Graph Of Data
12
13
Observations
>Both have means of zero and standard
deviations just under 3
The data was deliberately created to have
means of zero for both variables for the
demonstration of the deviations on next three
slides
14
Graph of Data
15
Implications
>When points in the upper right and
lower left quadrants dominate, then
the sums of the products of the
deviations will be positive
>When points in the lower right and
upper left quadrants dominate, then
the sums of the products of the
deviations will be negative
16
An Important Observation
>The sums of the products of the
deviations will give us the appropriate
sign of the slope of our relationship
17
Covariance
N
COV X, Y XY
x y
i
i1
x x y y
i1
n 1
xy
xy
i
i i
n 1
18
Covariance
>Excel:
=COVARIANCE.P(range1,range2)
Population Covariance
=COVARIANCE.S(range1,range2)
Sample Covariance
19
Using Covariance
>In the same units as Variance (if both
variables are in the same unit), i.e.
units squared
>Very useful in Finance for measuring
portfolio risk
>Unfortunately, it is hard to interpret
for two reasons:
What does the magnitude/size imply?
The units are confusing
Copyright 2017, A. Marshall
20
21
Correlation
>Correlation measures the sensitivity of
one variable to another, but ignoring
magnitude
>Range: -1 to 1
+1: Implies perfect positive co-movement
-1: Implies perfect negative co-movement
0: No relationship
22
Calculating Correlation
XY
XY
X Y
sXY
XY
rXY
sXs Y
=CORREL(range1,ran
ge2)
Copyright 2017, A. Marshall
23
Regression Analysis
24
Regression Analysis
>A statistical technique for determining
the best fit line through a series of
data
25
Error
>No line can hit all, or even most of the
points - The amount we miss by is called
ERROR
>As we have learned, error does not
mean mistake! It simply means the
inevitable missing that will happen
when we generalize, or try to describe
things with models
>When we looked at the mean and
variance, we called the errors deviations
Copyright 2017, A. Marshall
26
27
Example
>Suppose we are examining the sale
prices of compact cars sold by rental
agencies and that we have the
following summary statistics:
28
Summary Statistics
> Our best estimate
of the average price
would be $5,411
> Using the Empirical
Rule, a 95%
Confidence Interval
would be $5,411
(2)(255) or $5,411
(510) or $4,901 to
$5,921
29
Something Missing?
>Clearly, looking at this data in such a
simplistic way ignores a key factor:
the mileage of the vehicle
30
31
32
33
>From Graph
>Excel Functions
=INTERCEPT(known ys,known xs)
=SLOPE(known ys,known xs)
34
From Graph
35
Excel Functions
=INTERCEPT(B3:B102,A3:A102)
=SLOPE(B3:B102,A3:A102)
36
37
38
Switch to Excel
File CarPrice.xls
Tab Raw Data
39
40
41
Interpretation
>Our estimated relationship is
>Price = $6,533 - 0.031(km)
Every 1000 km reduces the price by an
average of $31
What does the $6,533 mean?
Careful! It is outside the data range!
42
A Useful Formula
cov(
x
,
y
)
1 b1
var(x)
>The estimate of the slope coefficient
is the ratio of the covariance between
the dependent and independent
variables and the variance of the
independent variable
43
Estimating Slope
cov(x, y) - 1356256
1 b1
0.03115...
var(x)
43528690
>We get the same slope as the
regression output using the ratio of
the covariance to the variance of the
independent variable
44
b y b x
0
0
1
45
Manual Calculation
> To calculate regression manually, you will
need the following summary statistics
x, y, x , y , xy
2
46
47
Formulae - I
x y
xy
n
b1
2
2
x n
b0 y b1 x
Copyright 2017, A. Marshall
48
Formulae - II
cov x, y
b1
, b0 y b1 x
var x
cov x, y
x y
xy
n 1
var x
Copyright 2017, A. Marshall
n 1
49
xy
19351.92
3600.945 541.141
100
n
2
2
3600.945
2
133977.3892
x n
100
134.2693
0.031158
4309.3403
y
x 541.141
3600.945
b0 y b1 x
b1
0.031158
n
n
100
100
6.5334
b1
50
51
52
R : Coefficient of Determination
2
53
54
Manual Calculations
R
2
cov x, y
s s
2
x
2
y
1.35626
43.52869 0.06499
2
0.65023 65.023%
Minor difference due to
rounding
Copyright
2017, A. Marshall
55
56
CMP-TSX Example
57
58
59
60
61