Professional Documents
Culture Documents
Outline:
Simple linear regression
Multiple linear regression
60
50
40
ANN.RISK
30
20
10
20
30
40
50
60
70
80
90
100
AV.CRED
= slope of line
Model: y = + x +
0
1
Line of means
y = 0 + 1 x
+x
0 1 i
yi
0
i = y - ( + x i)
i
0 1
random error
observed
value
xi
mean
value
1.5
^
^
^
Problem: If y = + x is least square line,
0
1
^
^
How do we find the coefficients 0 and 1 ?
We have explicit formulas:
^
2
1 = Cov(x,y) / sx
^
^
=y- x
1
0
Coefficients
Standard
Error
t Stat
Pvalue
Lower 95%
Upper 95%
Intercept
1.870828693
0.5345
0.6875
-22.7710306
24.7710306
X Variable 1
1.732050808
0.5774
0.6667
-21.0076979
23.0076979
Hence y^ = 1 + x
H 0 : 1 = 0
H a : 1 0
Standard
Error
t Stat
Pvalue
Lower 95%
Upper 95%
Intercept
1.870828693
0.5345
0.6875
-22.7710306
24.7710306
X Variable 1
1.732050808
0.5774
0.6667
-21.0076979
23.0076979
is:
^
^
( - t
s^ , + t
s^ )
1 n-2;/2
1
n-2;/2
1
1
Forget the formula, but observe that structure is similar
To confidence interval of mean and population proportion
Excel yields
Coefficients
Standard
Error
t Stat
Pvalue
Lower 95%
Upper 95%
Intercept
1.870828693
0.5345
0.6875
-22.7710306
24.7710306
X Variable 1
1.732050808
0.5774
0.6667
-21.0076979
23.0076979
R2
SSyy - SSE
=
SSyy
where
^
SSyy = (yi y )2
and
Explained variation y
Variation in y
R2
SSyy - SSE
=
SSyy
Properties:
Explained variation y
a. 0 R2
Variation in y
explanation of property b.
SSyy - SSE
If
= 0, then SS yy= SSE,
SSyy
then x contributes no information about y,
since observed points are in the same distance
from the line y = y, as from the least square line
If
SSyy - SSE
= 1, we have SSE = 0,
SSyy
hence all observed points are on the
least square line
Excel yields:
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.5
R Square
0.25
Adjusted R
-0.5
Standard E 1.224745
Observatio
3
ANOVA
df
Regression
Residual
Total
SS
1
1
2
MS
0.5
1.5
2
ignificance F
F
0.5 0.333333 0.666667
1.5
x
2306
2677
2324
1447
3333
3004
4142
2923
2902
1847
2148
2819
145541
179900
149000
113900
189000
184500
339717
228000
209000
133000
168000
205000
1753
3206
2474
2933
3987
2598
4934
2253
2998
2791
2865
4417
129900
235000
129900
199500
319000
185500
375000
169000
185900
189800
192000
379900
Standard
Error
P-value
Lower
95%
Upper
95%
t Stat
Intercept
-39001.1
18237.94
-2.13846
0.043834
-76824.4
-1177.92
X Variable 1
84.98698
6.095676
13.94217
2.12E-12
72.3453
97.62865
^y = -39001.1 + 84.987 x
Standard
Error
P-value
Lower
95%
Upper
95%
t Stat
Intercept
-39001.1
18237.94
-2.13846
0.043834
-76824.4
-1177.92
X Variable 1
84.98698
6.095676
13.94217
2.12E-12
72.3453
97.62865
Coefficie
nts
Standard
Error
t Stat
P-value
Lower
95%
Upper
95%
Intercept
-39001.1
18237.94
-2.13846
0.043834
-76824.4
-1177.92
X Variable 1
84.98698
6.095676
13.94217
2.12E-12
72.3453
97.62865
SUMMARY OUTPUT
Regression Statistics
Multiple R
0.947802
R Square
0.898329
Adjusted R Square
0.893708
Standard Error
24383.43
Observations
24
Question:
Predict the selling price of a home
with a area of 3000 square feet. Use a
95 % confidence interval (prediction interval)
3000
164326.0100 267593.5720
PI
0.10
y = 0.7553x + 0.0026
0.08
0.06
0.04
0.02
-0.05
0.00
0.00
-0.02
0.05
0.10
-0.04
0.10
Regression Statistics
Multiple R
0.846514
R Square
0.716586
Adjusted R
Square
0.692968
Standard
Error
0.016815
Observatio
ns
14
0.08
0.06
0.04
0.02
-0.05
Intercept
X Variable
1
0.00
0.00
-0.02
-0.04
ANOVA
Regression
Residual
Total
y = 0.7553x + 0.0026
df
SS
1
12
13
0.008579
0.003393
0.011971
Coefficients
0.002637
Standard
Error
0.004621
0.755344
0.13713
MS
0.008579
0.000283
F
30.34084
t Stat
P-value
0.570526 0.578848
5.508252
0.000134
Significance F
0.000134465
Lower 95%
Upper 95% Lower 90.0% Upper 90.0%
-0.007432405 0.0127056 -0.005599936
0.0108731
0.456564681
1.0541242
0.510940035
0.9997488
0.05
0.10
R
+
with
c
=
(1)
R
t+1
t
t+1
(longtermrate.xlsx)
period
interest
rate
1
7.229
2
7.725
3
7.671
4
8.037
5
7.516
6
6.996
7
6.719
8
7.056
9
7.243
y=R_t+1 x=R_t
7.725
7.229
7.671
7.725
8.037
7.671
7.516
8.037
6.996
7.516
6.719
6.996
7.056
6.719
7.243
7.056
7.109
7.243
= 0.9075
c = 0.5922
SUMMARY
OUTPUT
Regression
Statistics
Multiple R 0.905166
R Square 0.819326
Adjusted
R Square 0.816817
Standard
Error
0.501022
Observati
ons
74
ANOVA
df
Regressio
n
Residual
Total
Rt+1 = c + Rt + t+1
SS
MS
Significan
ce F
Coefficien Standard
Lower
Upper
Lower
Upper
ts
Error
t Stat
P-value
95%
95%
90.0%
90.0%
Intercept 0.592209 0.338067 1.751747 0.084075 -0.08172 1.266134 0.028889 1.155528
X
Variable 1 0.907522 0.050224 18.06955 1.83E-28 0.807403 1.007641 0.823834 0.99121
Substitution in
c = (1- ) gives:
0.5922= (1-0.9075)
Hence = 6.4038
average income
average income
single man
single woman
1 000 euro
1 000 euro
15.4
13.9
2000
17.2
15.1
2001
17.6
15.8
2002
17.4
16
2003
17.6
16.2
2004
17.8
16.5
2005
18.9
17.1
2006
19.7
17.7
2007
20.2
18
2008
20.1
18.2
2009
20
18.1
2010
SUMMARY OUTPUT
(income) (man/woman)
y
x
15.4
1
17.2
1
17.6
1
17.4
1
17.6
1
17.8
1
18.9
1
19.7
1
20.2
1
20.1
1
20
1
13.9
0
15.1
0
15.8
0
16
0
16.2
0
16.5
0
17.1
0
17.7
0
18
0
18.2
0
18.1
0
Regression Statistics
Multiple R
0.53318
R Square
0.284281
Adjusted R
Square
0.248495
Standard
Error
1.459919
Observatio
ns
22
ANOVA
df
Regressio
n
Residual
Total
Intercept
X Variable
1
SS
1
20
21
16.93136
42.62727
59.55864
Coefficient Standard
s
Error
16.6 0.440182
1.754545
0.622512
MS
16.93136
2.131364
F
7.943911
Significanc
eF
0.010614
t Stat
P-value Lower 95% Upper 95%
37.71166 4.67E-20
15.6818
17.5182
Lower
95.0%
15.6818
Upper
95.0%
17.5182
2.818495
0.456009
3.053082
0.010614
0.456009
3.053082
Conclusion: mean income single men > mean income single women
y
10012
326
13376
13767
662
857
1259
18842
6763
16681
7094
10021
5142
5104
7039
x1
50.24
1.44
64.71
49.14
3.61
2.84
7.89
82.3
26.8
45.2
35
43.8
28
20.1
37
x2
1072
20
1354
1199
26
503
64
1634
4239
5269
3383
3472
1621
2098
2006
Coefficie
nts
Standard
Error
t Stat
P-value
-754.711
895.5027
-0.84278
0.415835
-2705.843455
1196.422214
X Variable 1
217.493
21.96107
9.903567
3.98E-07
169.6438844
265.3420219
X Variable 2
0.713124
0.327711
2.176075
0.050246
-0.000897191
1.427144982
Intercept
Lower 95%
Upper 95%
SS
MS
4.52E+08
2.26E+08
68.24643
Residual
12
39727837
3310653
Total
14
4.92E+08
Regression
Significance F
2.78519E-07
Standard
Error
t Stat
P-value
-754.711
895.5027
-0.84278
0.415835
-2705.843455
1196.422214
X Variable 1
217.493
21.96107
9.903567
3.98E-07
169.6438844
265.3420219
X Variable 2
0.713124
0.327711
2.176075
0.050246
-0.000897191
1.427144982
Intercept
Lower 95%
Upper 95%
0.958743
R Square
0.919188
Adjusted R Square
0.905719
Standard Error
Observations
1819.52
15
adjR2 = 0.9192
Hence, 91.92 % of sample variation in y can
be explained by the linear model
We use SPSS
y
10012
326
13376
13767
662
857
1259
18842
6763
16681
7094
10021
5142
5104
7039
,
x1
50
1
65
49
4
3
8
82
27
45
35
44
28
20
37
50
x2
lci
uci
lpi
upi
1072
20
1354
1199
26
503
64
1634
4239
5269
3383
3472
1621
2098
2006
1000
9398,11
-2334,43
12322,22
9332,70
-1801,68
-1532,50
-735,34
15688,44
6000,59
10328,59
7799,05
9764,18
5438,21
3870,05
7684,96
9272,79
12475,08
1479,91
16247,43
12243,15
1899,64
1975,84
2749,24
20931,96
10193,46
15338,24
10741,02
12730,70
7543,91
6356,00
9761,15
12393,32
6684,15
-4826,54
9861,22
6564,88
-4326,10
-4113,48
-3323,40
13557,30
3612,45
8144,01
5041,54
7014,66
2389,24
958,34
4624,99
6572,67
15189,05
3972,029
18708,42
15010,97
4424,06
4556,82
5337,30
23063,10
12581,61
17522,83
13498,54
15480,23
10592,88
9267,71
12821,11
15093,44
95% PI