You are on page 1of 9

STA 302 H1F / 1001 HF Fall 2011

Test
October 25, 2011
LAST NAME: SOLUTIONS FIRST NAME:
STUDENT NUMBER:
INSTRUCTIONS:
Time: 90 minutes
Aids allowed: calculator.
All of the formulae below can be taken as known unless a question indicates otherwise.
SAS output for question 2 is on the last 2 pages (pages 8 and 9).
Total points: 50
Some formulae:
b
1
=

(x
i
x)(y
i
y)

(x
i
x)
2
=

(x
i
x)y
i

(x
i
x)
2
=

x
i
y
i
nxy

x
2
i
nx
2
b
0
= y b
1
x
Var(

1
| X) =

2

(x
i
x)
2
Var(

0
| X) =
2
_
1
n
+
x
2

(x
i
x)
2
_
Cov(

0
,

1
| X) =

2
x

(x
i
x)
2
SST =

(y
i
y)
2
RSS =

(y
i
y
i
)
2
SSReg = b
2
1

(x
i
x)
2
=

( y
i
y)
2
Var( y|X = x

) =
2
_
1
n
+
(x

x)
2

(x
i
x)
2
_
Var(Y y|X = x

) =
2
_
1 +
1
n
+
(x

x)
2

(x
i
x)
2
_
r =

(x
i
x)(y
i
y)

(x
i
x)
2

(y
i
y)
2
SXX =

(x
i
x)
2
=

x
2
i
nx
2
h
ij
=
1
n
+
(x
i
x)(x
j
x)
SXX
_
h
ii
>
4
n
_
DFBETAS
ik
=
b
k
b
k(i)
s.e.(b
k(i)
)
_
> 1 or
2

n
_
DFFITS
i
=
y
i
y
i(i)
s.e.( y
i(i)
)
_
> 1 or 2
_
2
n
_
D
i
=

( y
j(i)
y
j
)
2
2s
2
_
>
4
n2
_
1abc 1de 1fg 2abcd 2efghi 2jkl
1
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
1. Suppose that a simple linear regression model, Y =
0
+
1
x + e has been t to some data
with n observations, (x
i
, y
i
), i = 1, . . . , n, and the usual statistics have been calculated. For
this question, assume that the independent variables are not random. The assumptions of
the model are:
(1) The form of the model is appropriate.
(2) The error terms have expectation 0.
(3) The error terms have constant variance.
(4) The error terms are uncorrelated.
(5) The error terms are normally distributed.
(a) (2 marks) In this context, what is the dierence between a parameter and an estimate?
A parameter is an unobserved but presumed constant value which is part of the model.
In this case, the parameters are
0
,
1
, and
2
(the constant variance of the error term).
An estimate is a numerical value calculated from the observed data to estimate the values
of the parameters.
(b) (3 marks) Which of the model assumptions are also true of the residuals?
The residuals have expectation 0 and are normally distributed (so (2) and (5)).
(c) (6 marks) Suppose we are interested in a particular value of the explanatory variable,
x

. Give the mean and variance of the probability distributions of each of the following
random variables or estimators at x = x

:
i. The model error, e
Mean: 0
Variance:
2
ii. The response variable, Y
Mean:
0
+
1
x

Variance:
2
iii. The estimator of the slope
Mean:
1
Variance:

2
SXX
2
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
(Question 1 continued)
(d) (3 marks) Show that the predicted value of the response variable for the ith observation
is

n
j=1
h
ij
y
j
(where h
ij
is given in the formula on the rst page).
y
i
= b
0
+b
1
x
i
= y (x
i
x)b
1
=
1
n
n

j=1
y
j
(x
i
x)

n
j=1
[(x
j
x)y
j
]
SXX
=
n

j=1
__
1
n

(x
i
x)(x
j
x)
SXX
_
y
j
_
=
n

j=1
[h
ij
y
j
]
(e) (2 marks) Tests and condence intervals for the model parameters use the t-distribution.
Explain why this is the appropriate probability distribution.
Since the error terms are normally distributed, the values of the response variable are
observations from normal distributions. The estimators of the parameters are linear
combinations of the response variables, so they are also normally distributed. When the
estimators are standardized to create test statistics or pivots for the condence intervals,
we divide by the estimate of their standard deviation (their standard error) so the result
follows a t-distribution.
3
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
(Question 1 continued)
(f) (3 marks) For testing the null hypothesis H
0
:
1
= 0 versus the alternative H
a
:
1
= 0,
we have both an F and a t test. Show that the test statistic for the F test is the square
of the test statistic for the t test.
F-test statistic =
MSReg
MSE
=
b
2
1
SXX
s
2
where s
2
is the estimate of the variance of the error terms, s
2
= MSE
=
_
b
1
_
s
2
/SXX
_
2
= square of t test statistic
(g) (3 marks) In class we showed that the estimator of the slope is unbiased. Show that the
estimator of the intercept is also unbiased. (You may use the result shown in class if it
is useful.)
E(

0
) = E(Y

1
x)
= E(Y ) xE(

1
)
=
0
+
1
x x
1
_
E(Y ) =
0
+
1
x since E(Y
i
) =
0
+
1
x
i
_
=
0
4
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0 Item ID: 4656
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
2. In this question we will consider the Old Faithful data from assignment one. We are using
duration of an eruption (in minutes) to predict the time interval to the next eruption (in
minutes).
Some output from SAS is given on pages 8 and 9. Some numbers have been replaced by
letters.
Parts (a) through (k) relate to MODEL 1.
(a) (4 marks) What are the values of the missing numbers:
(A) = 261
(B) = 75233 64229 = 11004
(C) = 64229/75233 = 0.854
(D) = 33.34745/1.20108 = 27.76
(b) (2 marks) Give a practical interpretation of the estimate of the intercept.
Since a duration of 0 is not within the scope of the data (and has no practical meaning),
the intercept has no practical interpretation.
(c) (2 marks) Give a practical interpretation of the estimate of the slope.
For every one minute increase in duration of intercept, the estimated mean of the dis-
tribution of the interval between eruptions increases by 13.3 minutes.
(d) For the test with test statistic = 39.03, state:
i. (1 mark) the null and alternative hypotheses
H
0
:
1
= 0 versus H
a
:
1
= 0
ii. (1 mark) the conclusion
There is very strong evidence that the slope is not 0, so we conclude that there exists
a linear relationship between duration and length of interval between eruptions.
5
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
(Question 2 continued.)
(e) (1 mark) Without calculating the interval, would the 95% condence interval for the
slope include 0? How do you know?
Since the p-value is < 0.0001 < 0.05 for the 2-sided test that the slope is 0, 0 will
not be in a 95% condence interval for the slope.
(f) (2 marks) The scatterplot of the data on page 9 includes a 95% interval about the tted
regression line. Does the plot show prediction intervals or condence intervals for the
mean of the response? How can you tell?
The plot shows prediction intervals. The standard error in a prediction interval includes
error due to the facts that (1) the mean of the response variable is being estimated by the
regression line and (2) according to the model, the distribution of the response variable
varies about the line. As a result, 95% prediction intervals should capture about 95% of
the points, which is true of the intervals in the plot. (The standard error for a condence
interval for the mean of the response only includes the error due to (1); such intervals
attempt to capture the mean value of the response, and not individual observations.)
(g) (2 marks) As a park ranger whose responsibilities include telling the tourists when the
next eruption will take place, would you rather have condence intervals for the mean
of the distributions of interval or prediction intervals? Why?
Prediction intervals as you are trying to estimate the interval to the next eruption for
an individual observation, not the mean value of the distribution of interval to the next
eruption, so you want to include the error to account for the variation of observations
about the line.
(h) (2 marks) For a duration of 2 minutes, calculate the standard error for the interval il-
lustrated in part (f).
x = 3.32668
SXX = 3274.46426 263(3.32668)
2
( or = 262(1.38892)) = 363.8959
s = 6.49336
So the required standard error is
6.49336

1 +
1
263
+
(2 3.32668)
2
363.8959
= 6.52
(i) (2 marks) One of the observations in the data set has a duration of 4.5 minutes and an
interval of 97 minutes. What is the value of the residual for this observation?
y = 33.34745 + 13.28540(4.5) = 93.13
So the residual is 97 93.13 = 3.87
6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
(Question 2 continued.)
(j) (2 marks) Suppose a regression was carried out on the same data, but only using the
observations for which the duration is less than 3 minutes. Would the estimate of the
standard deviation of the slope be larger, smaller, or similar? If you dont have enough
information to answer, indicate why.
s, the estimate of the standard deviation of the error terms, would be similar because the
scatter about the tted line is similar in magnitude.
SXX would be smaller because (1) there are fewer observations and (2) the observed val-
ues of duration (x) are more tightly scattered about their mean.
So the estimate of the standard deviation of the slope (which is s/

SXX) would be larger.


(k) (4 marks) Measures of inuence are given for the last 9 observations. What do you
conclude from Hat Diag H and from DFFITS? Explain fully.
For Hat Diag H = h
ii
, none of the given values are >
4
263
= 0.015 so we conclude
that none of the 9 observations for which we have h
ii
are leverage points. That is, none
of these observations are far from the others in the x-direction.
For DFFITS, the value for the 343
rd
observation is 0.295 > 2
_
2
263
= 0.174 and this value
of DFFITS is quite a bit larger than all of the other values of DFFITS. We conclude that
the 343
rd
observation is inuential. That is, its corresponding predicted value of interval
until the next eruption changes substantially when this observation is removed from the
data.
(l) In MODEL 2, the variable short is equal to 1 for observations with short duration (less
than 3 minutes) and is equal to 0 for observations with long duration.
i. (1 mark) What is the estimated interval for a duration of 2 minutes?
y = 90.51948 31.30847(1) = 59.21
ii. (2 marks) Give a practical conclusion for the test with test statistic = -35.90.
The p-value is < 0.0001 so we have strong evidence that there is a dierence be-
tween the mean interval between eruptions for observations with short duration and
the mean interval between eruptions for observations with long duration.
7
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
SAS output for question 2.
MODEL 1
The REG Procedure
Number of Observations Read 343
Number of Observations Used 263
Number of Observations with Missing Values 80
Descriptive Statistics
Uncorrected Standard
Variable Sum Mean SS Variance Deviation
Intercept 263.00000 1.00000 263.00000 0 0
duration 874.91667 3.32668 3274.46416 1.38892 1.17852
interval 20394 77.54373 1656660 287.14980 16.94549
Dependent Variable: interval
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 64229 64229 1523.31 <.0001
Error (A) (B) 42.16368
Corrected Total 262 75233
Root MSE 6.49336 R-Square (C)
Dependent Mean 77.54373 Adj R-Sq 0.8532
Coeff Var 8.37380
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 33.34745 1.20108 (D) <.0001
duration 1 13.28540 0.34039 39.03 <.0001
Output Statistics
Hat Diag Cov ------DFBETAS-----
Obs Residual RStudent H Ratio DFFITS Intercept duration
335 -9.4891 -1.4683 0.0050 0.9962 -0.1046 0.0187 -0.0520
336 -5.1533 -0.7952 0.0052 1.0081 -0.0577 0.0121 -0.0302
337 -3.0462 -0.4696 0.0049 1.0110 -0.0330 0.0052 -0.0158
338 -6.5681 -1.0148 0.0064 1.0062 -0.0816 -0.0701 0.0521
339 10.7539 1.6678 0.0072 0.9936 0.1417 -0.0571 0.0971
340 -6.2540 -0.9674 0.0090 1.0096 -0.0922 -0.0861 0.0701
341 -2.4675 -0.3808 0.0073 1.0140 -0.0326 0.0134 -0.0225
342 -2.4898 -0.3850 0.0114 1.0181 -0.0413 -0.0397 0.0337
343 24.9610 3.9618 0.0055 0.9012 0.2951 -0.0734 0.1645
8
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6 Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
More SAS output for question 2
MODEL 2
The REG Procedure
Dependent Variable: interval
Number of Observations Read 343
Number of Observations Used 263
Number of Observations with Missing Values 80
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 62563 62563 1288.72 <.0001
Error 261 12671 48.54632
Corrected Total 262 75233
Root MSE 6.96752 R-Square 0.8316
Dependent Mean 77.54373 Adj R-Sq 0.8309
Coeff Var 8.98528
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 90.51948 0.56146 161.22 <.0001
short 1 -31.30847 0.87213 -35.90 <.0001
9
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
I t e m I D : 4 6 5 6
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
D o w n l o a d e r I D : 2 0 7 8 0
Item ID: 4656
Downloader ID: 20780
Powered by TCPDF (www.tcpdf.org)

You might also like