Linear Regression Analysis

CHAPTER 12
Section 12.1
1.
a. Stem and Leaf display of temp:
17 0
17 23 stem = tens
17 445 leaf = ones
17 67
17
18 0000011
18 2222
18 445
18 6
18 8
180 appears to be a typical value for this data. The distribution is reasonably symmetric
in appearance and somewhat bell-shaped. The variation in the data is fairly small since
the range of values (188 170 = 18) is fairly small compared to the typical value of 180.
0 889
1 0000 stem = ones
13 leaf = tenths
1 4444
1 66
1 8889
2 11
2
25
26
2
3 00
For the ratio data, a typical value is around 1.6 and the distribution appears to be
positively skewed. The variation in the data is large since the range of the data (3.08 - .84
= 2.24) is very large compared to the typical value of 1.6. The two largest values could
be outliers.
b. The efficiency ratio is not uniquely determined by temperature since there are several
instances in the data of equal temperatures associated with different efficiency ratios. For
364
Chapter 12: Simple Linear Regression and Correlation
example, the five observations with temperatures of 180 each have different efficiency
ratios.
2
A scatter plot of the data appears below. The points exhibit quite a bit of variation and do not appear to
fall close to any straight line or simple curve.
Ratio:
2
170 180 190

Temp:
2. Scatter plots for the emissions vs age:
7
5
6
4
5
Baseline
Reformul
3 4
3
2
2
1
1
0 0
0 5 10 15 0 5 10 15
Age: Age:
With this data the relationship between the age of the lawn mower and its NOx emissions
seems somewhat dubious. One might have expected to see that as the age of the lawn mower
increased the emissions would also increase. We certainly do not see such a pattern. Age
does not seem to be a particularly useful predictor of NOx emission.
3
A scatter plot of the data appears below. The points fall very close to a straight line with an intercept
of approximately 0 and a slope of about 1. This suggests that the two methods are producing
substantially the same concentration measurements.
220
120
y:
20
50 100 150 200
x:
3.
a.
Box plots of both variables:
BOD mass loading BOD mass removal
0 10 20 30 40 50 60 70 80 90
0 50 100 150
x: y:
On both the BOD mass loading boxplot and the BOD mass removal boxplot there are 2
outliers. Both variables are positively skewed.
4
Scatter plot of the data:
BOD mass loading (x) vs BOD mass removal (y)
90
80
70
60
50
y: 40
30
20
10
0
0 50 100 150
x:
There is a strong linear relationship between BOD mass loading and BOD mass removal.
As the loading increases, so does the removal. The two outliers seen on each of the
boxplots are seen to be correlated here. There is one observation that appears not to
match the liner pattern. This value is (37, 9). One might have expected a larger value for
BOD mass removal.
5.
a. The scatter plot with axes intersecting at (0,0) is shown below.
Temperature (x) vs Elongation (y)
250
200
150
y:
100
50
0 20 40 60 80 100
x:
5
The scatter plot with axes intersecting at (55, 100) is shown below.
Temperature (x) vs Elongation (y)
250
200
y:
150
100
55 65 75 85
x:
b. A parabola appears to provide a good fit to both graphs.
6. There appears to be a linear relationship between racket resonance frequency and sum of
peak-to-peak acceleration. As the resonance frequency increases the sum of peak-to-peak
acceleration tends to decrease. However, there is not a perfect relationship. Variation does
exist. One should also notice that there are two tennis rackets that appear to differ from the
other 21 rackets. Both have very high resonance frequency values. One might investigate if
these rackets differ in other ways as well.
7.
a.
Y 2500 1800 1.3 2500 5050
b. expected change = slope = 1 1.3
expected change = 100 1

c. 130
d. expected change =
100 1 130
6
e.
Y 2000 1800 1.3 2000 4400 , and 350 , so P Y 5000
5000 4400
P Z P Z 1.71 .0436
350
f. Now E(Y) = 5050, so P Y 5000 P Z .14 .5557
g. E (Y2 Y1 ) E (Y2 ) E (Y1 ) 5050 4400 650 , and

V (Y2 Y1 ) V (Y2 ) V (Y1 ) 350 350 245,000 , so the s.d. of
2 2
Y2 Y1 494.97 . Thus
1000 650
P (Y2 Y1 0) P z P Z .71 .2389
494.97
h. The standard deviation of Y2 Y1 494.97 (from c), and

E (Y2 Y1 ) 1800 1.3x 2 1800 1.3 x1 1.3 x 2 x1 . Thus
1.3 x 2 x1
P (Y2 Y1 ) P(Y2 Y1 0) P z .95
494.97 implies that
1.3 x 2 x1
1.645
494.97 , so x 2 x1 626.33 .
8.
a. 1 expected change in flow rate (y) associated with a one inch increase in pressure
drop (x) = .095.
b. We expect flow rate to decrease by 5 1 .475 .
c.
Y 10 .12 .09510 .83, and Y 15 .12 .09515 1.305 .
.835 .830
P Y .835 P Z P Z .20 .4207
d. .025
.840 .830
P Y .840 P Z P Z .40 .3446
.025
e. Let Y1 and Y2 denote pressure drops for flow rates of 10 and 11, respectively. Then
Y 11 .925, so Y1 - Y2 has expected value .830 - .925 = -.095, and s.d.
7
.025 2 .025 2
.035355 . Thus
.095
P (Y1 Y2 ) P(Y1 Y2 0) P z P Z 2.69 .0036
.035355
9. Y has expected value 5000 when x = 100 and 6000 when x = 200, so the two probabilities
500 500
P z .05 P z .10
become and . These two equations are
contradictory, so the answer to the question posed is no.
10.
a. 1 expected change for a one degree increase = -.01, and 10 1 .1 is the
expected change for a 10 degree increase.
b.
Y 200 5.00 .01 200 3 , and Y 250 2.5 .
c. The probability that the first observation is between 2.4 and 2.6 is
2.4 2.5 2.6 2.5
P 2.4 Y 2.6 P Z
.075 .075
P 1.33 Z 1.33 .8164 . The probability that any particular one of the other
four observations is between 2.4 and 2.6 is also .8164, so the probability that all five are
between 2.4 and 2.6 is

.8164 5 .3627 .
d. Let Y1 and Y2 denote the times at the higher and lower temperatures, respectively. Then
Y1 - Y2 has expected value 5.00 .01 x 1 5.00 .01x .01 . The standard
deviation of Y1 - Y2 is
.075 2 .075 2 .10607 . Thus
.01
P (Y1 Y2 0) P z P Z .09 .4641
.10607 .
8
Section 12.2
11.
S xx 39,095
517
2
20,002.929
a. 14 ,
S xy 25,825
517 346 13047.714 1
S xy 13,047.714
.652
14 ;
S xx 20,002.929 ;
y 1x 346 .652 517
0 .626
n 14 , so the equation of the least squares
regression line is y .626 .652 x .
y 35 .626 .652 35 23.456

b. . The residual is
y y 21 23.456 2.456 .
S yy 17,454
346
2
8902.857
c. 14 , so
SSE 8902.857 .65213047.714 395.747 .
SSE 395.747
5.743
n2 12 .
SSE 395.747
SST S yy 8902.857 r 1 SST 1 8902.857 .956
2
d. ; .
e. Without the two upper extreme observations, the new summary values are
n 12, x 272, x 2 8322, y 181, y 2 3729, xy 5320 . The new
S xx 2156.667, S yy 998.917, S xy 1217.333 .56445
. New 1 and
0 2.2891 , which yields the new equation y 2.2891 .56445 x . Removing
the two values changes the position of the line considerably, and the slope slightly. The
311.79
r2 1 .6879
new 998.917 , which is much worse than that of the original set of
observations.
9
x i 200 y i 5 . 37 x 2i 12.000 y 2i 9.3501 xi y i 333

For this data, n = 4, , , , , .
S xx 12,000
200 2000 S 9.3501 5.37 2.140875
2 2
yy
4 , 4 , and
S xy 333
200 5.37 64.5 1 xy S 64 . 5
.03225
4 .
S xx 2000 and
5.37 200
0 .03225 .27000
4 4 .
SSE S yy 1 S xy 2.14085 .03225 64.5 .060750
.
SSE .060750
r2 1 1 .972 2
SST 2.14085 . This is a very high value of r , which confirms the
authors claim that there is a strong linear relationship between the two variables.
12.
x i 4308 y i 40. 09 x 2i 773,790 y 2i 76.8823
a. n = 24, , , , ,
S xx 773,790
4308 2 504.0
xi y i 7,243.65 24
. ,
S yy 76.8823
40.09 2
9.9153
24 , and
S xy 7,243.65
4308 40.09 45.8246 1
S xy 45.8246
.09092
24 . S xx 504 and
40.09 4308
0 .09092 14.6497
24 24 . The equation of the estimated regression
line is y 14 . 6497 .09092 x .

b. When x = 182, y 14.6497 .09092 182 1.8997 . So when the tank

temperature is 182, we would predict an efficiency ratio of 1.8997.
c. The four observations for which temperature is 182 are: (182, .90), (182, 1.81), (182,
.90 1.8997 0.9977 ,
1.94), and (182, 2.68). Their corresponding residuals are:
1.81 1.8997 0.0877 , 1.94 1.8997 0.0423 , 2.68 1.8997 0.7823 .
These residuals do not all have the same sign because in the cases of the first two pairs of
observations, the observed efficiency ratios were smaller than the predicted value of
1.8997. Whereas, in the cases of the last two pairs of observations, the observed
efficiency ratios were larger than the predicted value.
10
SSE S yy 1 S xy 9.9153 .09092 45.8246 5.7489

d. .
SSE 5.7489
r2 1 1 .4202
SST 9.9153 . (42.02% of the observed variation in
efficiency ratio can be attributed to the approximate linear relationship between the
efficiency ratio and the tank temperature.)
13.
a. The following stem and leaf display shows that: a typical value for this data is a number
in the low 40s. There is some positive skew in the data. There are some potential outliers
(79.5 and 80.0), and there is a reasonably large amount of variation in the data (e.g., the
spread 80.0-29.8 = 50.2 is large compared with the typical values in the low 40s).
29
3 33 stem = tens
3 5566677889 leaf = ones
4 1223
4 56689
51
5
62
69
7
79
80
b. No, the strength values are not uniquely determined by the MoE values. For example,
note that the two pairs of observations having strength values of 42.8 have different MoE
values.
c. The least squares line is y 3.2925 .10748 x . For a beam whose modulus of

elasticity is x = 40, the predicted strength would be y 3.2925 .10748 40 7.59

. The value x = 100 is far beyond the range of the x values in the data, so it would be
dangerous (i.e., potentially misleading) to extrapolate the linear relationship that far.
d. From the output, SSE = 18.736, SST = 71.605, and the coefficient of determination is r2 =
.738 (or 73.8%). The r2 value is large, which suggests that the linear relationship is a
useful approximation to the true relationship between these two variables.
11
e.
Rainfall volume (x) vs Runoff volume (y)
100
90
80
70
60
50
y:
40
30
20
10
0
0 50 100
x:
Yes, the scatterplot shows a strong linear relationship between rainfall volume and runoff
volume, thus it supports the use of the simple linear regression model.
S xx 63040
798 2 20,586.4
f. x 53.200 , y 42.867 , 15 ,
S yy 41,999
643 2
14,435.7
15 , and
S xy 51,232
798 643
17,024.4 1
S xy 17,024.4
.82697
15 .
S xx 20,586. 4 and
0 42.867 .82697 53.2 1.1278 .

y50 1.1278 .82697 50 40.2207

g. .
SSE S yy 1 S xy 14,435.7 .82697 17,324.4 357.07

h. .
SSE 357.07
s 5.24
n2 13 .
SSE 357.07
r2 1 1 .9753
i. SST 14,435.7 . So 97.53% of the observed variation in
runoff volume can be attributed to the simple linear regression relationship between
runoff and rainfall.
12
Note: n = 23 in this study.

j. For a one (mg/cm2) increase in dissolved material, one would expect a .144 (g/l) increase
in calcium content. Secondly, 86% of the observed variation in calcium content can be
attributed to the simple linear regression relationship between calcium content and
dissolved material.
y50 3.678 .144 50 10.878

k.
SSE
r 2 .86 1
l. SST , so SSE SST 1 .86 320.398 .14 44.85572 .
SSE 44.85572
s 1.46
Then n2 21
14.
15 987.645 142510.68 404.3250
1 .00736023
15139,037.25 1425
2
a.
54,933.7500
10.68 .007360231425
0 1.41122185
15 , y 1.4112 .007360 x .
b. 1 .00736023
9
y 0 1 x 32
c. With x now denoting temperature in C , 5
9

0 32 1 1 x 1.175695 .0132484 x
5 , so the new 1 is -.0132484 and
1.175695
the new 0 .
d. Using the equation of a, predicted

y 0 1 200 .0608 , but the deflection
factor cannot be negative.
13
x i 3300 y i 5010 x i2 913,750 y 2i 2,207,100 xi y i 1,413,500

N = 14, , , , ,
3, 256 , 000
1 1.71143233
e. 1,902,500 45.55190543 , so we use the equation
, 0
y 45.5519 1.7114 x .
f.
Y 225 45.5519 1.7114 225 339.51
g. Estimated expected change 50 1 85.57
h. No, the value 500 is outside the range of x values for which observations were available
(the danger of extrapolation).
15.
a.
0 .3651 , 1 .9668
b. .8485
c. .1932
d. SST = 1.4533, 71.7% of this variation can be explained by the model. Note:
SSR 1.0427
.717
SST 1.4533 which matches R-squared on output.
16.
a. Yes a scatter plot of the data shows a strong, linear pattern, and r2 = 98.5%.
b. From the output, the estimated regression line is = 321.878 + 156.711x, where x =
absorbance and y = resistance angle. For x = .300, = 321.878 + 156.711(.300) =
368.89.
c. The estimated regression line serves as an estimate both for a single y at a given x-value
and for the true average y at a given x-value. Hence, our estimate for y when x = .300 is
also 368.89.
14
404.325
1 .00736023 1.41122185
d. 54.933.75 , 0 ,
SSE 7.8518 1.41122185 10.68 .00736023 987.645 .049245 ,
.049245
s2 .003788
13 , and s .06155
SST 7.8518
10.68 2 .24764 r2 1
.049245
1 .199 .801
e. 15 so .24764
17.
Using the given yis and the formula y i 45.5519 1.7114 x i ,

a.
SSE 150 125.6 2 ... 670 639.0 2 16,213.64 . The computation formula gives
SSE 2,207,100 45.55190543 5010 1.71143233 1,413,500 16,205.45
SST 2,207,100
5010 2 414,235.71 r2 1
16,205.45
.961
b. 14 so 414,235.71 .
18.
a.
1200
700
y
200
0 50 100
x
According to the scatter plot of the data, a simple linear regression model does appear to
be plausible.
b. The regression equation is y = 138 + 9.31 x
c. The desired value is the coefficient of determination, r 2 99.0% .
d. The new equation is y* = 190 + 7.55 x*. This new equation appears to differ
significantly. If we were to predict a value of y* for x* = 50, the value would be 567.9,
where using the original data, the predicted value for x = 50 would be 603.5.
15
y 1xi
0 i
19. Substitution of n and 1 for bo and b1 on the left hand side of the normal

n y i 1xi
1xi y i
equations yields n from the first equation and

xi y i 1xi
1x i
2

xi y i 1 nx i xi
2 2

n n n
xi y i nxi y i xi y i
xi y i
n n n from the second equation.
x is substituted for x in 0 1 x , y results, so that x, y is on the

20. We show that when
y i 1xi
x x 1 x y 1 x 1 x y
line
y 0 1 :
0 1
n .
We wish to find b1 to minimize f (b1 ) y i b1 x i . Equating f b1 to 0 yields

2

21.
x y
b1 i 2 i
2 y i b1 x i x i 0 so x i y i b1 x i and
2
x i . The least squares estimator of 1
x Y
1 i 2 i
xi
is thus .
22.
x
a. Subtracting x from each i shifts the plot in a rigid fashion x units to the left without
otherwise altering its character. The last squares line for the new plot will thus have the
same slope as the one for the old plot. Since the new line is x units to the left of the old
one, the new y intercept (height at x = 0) is the height of the old line at x = x , which is
0 1 x y (since from exercise 26, x, y is on the old line). Thus the new y
intercept is y .
f
y i b0 b1 xi x
2
b. We wish b0 and b1 to minimize f(b0, b1) = . Equating
b0
f
b1 to 0 yields nb0 b1 xi x y i , b0 xi x b1 xi x
2
to
xi x xi x y i . Since xi x 0 , b0 y , and since
2
16
xi x y i xi x y i y [ because xi x y y xi x ], b1 1 .
Thus 0
* Y and 1* 1 .
23. For data set #1, r2 = .43 and s 4.03 ; whereas these quantities are .99 and 4.03 for #2,
and .99 and 1.90 for #3. In general, one hopes for both large r2 (large % of variation
explained) and small s (indicating that observations dont deviate much from the estimated
line). Simple linear regression would thus seem to be most effective in the third situation.
Section 12.3
24.
V
350 .0175
2
xi x 7,000,000 , so
2 1
a. 7,000,000 and the standard

deviation of 1 is .0175 .1323 .

P 1.0 1 1.5 P
1.0 1.25
Z
1.5 1.25

b. .1323 .1323
P 1.89 Z 1.89 .9412 .
xi x 1,100,000 now, which is smaller

2
c. Although n = 11 here and n = 7 in a,

than in a. Because this appears in the denominator of V 1 , the variance is smaller for
the choice of x values in a.
25.
a. 1 .00736023 , 0 1.41122185 , so
SSE 7.8518 1.41122185 10.68 .00736023 987.645 .04925 ,
s2 .003788
2 .00000103
x xi / n
2
s 2 .003788 , s .06155 .
1 2
3662.25
i ,
s 1 .00000103 .001017 .
1 1
estimated s.d. of
b. .00736 2.160 .001017 .00736 .00220 .00956,.00516
17
Let 1 denote the true average change in runoff for each 1 m3 increase in rainfall. To test the
.82697
t 1 22.64
H o : 1 0 H a : 1 0 s .03652
hypotheses vs. , the calculated t statistic is 1
which (from the printout) has an associated p-value of P = 0.000. Therefore, since the p-value is so
small, Ho is rejected and we conclude that there is a useful linear relationship between runoff and
rainfall.
A confidence interval for 1 is based on n 2 = 15 2 = 13 degrees of freedom.
t.025,13 2.160
, so the interval estimate is
1 t.025,13 s .82697 2.160 .03652 .748,.906
1
. Therefore, we can be
confident that the true average change in runoff, for each 1 m3 increase in rainfall, is
somewhere between .748 m3 and .906 m3.
26.
t.025, 25 2.060
a. Error d.f. = n 2 = 25, , and so the desired confidence interval is
1 t .025, 25 s .10748 2.060 .01280 .081,.134
1
. We are 95%
confident that the true average change in strength associated with a 1 GPa increase in
modulus of elasticity is between .081 MPa and .134 MPa.
H o : 1 .1 vs. H a : 1 .1 . The calculated t statistic is

b. We wish to test
.1 .10748 .1
t 1 .58
s .01280
1 , which yields a p-value of .277 at 25 d.f. Thus, we
fail to reject Ho; i.e., there is not enough evidence to contradict the prior belief.
27.
H o : 1 0 ; H a : 1 0
a.
t t / 2, n 2 t 3.106
RR: or
t 5.29 : Reject Ho. The slope differs significantly from 0, and the model appears to
be useful.
b. At the level 0.01 , reject Ho if the p-value is less than 0.01. In this case, the
reported p-value was 0.000, therefore reject Ho. The conclusion is the same as that of
part a.
H o : 1 1.5 ; H a : 1 1.5
c.
t t , n 2
RR: or t 2.718
18
0.9668 1.5
t 2.92
0.1829 : Reject Ho. The data contradict the prior belief.
28.
1 t .025,15 s
a. We want a 95% CI for (1: 1
. First, we need our point estimate, 1 .
S xx 3056.69
222.1 2 155.019
Using the given summary statistics, 17 ,
S xy 2759.6
222.1193 238.112 1
S xy

238 . 112
1.536
17 S xx 115 .019
, and . We need
0
193 1 . 536 222 . 1 8.715
17 to calculate the SSE:
418.2494
s 5.28
SSE 2975 8.715193 1.536 2759.6 418.2494 . Then 15
5.28
s .424
1
155.019 t 2.131,
and . With .025,15 our CI is
1.536 2.131 .424 = ( .632, 2.440). With 95% confidence, we estimate that the
change in reported nausea percentage for every one-unit change in motion sickness dose
is between .632 and 2.440.
H o : 1 0 vs H a : 1 0 , and the test statistic is

b. We test the hypotheses
1.536
t 3.6226
.424 . With df=15, the two-tailed p-value = 2P( t > 3.6226) = 2( .001)
= .002. With a p-value of .002, we would reject the null hypothesis at most reasonable
significance levels. This suggests that there is a useful linear relationship between
motion sickness dose and reported nausea.
c. No. A regression model is only useful for estimating values of nausea % when using
dosages between 6.0 and 17.6 the range of values sampled.
xi 216.1 ,
d. Removing the point (6.0, 2.50), the new summary stats are: n = 16,
y i 191.5 , x i 3020.69 , y i 2968.75 , xi y i 2744.6 , and then
2 2
1 1.561 , 0 9.118 , SSE = 430.5264, s 5.55 , s 1 .551 , and the new CI

is 1.561 2.145 .551 , or ( .379, 2.743). The interval is a little wider. But removing
the one observation did not change it that much. The observation does not seem to be
exerting undue influence.
19
e. A scatter plot, generated by Minitab, supports the decision to use linear regression
analysis.
1.0
0.9
0.8
mist droplets
0.7
0.6
0.5
0.4
0 100 200 300 400 500 600 700 800 900 1000
fluid flow velocity
f. We are asked for the coefficient of determination, r2. From the Minitab output, r2 = .931
(which is close to the hand calculated value, the difference being accounted for by round-
off error.)
g. Increasing x from 100 to 1000 means an increase of 900. If, as a result, the average y
.6
.0006667
were to increase by .6, the slope would be 900 . We should test the
hypotheses o
H : 1 . 0006667 vs.
H a : 1 . 0006667 . The test statistic is
.00062108 .0006667
t .601
.00007579 , which is not significant. There is not
sufficient evidence that with an increase from 100 to 1000, the true average increase in y
is less than .6.
1 . Using the values from the Minitab

h. We are asked for a confidence interval for

output, we have .00062108 2.776 .00007579 (.00041069,.00083147)
20
i. Let d = the true mean difference in velocity between the two planes. We have 23 pairs of
data that we will use to test H : = 0 v. H : 0. From software, x d = 0.2913 with s =
0 d a d d
0.2913 0
0.1748, and so t = 0.1748 8, which has a two-sided P-value of 0.000 at 22 df.
Hence, we strongly reject the null hypothesis and conclude there is a statistically
significant difference in true average velocity in the two planes. [Note: A normal
probability plot of the differences shows one mild outlier, so we have slight concern
about the results of the t procedure.]
j. Let 1 denote the true slope for the linear relationship between Level velocity and
Level velocity. We wish to test H0: 1 = 1 v. Ha: 1 < 1. Using the relevant numbers
b1 1 0.65393 1

s (b1 ) 0.05947
provided, t = = 5.8, which has a one-sided P-value at 232 = 21 df
of P(t < 5.8) 0. Hence, we strongly reject the null hypothesis and conclude the same as
the authors; i.e., the true slope of this regression relationship is significantly less than 1.
29.
a. From Exercise 23, which also refers to Exercise 19, SSE = 16.205.45, so
36.75
s .0997
s 1350.454 , s 36.75 , and 1
2
368.636 . Thus
1.711
t 17.2 4.318 t .0005,14
.0997 , so p-value < .001. Because the p-value < .01,
H o : 1 0 is rejected at level .01 in favor of the conclusion that the model is useful
1 0 .
b. The C.I. for 1 is 1.711 2.179 .0997 1.711 .217 1.494,1.928 . Thus

the C.I. for 10 1 is 14.94,19.28 .

30. SSE = 124,039.58 (72.958547)(1574.8) (.04103377)(222657.88) = 7.9679, and SST =

39.828
Source df SS MS f
Regr 1 31.860 31.860 18.0
Error 18 7.968 1.77
Total 19 39.828
21
F.001,1,18 15.38 18.0

H o : 1 0 is rejected and the
Lets use ( = .001. Then , so
S 18,921.8295 , so
model is judged useful. s 1.77 1.33041347 , xx
.04103377
t 4.2426
1.33041347 / 18,921.8295 2
, and t 4.2426 18.0 f .
2

E 0

E y i 1xi

31. We use the fact that 1 is unbiased for 1 . n
E y i E Yi

n

E 1 x
n
1 x
0 1 xi
1 x 0 1 x 1 x 0
n
32.
E (Yi ) 0 1 xi and, hence, E (Y ) 0 1 x .
a. Under the regression model,
E (Yi Y ) 1 ( xi x ) , and
Therefore,
( xi x )(Yi Y ) ( xi x ) E[Yi Y ]

E 1 E
( xi x )
2
( xi x ) 2

( xi x ) 1( xi x )
1
( xi x ) 2
1
( xi x ) 2 ( xi x ) 2 .
xi x Yi Y xi x Yi
1 1
2 1
b. With
c x i x , c c (since

1
2
xi x Y Y xi x Y 0 0 ), so V 1 c 2 xi x Var Yi
1 2 2
x x 2
2

x i x xi2 xi / n , as desired.
i
c2 2 2
xi2 xi / n
2

t 1
33. s . The numerator of 1 will be changed by the factor cd (since
xi y i and xi y i appear) while the denominator of 1 will change by the factor
both
d
xi2 and xi appear). Thus 1 will change by the factor c . Because
2
c2 (since both
22
SSE y i y i , SSE will change by the factor d2, so s will change by the factor d.
2
d c
1
Since in t changes by the factor c, t itself will change by c d , or not at all.
4 14
.831
34. The numerator of d is |1 2| = 1, and the denominator is 324.40 , so
1
d 1.20
.831 . The approximate power curve is for n 2 df = 13, and is read from
Table A.17 as approximately .1.
Section 12.4
35.
a. The mean of the x data in Exercise 12.15 is x 45.11 . Since x = 40 is closer to 45.11
than is x = 60, the quantity

40 x 2 must be smaller than 60 x 2 . Therefore, since
s y s y
these quantities are the only ones that are different in the two values, the value for
s y
x = 40 must necessarily be smaller than the for x = 60. Said briefly, the closer x is to
x , the smaller the value of s y .
b. From the printout in Exercise 12.15, the error degrees of freedom is df = 25.
t.025, 25 2.060
, so the interval estimate when x = 40 is : 7.592 2.060 .179
7.592 .369 7.223,7.961 . We estimate, with a high degree of confidence, that
the true average strength for all beams whose MoE is 40 GPa is between 7.223 MPa and
7.961 MPa.
c. From the printout in Exercise 12.15, s = .8657, so the 95% prediction interval is
y t .025, 25 s 2 s y2 7.592 2.060 .8657 .179
2 2
7.592 1.821 5.771,9.413 . Note that the prediction interval is almost 5 times
as wide as the confidence interval.
d. For two 95% intervals, the simultaneous confidence level is at least 100(1 2(.05)) =
90%
36.
23
y125 y125 78.088 t.05,18 1.734

a. We wish to find a 90% CI for : , , and
1 125 140.895
2
s y s .1674
20 18,921.8295 .Putting it together, we get
78.088 1.734 .1674 77.797, 78.378
b. We want a 90% PI: Only the standard error changes:

1 125 140.895
2
s y s 1 .6860
20 18,921.8295 , so the PI is
78.088 1.734 .6860 76.898, 79.277
c. Because the x* of 115 is farther away from x than the previous value, the term
x
x 2
will be larger, making the standard error larger, and thus the width of the
interval is wider.
d. We would be testing to see if the filtration rate were 125 kg-DS/m/h, would the average
moisture content of the compressed pellets be less than 80%. The test statistic is
78.088 80
t 11.42
.1674 , and with 18 df the p-value is P(t<-11.42) 0.00. We
would reject Ho. There is significant evidence to prove that the true average moisture
content when filtration rate is 125 is less than 80%.
37.
a. A 95% CI for
Y 500 : y 500 .311 .00143 500 .40 and
1 500 471.54
2
s y 500 .131 .03775
13 131,519.23 , so the interval is
y 500 t.025,11 S y 500 .40 2.210 .03775 .40 .08 .32,.48
b. The width at x = 400 will be wider than that of x = 500 because x = 400 is farther away
from the mean ( x 471.54 ).
c. A 95% CI for 1 :
1 t.025,11 s .00143 2.201 .0003602 .000637,.002223
1
H 0 : y 400 .25 H 0 : y 400 .25

d. We wish to test vs. . The test statistic is
y 400 .25
t
s y 400 t t .025,11 2.201
, and we reject Ho if .
24
y 400 .311 .00143 400 .2614

and
1 400 471.54
2
s y 400 .131 .0445
13 131,519.23 , so the calculated
.2614 .25
t .2561
.0445 , which is not 2.201 , so we do not reject Ho. This sample
data does not contradict the prior belief.
25
y 40 1.128 .82697 40 31.95 t.025,13 2.160

e. , ; a 95% PI for runoff is
31.95 2.160 5.24 2 1.44 2 31.95 11.74 20.21,43.69
. No, the resulting
interval is very wide, therefore the available information is not very precise.
f. x 798, x 2 63,040 which gives S xx 20,586.4 , which in turn gives
1 50 53.20 2
s y 50 5.24 1.358
15 20,586.4
, so the PI for runoff when x = 50 is
40.22 2.160 5.24 1.358 40.22 11.69 28.53,51.92
2 2
. The simultaneous
prediction level for the two intervals is at least 100 1 2 % 90 % .
38.
a. Yes, the scatter plot shows a reasonably linear relationship between percent of total
suspended solids removed (y) and amount removed (x).
Scatterplot of y vs x
60
50
40
30
y
20
10
0
0 50 100 150 200 250
x
b. Using software, = 52.63 0.2204x. These coefficients can also be obtained from the
summary quantities provided.
2081.43
c. Using software, r = 2969
2.32 = 70.1%.
b1 0 0.2204

s (b1 ) 0.0509
d. We wish to test H0: 1 = 0 v. Ha: 1 0. Using software, t = = 4.33,
which has a 2-sided P-value at 102 = 8 df of P(|t| > 4.33) 0.003 < .05. Hence, we reject
H0 at the = .05 level and conclude that a statistically useful linear relationship exists
between x and y.
e. The hypothesized slope is (2-unit decrease in mean y)/(10-unit increase in x) = 0.2.

Specifically, we wish to test H0: 1 = 0.2 v. Ha: 1 < 0.2. The revised test statistic is t =
b1 (0.2) 0.0204

s (b1 ) 0.0509
= 0.4, and the corresponding P-value at 8 df is P(t < 0.4) = P(t
> 0.4) = .350 from Table A.8. Hence, we fail to reject H0 at = .05: the data do not
26
provide significant evidence that the true average decrease in y associated with a 10kL
increase in amount filtered exceeds 2%.
f. Plug in x = 100kL into software; the resulting interval estimate for y.100 is (22.36, 38.82).
We are 95% confident that the true average % of total suspended solids removed when
100,000 L are filtered is between 22.36% and 38.82%. Since x = 100 is nearer the
average than x = 200, a CI for y.200 will be wider.
g. Again use software; the resulting PI for Y is (4.94, 56.24). We are 95% confident that the
total suspended solids removed for one sample of 100,000 L filtered is between 4.94%
and 56.24%. This interval is very wide, much wider than the CI in part f. However, this
PI is narrower than a PI at x = 200, since x = 200 is farther from the mean than x = 100.
39. 95% CI: (462.1, 597.7); midpoint = 529.9; .025,8

t
;

2.306 529.9 2.306 s 0 1 15 597.7
s 15 29.402
( 0 1 ( 99% CI = 529.9 3.355 29.402 431.3,628.5
40.
a. Use software to find s(b1) = 0.065. A 95% CI for 1 is b1 t.025,11s(b1) = 0.433
2.201(0.065) = (0.576, 0.290). We want the effect of a .1-unit change in x, i.e. 0.11;
the desired CI is just (0.058, 0.029). We are 95% confident that the decrease in mean
Fermi level position associated with a 0.1 increase in Ge concentration is between 0.029
and 0.058.
b. Using software, a 95% CI for y.0.50 is (0.4566, 0.5542). We are 95% confident that the
mean Fermi position level when Ge concentration equals .50 is between 0.4566 and
0.5542.
c. Again using software, a 95% PI for Y when x = 0.50 is (0.3359, 0.6749). We are 95%
confident that the Fermi position level for a single observation to be made at 0.50 Ge
concentration will be between 0.3359 and 0.6749. As always, this prediction interval is
markedly wider than the corresponding CI.
d. To obtain simultaneous confidence of at least 97% for the three intervals, we compute
each one using confidence level 99%. Using software, the intervals are:
for x = .3, (0.3450, 0.8389);
for x = .5, (0.2662, 0.7446);
for x = .7, (0.1807, 0.6570).
41.
a. 0.40 is closer to x.
b.

0 1 0.40 t / 2,n 2 s 0.40
0 1
or 0.8104 2.101 0.0311
0.745,0.876
27
0 1 1.20 t / 2,n 2 s 2 s 2 0 1 1.20

c. or
0.2912 2.101 0.1049 2 0.0352 2 .059,.523
28
H o : 1 0 vs H a : 1 0 . The test statistic

d. We wish to test
10.6026
t 10.62
.9985 leads to a p-value of < .006 ( 2P( t > 4.0 ) from the 7 df row of
table A.8), and Ho is rejected since the p-value is smaller than any reasonable . The
data suggests that this model does specify a useful relationship between chlorine flow and
etch rate.
e.

A 95% confidence interval for 1 : 10.6026 2.365 .9985 8.24,12.96 . We

can be highly confident that when the flow rate is increased by 1 SCCM, the associated
expected change in etch rate will be between 824 and 1296 A/min.
1 9 3.0 2.667
2
38.256 2.365 2.546
9 58.50
f. A 95% CI for Y 3.0 :
38.256 2.365 2.546 .35805 38.256 2.156 36.100,40.412 , or
3610.0 to 4041.2 A/min.
1 9 3.0 2.667
2
38.256 2.365 2.546 1
9 58.50
g. The 95% PI is
38.256 2.365 2.546 1.06 38.256 6.398 31.859,44.655 , or
3185.9 to 4465.5 A/min.
h. The intervals for x* = 2.5 will be narrower than those above because 2.5 is closer to the
mean than is 3.0.
i. No. A value of 6.0 is not in the range of observed x values, therefore predicting at that
point is meaningless.
42. Choice a will be the smallest, with d being largest. a is less than b and c (obviously), and b
and c are both smaller than d. Nothing can be said about the relationship between b and c.
29
a. There is a linear pattern in the scatter plot, although the pot also shows a reasonable
amount of variation about any straight line fit to the data. The simple linear regression
model provides a sensible starting point for a formal analysis.
1.060
t 4.39
H : 1 0 vs H a : 1 0 , Minitab calculates
b. In testing o .241 with a
corresponding P-value of .001. Hence, Ho is rejected. The simple linear regression model
does appear to specify a useful relationship.
c.

A confidence interval for 0 1 75 is requested. The interval is centered at
1 n 75 x 2
s s 14.83
0 1 75 n nxi2 xi 2

0 1 75 435.9 . (using s = 54.803).
Thus a 95% CI is 435.9 2.17914.83 403.6,468.2 .
1 x x xi x Yi
0 1 x Y 1 x 1 x Y x x 1
n Y i
S XX
d Y
i i
43. where
1 x x xi x
di
n S XX
. Thus,

Var 0 1 x d Var Y d
i
2
i
2
i
2
1 ( x x )( xi x ) ( x x ) 2 ( xi x ) 2
2 n 2
2
nS xx

nS 2

XX
1
2 n 2 2

( x x ) ( xi x ) ( x x )

2

( xi x ) 2

n nS xx S 2XX

1 ( x x ) 0 ( x x ) S XX
2 1 (x x) 2
2 2 2
2
n nS xx S XX n S XX
44.
a. Yes: a normal probability plot of yield load (not shown) is quite linear.
b. From software, y = 498.2 and sy = 103.6. Hence, a 95% CI for y is 498.2 t.025,14(103.6)
= 498.2 (2.145)(103.6) = (440.83, 555.57). We are 95% confident that the true average
yield load is between 440.83 N and 555.57 N.
c. Yes: the t-statistic and P-value associated with the hypotheses H0: 1 = 0 v. Ha: 1 0 are t
= 3.88 and P = 0.002, respectively. At any reasonable significance level, we reject H0 and
conclude that a useful linear relationship exists between yield load and torque.
d. Yes: prediction intervals based upon this data will be too wide to be useful. For example,
a PI for Y when x = 2.0 is (345.4, 672.4), using software. This interval estimate includes
the entire range of Y-values in the data.
30
Section 12.5
45. Most people acquire a license as soon as they become eligible. If, for example, the minimum
age for obtaining a license is 16, then the time since acquiring a license, y, is usually related to
age by the equation y x 16 , which is the equation of a straight line. In other words, the
majority of people in a sample will have y values that closely follow the line y x 16 .
46.
a. Summary values: x 44,615 , x 2 170,355,425 , y 3,860 ,

y 2 1,284,450 , xy 14,755,500 , n 12 . Using these values we calculate
S xx 4,480,572.92 S yy 42,816.67 S xy 404,391.67
, , and . So
S xy
r .9233
S xx S yy
.
b. The value of r does not depend on which of the two variables is labeled as the x variable.
Thus, had we let x = RBOT time and y = TOST time, the value of r would have remained
the same.
c. The value of r does no depend on the unit of measure for either variable. Thus, had we
expressed RBOT time in hours instead of minutes, the value of r would have remained
the same.
31
Normal Probability Plot
.999
.99
.95
Probability
.80
.50
.20
.05
.01
.001
2800 3800 4800

TOST:
Av erage: 3717.92 Anderson-Darling Normality Test
StDev : 638.220 A-Squared: 0.197
N: 12 P-Value: 0.856
Normal Probability Plot
.999
.99
.95
Probability
.80
.50
.20
.05
.01
.001
200 300 400

RBOT:
Av erage: 321.667 Anderson-Darling Normality Test
StDev : 62.3893 A-Squared: 0.446
N: 12 P-Value: 0.232
Both TOST time and ROBT time appear to have come from normally distributed
populations.
r n2
t
d. H 0 : 0 vs H a : 0 . 1 r 2 ; Reject Ho at level .05 if either
t t.025,10 2.228
or t 2.228 . r = .923, t = 7.58, so Ho should be rejected. The
model is useful.
32
S xx 251,970
1950
2
40,720 S yy 130.6074
47.92
2
3.033711
e. 18 , 18 ,
S xy 5530.92
1950 47.92
339.586667
and 18 , so
339.586667
r .9662
40,720 3.033711 . There is a very strong positive correlation
between the two variables.
f. Because the association between the variables is positive, the specimen with the larger
shear force will tend to have a larger percent dry fiber weight.
g. Changing the units of measurement on either (or both) variables will have no effect on
the calculated value of r, because any change in units will affect both the numerator and
denominator of r by exactly the same multiplicative constant.
r 2 .966 .933
2
h.
r n2
t
i. H 0 : 0 vs H a : 0 . 1 r 2 ; Reject Ho at level .01 if t t.01,16 2.583 .
.966 16
t 14.94 2.583
1 .966 2 , so Ho should be rejected . The data indicates a
positive linear relationship between the two variables.
r n2
t
47. H 0 : 0 vs H a : 0 . 1 r 2 ; Reject Ho at level .01 if either
t t.005, 22 2.819
or t 2.819 . r = .5778, t = 3.32, so Ho should be rejected. There
appears to be a non-zero correlation in the population.
48.
a. We are testing H 0 : 0 vs H a : 0 .
7377.704 .7482 12
r .7482 t 3.9066
36.9839 2,628,930.359 , and 1 .7482 2 . We
t 3.9066 t .05,12 1.782
reject Ho since . There is evidence that a positive
correlation exists between maximum lactate level and muscular endurance.
b. We are looking for r2, the coefficient of determination. r2 = (.7482)2 = .5598. It is the
same no matter which variable is the predictor.
33
34
H 0 : 0 vs H a : 0 , Reject H if; Reject H at level .05 if either

c. o o
t
r n2

.449 12 1.74
t t.025,12 2.179 1 r2 1 .449
2
or t 2.179 . . Hence
we fail to reject Ho; the data does not suggest that the population correlation coefficient
differs significantly from 0.
d.
.449 2 .20 , so 20 percent of the observed variation in gas porosity can be accounted
for by its linear relationship to hydrogen content.
xi 111 .71, xi2 2,724.7643, y i 2.9, y i2 1.6572 , and

49. n = 6,
xi y i 63.915 .
r
6 63.915 111 .71 2.9 .7729
6 2,724.7943 111 .73 2 61.6572 2.9 2 . H 0 : 0 vs
t
.7729 4 2.436
H a : 0 ; Reject H at level .05 if t t .025, 4 2.776 . 1 .7729
2
o .
Fail to reject Ho. The data does not indicate that the population correlation coefficient differs
from 0. This result may seem surprising due to the relatively large size of r (.77), however, it
can be attributed to a small sample size (6).
757.6423
r .5730
50.
3756.96 465.34
v .5 ln
.427 1.645 .995,.309
.652 .652
a. 1.573 , so (12.11) is 23 , and
the desired interval for is .7595,.2995 .
b. z .652 .549 23 .49 , so Ho cannot be rejected at any reasonable level.
c. r 2 .328
d. Again, r 2 .328
35
e. Although the normal probability plot of the xs appears somewhat curved, such a pattern
is not terribly unusual when n is small; the test of normality presented in section 14.2 (p.
651) does not reject the hypothesis of population normality. The normal probability plot
of the ys is much straighter.
H 0 : 0 will be rejected in favor of H a : 0 at level .01 if t t .005,8 3.355

f. .
xi 864, xi2 78,142, y i 138.0, y i2 1959.1 and xi y i 12,322.4 , so
r
3992
.913 .913 2.8284
t 6.33 3.355
186.8796 23.3880 and .4080 , so reject Ho.
There does appear to be a linear relationship.
51.
a. We used Minitab to calculate the ris: r1 = 0.192, r2 = 0.382, and r3 = 0.183. It appears
that the lag 2 correlation is best, but all of them are weak, based on the definitions given
in the text.
2
.2
100 r .2
b. . We reject Ho if i . For all lags, ri does not fall in the rejection region,
so we cannot reject Ho. There is not evidence of theoretical autocorrelation at the first 3
lags.
c. If we want an approximate .05 significance level for the simultaneous hypotheses, we

would have to use smaller individual significance level. If the individual confidence
levels were .95, then the simultaneous confidence levels would be approximately (.95)
(.95)(.95) = .857.
52.
a. Because p-value = .00032 < ( = .001, Ho should be rejected at this significance level.
b. Not necessarily. For this n, the test statistic t has approximately a standard normal
H o : 1 0 is true, and a p-value of .00032 corresponds to
distribution when
r 498
3.60
z 3.60 (or 3.60). Solving 1 r 2 for r yields r = .159. This r suggests only
a weak linear relationship between x and y, one that would typically have little practical
importance .
.022 9998
t 2.20 t.025,9998 1.96
c. 1 .022 2 , so Ho is rejected in favor of Ha. The value t =
2.20 is statistically significant it cannot be attributed just to sampling variability in the
case 0 . But with this n, r = .022 implies .022 , which in turn shows an
extremely weak linear relationship.
36
Supplementary Exercises
53.
a. Clearly not: for example, the two observations with truss height = 14 ft have different
sale prices (37.82 and 36.90).
b. The scatter plot below suggests a strong linear relationship.
Scatterplot of SalePrice vs Truss

60
55
50
SalePrice
45
40
35
10 15 20 25 30 35
Truss
c. Using software, the least squares line is = 23.8 + 0.987x, where x = truss height and y =
sale price.
d. At x = 27, = 23.8 + 0.987(27) = $50.43 per square foot. The observed value at x = 27 is
y = 48.07, and the corresponding residual is y = 48.07 50.43 = 2.36.
e. Using software, r2 = SSR/SST = 890.36/924.44 = 96.3%.
54. Use available software for all calculations.

a. We want a confidence interval for 1. From software, b1 = 0.987 and s(b1) = 0.047, so the
corresponding 95% CI is 0.987 t.025,17(0.047) = 0.987 2.110(0.047) = (0.888, 1.086).
We are 95% confident that the true average change in sale price associated with a one-
foot increase in truss height is between $0.89 per square foot and $1.09 per square foot.
b. Using software, a 95% CI for y.25 is (47.730, 49.172). We are 95% confident that the true
average sale price for all warehouses with 25-foot truss height is between $47.73/ft2 and
$49.17/ft2.
c. Again using software, a 95% PI for Y when x = 25 is (45.378, 51.524). We are 95%
confident that the sale price for a single warehouse with 25-foot truss height will be
between $45.38/ft2 and $51.52/ft2.
d. Since x = 25 is nearer the mean than x = 30, a PI at x = 30 would be wider.
e. From software, r2 = SSR/SST = 890.36/924.44 = .963. Hence, r = .963 = .981.
37
55. First, use software to get the least squares relation = 0.5817 + 0.049727x, where x = age and

y = %DDA. From this, 1 = 0.049727. Also from software, s = 0.244, x = 29.73, and Sxx =
5390.2. Finally, if y* = 2.01, then 2.01 = 0.5817 + 0.049727 x , whence x = 28.723.
Therefore, an approximate 95% CI for X when y = y* = 2.01 is
1/ 2
0.244 1 28.723 29.73 2
28.723 t .025,9 1
0.049727 11 5390.2
=
28.723 2.262(5.125) = (17.130, 40.316).
Since this interval straddles 22, we cannot say with confidence whether the individual was
older or younger than 22 ages both above and below 22 are plausible based on the data.
56.
1 1
t
s t t.025,11 2.201
a. The test statistic value is 1 , and Ho will be rejected if either
xi 243, xi2 5965, y i 241, y i2 5731 and
or t 2.201 . With
xi y i 5805 , 1 .913819 , 0 1.457072 , SSE 75.126 , s 2.613 , and
.9138 1
s .0693 t 1.24
1
, .0693 . Because 1.24 is neither 2.201 nor
2.201 , Ho cannot be rejected. It is plausible that 1 1 .
16,902
r .970
b. 136128.15
57.
a. df(SSE) = 6 = n-2, so sample size = 8.
b. y 326.976038 8.403964 x . When x = 35.5, y 28.64 .
c. The P-value for the model utility test is 0.0002 according to the output. The model utility
test is statistically significant at the level .01.
d. r ( signofslope) r 2 0.9134 0.9557
e. First check to see if the value x = 40 falls within the range of x values used to generate
the least-squares regression equation. If it does not, this equation should not be used.
Furthermore, for this particular model an x value of 40 yields a y value of 9.18mcM/L,
which is an impossible value for y.
38
58.
a. r 2 .5073
b. r r 2 .5073 .7122 (positive because 1 is positive.)
c. We test H 0 : 1 0 v. H a : 1 0 .. The test statistic t = 3.93 gives p-value = .0013,

which is < .01, the given level of significance, therefore we reject Ho and conclude that
the model is useful.
d. We use a 95% CI for

Y 50 . y 50 .787218 .007570 50 1.165718 ,
t.025,15 2.131
, s = Root MSE = .020308, so
17 50 42.33
2
1
s y 50 .20308 .051422
17 17 41,575 719.60 2
. The interval is, then,
1.165718 2.131 .051422 1.165718 .109581 1.056137,1.275299 .
y 30 .787218 .007570 30 1.0143.

e. The residual is
y y .80 1.0143 .2143 .
59.
a.
Regression Plot
30
20
Noy:
10
0 100 200 300 400 500 600 700
CO:
The above analysis was created in Minitab. A simple linear regression model seems to fit
the data well. The least squares regression equation is y .220 .0436 x . The

model utility test obtained from Minitab produces a t test statistic equal to 12.72. The
corresponding p-value is extremely small. So we have sufficient evidence to claim that
CO is a good predictor of NO y .
39
b. y .220 .0436 400 17.228 . A 95% prediction interval produced by Minitab

NO y
is (11.953, 22.503). Since this interval is so wide, it does not appear that is
accurately predicted.
c. While the large CO value appears to be near the least squares regression line, the
value has extremely high leverage. The least squares line that is obtained when
excluding the value is y 1.00 .0346 x . The r2 value with the value included is 96%

and is reduced to 75% when the value is excluded. The value of s with the value
included is 2.024, and with the value excluded is 1.96. So the large CO value does
appear to affect our analysis in a substantial way.
60.
xi 228, xi2 5958, y i 93.76, y i2 982.2932 and
a. n = 9,
243.93 .148919
xi y i 2348.15 , giving 1 1638 14.190392 , and
, 0

the equation y 14.19 .1489 x .

b. 1 is the expected increase in load associated with a one-day age increase (so a negative
value of 1 corresponds to a decrease). We wish to test H 0 : 1 .10 v.
H a : 1 .10 (the alternative contradicts prior belief). H will be rejected at level .05 if
o
1 .10
t t.05.7 1.895
s
1 . With SSE = 1.4862, s = .4608, and
.4608 .1489 .1
s .0342 t 1.43
1
182 . Thus .0342 . Because 1.43 is not
1.895 , do not reject Ho. The data do not contradict this assertion.
306 2
xi 306, x 7946, so 2
i
xi x 2 7946 12
143
c. here, as
x 's
contrasted with 182 for the given 9 i . Even though the sample size for the proposed
x values is larger, the original set of values is preferable.
1 9 28 25.33
2
t .025, 7 s 2.365 .4608 .3877 .42
d. 9 1638 , and
0 1 28 10.02, so the 95% CI is 10.02 .42 9.60,10.44.
40
61.
3.5979
1 .0805 1.6939
a. 44.713 , 0

, y 1.69 .0805 x .
3.5979
1 12.2254 20.4046
b. .2943 , 0

, y 20.40 12.2254 x .
c. r = .992, so r2 = .984 for either regression.
62.
a. The plot suggests a strong linear relationship between x and y.
xi 1797, xi2 4334.41, y i 7.28, y i2 7.4028 and

b. n = 9,
299.931
xi y i 178.683 , so 1 6717.6 .04464854 , 0 .08259353 , and the
equation of the estimated line is y .08259 .044649 x .

c. SSE 7.4028 601281 7.977935 .026146,
SST 7.4028
7.28 2 1.5141 r2 1
SSE
.983
9 , and SST , so 93.8% of the
observed variation can be explained by the model relationship.
d. y 4 .08259 .04464919.1 .7702 , and

y 4 y 4 .68 .7702 .0902 .
.06112
s .002237
1
746 . 4 H : 1 0
e. s = .06112, and , so the value of t for testing 0
.044649
t 19.96
H : 1 0 is .002237 t 5.408
vs 0 . From Table A.5, .0005,7 , so the 2-
sided P-value is less than 2(.005) = .001. There is strong evidence for a useful
relationship.
f. A 95% CI for 1 is .044649 2.365 .002237 .044649 .005291

.0394,.0499 .
g. A 95% CI for
0 1 20 is .810 2.365 .002237 .3333356
.810 .048 .762,.858
41
1 nx 2
0 t / 2,n 2 s
n nxi2 xi 2
63. Substituting x* = 0 gives the CI . From Example
12.8,
0 3.621 , SSE = .262453, n = 14, xi 890, x 63.5714, xi2 67,182, so
1 56,578.52
3.621 2.179 .1479
t.025,12 2.179 12 148,448
with s = .1479, , the CI is
3.621 2.179 .1479 .6815 3.62 .22 3.40,3.84 .

y 1x

SSE y 0 y 1xy . Substituting 0
2
64. n , SSE becomes
SSE y 2

y y 1x
1xy y 2
y 1xy xy
2
1
n n n
y
2
xy
y 1 xy
S yy 1 S xy
2
n n
, as desired.
65. The value of the sample correlation coefficient using the squared y values would not
necessarily be approximately 1. If the y values are greater than 1, then the squared y values
would differ from each other by more than the y values differ from one another. Hence, the
relationship between x and y2 would be less like a straight line, and the resulting value of the
correlation coefficient would decrease.
66.
sy s yy

s x x y y
2 2
i s s s
a. With xx , yy i
, note that x xx
( since the
factor n-1 appears in both the numerator and denominator, so cancels). Thus
s xy s yy s xy
y 0 1 x y 1 x x y x x y x x
s xx s xx s xx s yy
sy
y r x x
sx , as desired.
b. By .573 s.d.s above, (above, since r < 0) or (since sy = 4.3143) an amount 2.4721 above.
42
s xy s xy
r 1
s x x ), and
(where e.g. xx i
2
s xy s xx s yy s xx . Also,
With given in the text,
SSE
s
n 2 and SSE y i 0 y i 1xi y i s yy 1 s xy . Thus the t statistic for
2
1 s xy / s xx s xx
t
H o : 1 0 is s / xi x
2
s yy s xy2 / s xx / n 2
s xy n 2

s xy / s xx s yy n2

r n2
s xx s yy s 2
xy 1 s / s xx s yy
2
xy 1 r2
as desired.
SST s yy SSE s yy 1 s xy
67. Using the notation of the exercise above, and
s xy2
s yy
s xy2 SSE s xx s xy2
s yy 1 1 r2
s xx , so SST s yy s xx s yy
, as desired.
68.
a. A Scatter Plot suggests the linear model is appropriate.
99.0
removal%
98.5
98.0
5 10 15
temp
43
Minitab Output:
The regression equation is

removal% = 97.5 + 0.0757 temp
Predictor Coef StDev T P

Constant 97.4986 0.0889 1096.17 0.000
temp 0.075691 0.007046 10.74 0.000
S = 0.1552 R-Sq = 79.4% R-Sq(adj) = 78.7%
Analysis of Variance
Source DF SS MS F P
Regression 1 2.7786 2.7786 115.40 0.000
Residual Error 30 0.7224 0.0241
Total 31 3.5010
Minitab will output all the residual information if the option is chosen, from which you
y10.5 98.2933 , the observed value y = 98.41, so
can find the point prediction value
the residual = .0294.
b. Roughly s = .1552
c. R2 = 79.4
t.025,30 2.042
d. A 95% CI for (1, using :
.075691 2.042 .007046 .061303,.090079
e. The slope of the regression line is steeper. The value of s is almost doubled (to 0.291),
and the value of R2 drops to 61.6%.
44
Using Minitab, we create a scatterplot to see if a linear regression model is appropriate.
blood glucose level

6
0 10 20 30 40 50 60
time
A linear model is reasonable; although it appears that the variance in y gets larger as x
increases. The Minitab output follows:
blood glucose level = 3.70 + 0.0379 time

Constant 3.6965 0.2159 17.12 0.000
time 0.037895 0.006137 6.17 0.000
S = 0.5525 R-Sq = 63.4% R-Sq(adj) = 61.7%
Source DF SS MS F P
Regression 1 11.638 11.638 38.12 0.000
Total 23 18.353
The coefficient of determination of 63.4% indicates that only a moderate percentage of the
variation in y can be explained by the change in x. A test of model utility indicates that time
is a significant predictor of blood glucose level. (t = 6.17, p = 0.0). A point estimate for blood
glucose level when time = 30 minutes is 4.833%. We would expect the average blood glucose
level at 30 minutes to be between 4.599 and 5.067, with 95% confidence.
69.
a. Using the techniques from a previous chapter, we can do a t test for the difference of two
means based on paired data. Minitabs paired t test for equality of means gives t = 3.54,
with a p value of .002, which suggests that the average bf% reading for the two methods
is not the same.
45
Using linear regression to predict HW from BOD POD seems reasonable after looking at the
scatterplot, below.
20
15
HW
10
2 7 12 17
BOD
The least squares linear regression equation, as well as the test statistic and p value for a
model utility test, can be found in the Minitab output below. We see that we do have
significance, and the coefficient of determination shows that about 75% of the variation
in HW can be explained by the variation in BOD.

HW = 4.79 + 0.743 BOD

Constant 4.788 1.215 3.94 0.001
BOD 0.7432 0.1003 7.41 0.000
S = 2.146 R-Sq = 75.3% R-Sq(adj) = 73.9%
Source DF SS MS F P
Regression 1 252.98 252.98 54.94 0.000
Total 19 335.87
n 6 , xi 125 , y i 472.0 , xi 3625 ,

2
70. For the second boiler,
y i2 37,140.82 , and xi y i 9749.5 , giving 1 estimated slope
503
.0821224 80.377551
6125 , 0 , SSE 2 3.26827 , SSx 2 1020.833 .
.1333 SSE 8.733
For boiler #1, n = 8, 1 , 1 SSx 1442.875
, and 1 . Thus
.1333 .0821
8.733 3.286 t
2 1.2, 1.095 14421.875 10201.833
10 1.095 , and
.0512
1.14 t 2.228
.0448 . .025,10 and 1.14 is neither 2.228 nor 2.228 , so
Ho is not rejected. It is plausible that 1 1 .
46
47

Linear Regression Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Regression Analysis

Uploaded by

Copyright:

Available Formats

CHAPTER 12

170 180 190

2. Scatter plots for the emissions vs age:

BOD mass loading BOD mass removal

Scatter plot of the data:

BOD mass loading (x) vs BOD mass removal (y)

Temperature (x) vs Elongation (y)

Temperature (x) vs Elongation (y)

b. A parabola appears to provide a good fit to both graphs.

b. expected change = slope = 1 1.3

expected change = 100 1

f. Now E(Y) = 5050, so P Y 5000 P Z .14 .5557

g. E (Y2 Y1 ) E (Y2 ) E (Y1 ) 5050 4400 650 , and

h. The standard deviation of Y2 Y1 494.97 (from c), and

b. We expect flow rate to decrease by 5 1 .475 .

between 2.4 and 2.6 is

y 35 .626 .652 35 23.456

x i 200 y i 5 . 37 x 2i 12.000 y 2i 9.3501 xi y i 333

SSE S yy 1 S xy 9.9153 .09092 45.8246 5.7489

y50 1.1278 .82697 50 40.2207

SSE S yy 1 S xy 14,435.7 .82697 17,324.4 357.07

Note: n = 23 in this study.

y50 3.678 .144 50 10.878

d. Using the equation of a, predicted

x i 3300 y i 5010 x i2 913,750 y 2i 2,207,100 xi y i 1,413,500

g. Estimated expected change 50 1 85.57

b. The regression equation is y = 138 + 9.31 x

c. The desired value is the coefficient of determination, r 2 99.0% .

x is substituted for x in 0 1 x , y results, so that x, y is on the

We wish to find b1 to minimize f (b1 ) y i b1 x i . Equating f b1 to 0 yields

xi x 1,100,000 now, which is smaller

b. .00736 2.160 .001017 .00736 .00220 .00956,.00516

H o : 1 .1 vs. H a : 1 .1 . The calculated t statistic is

H o : 1 0 vs H a : 1 0 , and the test statistic is

1 1.561 , 0 9.118 , SSE = 430.5264, s 5.55 , s 1 .551 , and the new CI

1 . Using the values from the Minitab

30. SSE = 124,039.58 (72.958547)(1574.8) (.04103377)(222657.88) = 7.9679, and SST =

F.001,1,18 15.38 18.0

than is x = 60, the quantity

y125 y125 78.088 t.05,18 1.734

b. We want a 90% PI: Only the standard error changes:

H 0 : y 400 .25 H 0 : y 400 .25

y 400 .311 .00143 400 .2614

y 40 1.128 .82697 40 31.95 t.025,13 2.160

f. x 798, x 2 63,040 which gives S xx 20,586.4 , which in turn gives

e. The hypothesized slope is (2-unit decrease in mean y)/(10-unit increase in x) = 0.2.

39. 95% CI: (462.1, 597.7); midpoint = 529.9; .025,8

0 1 1.20 t / 2,n 2 s 2 s 2 0 1 1.20

H o : 1 0 vs H a : 1 0 . The test statistic

a. Summary values: x 44,615 , x 2 170,355,425 , y 3,860 ,

Normal Probability Plot

2800 3800 4800

Normal Probability Plot

200 300 400

H 0 : 0 vs H a : 0 , Reject H if; Reject H at level .05 if either

xi 111 .71, xi2 2,724.7643, y i 2.9, y i2 1.6572 , and

b. z .652 .549 23 .49 , so Ho cannot be rejected at any reasonable level.

H 0 : 0 will be rejected in favor of H a : 0 at level .01 if t t .005,8 3.355

c. If we want an approximate .05 significance level for the simultaneous hypotheses, we

b. The scatter plot below suggests a strong linear relationship.

Scatterplot of SalePrice vs Truss

e. Using software, r2 = SSR/SST = 890.36/924.44 = 96.3%.

54. Use available software for all calculations.