You are on page 1of 3

Stat 331 - Tutorial 1

Suppose we would like to investigate the relationship between the maintenance cost (in thousands of dollars) of Boeing 787 aircraft and the number of ight hours. The dataset airline_slr.txt is available on Learn and contains data for 100 aircraft. The model: yi = 0 + 1 xi + i ,
i

N (0, 2 ) , i = 1, 2, . . . , 100 , where

y is the maintenance cost (in thousands of $) of a randomly selected Boeing 787 aircraft, x is the corresponding number of ight hours. You are provided with the following output from R: dataset = read.table("airline_slr.txt", header = T) x = dataset$hours y = dataset$cost mean(x) ## [1] 148.8 mean(y) ## [1] 1822 sum((x - mean(x))^2) ## [1] 731093 sum((y - mean(y))^2) ## [1] 47854282 sum((x - mean(x)) * (y - mean(y))) ## [1] 4686071 slr_model = lm(cost ~ hours, data = dataset) summary(slr_model) ## ## Call: ## lm(formula = cost ~ hours, data = dataset) ## ## Residuals: ## Min 1Q Median 3Q Max ## -815.4 -346.5 50.2 340.3 658.8 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 868.487 85.588 10.2 <2e-16 *** ## hours 6.410 0.499 12.8 <2e-16 *** ## --## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## ## Residual standard error: 426 on 98 degrees of freedom ## Multiple R-squared: 0.628,Adjusted R-squared: 0.624 ## F-statistic: 165 on 1 and 98 DF, p-value: <2e-16 ## sum(slr_model$residuals^2) ## [1] 17818085

1. Verify the least squares estimates of 0 and 1 from the model summary (using the formulas derived in class). Solution: sxy 4686071 = 1 = = 6.4097 sxx 731093 0 = y 1 x = 1822 6.4097 (148.8) = 868.4874 2. Verify the estimate of from the model summary.
n 2 i=1 ei

Solution: =

n2

17818085 = 426.4003 98

3. Use the R2 value to comment on the goodness of t of the model. Solution: An R2 of 0.628 implies that the model provides a fair t to the data. 4. Calculate the value of the sample correlation coefcient. For the remaining parts, you may require the following information: t0.05 (99) = 1.6604, t0.025 (99) = 1.9842, t0.05 (98) = 1.6606, t0.025 (98) = 1.9845 F0.05 (1, 99) = 3.9371, F0.025 (1, 99) = 5.1802, F0.05 (1, 98) = 3.9381, F0.025 (1, 98) = 5.1818 5. Test H0 : 1 = 0 vs H1 : 1 = 0 at a 5% signicance level using the t-test. Conrm your conclusion using the ANOVA F-test. Solution: Assuming H0 : 1 = 0, the test statistic is t = 1 = 12.853 s.e.(1 )

Since t > t0.025 (98) = 1.9845, we reject H0 in favour of H1 at a 5% level. Alternatively, if you look at the p-value provided in the model summary, we see that it is less than 2 1016 and so, we reject H0 at a 5% level. F-test: From the model summary, the F statistic is F = 165 > F0.05 (1, 98) = 3.9381 and so, we will also reject H0 at a 5% level. 6. Test (at a 5% signicance level) the hypothesis that on average, for a 1 hour increase in ight time, the maintenance cost increases by more than $6000. (Remember that in the dataset, the maintenance cost is given in thousands of dollars.) Solution: Test 1 6 vs 1 > 6. Assuming H0 , the test statistic is t = 1 6 = 0.8215 s.e.(1 )

Since t < t0.05 (98) = 1.6606, we do not reject H0 . 7. Construct a 95% condence interval to support the conclusion in the previous part. Solution: 95% C.I. for 1 : 1 t0.025 (98) s.e.(1 ) = (5.42, 7.3993) To be more concrete, note that the hypothesis in question 6) will also be rejected at a 2.5% level. Informally, the condence interval tells us that, given the sample, it is plausible that either 6 or > 6. As such, in the previous part, we couldnt reject the null hypothesis in favour of the alternative. Had the condence interval been strictly greater than 6, we would have rejected H0 in favour of H1 at a 2.5% level. 8. For a certain Boeing 787 aircraft, it is estimated that the number of ight hours in the next month would be 120. Predict the maintenance cost for this aircraft and construct a 95% prediction interval. Recall that V (p yp ) = 2 1 + y 1 (xp x)2 + n Sxx

Solution: Predicted Value: p = 0 + 1 xp = 0 + 1 120 = 1637.6485 y 95% P.I.: p t0.025 (98) s.e.(p yp ) = (786.7728, 2488.5242) 9. A month has passed and for an aircraft that has own for 120 hours, the maintenance cost was 2,600. Assuming the model assumptions are satised, explain briey why you do, or do not, believe that this is too costly and that other external factors have changed signicantly since the data was collected. Solution: Based on the prediction interval, we were fairly condent that the cost would have lied between 786 and 2,489. A cost of 2,600 could possibly imply that external factors have changed. However, since the residual standard error is very large, maybe we should have created a prediction interval with a higher prediction level (say 99%). A 99% prediction interval is (511.3051, 2763.9918) and we now see that 2,600 lies in this interval.

You might also like