Professional Documents
Culture Documents
Project # 7
Michael Luketich
April 30, 2015
Access ID: EY0350
Final project
Regression
1. Use a scatterplot in order to explain the relationship between X and Y.
10
MOE
0
0
20
40
60
80
100
120
140
160
180
NOB
Coefficients
Term
Constant
MOE
Coef
142.2
-12.48
S=18.2978
SE Coef
11.3
1.53
R-sq=86.93%
T-Value
12.63
-8.16
P-Value
0.000
0.000
VIF
R-sq(adj)=85.62%
1.00
R-sq(pred)=76.20%
Analysis of Variance
Source
Regression
MOE
Error
Total
DF
1
1
10
11
Adj SS
22269
22269
3348
25617
Adj MS
22268.8
22268.8
334.8
F-Value
66.51
66.51
P-Value
0.000
0.000
NOB
175.00
Fit
129.72
Resid
45.28
Std
Resid
2.95
B. Interpret the estimate of B0 in the workds of the problem. Do the same for B1.
B0=142.2. the regression line intersect the y axis at 142.2. this means that the maximum
amount of bacteria expected in the tin can will be 142.2 bacteria before the heat treatment.
The slope B1= -12.48 means that, the average number of bacteria decreases by 12.48
bacteria over the course of a minute.
C. Conduct a test that the true slope of the model differs from 0. Explain how to use the output
of the regression for the test.
H0: B1=0
Ha: B10
Since the p-value describes the relationship between the x and y axis, we can do a simple test to prove
or disprove the hypothesis: P-value of 0.000< .05 Alpha since the p-value is less then the alpha, that
means that there is a relationship between x and y in this model.
Term
Constant
MOE
Coef
142.2
-12.48
SE Coef
11.3
1.53
T-Value
12.63
-8.16
P-Value
0.000
0.000
VIF
1.00
D. Relate the conclusion of you test to the scatterplot that you generated in (1)
As per our predictions Minitab has shown that the B1 is in the negative direction B1=-12.48
and that the r was close to 1 r= 86.93%.
E. Use the generated plots of the regression model in order to check the normality assumption
of your model. While there is no need for a statistical test you must compare the shape t a
normal distribution pdf.
Looking at the histogram it seems to me that it doesnt fallow a normal curve or at least
there isnt enough info to definitively say it is.
10.0
MOE
7.5
5.0
2.5
0.0
60
80
100
NOB
120
140
160
180
4.
Generate a prediction interval for y* and confidence interval for (y) when x*= xbar.
SE Coef
0.654
0.00854
95% CI
(
9.297,
12.213)
(-0.08869, -0.05063)
T-Value
16.44
-8.16
P-Value
0.000
0.000
VIF
1.00
Model Summary
S=1.36711 R-sq=86.93% R-sq(adj)=85.62%
Analysis of Variance
Source
Regression
NOB
Error
Total
DF
1
1
10
11
Seq SS
124.31
124.31
18.69
143.00
Contribution
86.93%
86.93%
13.07%
100.00%
Adj SS
124.31
124.31
18.69
Seq MS
124.310
124.310
1.869
F-Value
66.51
66.51
P-Value
0.000
0.000
Fit
SE Fit
95% CI
Resid
Std
Resid
Del
Resid
HI
Cooks D
-1.436
1.050
(-3.775, 0.904)
2.436
2.78
5.55
0.589913
5.57
Fit
SE Fit
6.50581 0.394652
95% CI
(5.62647, 7.38514)
95% PI
(3.33531, 9.67630)
Variable
NOB
Setting
61
x*
80
Data
60
40
20
0
circuit1
circuit2
circuit3
circuit4
A. are there differences between the mean of the responses in at least two of the levels?
Yes. It can be seen that the mean of the first circuit is very different from the rest. The 3rd
circuit is also different from the 2nd and 4th circuits.
B. do you think this is the result of within- group variations or between-group variations?
I believe the difference between circuits 1 and 3 and the others is between group
variations.
4. State clearly the hypothesis that we are testing in this problem.
5. Run ANOVA on the data and generate the output with plots
DF
3
16
19
Adj SS
12042
2949
14991
Adj MS
4014.0
184.3
F-Value
21.78
P-Value
0.000
Model Summary
S
13.5757
R-sq
80.33%
Means
CIRCUIT
C1
C2
C3
C4
N
5
5
5
5
R-sq(adj)
76.64%
Mean
19.20
70.00
36.60
79.80
StDev
7.79
11.02
11.59
20.51
R-sq(pred)
69.26%
95%
( 6.33,
(57.13,
(23.73,
(66.93,
CI
32.07)
82.87)
49.47)
92.67)
A. Comment on the degree of freedom values for each source of variation. How do you calculate
them?
DF for the groups: number of groups 1= 4-1=3
Df for between groups: 4(5-1)= 16
DF total number of observations: 16 +3 = 19
B. Do we reject the hypothesis that we are testing? Why or why not?
Yes we reject the hypothesis that the means of the four groups are equal to one another. Since
the p-value is 0.00 which is less than the value of alpha which is =.05, we have the evidence that
at least one circuit is different.
:We reject HO if p<alpha. Since 0<.05, we reject HO
We also reject if the F-value calculated > F-critical, since 21.78> 4.49 we reject the HO.
C. If you reject, can you tell which level(s) is probably the one(s) that has the different mean?
Circuit 1 and 3 have different means. I am really suspicions of the first circuit but not as
suspicious of the 3rd so I will reevaluate the ANOVA test without circuit 1 to assure my suspicions
of both circuit 1 and 3.
DF
2
12
14
Adj SS
5130
2706
7836
Adj MS
2564.9
225.5
F-Value
11.37
P-Value
0.002
Model Summary
S
15.0167
R-sq
65.47%
R-sq(adj)
59.71%
R-sq(pred)
46.04%
Means
CIRCUIT
C2
C3
C4
N
5
5
5
Mean
70.00
36.60
79.80
StDev
11.02
11.59
20.51
95%
(55.37,
(21.97,
(65.17,
CI
84.63)
51.23)
94.43)
Since the p-value is (.002<.05) we must reject the null hypothesis. We conclude that the
hypothesis has to be rejected.
D. Using the histogram of residuals to comment on the normality of error term in this ANOVA
model.
The histogram shows what looks like a possible normal distribution but there isnt enough info
to accurately say that the error term is normally distributed.