You are on page 1of 15

STAT3010: Lecture 5

Notation and Examples (Section 9.2, Page 413)


To make a decision of reject/do not reject the null hypothesis,
we simplify the test by the use of the ANOVA table. Here are
the formulas which make up the ANOVA table:
Analysis of Variance Table
Source of
Variation

Sums of Squares
(SS)

Between

SS b

nj (X .j

Within

SS w

( X ij

Total

SS total

( X ij

Degrees of
Freedom
(df)

Mean Squares
(MS)

X .. ) 2

k-1

s b2

MS b

SS b
k 1

X . j )2

N-k

s w2

MS w

SS w
N k

X .. ) 2

F
F

MS b
MS w

N-1

Example 9.3: Testing Difference in Mean Time to Pain Relief


Among 3 Treatments
An investigator wishes to compare the average time to relief of
headache pain under three distinct medications, call them
Drugs A, B and C. Fifteen patients who suffer from chronic
headaches are randomly selected for the investigation, and
five subjects are randomly assigned to each treatment. The
following data reflect times to relief (in minutes) after taking the
assigned drug:
Drug A
Drug B
Drug C
30
35
40
25
35

25
20
30
25
30

15
20
25
20
20
1

STAT3010: Lecture 5

Summary Statistics by Treatment

x1
s12

29
0.025

x2
s22

25
0.005

x3
s32

20
0.025

s1 0.158

s2

0.071

s3

0.158

To test whether the true mean times to relief under the three
different drugs are equal, we use a five step procedure:
1. Set up the hypothesis.

2. Select the appropriate test statistic.

3. Compute the test statistic.

STAT3010: Lecture 5

Analysis of Variance Table


Source of
Variation

Sums of Squares
(SS)

Degrees of
Freedom
(df)

Mean Squares
(MS)

Between
Within
Total

4. Decision Rule.

5. Conclusion.

This ANOVA procedure utilizes several calculations (as do many


statistical procedures).the calculations are generally
performed using a statistical software on a computer, so well
use SAS to evaluate this same example.
SAS CODE:
options ps=62 ls=80;
data headache;
input trt $ time;
cards;

STAT3010: Lecture 5

A
30
A
35
A
40
A
25
A
35
B
25
B
20
B
30
B
25
B
30
C
15
C
20
C
25
C
20
C
20
run;
proc print;
run;
proc anova;
class trt;
model time=trt;
run;

SAS OUTPUT:
The SAS System
Obs
1
2
3
4
5

trt

time

A
A
A
A
A

30
35
40
25
35

STAT3010: Lecture 5
6
7
8
9
10
11
12
13
14
15

B
B
B
B
B
C
C
C
C
C

25
20
30
25
30
15
20
25
20
20

The ANOVA Procedure


Class Level Information
Class
Levels
Values
trt
3
A B C
Number of Observations Read
Number of Observations Used
The ANOVA Procedure

15
15

Dependent Variable: time


Source
Model
Error
Corrected Total

R-Square
0.628713
Source
trt

DF
2
12
14

Sum of
Squares
423.3333333
250.0000000
673.3333333

Coeff Var
17.33299
DF
2

Mean Square
211.6666667
20.8333333

Root MSE
4.564355

Anova SS
423.3333333

F Value
10.16

Pr > F
0.0026

time Mean
26.33333

Mean Square
211.6666667

F Value
10.16

Pr > F
0.0026

Note: SAS has two procedures for analysis of variance


applications. The first is the ANOVA procedure, which is used
when the sample sizes are equal, and the second is the GLM
(general linear models) procedure, which can be used when

STAT3010: Lecture 5

the sample sizes are unequal or equal. Since the sample sizes
are equal in example 9.3, we used the ANOVA procedure.
Example 9.5: Testing Difference in Mean Weight Gain Among 4
Different Diets
A study is developed to examine the effects of vitamin and milk
supplements on infant weight gain. Four diet plans are
considered: Diet A involves a regular diet plus the vitamin
supplement Diet B involves a regular diet plus the special milk
formula, Diet C is our control diet (no restrictions) and Diet D
involves a regular diet plus the vitamin and the special milk
formula. Twenty infants are selected for the investigation and
each is randomized to one of the four competing diet
programs. The following table displays weight gains, measured
in pounds, after 1 month on the assigned diet:
Diet A
Diet B
2.0
1.6
1.5
1.9
2.4
2.1
1.9
1.1
2.6
1.7
1.) Set up the hypothesis.

Diet C
1.5
2.0
1.8
1.3
1.2

Diet D
2.1
2.4
1.9
1.8
2.2

2.) Use SAS to compute the ANOVA Table; make a decision


and conclusion based on your output.
SAS CODE:
options ps=62 ls=80;
data infants;
input diet $ gain;
cards;
A
2.0
A
1.5
A
2.4
A
1.9

STAT3010: Lecture 5

A
2.6
B
1.6
B
1.9
B
2.1
B
1.1
B
1.7
C
1.5
C
2.0
C
1.8
C
1.3
C
1.2
D
2.1
D
2.4
D
1.9
D
1.8
D
2.2
run;
proc print;
run;
proc glm;
class diet;
model gain=diet;
run;
SAS OUTPUT:

The SAS System


Obs
diet
gain
1
A
2.0
2
A
1.5
3
A
2.4
4
A
1.9
5
A
2.6
6
B
1.6
7
B
1.9
8
B
2.1
9
B
1.1
10
B
1.7
11
C
1.5
12
C
2.0
13
C
1.8
14
C
1.3
15
C
1.2
16
D
2.1
17
D
2.4

STAT3010: Lecture 5
18
19
20

D
D
D

1.9
1.8
2.2

The SAS System


The GLM Procedure
Class Level Information
Class
diet

Levels
4

Values
A B C D

Number of Observations Read


Number of Observations Used

20
20

The SAS System


The GLM Procedure
Dependent Variable: gain
Source
Model
Error
Corrected Total
R-Square
0.354045

DF
3
16
19

Sum of
Squares
1.09400000
1.99600000
3.09000000

Coeff Var
19.09187

Mean Square
0.36466667
0.12475000

Root MSE
0.353200

F Value
2.92

Pr > F
0.0659

gain Mean
1.850000

Source
diet

DF
3

Type I SS
1.09400000

Mean Square
0.36466667

F Value
2.92

Pr > F
0.0659

Source
diet

DF
3

Type III SS
1.09400000

Mean Square
0.36466667

F Value
2.92

Pr > F
0.0659

Decision:

Conclusion:

STAT3010: Lecture 5

Note: We always make conclusions based on the alternative


hypothesis. Whether we reject or do not reject the null, we will
always conclude on the alternative with sufficient or
insufficient evidence to say that the means are not equal.

Fixed Versus Random Effects Models (Section 9.3, Page


424)
Theres two types of analysis of variance applications: fixed
effects models and random effects models.
Fixed Effects Models:

Random Effects Models:

Note: We will only be using fixed effects models in the


upcoming sections. Basically, these formulas only apply to
fixed effects models.
9

STAT3010: Lecture 5

Evaluating Treatment Effects (Section 9.4, Page 424)


This section is purely based on the decision reject H 0 . If an
ANOVA is performed and it has been established that a
significant difference in means exists, we then want to figure
out how much variation in the data is due to the treatments.
We use the following statistic to find the ratio of variation due to
the treatments ( SSb ) to the total variation:

10

STAT3010: Lecture 5

Multiple Comparisons Procedures (Section 9.5, Page


425)
Now that we know when to reject/do not reject the null
hypothesis, lets consider some new comparisons. Lets say we
decide to reject the null hypothesis, and conclude that not all
means are equal. What if we wanted to know, specifically,
which means arent equal? For example, in example 9.3, we
wanted to test to see if the mean times to relief of three
different headache medications differed:

And we came up with the decision to reject the null hypothesis.


So, we are saying that there is a significant difference in at least
2 of the headache medications. Suppose we are particularly
interested in comparing only the first two medications:

Or the first and third:

Tests of this type are called pairwise comparisons, since they


involve pairs of treatment means.
It is however, possible to construct more complicated
comparisons: For example, Compare the mean time to relief for
patients assigned to either Drug A or B to the mean time to
relief for patients assigned to Drug C.

Both pairwise (two-at-a-time) and more complicated


comparisons are generally called contrasts.

11

STAT3010: Lecture 5

There are a number of statistical procedures for handling these


applications, which are called multiple comparison procedures
(MCP). For pairwise (two-at-a-time) comparisons, we will be
looking at 2 popular multiple comparison procedures, the
Scheffe and Tukey procedures. Next class, well look at a
different method for more complicated contrasts.
Remember: These MCPs are only used when weve come up
with the decision of rejecting H 0 in our ANOVA and a
conclusion that the treatment means are significantly different.
The Scheffe Procedure
The Scheffe procedure is a multiple comparison procedure that
controls the familywise error rate. This means that the
P(type I error) is controlled (and equal to ) over the family of
all comparisons.
Recall: Type I error?

Note: The Scheffe procedure is most commonly used when


involving more than a few contrasts; however, it has lower
statistical power compared to competing procedures.
Outline of the Scheffe Procedure:
1. Set up the hypotheses:

2. Compute the test statistic:

12

STAT3010: Lecture 5

3. Decision Rule:

4. Conclusion. (We should all know how to write a


conclusion by now!)
Okay, lets do an example:
Example 9.7: Recall Example 9.3;
We compared the mean time to relief of headache pain under
3 competing medications and had the following hypothesis:

Analysis of Variance Table


Source of
Variation

Sums of Squares
(SS)

Degrees of
Freedom
(df)

Mean Squares
(MS)

Between

423.329

211.66

Within

250

12

20.833

Total

673.329

14

F
10.1598

Since we dont know which of the 3 treatments do not equal,


we now wish to compare the medications taken two-at-a-time,
(i.e., pairwise comparisons).
13

STAT3010: Lecture 5

Summary Statistics by Treatment

n3

n1 5

n2

x1 33
s1 5.7

x 2 26
s2 4.2

Drug A versus Drug B:


1. Hypothesis:

2. Test Statistic:

3. Decision:

4. Conclusion:

Drug A versus Drug C:


1. Hypothesis:

14

x 3 20
s3 3.5

STAT3010: Lecture 5

2. Test Statistic:

3. Decision:

4. Conclusion:

Drug B versus Drug C:


1. Hypothesis:

2. Test Statistic:

3. Decision:

4. Conclusion:

Therefore, it is shown through the Scheffe comparison


procedure that 1
3.

15

You might also like