You are on page 1of 14

One-way Analysis of Variance (ANOVA) (Chapter 8, Daniel)

Suppose we wish to compare k population means ( 2 k ). This situation can arise in two
ways. If the study is observational, we are obtaining independently drawn samples from
k distinct populations and we wish to compare the population means for some numerical
response of interest. If the study is experimental, then we are using a completely
randomized design to obtain our data from k distinct treatment groups. In a completely
randomied design the experimental units are randomly assigned to one of k treatments
and the response value from each unit is obtained. The mean of the numerical response
of interest is then compared across the different treatment groups.
There two main questions of interest
!) "re there at least two population means that differ#
2) If so, which population means differ and how much do they differ by#
$ore formally%
. some for i.e. differ, means population least two at %
... %
2 !
j i H
H
j i a
k o

= = =


If we re&ect the null then we use comparative methods to answer 'uestion 2 above.
Assumptions
!. Samples are drawn independently (completely randomized design)
2. (opulation variances are e'ual, i.e.
2 2
2
2
! k
= = = .
). (opulations are normally distributed.
*e examine chec+ing these assumptions in the examples that follow.
The test procedure compares the variation in observations between samples to the
variation within samples. If the variation between samples is large relative to the
variation within samples we are li+ely to conclude that the population means are not all
e'ual. The diagrams below illustrate this idea...
,etween -roup .ariation // *ithin -roup .ariation ,etween -roup .ariation

*ithin -roup .ariation


(0onclude population means differ) (1ail to conclude the population means differ)

!
The name analysis of variance gets its name because we are using variation to
decide whether the population means differ.
!ithin "roup Variation
To measure the variation within groups we use the following formula%
k n n n
s n s n s n
s
k
k k
W
+ + +
+ + +
=


2 !
2 2
2 2
2
! ! 2
) ! ( ) ! ( ) ! (
This is an estimate of the variance common to all k populations (
2
).
#etween "roup Variation
To measure the variation between groups we use the following formula%
groups treatment all from response the of mean grand x
group treatment i the for response mean sample x
where
k
x x n x x n x x n
s
th
i
k k
B
=
=

+ + +
=
,
!
) ( ) ( ) (
2 2
2 2
2
! ! 2

The larger the differences between the k sample means for the treatment groups are, the
larger
2
B
s will be.
If the null hypothesis is true, i.e. the k treatments have the same population mean, we
expect the between group variation to the same as the within group variation (
2 2
W B
s s ).
If
2 2
W B
s s > we have evidence against the null hypothesis in support of the alternative, i.e.
we have evidence to suggest that at least two of the population means are different.
Test $tatistic
2
2
W
B
s
s
F =
2 13distribution with numerator df = k ! and denominator df = n " k
The larger 1 is the strong the evidence against the null hypothesis, i.e. ,ig 14e&ect 5o.
2
6ote% This is an extension of pooled variance estimate from the two3sample t3Test for
independent samples (
2
p
s ) to more than two populations.
n 7 overall sample sie
%&A'()% * - !ei+ht "ain in Anore,ia (atients
Data -ile Anore,ia./'(
These data give the pre3 and post3weights of patients being treated for anorexia nervosa.
There are actually three different treatment plans being used in this study, and we wish to
compare their performance in terms of the mean weight gain of the patients. The patients
in this study were randomly assigned to one of three therapies.
The 0aria1les in the 2ata file are
+roup 3 treatment group (! 7 1amily therapy, 2 7 Standard therapy, ) 7 0ognitive
,ehavioral therapy)
Treatment 3 treatment group by name (,ehavioral, 1amily, Standard)
prewt 3 weight at the beginning of treatment
postwt 3 weight at the end of the study period
!ei+ht "ain 3 weight gained (or lost) during treatment (postwt3prewt)
*e begin our analysis by examining comparative displays for the weight gained across
the three treatment methods. To do this select -it 4 1y & from the Analy5e menu and
place the grouping variable, Treatment, in the & box and place the response, !ei+ht
"ain, in the 4 box and clic+ 89. 5ere boxplots, mean diamonds, normal 'uantile plots,
and comparison circles have been added.
Things to consider from this graphical display%
:o there appear to be differences in the mean weight gain#
"re the weight changes normally distributed#
Is the variation in weight gain e'ual across therapies#
)
Chec6in+ the %quality of Variance Assumption
To test whether it is reasonable to assume the population variances are e'ual for these
three therapies select 7n%qual Variances from the Oneway Analysis pull down3menu.
*e have no evidence to conclude that the variances;standard deviations of the weight
gains for the different treatment programs differ (p // .<=).
ON%-!A4 ANOVA T%$T -O8 CO'(A89N" T:% T:%8A(4 '%AN$
To test the null hypothesis that the mean weight gain is the same for each of the therapy
methods we will perform the standard one3way "68." test. To do this in >$( select
'eans, Ano0a;t-Test from the Oneway Analysis pull3down menu. The results of the
test are shown in the "nalysis of .ariance box.
The p3value contained in the "68." table is .<<?=, thus we re&ect the null hypothesis at
the .<= level and conclude statistically significant differences in the mean weight gain
experienced by patients in the different therapy groups exist.
'7)T9()% CO'(A89$ON$
,ecause we have concluded that the mean weight gains across treatment method are not
all e'ual it is natural to as+ the secondary 'uestion%
Which means are significantly different from one another#
*e could consider performing a series of two3sample t3Tests and constructing confidence
intervals for independent samples to compare all possible pairs of means, however if the
number of treatment groups is large we will almost certainly find two treatment means as
being significantly different. *hy# 0onsider a situation where we have k = $ different
treatments that we wish to compare. To compare all possible pairs of means (! vs. 2, !
vs. ), @, ? vs. A) would re'uire performing a total of 2!
2
) ! (
=
k k
two3sample t3Tests.
If we used
%&' (rror) ) *ype + = = (
for each test we expect to ma+e
! ) <= (. 2!
Type
I Brror, i.e. we expect to find one pair of means as being significantly different when in
fact they are not. This problem only becomes worse as the number of groups, k , gets
larger.
C
?D . =?
22 . )<A
2
2
=
=
W
B
s
s
The between group variation is =.C2 times larger than the
within group variationE This provides strong evidence against
the null hypothesis (p 7 .<<?=).
B'uality of .ariance Test
2 2
2
2
!
%
k o
H = = =
%
a
H the population variances are not
all e'ual
%,periment-wise %rror 8ate
"nother way to thin+ about this is to consider the probability of ma+ing no Type I Brrors
when ma+ing our pair3wise comparisons. *hen k = $ for example, the probability of
ma+ing no Type I Brrors is )C<? . ) F= (.
2!
= , i.e. the probability that we ma+e at least one
Type I Brror is therefore .?=F? or a ?=.F?G chance. 0ertainly this unacceptableE *hy
would conduct a statistical analysis when you +now that you have a ??G of ma+ing an
error in your conclusions# This probability is called the experimentwise error rate.
#onferroni Correction
There are several different ways to control the experiment3wise error rate. 8ne of the
easiest ways to control experiment3wise error rate is use the Bonferroni ,orrection. If
we plan on ma+ing m comparisons or conducting m significance tests the ,onferroni
0orrection is to simply use
m

as our significance level rather than

. This simple
correction guarantees that our experiment3wise error rate will be no larger than

. This
correction implies that our p3values will have to be less than
m

rather than

to be
considered statistically significant.
'ultiple Comparison (roce2ures for (air-wise Comparisons of k (opulation 'eans
*hen performing pair3wise comparison of population means in "68." there are
several different methods that can be employed. These methods depend on the types of
pair3wise comparisons we wish to perform. The different types available in >$( are
summaried briefly below%
Compare each pair using the usual two3sample t3Test for independent
samples. This choice does not provide any experiment3wise error rate
protectionE (DON<T 7$%)
Compare all pairs using Tu6ey<s 5onest Significant :ifference (5S:)
approach. This is best choice if you are interested comparing each
possible pair of treatments.
Compare with the means to the =#est> using 5suHs method. The best
mean can either be the minimum (if smaller is better for the response) or
maximum (if bigger is better for the response).
Compare each mean to a control +roup using :unnettHs method.
0ompares each treatment mean to a control group only. Iou must
identify the control group in >$( by clic+ing on an observation in your
comparative plot corresponding to the control group before selecting this
option.
=
'ultiple Comparison Options in /'(
%&A'()% * - !ei+ht "ain in Anore,ia (atients (cont<2)
1or these data we are probably interested in comparing each of the treatments to one
another. 1or this we will use Tu+eyHs multiple comparison procedure for comparing all
pairs of population means. Select Compare 'eans from the Oneway Analysis menu
and highlight the All (airs, Tu6ey :$D option. ,eside the graph you will now notice
there are circles plotted. There is one circle for each group and each circle is centered at
the mean for the corresponding group. The sie of the circles are inversely proportional
to the sample sie, thus larger circles will drawn for groups with smaller sample sies.
These circles are called comparison circles and can be used to see which pairs of means
are significantly different from each other. To do this, clic+ on the circle for one of the
treatments. 6otice that the treatment group selected will be appear in the plot window
and the circle will become red J bold. The means that are significantly different from
the treatment group selected will have circles that are gray. These color differences will
also be conveyed in the group labels on the horiontal axis. In the plot below we have
selected Standard treatment group.
?
The results of the pair3wise comparisons are also contained the output window.
The matrix labeled ,omparison for all pairs using *ukey-ramer H./ identifies pairs of
means that are significantly different using positive entries in this matrix. 5ere we see
only treatments 2 and ) significantly differ.
The next table conveys the same information by using different letters to represent
populations that have significantly different means. 6otice treatments 2 and ) are not
connected by the same letter so they are significantly different.
1inally the 0IHs in the 8rdered :ifferences section give estimates for the differences in
the population means. 5ere we see that mean weight gain for patients in treatment ) is
estimated to be between 2.<F lbs. and !).)C lbs. larger than the mean weight gain for
patients receiving treatment 2 (see highlighted section below).
A
%&A'()% ? 3 'otor $6ills in Chil2ren an2 )ea2 %,posure
These data come from a study of children living in close proximity to a lead smelter in Bl
(aso, TK. " study was conducted in !FA23A) where children living near the lead smelter
were sampled and their blood lead levels were measured in both !FA2 and in!FA). IL
tests and motor s+ills tests were conducted on these children as well. In this example we
will examine difference in finger3wrist tap scores for three groups of children. The
groups were determined using their blood lead levels from !FA2 and !FA) as follows%
Mead -roup ! 7 the children in this group had blood levels below C<
micrograms;dl in both !FA2 and !FA) (we can thin+ of these children as a Ncontrol
groupO).
Mead -roup 2 7 the children in this group had lead levels above C< micrograms;dl
in !FA) (we can thin+ of these children as the Ncurrently exposedO group).
Mead -roup ) 7 the children in this group had lead levels above C< microgram;dl
in !FA2, but had levels below C< in !FA) (we can thin+ of these children as the
Npreviously exposedO group).
The response that was measured ($"K1*T) was the maximum finger3wrist tapping
score for the child using both their left and right hands to do the tapping (to remove hand
dominance). These data are contained in the >$( file% 'a,fwt )ea2./'(.
Select -it 4 1y & and place )ea2 "roup in the &,-actor box and 'A&-!T in the
4,8esponse box. "fter selecting @uantiles and Normal @uantile (lots we obtain the
following comparative display.
The wrist3tap scores appear to be approximately normally distributed. The variance in
the wrist3tap scores may appear to be a bit larger for Mead -roup !, however because this
is the largest group we expect to see more observations in the extremes. The inter'uartile
ranges for the three lead groups appear to be approximately e'ual. *e can chec+ the
e'uality of variance assumption formally by selecting the 7n%qual Variance option.
D
-ormally Chec6in+ the %quality of the (opulation Variances
ON%-!A4 ANOVA -O8 CO'(A89N" T:% '%AN !89$T-TA( $CO8%$
AC8O$$ )%AD "8O7(
To test the null hypothesis that the mean finger wrist3tap score is the same for each of the
lead exposure groups we will perform the standard one3way "68." test. To do this in
>$( select 'eans, Ano0a from the Oneway Analysis pull3down menu. The results of
the test are shown in the "nalysis of .ariance box.
The p3value contained in the "68." table is .<!2=, thus we re&ect the null hypothesis at
the .<= level and conclude that statistically significant differences in the mean finger wrist
tap scores of the children in the different lead exposure groups exist (p P .<=).
'ultiple Comparisons usin+ Tu6ey<s :$D
F
6one of the e'uality of variance
tests suggests that the population
variances are une'ual (p / .<=).
8nly the mean finger wrist3tap scores of lead
groups ! and 2 significantly differ, i.e.
children who currently have high lead levels
have a significantly smaller mean than the
children who did not have high lead levels in
either year of the study.
Tu6ey<s :$D (air-wise Comparisons (cont<2)
The results of the pair3wise comparisons are also contained the output window.
The matrix labeled ,omparison for all pairs using *ukey-ramer H./ identifies pairs of
means that are significantly different using positive entries in this matrix. 5ere we see
only lead groups ! and 2 significantly differ.
The next table conveys the same information by using different letters to represent
populations that have significantly different means. 6otice lead groups ! and 2 are not
connected by the same letter so they are significantly different.
1inally the 0IHs in the 8rdered :ifferences section give estimates for the differences in
the population means. 5ere we see that the mean finger wrist tap score for children who
currently have a high blood lead level is estimated to be between .D) and !C.!D taps
smaller than the mean finger wrist tap score for children who did not have high lead
levels in either year of the study.
!<
8an2omi5e2 Complete #loc6 (8C#) Desi+ns ($ection 8.A Daniel)
%&A'()% * 3 Comparin+ 'etho2s of Determinin+ #loo2 $erum )e0el
Data -ile $erum-'eth./'(
The goal of this study was determine if four different methods for determining blood
serum levels significantly differ in terms of the readings they give. Suppose we plan to
have ? readings from each method which we will then use to ma+e our comparisons.
8ne approach we could ta+e would be to find 2C volunteers and randomly allocate six
sub&ects to each method and compare the readings obtained using the four methods.
(6ote% this is called a completely randomized design). There is one maBor problem with
this approach, what is it#
Instead of ta+ing this approach it would clearly be better to use each method on the same
sub&ect. This removes sub&ect to sub&ect variation from the results and will allow us to
get a clearer picture of the actual differences in the methods. "lso if we truly only wish
to have ? readings for each method, this approach will only re'uire the use of ? sub&ects
versus the 2C sub&ects the completely randomied approach discussed above re'uires,
thus reducing the NcostO of the experiment.
The experimental design where each patientHs serum level is determined using each
method is called a randomized complete block 01,B) design. 5ere the patients serve as
the blocksQ the term randomized refers to the fact that the methods will be applied to the
patients blood sample in a random order, and complete refers to the fact that each method
is used on each patients blood sample. In some experiments where bloc+ing is used it is
not possible to apply each treatment to each bloc+ resulting in what is called an
incomplete bloc+ design. These are less common and we will not discuss them in this
class.

The table below contains the raw data from the 40, experiment to compare the serum
determination methods.
'etho2
$u1Bect ! 2 ) C
! )?< C)= )F! =<2
2 !<)= !!=2 !<<2 !2)<
) ?)2 A=< =F! D<C
C =D! A<) =D) AF<
= C?) =2< CA! =<2
? !!)! !)C< !!CC !)<<

!!
.isualiing the need for ,loc+ing
Select -it 4 1y & from the Analy5e menu and place Serum Mevel in the I, 4esponse box
and $ethod in the K, 1actor box. The resulting comparative plot is shown below. :oes
there appear to be any differences in the serum levels obtained from the four methods#
This plot completely ignores the fact that the same six blood samples were used for each
method. *e can incorporate this fact visually by selecting Oneway Analysis C
'atchin+ Column... C then highlight (atient in the list. This will have the following
effect on the plot.
6ow we can clearly see that ignoring the fact the same six blood samples were used for
each method is a big mista+eE
8n the next page we will show how to correctly analye these data.
!2
0orrect "nalysis of 40, :esign :ata in >$(
1irst select -it 4 1y & from the Analy5e menu and place Serum Mevel in the I,
4esponse box, $ethod in the K, 1actor box, and (atient in the ,loc+ box. The results
from >$( are shown below.
6otice the I axis is NSerum Mevel R ,loc+ 0enteredO. This means that what we are
seeing in the display is the differences in the serum level readings ad&usting for the fact
that the readings for each method came from the same ? patients. Bxamining the data in
this way we can clearly see that the methods differ in the serum level reported when
measuring blood samples from the same patient.
The results of the "68." clearly show we have strong evidence that the four methods
do not give the same readings when measuring the same blood sample (p P .<<<!).
The tables below give the bloc+ corrected mean for each method and the bloc+ means
used to ma+e the ad&ustment.
"s was the case with one3way "68." (completely randomied) we may still wish to
determine which methods have significantly different mean serum level readings when
measuring the same blood sample. *e can again use multiple comparison procedures.
!)
Select Compare 'eans... C All (airs, Tu6ey<s :$D.
*e can see that methods C J 2 differ significantly from methods ! J ) but not each
other. The same can be said for methods ! J ) when compared to methods C J 2. The
confidence intervals 'uantify the sie of the difference we can expect on average when
measuring the same blood samples. 1or example, we see that method C will give
between F<.2D and 2==.<= higher serum levels than method ) on average when
measuring the same blood sample. 8ther comparisons can be interpreted in similar
fashion.
!C

You might also like