You are on page 1of 5

Peter Major

ECON1320: Lecture 3 Analysis of Variance (ANOVA) Chapter 11.111.4


Key Concepts/deliverables:
1. Basic Concepts of Experimental Design
2. Hypothesis test using completely randomised design (one-way ANOVA)
3. Tukeys HSD test and Tukey-Kramer procedure
4. Hypothesis test using the randomised block design (two-way ANOVA without replication)
1. Experimental Design
Researchers conduct experiments to test a hypothesis by controlling one or more variables.
Throughout this summary we will consider one of the first experiments most of us did at primary school
growing beans stalks from a packet of seeds.
Experiments contain two types of variables:






Consider our bean growing example. Some variables may be grouped as follows:
Independent: Amount of water, type of fertiliser, amount of fertiliser, quantity of soil, amount of
lumens (light) from the sun, size of seed, weight of seed
- Treatment: Think what can we change/control when we do the experiment?
o Amount of water, type of fertiliser, amount of fertiliser, quantity of soil
- Classification variables: What was pre-determined prior to the experiment?
o Lumens, size of seed, weight of seed
Each independent variable has two or more levels or classifications that can be categorical or
numerical.
What we mean by this is that each variable can take more than one value! (it wouldnt be a
variable if it couldnt!!!)
In our example consider the variables: Amount of water, type of fertiliser, and lumens
Amount of water: 0 50mL, 50 100mL, 100 150mL (Numerical)
Type of fertiliser: phosphate fertiliser, non-phosphate fertiliser, manure (Categorical)
Lumens: 200-400 lumens, 400-600 lumens, 600-800 lumens (Numerical)

Independent Variables
("Factors")
Treatment variable Classification variable
Dependent Variables
("Reponse variables")
- Controlled/varied
by the researcher
- Characteristic of the
experimental subjects
present prior to the
experiment and out of
the experimenters
control. They cannot
change it!!!
Peter Major
2. Hypothesis test using completely randomised design (one-way ANOVA)
The completely randomised design is an experimental design where there is one treatment variable
(sometimes called independent variable) with two or more treatment levels, and one dependent
variable. This design is analysed by analysis of variance.
For example, a completely randomised version of our plant growing experiment would be the following:
1. Dependent variable: Growth rate of bean seeds (what we are interested in finding out)
2. Independent variable: Type of fertiliser (Phosphate fertiliser, non-phosphate fertiliser, manure)

In this case we have ONE treatment or independent variable (type of fertiliser) that can take three
treatment levels.

A VIOLATION would be:
1. Dependent variable: Growth rate (what we are interested in finding out)
2. Independent variables: Type of fertiliser, amount of water
This second example is NOT a completely randomised design as there are TWO treatment variables:
Type of fertiliser AND amount of water.
ANOVA: We analyse completely randomised design using ANOVA (ANalysis Of Variance)
ANOVA explores the possible reasons for the variable between the results of a study and breaks it into
possible causes:
1. Variation due to experimental error: Variation within groups
2. Variation due to treatment effects: Variation between/among groups
In our plant growing example:
1. Dependent variable: Growth rate of bean seeds (what we are interested in finding out)
2. Independent variable: Type of fertiliser (Phosphate fertiliser, non-phosphate fertiliser, manure)
Lets say that we planted 5 bean seeds with each treatment level (5 bean seeds with phosphate, 5 with
non-phosphate and 5 with manure) and measured their growth after 14 days.
When we measure the growth of our plants after a week, its highly unlikely that any of them will have
grown EXACTLY the same amount! Breaking apart the reasons why this is so into two groups, we have:
1. Variation due to experimental error: We cant measure exactly the same quantities of
fertiliser; we cant measure the bean stalks height exactly!
2. Variation due to treatment effects: This is what were really interested in! Different growth
amounts due to the different types of fertiliser!
ANOVA assumptions:
1. Samples should be independently selected and randomly assigned to the levels of the treatment
factor
- Can draw a random sample or assign treatments to ensure this.
2. The variable level of interest for each population has a normal distribution
- Can now test this (Chi-square goodness-of-fit test)
3. The variance associated with each variable level in the population is the same (equal)
(homogeneity of variance).
- Can now test this (F-test)


Peter Major
Plant example and ANOVA:












Where: C = number of treatment levels (3 in our case)
N = number of trials (15 in our case (5 per fertiliser type))
SSC = variation among (or between) groups (variation between: phosphate,
non-phosphate and manure).
SSE = variation within groups (variation within groups: due to experimental
error)
Intuitively:

Hypothesis Test:
Step 1: State Hypothesis
= = = (no treatment effect is present different fertilisers have no effect)
(treatment effect present fertilisers influence growth)
Step 2: Decision Rule (always upper-tail F-test)
Reject H
0
if Fcalc > F
(C-1,N-C)
= F
0.05 (2,12)
= 3.89
Step 3: Calculate Test Statistic
F
calc
= MSC/MSE = 1.73
Step 4: Decision: Do not reject H
0
as F
calc
< F
crit

Step 5: Conclusion: There is insufficient evidence at the 5% level of significance to conclude that at
least one of the average contributions made by fertiliser types is different.
0
2
4
6
8
10
12
0 1 2 3
Plant Fertilser
mean growth
Small variation
amount treatment
means: Small SSC
Phosphate Non-phosphate Manure
R
e
s
u
l
t
s

(
c
m
)

5 4.5 6
6 5 5.5
5 6 4
5.5 3 5.5
4 3.5 7
Anova: Completely Randomised Design
SUMMARY

Groups Count Sum Average Variance
Phosphate 5 25.5 5.1 0.55
Non-phosphate 5 22 4.4 1.425
Manure 5 28 5.6 1.175

ANOVA

Source of
Variation df SS MS F
Among Groups 2 (C-1) 3.64 (SSC) 1.816667 (MSC) 1.730159 (MSC/MSE)
Within Groups 12 (N-C) 12.6 (SSE) 1.05 (MSE)


Total 14 (N-1) 16.23 (SST)
Mean Flame height
with different fuel
types
Large variation
amount treatment
means: Large SSC
Peter Major
3. Tukeys HSD test and Tukey-Kramer Procedure:
If we REJECT H
0
we know that at least one of the means is different! The one-way ANOVA F-test only
tells us if ALL the means are equal or not. It does not tell us which ones are different!!!
Note: if we Do not reject then there is NO NEED to perform either of these tests.
There are two tests that can allow us to find out which is different: Tukeys HSD and the Tukey-Kramer
Procedure.
We must as ourselves a question to use the appropriate test:

In our example: We found that at the 5% LOS, all the fertilisers were the same. Hypothetically though,
if we rejected H
0
(i.e. at

least one mean is different), what test would we use?
Question: Were the samples sizes in each treatment level equal?
Answer: YES! We had a sample size of 5 for each treatment type (5 phosphate, 5 non-phosphate and
5 manure). Therefore we should use the Tukeys HSD test to find out which was different (if we had of
rejected the null hypothesis).

Tukeys HSD test:
1. Compute all possible pairs of absolute differences
2. For each pair, compute the critical range (will be the same for each):

=
o

3. A given pair is significantly different at , if the absolute difference from part 1 exceeds the critical
range in part 2.
Tukey-Kramer Procedure:
1. Compute all possible pairs of absolute differences (Same as Tukeys HSD)
2. For each pair, compute the critical range (will be the same for each):

|
|
.
|

\
|
+ =
o
(only different step!)
3. A given pair is significantly different at , if the absolute difference from part 1 exceeds the critical
range in part 2. (Same as Tukeys HSD)




Are the sample sizes
in each level equal?
Yes: Tukey's HSD
No: Tukey-Kramer
Procedure
Peter Major
4. The Randomised Block-Design
The randomised block-design is another experimental design. It is similar to the completely
randomised design which was the experimental design at in part 2. The key difference is that:
1. Part 2: Completely randomised design: One treatment or independent variable
2. Part 4: Randomised Block-Design: One treatment variable of interest, but ALSO a second
independent variable (called the blocking variable).
Thinking back to our plant example:
1. Dependent variable: Growth rate of bean seeds (what we are interested in finding out)
2. Independent variable: Type of fertiliser (phosphate fertiliser, non-phosphate fertiliser, manure)
We also know that other variables affect the growth rate of the bean seed, but we only want to see the
effect of our type of fertiliser! To isolate the effect fertiliser has we can use blocking variables to block
the effect of other independent variables. In this case, we may want to block the effect of different
amounts of sunlight reaching pot (we cant put all the pots exactly on the same spot of ground, so
some may receive more sun than others we can remove the growth from this effect with blocking
variables).
The ANOVA table for the two-factor design has an additional row for the blocking effect:
Plant Growth (cm) Phosphate Non-Phosphate Manure
lumens (1000-1200) 9 7.65 7.497
lumens (800-1000) 8 6.8 6.664
lumens (600-800) 7.6 6.46 6.3308
lumens (400-600) 8 6.8 6.664
lumens (200-400) 4 3.4 3.332
ANOVA

Source of Variation SS df MS F
Columns (Treatment) 4.5257481 (SSC) 2 (C-1) 2.262874056 (MSC) 72.17457 (MSC/MSE)
Rows (Blocking) 35.627722 (SSR) 4 (n-1) 8.906930389 (MSR) 284.0873 (MSR/MSE)
Error 0.2508223 (SSE) 8 (C-1)(n-1) 0.031352789 (MSE)

Total 40.404292 (SST) 14 (N-1)
Where:
SST = total sum of squares
SSC = sum of squares columns (treatment)
SSR = sum of squares rows (blocking)
SSE = sum of squares error
Hypothesis Test: Two do two hypothesis tests simultaneously!!! One for treatment and one for blocking!
Step 1: State Hypothesis
= = = (no treatment effect is present)
(treatment effect)
Step 2: Decision Rule (always upper-tail F-test)
Reject H
0
if F
calc
> F
(C-1,(C-1)(n-1))
= F
0.05 (2,8)
= 4.46
Step 3: Calculate Test Statistic
F
calc
= MSC/MSE = 72.17
Step 4: Decision: Reject H
0
as F
calc
> F
crit

Step 5: Conclusion: There is sufficient evidence at the 5% level
of significance to conclude that at least one of the average
contributions made by fertiliser types is different to the others.
Treatment type
Blocking
type
Step 1: State Hypothesis
(no blocking effects)
(blocking effects)
Step 2: Decision Rule (always upper-tail F-test)
Reject H
0
if F
calc
> F
(n-1,(C-1)(n-1))
= F
0.05 (4,8)
= 3.84
Step 3: Calculate Test Statistic
F
calc
= MSR/MSE = 284
Step 4: Decision: Reject H
0
as F
calc
> F
crit

Step 5: Conclusion: There is sufficient evidence at the 5%
level of significance to conclude that at least one of the
average contributions made by lumen exposure is different
to the others. (There are blocking effects)

You might also like