You are on page 1of 28

Analysis of Variance

Analysis of Variance (ANOVA)


Analysis of variance is a technique that allows us to compare two or
more populations.
Analysis of variance is a procedure which determines whether
differences exist between population means.

One-Way ANOVA: An example


Do readability of magazine advertisements dier from one class of
magazine to another?

To see whether and how readability of magazine advertisements


dier from one class of magazine to another, magazines are rst
grouped into three classes according to educational level of their
readers.

One-Way ANOVA: An example


Ten magazines are considered in each group and then three of them
are selected at random from each of the three groups. The magazines
thus selected are as follows.
Group 1, Highest Educational Level
1. Scientic American 2. Fortune 3. The New Yorker.
Group 2, Medium Educational Level
4. Sports Illustrated 5. Newsweek 6. People.
Group 3, Lowest Educational Level
7. National Enquirer 8. Grit 9 True Confessions.

One-Way ANOVA: An example


Then six advertisements are randomly selected from each of the above
nine selected magazines.
Readability of an advertisement is measured in terms of three variables
viz.
WORDS: number of words
SEN: number of sentences
SYL3: number of 3+ syllable words.

One-Way ANOVA: An example


Questions arise as to whether signicant dierences exist in the
above three characteristics of advertising copy among the magazines
or the groups of magazines.
Also relevant to readability are the number of words per sentence
and the proportion of 3+ syllable words.

One-Way ANOVA: An example


Here we will measure the readability of an advertisement by the
proportion of 3+ syllable words.
Thus the proportion of 3+ syllable words is the response variable here.
Here educational level is used as a grouping variable.
This grouping variable is sometimes called a factor.
This is the only factor under consideration, hence the term one-way
ANOVA.
Here there are four groups, i.e., factor levels:
Group 1: Highest Educational Level
Group 2: Medium Educational Level
Group 3: Lowest Educational Level

Strip chart of Proportion of 3+ Syllable Words

Boxplot of Proportion of 3+ Syllable Words

Plot of Means

One-Way ANOVA: An example


The null hypothesis in this case is:
0: 1 = 2 = 3
i.e., there are no differences between population means.
Our alternative hypothesis becomes:
1: at least two means differ.
Now we need some method to check the above hypothesis.

One-Way ANOVA: Theory


Independent samples are drawn from k populations:

These populations are referred to as treatments.

Assumptions
1.The populations are normally distributed.
2.The population standard deviations are unknown but assumed equal.

3.Samples are selected independently from each population.

Hypothesis
The null hypothesis in this case is:
0: 1 = 2 = =
i.e., there are no differences between population means.
Our alternative hypothesis becomes:
1: at least two means differ.

Test Statistic
It is obvious that a statistic that measures the proximity of the sample
means to each other would also be of interest.
Such a statistic exists, and is called the between-treatments variation.
It is given by

=1

where is the grand mean.


A large SST indicates large variation between sample means which
supports H1.

Test Statistic
SST gave us the between-treatments variation.
A second statistic, SSE (Sum of Squares for Error) measures the
within-treatments variation.
SSE is given by


=1 =1

1 2 .

=
=1

Test Statistic
The mean square for treatments (MST) is given by

=
.
1
The mean square for errors (MSE) is given by

=
.

The test statistic is given by

=
,

which has a -distribution with k1 and nk degrees of freedom.

Test Statistic
The -value is given by
p-value= (1, > ),
where is the observed test-statistic value.
Reject 0 in favor of 1 if p-value is small.

ANOVA Table
Source of
Variation

Degrees of
Freedom

Sum of Squares

Mean Sum of
Squares

Treatments

Residuals

F-value
=

Reject 0 in favour of 1 if -value is small.

p-value

(1, > )

ANOVA Table for Example


Source of
Variation

Degrees of
Freedom

Sum of
Squares

Mean Sum of
Squares

F-value

p-value

Groups

0.038

0.019

6.461

0.003

Error

51

0.150

0.003

Looking at the small -value, it seems that there is enough evidence to reject the null
hypothesis. Thus there is indeed significant difference in readability of magazine
advertisements from one class of magazines to another.

Checking the Assumptions

Checking the Assumptions

Checking the Assumptions


Here we use Shapiro-Wilk test to check whether the residuals are
normally distributed.
The observed -value is 0.4.
Thus there is not enough evidence against the null hypothesis that
the residuals are normally distributed.

Checking the Assumptions


Here we use Bartletts test for checking the homogeneity of variances.
The observed -value is 0.1.
Thus there is not enough evidence to reject the null hypothesis that
the population variances are all equal.

Multiple Comparison
When the one-way ANOVA finds significant differences between the
population means, it is natural to ask which means differ.
Here we will use the following two techniques for performing this
follow-up analysis:
1. Bonferroni's method
2. Tukeys Method

Bonferroni's method
Pairwise comparisons using t tests with pooled SD
Group 1

Group 2

Group 2

0.0281

Group 3

1.0000

0.0039

Tukeys Method
Difference in Means Lower Limit

Upper Limit

Adjuted -value

2-1

0.049

0.092

0.005

0.024

3-1

0.013

0.031

0.056

0.761

3-2

0.061

0.017

0.105

0.004

Tukeys HSD

You might also like