03 Anova

Experiments with a Single Factor
The Analysis of Variance
Marco E. Sanjun, Ph.D.
What If There Are More Than Two

Factor Levels?
{ The t-test does not directly apply

{ There are lots of practical situations where there
are either more than two levels of interest, or
there are several factors of simultaneous interest
{ The Analysis of Variance (ANOVA) is the
appropriate analysis engine for these types of
experiments Chapter 3, textbook
{ The ANOVA was developed by Fisher in the early
1920s, and initially applied to agricultural
experiments
{ Used extensively today for industrial experiments
1
Single Factor ANOVA
Example { Hence, this is a single-
{ To find the effect of factor experiment with
percentage of cotton a=5 levels and n=5
fiber on cloth strength replicates
z Engineer knows that %
Cotton
should be between 10 Weight Experimental Run Number
and 40 Percent
z Engineer decides to test 15 1 2 3 4 5
specimens at five levels: 20 6 7 8 9 10
25 11 12 13 14 15
15%, 20%, 25%, 30%, 30 16 17 18 19 20
and 35% 35 21 22 23 24 25
z He/She also decides to
take 5 samples at each
level of cotton content
Single Factor ANOVA

Test Run
Cotton
Weight
An Idea
Sequence Number
1 8
Porcentage
20
To conduct 5C2=10 pair wise
2
3
18
10
30
20
comparisons to test:
4 23 35
5
6
17
5
30
15
H 0 = 1 = 2 = 3 = 4 = 5
7 14 25
8 6 20
9
10
15
20
25
30
Problem
11
12
9
4
20
15
Distortion of Type I error. If every
13
14
12
7
25
20 test has =0.05, and it can be
15
16
1
24
15
35 assumed that the tests are
17 21 35
18 11 25 independent, then the total
19 2 15
20 13 25 probability of accepting H0 would
21 22 35
22
23
16
25
30
35
be (1-)10=0.9510=0.60
24 19 30
25 3 15
2
An Example (See pg. 62)
{ Does changing
the cotton weight
percent change
the mean tensile
strength?
{ Is there an
optimum level for
cotton content?
The Analysis of Variance (Sec. 3-3, pg. 65)
{ In general, there will be a levels of the factor, or a

treatments, and n replicates of the experiment, run in
random order a completely randomized design (CRD)
{ N = an total runs
{ We consider the fixed effects casethe random effects
case will be discussed later
{ Objective is to test hypotheses about the equality of the a
treatment means
3
{ The name analysis of variance stems from a

partitioning of the total variability in the response
variable into components that are consistent with a
model for the experiment
{ The basic single-factor ANOVA model is
i = 1,2,..., a
yij = + i + ij ,
j = 1,2,..., n
= an overall mean, i = ith treatment effect,

ij = experimental error, NID(0, 2 )
Models for the Data
There are several ways to write a

model for the data:
yij = + i + ij is called the effects model

Let i = + i , then
yij = i + ij is called the means model
Regression models can also be employed
4
{ Total variability is measured by the total sum of
squares: a n
SST = ( yij y.. ) 2
i =1 j =1
{ The basic ANOVA partitioning is:
a n a n
( yij y.. )2 = [( yi. y.. ) + ( yij yi. )]2

i =1 j =1 i =1 j =1
a a n
= n ( yi. y.. ) 2 + ( yij yi. ) 2
i =1 i =1 j =1
SST = SSTreatments + SS E
SST = SSTreatments + SS E
{ A large value of SSTreatments reflects large differences in
treatment means
{ A small value of SSTreatments likely indicates no
differences in treatment means
{ Formal statistical hypotheses are:
H 0 : 1 = 2 = L = a
H1 : At least one mean is different
5
{ While sums of squares cannot be directly compared
to test the hypothesis of equal means, mean
squares can be compared.
{ A mean square is a sum of squares divided by its
degrees of freedom:
dfTotal = dfTreatments + df Error
an 1 = a 1 + a ( n 1)
SS SS E
MSTreatments = Treatments , MS E =
a 1 a ( n 1)
{ If the treatment means are equal, the treatment and
error mean squares will be (theoretically) equal.
{ If treatment means differ, the treatment mean
square will be larger than the error mean square.
11
The Analysis of Variance is

Summarized in a Table
{ The reference distribution for F0 is the Fa-1, a(n-1)

distribution
{ Reject the null hypothesis (equal treatment means) if
F0 > F ,a 1,a ( n 1)
6
Single Factor Analysis
Note that the treatment effect of If treatment levels are
five preselected chosen as a random sample
levels is being studied here. from a larger population,
Hence, the results then i is a random variable.
can not be extended beyond this
Here the knowledge about
five levels.
particular is that are
This is called investigated is useless.
Fixed Effects Model Instead, we test hypothesis
about the variability about ti
and try to estimate this
variability. This is called
Random Effects Model.
Single Factor ANOVA

{ Analysis of the Fixed Effects Model
i=deviation from overall mean for ith treatment
a
Hence, i =1
i =0 An equivalent Hypotheses are:
Also, we have that, H 0 : 1 = 2 = ... = a = 0
n
yi. = yij
y H1 : i 0 For at least one i
yi . = i .
j =1 n
a n
y y.. Decomposition of the Total Sum
y.. = yij y.. = .. =
i =1 j =1 an N of Squares
( )
a n 2
Hypotheses to be tested are: SST = yij y..
H 0 : 1 = 2 = ... = a i =1 j =1
( )
a a n 2
H1 : i j For at least one pair (i,j) = n ( yi. y.. ) 2 + yij yi.
i =1 i =1 j =1
Where i=+i
= SSTreatments + SS E
7
Single Factor ANOVA
{ Analysis of the Fixed Effects Model(Contd.)
We have seen that It can be shown easily that
SST = SS treatment + SS E
E [ MS E ] = 2
Define Mean Squares as a

n i2
SS
MStreatment = treatment Where E [ MS treatment ] = +
2 i =1
a 1 a 1
a-1=treatment dof
SS E
MS E = N-a= error
N a That is under null hypothesis,
MS treatment = MSE
Total dof = N-1, then But under H1, MStreatment
Error dof = Total dof - Treatment dof is higher than MSE
= N-1-(a-1)=N-a
Single Factor ANOVA
The Test Statistic Other formulas for computing are

MStreatment
a n
y..2
F0 = ~ Fa 1, N a SST = yij2
N
MS E i =1 j =1
1 a 2 y..2
Null hypothesis H0 is rejected if SStreatment = yi . N
n i =1
F0 > F ,a 1, N a SS E = SST SStreatment
Sum of Degrees of Mean

Source of Variation F0
Squares Freedom Square
MS treatment
Between Treatments SStreatments a-1 MStreatments F 0 =
MS E
Error (within treatments) SSE N-a MSE
Total SST N-1
8
Single Factor ANOVA
{ Estimation of Model Parameters and C.I.

The single factor model If we assume that errors are
Normally Distributed, then
yij = + i + ij
yi . ~ NID ( i , )
2
n
An estimate of the overall
mean and the treatment Thus, if 2 were known, then C.I.
effect can be given as Could be obtained from N.D.
= y..
i = 1,2,..., a Using MSE as an estimator of 2,we
i = yi . y.. would base the C.I. on t-distribution.
That is, a 100(1-)% C.I. On the ith
Now let i = + i
treatment with mean i is:
Then, an estimate of i would be
MS E
i = + i = yi. yi. t , N a
2 n
Single Factor ANOVA
A 100(1-)% C.I. On the { Unbalanced

difference in any two Data
If the number of observations taken
treatments means, say within each treatment may be different,
i-j, would be : then the design is unbalanced.
2 MS E Let, ni is the number of observations

yi. y j . t
2
, N a
n within treatment i, then we have that :
a
N = Ni
i =1
ni
a
y..2
SST = yij2
i =1 j =1 N
a
yi2. y..2
SS treatment =
1 ni N
9
ANOVA Computer Output
(Design-Expert)
Response:Strength
ANOVA for Selected Factorial Model
Analysis of variance table [Partial sum of squares]
Sum of Mean F
Source Squares DF Square Value Prob > F
Model 475.76 4 118.94 14.76 < 0.0001
A 475.76 4 118.94 14.76 < 0.0001
Pure Error161.20 20 8.06
Cor Total636.96 24
Std. Dev. 2.84 R-Squared 0.7469

Mean 15.04 Adj R-Squared 0.6963
C.V. 18.88 Pred R-Squared 0.6046
PRESS 251.88 Adeq Precision 9.294
The Reference Distribution:
10
Graphical View of the Results
DE S IG N-E X P E RT P l o t One Factor Plot
S tre n g th 25
X = A : Co tto n We i g h t %
De si g n P o i n ts
20.5
2 2
Strength 2 2
16
2
11.5
2
7 2
15 20 25 30 35
21
A: C otton Weight %
Single Factor ANOVA

{ Model Adequacy Checking
Partitioning of the total variance to test formally for no differences
in treatment means requires that the following assumptions are satisfied:
- That the observations are adequately described by the model
yij = + i + ij
- That the errors are NID(0, 2 )
Violations of the above assumption and model accuracy can be easily
investigated by examination of residuals.
Residual for observation j in treatment i eij = yij y ij
where y ij is an estimate of yij and is obtained as
y ij = + i
= y + ( yi y ) = yi
Residuals should be Structureless.
11
Single Factor ANOVA
{ The Normality Assumption

An useful procedure for checking normality
assumption is to construct normal probability plot
of residuals.
- Calculate the residuals
- Rank them in ascending order
- Calculate cumulative probability Pk=(k-0.5)/n
- Plot residuals vs. Pkx100
If the plot resemble a straight line, then the
normality is not violated.
Points falling far above the line are often called
outliers. Presence of one or more outliers can
seriously distort the ANOVA. Potential outliers call
for serious investigation.
Single Factor ANOVA
A rough check for outlier can be done using

standardized residuals
eij
d ij = ~ approximat ely _ N (0,1)
M SE
if
ij ~ N (0, 2 )
Then most dij should fall within 3. Those that are
outside are potential outliers.
12
99
95
N orm al % probability
90
80
70
50
30
20
10
-3.8 -1.55 0.7 2.95 5.2
R es idual
Single Factor ANOVA
{ Plot of residuals in time sequence
ij
Time
Such plot is helpful in determining correlation between

residuals
Evidence of runs of +ve or -ve residuals indicates
positive correlation which implies that independence
assumption is violated
Increase in the spread of the residuals indicates
increase in variance over time. This violates constant
variance which is a potentially serious problem.
13
Residuals vs. Run
5.2
2.95
R es iduals
0.7
-1.55
-3.8
1 4 7 10 13 16 19 22 25
R un N um ber
Single Factor ANOVA

{ Statistical Tests for equality of variance
Statistical test that can be used in addition to residual
plots are as follows.
H0: 12=22==a2
H1: above not true for at least one i2
Bartletts test statistic 0 ~ a 1
2 2
when random samples
are drawn from normal distribution. The statistic is
q
02 = 2.3026
c
where a
q = ( N a) log10 S p2 (ni 1) log10 Si2
i =1
1 a
c = 1+ (ni 1) 1 ( N a) 1
3(a 1) i =1
a
(n 1)S
i i
2
S =
2
p
i =1
N a
14
Single Factor ANOVA
And Si2 is the sample variance of the ith population.
q is large when Si2 differ greatly and equal to zero

when Si2s are equal.
Hence, reject H0 if
02 > 2 ,a 1
Single Factor ANOVA
If the assumption of variance homogeneity is

violated:
- F test is slightly affected for balanced fixed effect
model
- The problem is more serious for unbalanced
designs.
The usual approach to deal with this is to apply
Variance Stabilizing Transform.
Then the conclusions apply to the transformed
population.
Some variance stabilizing transforms are:
yij* = yij ; yij* = 1 + yij for poisson data
yij* = log( yij )
for lognormal data

15
Single Factor ANOVA
{ Comparison among Treatment Means
After H 0 : 1 = 2 = ... = a is rejected, we need to find exactly which
means differ. For this, comparisons of groups of treatment means may
be useful.
Multiple Comparison methods: Contrasts
Some sample multiple comparison hypotheses are: H 0 : 4 = 5
H1 : 4 5
The above hypothesis could be tested investigating an appropriate linear
combination of treatment totals, say y4. y5. = 0
Similarly, H 0 : 1 + 3 = 4 + 5 which implies testing with
H1 : 1 + 3 4 + 5 y1. + y3. y4. y5. = 0
In general, multiple comparison will imply a linear combination a of
treatment totals such as C = ci yi. with the restriction that ci = 0
a
i =1 i =1
C is called a contrast
Single Factor ANOVA

2
a
SS of a contrast C is calculated as ci yi.
SSC = i =1 a
which has a single degree of freedom. n ci2
i =1
For unbalanced design we have 2

a
a ci yi.
n c
i =1
i i =0 and
SSC = i =a1
ni ci
i =1
2
contrast is tested as SSC

~ F1, N a
MS E
16
Single Factor ANOVA
{ Orthogonal Contrasts
Two contrasts with coefficients {ci} and {di} are
orthogonal if
a
c d i i =0 for balanced design a

i =1
for unbalanced design n c d
i =1
i i i =0
For a treatments: there are a-1 orthogonal contrasts.

There are many ways to choose orthogonal contrasts.
For example, for 3 treatments (a=3)
Orthogonal Contrasts
Treatment Coefficient
1 (control) -2 0
2 (level 1) '1 1
3 (level 2) 1 -1
Single Factor ANOVA
{ Scheff`s method for computing all contrasts

Scheff (1953) has proposed a method for computing any and all
contrasts between treatment means. In this method type I
error is at most for any of the possible comparisons.
Let us suppose we want to test m contrasts in treatments means.
Then we can write
Tu = c1u 1 + c2u 2 + ... + cau a ; u = 1,2,..., m
Then the corresponding contrast in the treatment
a
average is
Cu = c1u y1. + c2u y2. + ... + cau ya. = ciu yi.
i =1
And the standard error of this contrast is
( )
a
SCu = MS E ciu2 ni
i =1
Critical values against which Cu should be compared is
Reject Tu=0 if Cu > S ,u S ,u = SCu (a 1) F ,a 1, N a
17
Single Factor ANOVA
{ Comparing pairs of treatment means
If we want to compare all pairs (ac2) of treatment means
H 0 : i = j for all i j
H1 : i j for at least one i j
Method 1. LSD Method (Least squares difference)
yi . y j .
t0 = ~ t N a
1 1
MS E +
reject H0 if n n
1 1 i j
yi. y j . > t , N a MS E +
n n
2
i j
Method 2. Duncan`s Multiple Range Test

Useful test to compare all pairs of means.
Method 3. Turkey`s Test
Design-Expert Output
Treatment Means (Adjusted, If Necessary)
Estimated Standard
Mean Error
1-15 9.80 1.27
2-20 15.40 1.27
3-25 17.60 1.27
4-30 21.60 1.27
5-35 10.80 1.27
Mean Standard t for H0

Treatment Difference DF Error Coeff=0 Prob > |t|
1 vs 2 -5.60 1 1.80 -3.12 0.0054
1 vs 3 -7.80 1 1.80 -4.34 0.0003
1 vs 4 -11.80 1 1.80 -6.57 < 0.0001
1 vs 5 -1.00 1 1.80 -0.56 0.5838
2 vs 3 -2.20 1 1.80 -1.23 0.2347
2 vs 4 -6.20 1 1.80 -3.45 0.0025
2 vs 5 4.60 1 1.80 2.56 0.0186
3 vs 4 -4.00 1 1.80 -2.23 0.0375
3 vs 5 6.80 1 1.80 3.79 0.0012
4 vs 5 10.80 1 1.80 6.01 < 0.0001
18
Graphical Comparison of Means
Text, pg. 89
For the Case of Quantitative Factors, a Regression

Model is often Useful
Response:Strength
ANOVA for Response Surface Cubic Model
Analysis of variance table [Partial sum of squares]
Sum of Mean F
Source Squares DF Square Value Prob > F
Model 441.81 3 147.27 15.85 < 0.0001
A 90.84 1 90.84 9.78 0.0051
A2 343.21 1 343.21 36.93 < 0.0001
A3 64.98 1 64.98 6.99 0.0152
Residual 195.15 21 9.29
Lack of Fit 33.95 1 33.95 4.21 0.0535
Pure Error 161.20 20 8.06
Cor Total 636.96 24
Coefficient Standard 95% CI 95% CI

Factor Estimate DF Error Low High VIF
Intercept 19.47 1 0.95 17.49 21.44
A-Cotton % 8.10 1 2.59 2.71 13.49 9.03
A2 -8.86 1 1.46 -11.89 -5.83 1.00
A3 -7.60 1 2.87 -13.58 -1.62 9.03
19
The Regression Model
25
%
Final Equation in
Terms of Actual 20.5
Factors: 2 2
2 2
Strength
Strength = +62.61143 16
-9.01143* Cotton
Weight % +0.48143 *
Cotton Weight %^2 -
7.60000E-003 * 11.5
2
Cotton Weight %^3 2
This is an empirical
model of the 7 2
experimental results
15.00 20.00 25.00 30.00 35.00
39
A: C otton Weight %
Sample Size Determination

{ FAQ in designed experiments
{ Answer depends on lots of things; including
what type of experiment is being
contemplated, how it will be conducted,
resources, and desired sensitivity
{ Sensitivity refers to the difference in
means that the experimenter wishes to
detect
{ Generally, increasing the number of
replications increases the sensitivity or it
makes it easier to detect small differences in
means
20
{ Recall the Hypothesis H 0 : 1 = 2 = ... = a
H1 : i j for at least one (i, j )
{ Choice of Sample Size
Operating Characteristics (OC) Curves can be used to
guide the experimenter in selecting the number of
replicates so that the design will be sensitive to
important potential differences in the treatments.
I.e. limit the Type II error to an acceptable value.
= 1 P{reject H 0 H 0 is false}
= 1 P{F0 > F ,a 1, N a H 0 is false}
Note that, if H0 is false, then

MStreat
F0 = ~ non-central Fa-1,N-a with non-centrality
MS E
parameter
OC curves are available which depicts the relationship

between and type II error, where is calculated as
a
n i2
2 = i =1
a 2
Where i is estimated as follows:
{ Let 1,2,,a are the values of the treatment
means for which we would like to reject H0. Then:
1
i = i i for all i
a i
{ 2 needed for calculating 2 may either be known
from previous experiment or be estimated using
judgement.
{ OC curves are available for =0.05 and =0.01 in
Appendix Table V.
21
{ Steps in Calculating Sample Size
1. Select acceptable values of and errors
2. Choose the actual values 1,2,,a for which you
would reject H0.
3. Calculate i for all i.
4. Choose a value for 2 (from previous experiment or
using judgement).
5. Select any value of n
- Calculate
- Check the OC curve (corresponding to selected value of
, and
1=a-1,2=an-a) for the value of error with respect to
the value.
6. If acceptable value of , then stop. Your
sample(replication) size is n. Else, nn+1 and go to
step 5.
{ Since is difficult to come up with values of 1,2,,a

, an alternate approach is to select the minimum
difference(D) between two means that should be
detected. Then we have:
nD 2
2 =
2a 2
{ Since this gives minimum values of , the
corresponding sample size(n) obtained is a
conservative one(why?). That is the resulting error
is its upper bound and the corresponding power of
test (1-) is at least as great as specified by the
experimenter.
22
Non-Parametric Methods
{ The Kruskal-Wallis Test. 1952.
{ Rank observations yij in ascending order,
and replace them by their ranks Rij.
{ Calculate the Test Statistic:
1 a R i2. N(N + 1)
2
H=
S 2 i=1 ni 4
1 a ni N(N + 1)
2
Where R ij
2
{ S =
N 1 i=1 j=1 4
{ Reject null hypothesis if: H > 2 ,a1
23

03 Anova

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

03 Anova

Uploaded by

Copyright:

Available Formats

Experiments with a Single Factor

The Analysis of Variance

Marco E. Sanjun, Ph.D.

What If There Are More Than Two

{ The t-test does not directly apply

Single Factor ANOVA

The Analysis of Variance (Sec. 3-3, pg. 65)

{ In general, there will be a levels of the factor, or a

{ The name analysis of variance stems from a

= an overall mean, i = ith treatment effect,

Models for the Data

There are several ways to write a

yij = + i + ij is called the effects model

( yij y.. )2 = [( yi. y.. ) + ( yij yi. )]2

The Analysis of Variance

The Analysis of Variance is

{ The reference distribution for F0 is the Fa-1, a(n-1)

Single Factor ANOVA

Define Mean Squares as a

Single Factor ANOVA

The Test Statistic Other formulas for computing are

Sum of Degrees of Mean

{ Estimation of Model Parameters and C.I.

Single Factor ANOVA

A 100(1-)% C.I. On the { Unbalanced

2 MS E Let, ni is the number of observations

Std. Dev. 2.84 R-Squared 0.7469

The Reference Distribution:

Single Factor ANOVA

{ The Normality Assumption

Single Factor ANOVA

A rough check for outlier can be done using

-3.8 -1.55 0.7 2.95 5.2

Single Factor ANOVA

{ Plot of residuals in time sequence

Such plot is helpful in determining correlation between

Single Factor ANOVA

And Si2 is the sample variance of the ith population.

q is large when Si2 differ greatly and equal to zero

Single Factor ANOVA

If the assumption of variance homogeneity is

Single Factor ANOVA

For unbalanced design we have 2

contrast is tested as SSC

c d i i =0 for balanced design a

For a treatments: there are a-1 orthogonal contrasts.

Single Factor ANOVA

{ Scheff`s method for computing all contrasts

Reject Tu=0 if Cu > S ,u S ,u = SCu (a 1) F ,a 1, N a

Method 2. Duncan`s Multiple Range Test

Method 3. Turkey`s Test

Mean Standard t for H0

For the Case of Quantitative Factors, a Regression

Coefficient Standard 95% CI 95% CI

Cotton Weight %^3 2

Sample Size Determination

Note that, if H0 is false, then

OC curves are available which depicts the relationship

{ Since is difficult to come up with values of 1,2,,a

{ Reject null hypothesis if: H > 2 ,a1

You might also like