You are on page 1of 10

Computing an ANOVA table

4 specimens of each brands (BRAND) of a synthetic wood veneer material are subjected to a friction
test. A measure of wear is determined for each specimen. All tests are made on the same machine in
completely random order. Data are stored in a SAS data set named VENEER.

Brand Wear
measurements
ACME 2.3
ACME 2.1
ACME 2.4
ACME 2.5
CHAMP 2.2
CHAMP 2.3
CHAMP 2.4
CHAMP 2.6
AJAX 2.2
AJAX 2.0
AJAX 1.9
AJAX 2.1
TUFFY 2.4
TUFFY 2.7
TUFFY 2.6
TUFFY 2.7
XTRA 2.3
XTRA 2.5
XTRA 2.3
XTRA 2.4

An appropriate analysis of variance has the basic form

Source of Variation DF
BRAND 4
Error 15
Total 19

SAS coding

data veneer;
input brand $ wear;
cards;
ACME 2.3
ACME 2.1
ACME 2.4
ACME 2.5
CHAMP 2.2
CHAMP 2.3
CHAMP 2.4
CHAMP 2.6
AJAX 2.2
AJAX 2.0
AJAX 1.9
AJAX 2.1
TUFFY 2.4
TUFFY 2.7
TUFFY 2.6
TUFFY 2.7
XTRA 2.3
XTRA 2.5
XTRA 2.3
XTRA 2.4
;
proc print data=veneer; /* to show the data*/
run;
/* the following SAS statements produce the analysis of variance*/
proc glm data=veneer;
class brand;
model wear=brand;
means brand/hovtest; /* the MEANS statement causes the treatment means to be
computed*/
/*the HOVTEST option computes statistics to test the homogeneity of variance
assumption*/
run;

THE OUTPUT:

Obs brand wear


1 ACME 2.3
2 ACME 2.1
3 ACME 2.4
4 ACME 2.5
5 CHAMP 2.2
6 CHAMP 2.3
7 CHAMP 2.4
8 CHAMP 2.6
9 AJAX 2.2
10 AJAX 2.0
11 AJAX 1.9
12 AJAX 2.1
13 TUFFY 2.4
14 TUFFY 2.7
15 TUFFY 2.6
16 TUFFY 2.7
17 XTRA 2.3
18 XTRA 2.5
19 XTRA 2.3
20 XTRA 2.4
Class Level Information

Class Levels Values

brand 5 ACME AJAX CHAMP TUFFY XTRA

Number of Observations Read 20


Number of Observations Used 20

The GLM Procedure

Dependent Variable: wear

Sum of

Source DF Squares Mean Square F Value Pr > F


Model 4 0.61700000 0.15425000 7.40 0.0017
Error 15 0.31250000 0.02083333
Corrected Total 19 0.92950000

R-Square Coeff Var Root MSE wear Mean


0.663798 6.155120 0.144338 2.345000

Source DF Type I SS Mean Square F Value Pr > F


brand 4 0.61700000 0.15425000 7.40 0.0017

Source DF Type III SS Mean Square F Value Pr > F


brand 4 0.61700000 0.15425000 7.40 0.0017

Levene's Test for Homogeneity of wear Variance


ANOVA of Squared Deviations from Group Means

Sum of Mean
Source DF Squares Square F Value Pr > F

brand 4 0.000659 0.000165 0.53 0.7149


Error 15 0.00466 0.000310

The GLM Procedure

Level of -------------wear------------
brand N Mean Std Dev

ACME 4 2.32500000 0.17078251


AJAX 4 2.05000000 0.12909944
CHAMP 4 2.37500000 0.17078251
TUFFY 4 2.60000000 0.14142136
XTRA 4 2.37500000 0.09574271

Notice that you will get the same computations from PROC GLM as from PROC ANOVA for the analysis
of variance, although they are labeled from somewhat differently. In addition to MODEL sum of squares,
PROC GLM computes two sets of SS for BRAND: Type I and Type III sum of squares; rather than the
single SS computed by the ANOVA procedure.

For the one-way classification, as well as for balanced multi-way classifications, the GLM-Type I, GLM-
Type III, and PROC ANOVA sums of squares are identical. For unbalanced multi-way data and for
multiple regression models, the Type I and Type III SS are different.
The HOVTEST output appears as “Levene’s Test for Homogeneity of WEAR Variance”. The F-value, 0.53,
tests the null hypothesis that the variances among observations within each treatment are equal. There
is clearly no evidence to suggest failure of this assumption for these data (p-value is not <0.05, failed to
reject the hypothesis of equal variances).

Computing Means, Multiple Comparisons of Means and Confidence Intervals


To obtain means and multiple comparisons of means by using MEANS statement after the MODEL
statement. Will get BRAND means and LSD comparisons of the BRAND means with the statement

means brand/lsd;

OUTPUT:
t Tests (LSD) for wear

NOTE: This test controls the Type I comparisonwise error rate, not the experimentwise
error rate.

Alpha 0.05
Error Degrees of Freedom 15
Error Mean Square 0.020833
Critical Value of t 2.13145
Least Significant Difference 0.2175

Means with the same letter are not significantly different.

t Grouping Mean N brand


A 2.6000 4 TUFFY

B 2.3750 4 XTRA
B
B 2.3750 4 CHAMP
B
B 2.3250 4 ACME

C 2.0500 4 AJAX

From the output, means and the number of observations (N) are produced for each BRAND. Under the
heading “T Grouping” are sequences of A’s , B’s and C’s. Means are joined by the same letter if they are
not significantly different, according to the t-test or equivalently if their difference is less than LSD.

The BRAND means for XTRA, CHAMP, and ACME are not significantly different and are joined by a
sequence of B’s. The means for AJAX and TUFFY are found to be significantly different from all other
means so they are labeled with a single C and A, respectively, and no other means are labeled with A’s
and C’s.

To obtain confidence intervals about means instead of comparisons of the means , we can specify the
CLM option:

means brand/lsd clm;


OUTPUT:

t Confidence Intervals for wear

Alpha 0.05
Error Degrees of Freedom 15
Error Mean Square 0.020833
Critical Value of t 2.13145
Half Width of Confidence Interval 0.153824

brand N Mean 95% Confidence Limits

TUFFY 4 2.60000 2.44618 2.75382


XTRA 4 2.37500 2.22118 2.52882
CHAMP 4 2.37500 2.22118 2.52882
ACME 4 2.32500 2.17118 2.47882
AJAX 4 2.05000 1.89618 2.20382

The CONTRAST Statement


Multiple comparison procedures are useful when there are no particular comparisons of special interest
and we want to make all comparisons among the means.

But in most situations there is something about classification criterion that suggests specific
comparisons. For example, suppose you know something about the companies that manufacture the 5
brands of synthetic wood veneer material. Let say, we know that ACME and AJAX are produced by a U.S
company named A-Line, that CHAMP is produced by U.S company named C-Line, and that TUFFY and
XTRA are produced a non-U.S companies.

Then, we would probably be interested in comparing certain groups of means with other groups of
means. These would be called planned comparisons, because they are suggested by the structure of the
classification criterion (BRAND) rather than the data.

We use contrasts to make planned comparisons. In SAS, PROC ANOVA does not have a CONTRAST
statement, but the GLM procedure does, so we must use PROC GLM to compute contrasts. We use
CONTRAST as an optional statement the same way we use a MEANS statement.

To define contrast, we should first express the comparisons as null hypotheses concerning linear
combinations of means to be tested. For the comparisons indicated above, we would have the following
null hypotheses:

(i) U.S versus non-US


1 1
H0 :   ACME   AJAX  CHAMP    TUFFY   XTRA 
3 2

(ii) A-Line versus C-Line


1
H0 :   ACME   AJAX   CHAMP
2
(iii) ACME versus AJAX
H 0 :  ACME   AJAX
(iv) TUFFY versus XTRA
H 0 : TUFFY   XTRA

The basic form of the CONTRAST statement is


CONTRAST ‘label’ effect-name effect-coefficients;
where label is a character string used for labeling outputs, effect-name is a term on the RHS of the
MODEL statement, and effect-coefficients is a list of numbers that specifies the linear combination of
parameters in the null hypothesis.

The hypothesis must be expressed as a linear combinations of the means equal to 0, that is, for example,
H 0 :  ACME   AJAX  0 . In terms of all the means, the null hypothesis is

H 0 :1*  ACME  1*  AJAX  0* CHAMP  0* TUFFY  0*  XTRA  0 .

Notice that the BRAND means are listed in alphabetical order. So we have to insert the coefficients on
the BRAND means in the list of effect coefficients in the CONTRAST statement. The coefficients for the
levels of BRAND follow the alphabetical ordering.

proc glm;
class brand;
model wear=brand;
contrast 'ACME vs AJAX' brand 1 -1 0 0 0;
run;

OUTPUT:

Contrast DF Contrast SS Mean Square F Value Pr > F

ACME vs AJAX 1 0.15125000 0.15125000 7.26 0.0166

From this output, shows a sum of squares for the contrast, and an F-value for testing
H 0 :  ACME   AJAX . The p-value tells us the means are significantly different at the 0.0166 level.

Note: Actually, we don’t have to include the trailing zeros in the CONTRAST statement. We can simply
use
contrast 'ACME vs AJAX' brand 1 -1;

By default, if we omit the trailing coefficients they are assumed to be zeros.

Following the same procedure, to test H 0 : TUFFY   XTRA , use the statement
proc glm;
class brand;
model wear=brand;
contrast 'TUFFY vs XTRA' brand 0 0 0 1 -1;
run;

OUTPUT:
Contrast DF Contrast SS Mean Square F Value Pr > F

TUFFY vs XTRA 1 0.10125000 0.10125000 4.86 0.0435

The contrast U.S versus non-U.S is a little more complicated because it involves fractions. We can use
the statement

contrast 'US vs NON-US' brand 0.3333 0.3333 0.3333 -0.5 -0.5;

It is usually easier to multiply all coefficients by the least common denominator to get rid of the
fractions. This is legitimate because the hypothesis we are testing with a CONTRAST statement is that a
linear combination is equal to 0, and multiplication by a constant does not change whether the
hypothesis is true or false.

In the case of U.S versus non-U.S,


1 1
H0 :   ACME   AJAX  CHAMP    TUFFY   XTRA  is equivalent to
3 2
H0 : 2   ACME   AJAX  CHAMP   3  TUFFY   XTRA   0 .

This tells us the appropriate CONTRAST statement is

contrast 'US vs NON-US' brand 2 2 2 -3 -3;

OUTPUT:
Contrast DF Contrast SS Mean Square F Value Pr > F

US vs NON-US 1 0.27075000 0.27075000 13.00 0.0026

The GLM procedure enables us to run as many CONTRAST statement as we want, but good statistical
practice ordinarily indicates that this number should not exceed the number of degrees of freedom for
the effect (in this case 4). Moreover, we should be aware of the inflation of the overall (experimentwise)
Type I error rate when we run several CONTRAST statement.

Run the following:


proc glm;
class brand;
model wear=brand;
contrast 'US vs NON-US' brand 2 2 2 -3 -3;
contrast 'A-L vs C-L' brand 1 1 -2 0 0;
contrast 'ACME vs AJAX' brand 1 -1 0 0 0;
contrast 'TUFFY vs XTRA' brand 0 0 0 1 -1;
run;

OUTPUT:

Contrast DF Contrast SS Mean Square F Value Pr > F

US vs NON-US 1 0.27075000 0.27075000 13.00 0.0026


A-L vs C-L 1 0.09375000 0.09375000 4.50 0.0510
ACME vs AJAX 1 0.15125000 0.15125000 7.26 0.0166
TUFFY vs XTRA 1 0.10125000 0.10125000 4.86 0.0435
Notice that the p-value for ACME versus AJAX is the same in the presence of other CONTRAST
statements as it was when run as a single contrast in previous output. Computations for one CONTRAST
statement are unaffected by the presence of other CONTRAST statements. Note that the contrasts we
mentioned earlier have a special property called orthogonality (Orthogonal Contrasts).

Linear Combinations of Model Parameters


The coefficients in a CONTRAST statement have been discussed as coefficients in a linear combination
of means. In fact, these are coefficients on the effect parameters in the MODEL statement. It is easier to
think in terms of means, but PROC GLM works in terms of model parameters . Therefore, we must be
able to translate between the two sets of parameters.
We need to know the relationship between coefficients on a linear combination of means and the
corresponding coefficients on linear combinations of model effect parameters. The coefficient of an
effect parameter in a linear combination of effect parameters is equal to the coefficient on the
corresponding mean in the linear combination of means. For example, consider the contrast A-Line vs C-
Line. The linear combination in terms of means is

2CHAMP   ACME   AJAX


 2     CHAMP       ACME       AJAX 
 2 CHAMP   ACME   AJAX

We can see that the coefficient on  CHAMP is the same as the coefficient on CHAMP ; the coefficient on
 ACME is the same as the coefficient on  ACME , and so on.

Testing Several Contrasts Simultaneously


We want to test several contrasts simultaneously. Let say we want to test for differences among the
three means for U.S BRANDs. The null hypothesis is

H 0 :  ACME   AJAX  CHAMP .

This hypothesis equation actually embodies two equations that can be expressed in several ways. One
way to express the hypothesis in terms of two equations is

H 0 :  ACME   AJAX and H 0 :  ACME  CHAMP .

The two hypotheses are equivalent because the three means are all equal if and only if the first is equal
to the second and the first is equal to the third.

Try run this


contrast 'US BRANDS' brand 1 -1 0 0 0, brand 1 0 -1 0 0;

OUTPUT:

Contrast DF Contrast SS Mean Square F Value Pr > F

US BRANDS 2 0.24500000 0.12250000 5.88 0.0130


Notice that the SS for the contrast has 2 degrees of freedom. This is because we are testing two
equations simultaneously. The F-statistics of 5.88 and associated p-value tell us the means are different
at the 0.0130 level of significance.

Another way to express the hypothesis in terms of two equations is

H 0 :  ACME   AJAX and H 0 : 2CHAMP   ACME   AJAX .


A contrast for this version of the hypothesis is

contrast 'US BRANDS' brand 1 -1 0 0 0,


brand 1 1 -2 0 0;

OUTPUT:

Contrast DF Contrast SS Mean Square F Value Pr > F

US BRANDS 2 0.24500000 0.12250000 5.88 0.0130

Estimating Linear Combinations of Parameters: The ESTIMATE Statement


The CONTRAST statement is used to construct an F-test for a hypothesis that a linear combination of
parameters is equal to 0. In many application we want to obtain an estimate of the linear combination
of parameters, along with the standard error of the estimate. The ESTIMATE statement is used in much
the same way as a CONTRAST statement.

To estimate the difference  ACME   AJAX , use the following statement:

proc glm;
class brand;
model wear=brand;
estimate 'ACME vs AJAX' brand 1 -1 0 0 0;
run;

OUTPUT:
Standard
Parameter Estimate Error t Value Pr > |t|

ACME vs AJAX 0.27500000 0.10206207 2.69 0.0166

Output shown includes the value of the estimate, a standard error, a t-statistic for testing whether the
differences is significantly different from 0, and a p-value for the t-statistics. Note the p-value (0.0166)
for the t-test is the same as for the F-test for the contrast ACME vs AJAX earlier . This is because the
two tests are equivalent; the F is equal to the square of the t.

The estimate of  ACME   AJAX can be computed as y ACME  y AJAX . The standard error is

1 1
MS  ERROR      .
 n1 n2 
1
To estimate CHAMP    ACME   AJAX  , we can use the following statement:
2

estimate 'AL vs CL' brand -0.5 -0.5 1 0 0;

The coefficients in the above ESTIMATE statement are not equivalent to the coefficients (-1 -1 2 0 0) as
they would be in a CONTRAST statement. The latter set of coefficients would actually estimate twice the
mean difference of interest. We can avoid the fractions by using the DIVISOR option:

estimate 'AL vs CL' brand -1 -1 2 0 0/divisor=2;

OUTPUT:
Standard
Parameter Estimate Error t Value Pr > |t|

AL vs CL 0.18750000 0.08838835 2.12 0.0510

Now suppose we want to estimate a linear combination of means that does not represent a comparison
of two groups of means. Let say, we want to estimate the average of the three U.S means,
1
  ACME   AJAX  CHAMP  . The coefficients do not sum to 0, so we can’t simply take coefficients of
3
the means and use them in the ESTIMATE statement as coefficients on model effect parameters. The 
parameter does not disappear when we convert from means to effect parameters:
1
  ACME   AJAX  CHAMP 
3
1
     ACME     AJAX     CHAMP 
3
1
    ACME   AJAX   CHAMP 
3

Notice that the parameter  remains in the linear combination of model effect parameters. This
parameter is called INTERCEPT in CONTRAST and ESTIMATE statements. An appropriate ESTIMATE
statement is
estimate 'US MEAN' intercept 1 brand 0.3333 0.3333 0.3333 0 0;

or equivalently

estimate 'US MEAN' intercept 3 brand 1 1 1 0 0/divisor=3;

OUTPUT:
Standard
Parameter Estimate Error t Value Pr > |t|
US MEAN 2.25000000 0.04166667 54.00 <.0001

In this application the estimate and its standard error are useful. For example, we can construct a 95%
confidence interval:
2.25  2.13  0.0417  .

You might also like