Professional Documents
Culture Documents
4 specimens of each brands (BRAND) of a synthetic wood veneer material are subjected to a friction
test. A measure of wear is determined for each specimen. All tests are made on the same machine in
completely random order. Data are stored in a SAS data set named VENEER.
Brand Wear
measurements
ACME 2.3
ACME 2.1
ACME 2.4
ACME 2.5
CHAMP 2.2
CHAMP 2.3
CHAMP 2.4
CHAMP 2.6
AJAX 2.2
AJAX 2.0
AJAX 1.9
AJAX 2.1
TUFFY 2.4
TUFFY 2.7
TUFFY 2.6
TUFFY 2.7
XTRA 2.3
XTRA 2.5
XTRA 2.3
XTRA 2.4
Source of Variation DF
BRAND 4
Error 15
Total 19
SAS coding
data veneer;
input brand $ wear;
cards;
ACME 2.3
ACME 2.1
ACME 2.4
ACME 2.5
CHAMP 2.2
CHAMP 2.3
CHAMP 2.4
CHAMP 2.6
AJAX 2.2
AJAX 2.0
AJAX 1.9
AJAX 2.1
TUFFY 2.4
TUFFY 2.7
TUFFY 2.6
TUFFY 2.7
XTRA 2.3
XTRA 2.5
XTRA 2.3
XTRA 2.4
;
proc print data=veneer; /* to show the data*/
run;
/* the following SAS statements produce the analysis of variance*/
proc glm data=veneer;
class brand;
model wear=brand;
means brand/hovtest; /* the MEANS statement causes the treatment means to be
computed*/
/*the HOVTEST option computes statistics to test the homogeneity of variance
assumption*/
run;
THE OUTPUT:
Sum of
Sum of Mean
Source DF Squares Square F Value Pr > F
Level of -------------wear------------
brand N Mean Std Dev
Notice that you will get the same computations from PROC GLM as from PROC ANOVA for the analysis
of variance, although they are labeled from somewhat differently. In addition to MODEL sum of squares,
PROC GLM computes two sets of SS for BRAND: Type I and Type III sum of squares; rather than the
single SS computed by the ANOVA procedure.
For the one-way classification, as well as for balanced multi-way classifications, the GLM-Type I, GLM-
Type III, and PROC ANOVA sums of squares are identical. For unbalanced multi-way data and for
multiple regression models, the Type I and Type III SS are different.
The HOVTEST output appears as “Levene’s Test for Homogeneity of WEAR Variance”. The F-value, 0.53,
tests the null hypothesis that the variances among observations within each treatment are equal. There
is clearly no evidence to suggest failure of this assumption for these data (p-value is not <0.05, failed to
reject the hypothesis of equal variances).
means brand/lsd;
OUTPUT:
t Tests (LSD) for wear
NOTE: This test controls the Type I comparisonwise error rate, not the experimentwise
error rate.
Alpha 0.05
Error Degrees of Freedom 15
Error Mean Square 0.020833
Critical Value of t 2.13145
Least Significant Difference 0.2175
B 2.3750 4 XTRA
B
B 2.3750 4 CHAMP
B
B 2.3250 4 ACME
C 2.0500 4 AJAX
From the output, means and the number of observations (N) are produced for each BRAND. Under the
heading “T Grouping” are sequences of A’s , B’s and C’s. Means are joined by the same letter if they are
not significantly different, according to the t-test or equivalently if their difference is less than LSD.
The BRAND means for XTRA, CHAMP, and ACME are not significantly different and are joined by a
sequence of B’s. The means for AJAX and TUFFY are found to be significantly different from all other
means so they are labeled with a single C and A, respectively, and no other means are labeled with A’s
and C’s.
To obtain confidence intervals about means instead of comparisons of the means , we can specify the
CLM option:
Alpha 0.05
Error Degrees of Freedom 15
Error Mean Square 0.020833
Critical Value of t 2.13145
Half Width of Confidence Interval 0.153824
But in most situations there is something about classification criterion that suggests specific
comparisons. For example, suppose you know something about the companies that manufacture the 5
brands of synthetic wood veneer material. Let say, we know that ACME and AJAX are produced by a U.S
company named A-Line, that CHAMP is produced by U.S company named C-Line, and that TUFFY and
XTRA are produced a non-U.S companies.
Then, we would probably be interested in comparing certain groups of means with other groups of
means. These would be called planned comparisons, because they are suggested by the structure of the
classification criterion (BRAND) rather than the data.
We use contrasts to make planned comparisons. In SAS, PROC ANOVA does not have a CONTRAST
statement, but the GLM procedure does, so we must use PROC GLM to compute contrasts. We use
CONTRAST as an optional statement the same way we use a MEANS statement.
To define contrast, we should first express the comparisons as null hypotheses concerning linear
combinations of means to be tested. For the comparisons indicated above, we would have the following
null hypotheses:
The hypothesis must be expressed as a linear combinations of the means equal to 0, that is, for example,
H 0 : ACME AJAX 0 . In terms of all the means, the null hypothesis is
Notice that the BRAND means are listed in alphabetical order. So we have to insert the coefficients on
the BRAND means in the list of effect coefficients in the CONTRAST statement. The coefficients for the
levels of BRAND follow the alphabetical ordering.
proc glm;
class brand;
model wear=brand;
contrast 'ACME vs AJAX' brand 1 -1 0 0 0;
run;
OUTPUT:
From this output, shows a sum of squares for the contrast, and an F-value for testing
H 0 : ACME AJAX . The p-value tells us the means are significantly different at the 0.0166 level.
Note: Actually, we don’t have to include the trailing zeros in the CONTRAST statement. We can simply
use
contrast 'ACME vs AJAX' brand 1 -1;
Following the same procedure, to test H 0 : TUFFY XTRA , use the statement
proc glm;
class brand;
model wear=brand;
contrast 'TUFFY vs XTRA' brand 0 0 0 1 -1;
run;
OUTPUT:
Contrast DF Contrast SS Mean Square F Value Pr > F
The contrast U.S versus non-U.S is a little more complicated because it involves fractions. We can use
the statement
It is usually easier to multiply all coefficients by the least common denominator to get rid of the
fractions. This is legitimate because the hypothesis we are testing with a CONTRAST statement is that a
linear combination is equal to 0, and multiplication by a constant does not change whether the
hypothesis is true or false.
OUTPUT:
Contrast DF Contrast SS Mean Square F Value Pr > F
The GLM procedure enables us to run as many CONTRAST statement as we want, but good statistical
practice ordinarily indicates that this number should not exceed the number of degrees of freedom for
the effect (in this case 4). Moreover, we should be aware of the inflation of the overall (experimentwise)
Type I error rate when we run several CONTRAST statement.
OUTPUT:
We can see that the coefficient on CHAMP is the same as the coefficient on CHAMP ; the coefficient on
ACME is the same as the coefficient on ACME , and so on.
This hypothesis equation actually embodies two equations that can be expressed in several ways. One
way to express the hypothesis in terms of two equations is
The two hypotheses are equivalent because the three means are all equal if and only if the first is equal
to the second and the first is equal to the third.
OUTPUT:
OUTPUT:
proc glm;
class brand;
model wear=brand;
estimate 'ACME vs AJAX' brand 1 -1 0 0 0;
run;
OUTPUT:
Standard
Parameter Estimate Error t Value Pr > |t|
Output shown includes the value of the estimate, a standard error, a t-statistic for testing whether the
differences is significantly different from 0, and a p-value for the t-statistics. Note the p-value (0.0166)
for the t-test is the same as for the F-test for the contrast ACME vs AJAX earlier . This is because the
two tests are equivalent; the F is equal to the square of the t.
The estimate of ACME AJAX can be computed as y ACME y AJAX . The standard error is
1 1
MS ERROR .
n1 n2
1
To estimate CHAMP ACME AJAX , we can use the following statement:
2
The coefficients in the above ESTIMATE statement are not equivalent to the coefficients (-1 -1 2 0 0) as
they would be in a CONTRAST statement. The latter set of coefficients would actually estimate twice the
mean difference of interest. We can avoid the fractions by using the DIVISOR option:
OUTPUT:
Standard
Parameter Estimate Error t Value Pr > |t|
Now suppose we want to estimate a linear combination of means that does not represent a comparison
of two groups of means. Let say, we want to estimate the average of the three U.S means,
1
ACME AJAX CHAMP . The coefficients do not sum to 0, so we can’t simply take coefficients of
3
the means and use them in the ESTIMATE statement as coefficients on model effect parameters. The
parameter does not disappear when we convert from means to effect parameters:
1
ACME AJAX CHAMP
3
1
ACME AJAX CHAMP
3
1
ACME AJAX CHAMP
3
Notice that the parameter remains in the linear combination of model effect parameters. This
parameter is called INTERCEPT in CONTRAST and ESTIMATE statements. An appropriate ESTIMATE
statement is
estimate 'US MEAN' intercept 1 brand 0.3333 0.3333 0.3333 0 0;
or equivalently
OUTPUT:
Standard
Parameter Estimate Error t Value Pr > |t|
US MEAN 2.25000000 0.04166667 54.00 <.0001
In this application the estimate and its standard error are useful. For example, we can construct a 95%
confidence interval:
2.25 2.13 0.0417 .