You are on page 1of 18

MTE3105 Statistics

Topic 5 Analysis of variance (ANOVA)

1.1 Synopsis

In this course, students will revisit the concepts of probability and explore inferential statistics such as analysis variance (ANOVA) in hypothesis testing.The important of using the appropriate statistical methods in solving real life problems is emphasized.

1.2 Learning Outcomes

1. 2. 3. 4.

Understand the theoretical and empirical (concept) of ANOVA Use inferential statistics such as ANOVA in hypothesis testing Calculating ANOVA by hand Calculating ANOVA using EXCEL

1.3

Conceptual Framework

TESTING HYPHOTESIS

ONE WAY ANOVA

TWO WAY ANOVA

CHI-SQUARE

LINEAR REGRESSION

MTE3105 Statistics

1.4

ANOVA Analysis of variance compares two or more populations of interval data. Specifically, we are interested in determining whether the differences exist between the population means. The procedure works by analyzing the sample variance. 1.4.1. Definitions F-distribution The ratio of two independent chi-square variables divided by their respective degrees of freedom. If the population variances are equal, this simplifies to be the ratio of the sample variances.

Analysis of Variance (ANOVA) A technique used to test a hypothesis concerning the means of three or mor populations. One-Way Analysis of Variance Analysis of Variance when there is only one independent variable. The null hypothesis will be that all population means are equal, the alternative hypothesis is that at least one mean is different.

Between Group Variation The variation due to the interaction between the samples, denoted SS(B) for Sum of Squares Between groups. If the sample means are close to each other (and therefore the Grand Mean) this will be small. There are k samples involved with one data value for each sample (the sample mean), so there are k-1 degrees of freedom. Between Group Variance The variance due to the interaction between the samples, denoted MS(B) for Mean Square Between groups. This is the between group variation divided by its degrees of freedom. Within Group Variation The variation due to differences within individual samples, denoted SS(W) for Sum of Squares Within groups. Each sample is considered independently, no interaction

MTE3105 Statistics

between samples is involved. The degrees of freedom is equal to the sum of the individual degrees of freedom for each sample. Since each sample has degrees of freedom equal to one less than their sample sizes, and there are k samples, the total degrees of freedom is k less than the total sample size: df = N - k.

Within Group Variance The variance due to the differences within individual samples, denoted MS(W) for Mean Square Within groups. This is the within group variation divided by its degrees of freedom.

Scheffe' Test A test used to find where the differences between means lie when the Analysis of Variance indicates the means are not all equal. The Scheffe' test is generally used when the sample sizes are different. Tukey Test A test used to find where the differences between the means lie when the Analysis of Variance indicates the means are not all equal. The Tukey test is generally used when the sample sizes are all the same. Two-Way Analysis of Variance An extension to the one-way analysis of variance. There are two independent variables. There are three sets of hypothesis with the two-way ANOVA. The first null hypothesis is that there is no interaction between the two factors. The second null hypothesis is that the population means of the first factor are equal. The third null hypothesis is that the population means of the second factor are equal.

Factors The two independent variables in a two-way ANOVA.

Treatment Groups Groups formed by making all possible combinations of the two factors. For example, if the first factor has 3 levels and the second factor has 2 levels, then there will be 3x2=6 different treatment groups.

MTE3105 Statistics

Interaction Effect The effect one factor has on the other factor Main Effect The effects of the independent variables. 1.4.2.ONE WAY ANOVA In the analysis of variance, the approach is conceptually similar to the t-test, although the method differs.When you want to compare more than two means, the ONEWAY Analysis of Variance (ANOVA) is used. Say, for example you conducted an experiment in which you compared the effectiveness of three teaching methods in enhancing reading comprehension. A One-Way Analysis of Variance is a way to test the equality of three or more means at one time by using variance. Assumptions The populations from which the samples were obtained must be normally or approximately normally distributed. The samples must be independent. The variences of the populations must be equal.

1.4.3 How ANOVA works ANOVA measures two sources of variation in the data and compares their relative sizes variation BETWEEN groups for each data value look at the difference between its group mean and the overall mean

xi x 2

variation WITHIN groups for each data value we look at the difference between that value and the mean of its group

ij

xi

MTE3105 Statistics

The ANOVA F-statistic is a ratio of the Between Group Variaton divided by the Within Group Variation

Between MSG Within MSE

A large F is evidence against H0, since it indicates that there is more difference between groups than within groups.

We want to measure the amount of variation due to BETWEEN group variation and WITHIN group variation For each data value, we calculate its contribution to: BETWEEN group variation: WITHIN group variation:

xi x 2
xi
2

ij

1.4.4. Example problem using One-way Analysis of Variance Three groups of students, 5 in each group, were receiving therapy for severe test anxiety. Group 1 received 5 hours of therapy, group 2 - 10 hours and group 3 - 15 hours. At the end of therapy each subject completed an evaluation of test anxiety (the dependent variable in the study). Did the amount of therapy have an effect on the level of test anxiety? The three groups of students received the following scores on the Test Anxiety Index (TAI) at the end of treatment. TAI Scores for Three Groups of Students Group 1 - 5 hours 48 50 53 52 50 Group 2 - 10 hours 55 52 53 55 53 Group 3 - 15 hours 51 52 50 53 50

MTE3105 Statistics

The following table contains the quantities we need to calculate the means for the three groups, the sum of squares, and the degrees of freedom: Worksheet for Test Anxiety Study Group 1 - 5 hours Group 2 - 10 hours X1 48 50 53 52 50 (X1)2 2304 2500 2809 2704 2500 55 52 53 55 53 ---------268 X2 (X2)2 3025 2704 2809 3025 2809 ---------14372 51 52 50 53 50 X3 2601 2704 2500 2809 2500 Group 3 - 15 hours (X3)2

---------- ---------253 12817

---------- ---------256 13114

The mean for group 1 is 253/5 = 50.6, the mean for group 2 is 268/5 = 53.6, and the mean for group 3 is 256/5 = 51.2 Is the differences between these three means significant? We can use analysis of variance to answer that question. Since we only have one independent variable, amount of therapy, we will use one-way analysis of variance. If we were concerned with the effect of two independent variables on the dependent variable, then we would use twoway analysis of variance. First we will calculate SSB, the sum of squares between groups, where X1 is a score from Group 1, X2 is a score from Group 2, X3 is a score from Group 3, n1 is the number of subjects in group 1, n2 is the number of subjects in group 2, n3 is the number of subjects in group 3, XT is a score from any subject in the total group of subjects, and NT is the total number of subjects in all groups.

MTE3105 Statistics

The degrees of freedom between groups is: dfB = K - 1 = 3 - 1 = 2 Where K is the number of groups. Next we calculate SSW, the sum of squares within groups.

The degrees of freedom within groups is: dfW = NT - K = 15 - 3 = 12 Where NT is the total number of subjects.

MTE3105 Statistics

Finally, we will calculate SST, the total sum of squares.

As a check SST = SSB + SSW 54.4 = 25.2 + 29.2 We can now calculate MSB, the mean square between groups, MSW, the mean square within groups, and F, the F ratio.

To test the significance of the F value we obtained, we need to compare it with the critical F value with an alpha level of .05, 2 degrees of freedom between groups (or degrees of freedom in the numerator of the F ratio), and 12 degrees of freedom within groups (or degrees of freedom in the denominator of the F ratio). We can look up the critical value of F in Appendix Table D of the text book (The 5 percent (Lightface Type) and 1 percent (Boldface Type) points for the Distribution of F), pages 319-326. Look in the table under column 2 (2 degrees of freedom for the numerator) and row 12 (12 degrees of freedom for the denominator) and read the non-boldfaced entry (for .05 level) of 3.88 - this is the critical value for F.

MTE3105 Statistics

One way of indicating this critical value of F at the .05 level, with 2 degrees of freedom between groups and 12 degrees of freedom within groups is F.05(2,12) = 3.88 When using analysis of variance, it is a common practice to present the results of the analysis in an analysis of variance table. This table which shows the source of variation, the sum of squares, the degrees of freedom, the mean squares, and the probability is sometimes presented in a research article. The analysis of variance table for our problem would appear as follows: Analysis of Variance Table Source of Variation Between Groups Within Groups Total Sum of Degrees of Squares 25.20 Freedom 2 Mean Square 12.60 F Ratio p

5.178

<.05

29.20 54.40

12 14

2.43

4.5 Calculating ANOVA by hand Example: A researcher was interested in studying the effects of three different text book on mathematics achievement. To investigate the effects, the three different books were used in three different school which had equal demographical characteristics. The three school employed the same teaching methods. At the end of the program, a mathematics test was administered to the students. Five scores from each school were randomly selected and the scores are as follows.

MTE3105 Statistics

Text Book A 54 49 52 55 48

Text Book B 53 56 57 51 59

Text Book C 49 53 47 50 54

With = .05, test if the means of the three populations are equal. 1. 2. 3. 4. 5. 6. 7. 8. 9. State the independent variable and the dependent variable in this study State the assumptions for using a one-way ANOVA State the null hypothesis and the alternative hypothesis Compute SSB, SSw and SST Compute the between and within samples variances Indicate the value of Fcritical. Compute the F value Create and ANOVA table and fill in the above information Describe the conclusion.

Solution: Text Book A 54 49 52 55 48 T1 = 258 X21 = 13350 n1 = 5 1 = 51.6 Text Book B 53 56 57 51 59 T2 = 276 X2 2 = 15276 n2 = 5 2 = 55.2 Text Book C 49 53 47 50 54 T3 = 253 X2 3 = 12835 n3 = 5 3 = 50.6

1) Independent variable : Text book with three different text books Dependent variable : scores of mathematics achievement 2) The assumption using one-way ANOVA: 1. 2. 3. The distribution of the populations are normal, The variances of the populations are equal Scores are independent

MTE3105 Statistics

4. 5. 3)

Samples are independent Samples are random Null Hypothesis, H0 = Alternative Hyphotesis, Ha : unequal) (the three group mean are equal) ( at least one of the means are

4)

a) Sum of Squares Between Group (SSB) SSB = SSB =


( ) ( ) ( )

= 58.5333

b) Sum of Square Within Groups (SSw) SSw = -


(

=
) ( ) ( )

= 41,461 = 111.2

c) Sum of Squares Total (SST) SST = SSB + SSw = 58.5333 + 111.2 = 169.7333 5) Between Group Variance MSB = Within Group Variance MSw = = = 9.2667

6)

The value of Fcritical Fcritical = F (0.05,2,12) = 3.89 Decision Rules: Reject Ho if F> 3.89

7)

The value of F F= =

MTE3105 Statistics

8) One-Way ANOVA Table


Sources of Variation Sum of Squares(SS) Degrees of Freedom (df) 58.5333 Mean Square(MS) Test Statistic Value (F) F critical

Between

29.2667 3.16 3.89

Within Total

12 14

112.2000 169.7333

9.2667

9) Conclusions F = 3.16, Fcritical = 3.89. Therefore we fail to reject the Ho. The data indicate that the means of populations are equal ( F(2,12) = 3.16, = 0.05). The differences of the three sample means are simply due to sampling errors.

4.6 Using the Excel Spreadsheet Program to Calculate One-Way Analysis of Variance The Excel spreadsheet program has a tool to calculate One-Way Analysis of Variance, which simplifies our computational task considerably. Let's use the same research problem we already considered, but use the spreadsheet program to do the calculations. Research Problem: Three groups of students, 5 in each group, were receiving therapy for severe test anxiety. Group 1 received 5 hours of therapy, group 2 - 10 hours and group 3 - 15 hours. At the end of therapy each subject completed an evaluation of test anxiety (the dependent variable in the study). Did the amount of therapy have an effect on the level of test anxiety?

MTE3105 Statistics

In this problem we are comparing the differences among the means representing three levels of the independent variable (hours of therapy). This would be an appropriate situation for one-way analysis of variance. The three groups of students received the following scores on the Test Anxiety Index (TAI) at the end of treatment. TAI Scores for Three Groups of Students Group 1 - 5 hours 48 50 53 52 50 Group 2 - 10 hours 55 52 53 55 53 Group 3 - 15 hours 51 52 50 53 50

The first step in solving this problem is to enter the TAI scores for the three groups of subjects into an Excel Worksheet. After we have done this our worksheet should look as follows:

MTE3105 Statistics

In the Excel Worksheet select Data Analysis under the Tools menu. If Data Analysis is not available you must install the Data Analysis Tools. If you need to you can install the data analysis tools as follows: 1. 2. 3. Select Add-Ins from the Tools menu. In the Add-Ins window click on the box next to Analysis Tool Pak to select it. Click OK. You have now installed the Tool Pak.

With the Data Analysis Tools installed, select Data Analysis under the Tools menu. In the Data Analysis window scroll down and select Anova: Single Factor. Complete the Anova: Single Factor window as follows: 1. Enter $A$2:$C$7 in the Input Range: box (or you can enter that value automatically by clicking in the box and then selecting the range of cells A2 through C7). Note that we have included the labels, Group 1, Group 2, and Group 3, in the range of cells we selected. 2. Click the Columns button so that we indicate we our data is grouped by columns. 3. Click the Labels in first row box so that we indicate we are using labels (Group 1, Group 2, and Group 3) 4. 5. Enter .05 in the Alpha: box. Under Output Options click the button for Output range: and enter $A$9 in the Output range: box (or click in the box and then click on the cell A9 to cause it to appear in the box). 6. Click OK.

MTE3105 Statistics

Your spreadsheet should now appear as follows:

The results of the one-way analysis of variance can be seen in the resultant tables. The means for the three groups (as well as the count, sum, and variance for each group) can be seen in the SUMMARYtable. The ANOVA table shows the same results as we put in the Analysis of Variance table when we calculated the results ourselves. The value of F is shown to be 5.178082192, which rounded to 5.18 is the same value as we received when we calculated F. The PValue is shown as .02391684 which indicates that the result is significant at the .02 level. We have set our alpha level as .05 so we will simply indicate that p < .05. There is an additional entry to the table showing the critical value of F at the .05 level (F Crit) which is 3.88529031 which is similar to the result (2.88) we looked up in Appendix Table D in the textbook. Unfortunately, the spreadsheet program does not have a program to calculate the Scheffe test, so we will have too calculate those the way we did before. The results of our Scheffe tests were:

MTE3105 Statistics

Summary of Scheffe Test Results Group One versus Group Two 4.62 Group One versus Group Three 0.18 Group Two versus Group Three 2.96 We now have all the information we need to complete the six step statistical inference process: 1. State the null hypothesis and the alternative hypothesis based on your research question.

Note: Our null hypothesis, for the F test, states that there are no differences among the three means. The alternate hypothesis states that there are significant differences among some or all of the individual means. An unequivocal way of stating this is not H0. 2. Set the alpha level.

Note: As usual we will set our alpha level at .05, we have 5 chances in 100 of making a type I error. 3. Calculate the value of the appropriate statistic. Also indicate the degrees of freedom for the statistical test if necessary and the results of any post hoc test, if they were conducted. F(2,12) = 5.178, value of the F ratio F.05(2,12) = 3.88, critical value of F F12 = 4.630, Scheffe test value for comparing means 1 and 2 F13 = 0.185, Scheffe test value for comparing means 1 and 3 F23 = 2.963, Scheffe test value for comparing means 2 and 3 4. Write the decision rule for rejecting the null hypothesis. Reject H0 if F is >= 3.88 Note: To write the decision rule we had to know the critical value for F, with an alpha level of .05, 2 degrees of freedom in the numerator (df between groups) and 12 degrees of freedom in the denominator (df within groups). We can do this

MTE3105 Statistics

by looking at Appendix Table D and noting the tabled value for the .05 level in the column for 2 df and the row for 12 df. 5. Write a summary statement based on the decision. Reject H0, p < .05 Note: Since our calculated value of F (5.178) is greater than 3.88, we reject the null hypothesis and accept the alternative hypothesis. 6. Write a statement of results in standard English. There is a significant difference among the scores the three groups of students received on the Test Anxiety Index. Group 1 (the five hour therapy group) has a significantly lower score on the TAI than does Group 2 (the ten hour therapy group). We can see that the Excel spreadsheet program gives us an easy way to calculate the F ratio. It also provides us with an analysis of variance table which shows, among other things, the critical value of F for the alpha level we specified, and the probability level (p) of the result.

MTE3105 Statistics

Question : (1) State the Assumptions of ANOVA. (2) Describe the Rationale of ANOVA stating the ANOVA table. (3) Solve the following problem using One-Way ANOVA Four types of advertising displays were set up in 12 retail outlets, with three outlets randomly assigned to each of the displays, for the purpose of studying the point-of-sale impact of the displays. The relevant information is given in the following table. Type of Display A1 A2 A3 A4 Sales 44 54 38 61

40 53 48 48

43 59 46 47

Carry out the Analysis of Variance to test the differences among the mean sales values for the four types of displays, using the 5 percent level of significance. (i) State the Null Hypothesis and Alternative Hypothesis. Give Step by Step solution using all the required formulas. Give the ANOVA table and comment on the conclusion. (ii) Use excel to solve the above problem using the data given in the above table and comment on the conclusion.

You might also like