You are on page 1of 16

Chapter 11 Study Guide

Inference for Distributions of Categorical Data


This condensed note packet is primarily based off of the AP Statistics Textbook. Please read through the book and fill this out as needed. It is no substitute for proper reading, but hopefully this study guide will assist you in your understanding of the material. Do what works for you in terms of homework (I would advise odd numbered problems + review) and/or quizzes. If you have questions, please feel free to contact me, Enze Chen, through Facebook, email (echen84), or in person. Good luck!

Introduction
Thus far, we have learned how to conduct a hypothesis test for a population mean and population proportion (Ch. 9), and we have also compared the proportion of successes for two populations (Ch. 10). But what if we wanted to look at the distribution for a categorical variable in a population? Does there exist a statistically significant difference between observed and expected counts? The chi-square test (read "kai;" Greek symbol 2) allows us to determine whether a hypothesized distribution is valid, the details of which will be flushed out in this chapter.

Section 11.1 - Chi-Square Goodness-of-Fit Tests


The first of three tests we will learn is the chi-square goodness-of-fit test, which allows us to test the distribution for a single categorical variable. We demonstrate this using the canonical M&M example.

Example 1: Mars, Inc. says that the distribution of M&Ms is as follows: 24% Blue, 20% Orange, 16% Green, 14% Yellow, 13% Red, and 13% Brown. Suppose that the following _____-_____ table gives the data from Enze's bag of M&Ms. Color: Count: Blue 9 Orange 8 Green 12 Yellow 15 Red 10 Brown 6 Total 60

Does this differ from the stated distribution? Let's look at the proportion of Blue M&Ms, which Page 1 of 16

happens to be 9/60 = 0.15, while the given is 0.24. Using what we have learned before, we could perform a one-sample z test for a proportion (Ch. 9) to test the hypotheses H0: ___________ Ha: ___________ This is bad. Not only is it inefficient because we would have to test each color, but this method also leads to multiple (possibly contradicting![1]) comparisons and calculates the probability for subgroups, rather than selecting a random sample of all 60 candies together. Therefore, we turn to the 2 goodnessof-fit test, analyzing the distribution of color collectively.

Parameter: We wish to analyze the color distribution of M&Ms. You can just mention the distribution of interest. No need to search for a specific p/.

[1] See Appendix I in Study Guide for more information

Page 2 of 16

Hypothesis and Test Statistic We begin by stating our hypotheses for the categorical variable, color. This can be done in two ways. Using words: H0: ________________________________________________________________ ________________________________________________________________ Ha: ________________________________________________________________ ________________________________________________________________ Using symbols: H0: ________________________________________________________________ ________________________________________________________________ Ha: ________________________________________________________________ ________________________________________________________________ Either way is fine and sufficient. However, in the alternate hypothesis, you cannot say that _______ the hypothesized proportions are incorrect; only one has to differ.

Now, while we listed the hypotheses in terms of proportions, we are actually looking for discrete counts (9 Blue; NOT 0.15 Blue) when running the 2 test. We wish to compare the ____________ counts from our sample with the ____________ counts if H0 is true. The greater the difference between these values, the lower the probability is of randomly selecting a sample as extreme as ours, and the greater evidence we have of _____________ H0. One way of analyzing the data is to construct a data table as shown: Color
(Categorical Variable)

Observed

Expected

(Observed - Expected)2

(Observed - Expected)2 Expected

Blue Orange Green Yellow Page 3 of 16

Red Brown To find the expected counts, multiply each of hypothesized proportions by the total number of candies in the sample, so for Blue, it would be EBlue = (0.24)(60) = 14.40. DON'T use just proportions!

Finally, we want to add up all the values in the last column to find the chi-square statistic[2].

General Formula: 2 =

In the M&M example, this value comes out to be ____________. [2] See Appendix II for more information. 2 Distribution and P-values Now that we have found 2 = 10.180, we wonder if this value is significant. We can locate this value on a chi-square distribution[3] and find the corresponding __________, similar to the procedure we would use for a z-test or t-test. However, it is important to note that the 2 distribution is NOT ___________. In fact, it is ________-___________, with degrees of freedom = __________ ___ _____________ 1.

We can find the P-value in two ways, using Table C or the calculator (recommended). The P-value is the probability of getting a value of 2 as large as or larger than the test statistic, in this case 10.180, when H0 is true.

Using Table C, we look in the row with df = ______. Our 2 value of 10.180 lies between 9.24 and 11.07, corresponding to a P-value between _______ and _______ (found in corresponding top row). Usually, Table C can only give us an interval in which P falls.

Using the calculator (pg 683), we use the 2-cdf command in the DISTR menu, asking for the area underneath the 2 distribution with df = ______ greater than 2 = 10.180. Choosing an arbitrarily large end value (i.e. 1000), we input 2-cdf (10.180, 1000, 5) = 0.070293 P = _______. This method is Page 4 of 16

more precise.

In either situation, both of which are valid, we find that our P-value is greater than ___ = _______, and so we _________ ____ ___________ H0. We _________ have sufficient evidence to conclude that the company's claimed color distribution is incorrect.

Assumptions In order to carry out a 2 test, we need to check the Random, Large Sample Size, and Independent conditions. Random: The data come from a __________ sample or _____________ experiment. Large Sample Size: All ____________ counts are at least ______. This is different from the previous tests! Also, these are __________ counts, not ___________.

Independent: Individual observations are ______________. When sampling, check that ___________.

We have now learned all the steps for performing a 2 goodness-of-fit test. We shall demonstrate in the following two examples, taken from the textbook for guidance. Like other significance tests, we will refer to the _______________ acronym to cover all necessary steps.

[3] See Appendix III for more information.

Page 5 of 16

Example 2: Birthdays (from pg. 686) Are births evenly distributed across the days of the week? The one-way table below shows the distribution of births across the days of the week in a random sample of 140 births from local records in a large city: Day: Births: Sunday 13 Monday 23 Tuesday 24 Wednesday Thursday 20 27 Friday 18 Saturday 15

Do these data give significant evidence that local births are not equally likely on all days of the week?

Parameter:

Hypotheses: H0:

Ha:

Assumptions: Random Large Sample Size Independence

Name the Test:

Test Statistic: =

Page 6 of 16

Sunday Observed Expected (O E)2/E 13

Monday 23

Tuesday 24

Wednesday Thursday 20 27

Friday 18

Saturday 15

Degrees of Freedom =

Obtain a P-value:

Make a decision:

Statement in Context:

Example 2: Birthdays (continued) If fewer babies are actually born on Saturday and Sunday than on other days, what type of error did we make based on the conclusion drawn?

Using Technology (Refer to pg. 687) As always, there is the option of calculating the test statistic on your calculator to save time (thank goodness). Enter the observed counts into L1/list1 and the expected counts into L2/list2. Find the appropriate test function, "2 GOF-Test," and calculate. Be sure to still write down the test statistic, P-value, and degrees of freedom. "2 GOF-Test" is not on Ti-83 models. Look on Mrs. Carson's website for a program.

Page 7 of 16

Example 3: Genetics AP Biology aficionados should be familiar with chi-square goodness-of-fit tests in the context of genetics and Punnett squares (2 test was a recent addition to the AP Biology Exam). If the ratio GG:Gg:gg is predicted to be 1:2:1 and we observe 23:50:11 out of a total of 84 samples, do these data differ significantly from the predicted values at = 0.05? (Condensed for space; use PHANTOMS; refer to pg. 689)

Follow-up Analysis If results are ever significant, perform a follow-up analysis to see which individual components, the Page 8 of 16

(O E)2/E for each category, affected 2 the most. In the genetics example, it would be the gg group. On the calculator, the components are stored in a list called CNTRB ("contribution").

Page 9 of 16

Section 11.2 - Inference for Relationships


Now we wish to compare the proportions of successes in more than two populations or for more than two treatments. We will learn two new tests to achieve this goal, one to see if the distribution of a categorical variable is the same for several populations, and another to examine if there is an association between two variables. Both methods rely on constructing a ______ - ______ table.

Example 1: We will follow the given example in the book Does Background Music Influence What Customers Buy? Below is the data table. Music - Observed Wine French Italian Other Total None 30 11 43 84 French 39 1 35 75 Italian 30 19 35 84 Total 99 31 113 243 84 75 84 None Music - Expected French Italian Total 99 31 113 243

To analyze the data for similarities and differences, we compute the _______________ distributions of the type of wine sold for each treatment [math omitted]. We see that the proportion of French wine sold is considerably higher when French music is playing, while the proportion of Italian wine sold is considerably lower. Previously, we learned how to perform a two-sample z test for a difference in two proportions, but here we are trying to compare many more variables, and do not want multiple comparisons;[1] instead, we can perform a chi-square test for homogeneity. While different, there are many parallels to the 2 goodness-of-fit test we learned previously.

Parameter: _________________________________________________________________ _________________________________________________________________ Hypothesis and 2: The hypotheses are stated as follows (statement of no difference): H0: __________________________________________________________________ __________________________________________________________________ Ha: __________________________________________________________________ Page 10 of 16

__________________________________________________________________ Again, we are looking for an overall difference/deviation, so any significant difference, inexclusive of being _____-sided or _____-sided, will lead us to ___________ H0.

Fortunately for 2 tests, the 2 statistic is calculated the same way each time. Since we are given our observed counts, we need to find the _____________ counts; here, the two-way table comes in handy.

The formula to find expected counts is: (________________)(__________________) ( )

For example, for French wine bought when no music is playing, we multiply the "categorical totals" and divide by the overall total. 9984/243 = 34.22. Fill out the table above. In our computations for 2, we can just write out the first few terms and last term, using ellipsis (...) to fill in the middle stuff; the calculator will take care of the rest. The degrees of freedom is the product (____________ ___ ________ - 1)(_____________ ___ __________ - 1). In this problem, our 2 = ____________ and df = ______.

Assumptions Our assumptions here are the same. Random: The data come from separate ___________ ___________. Large Sample Size: All expected counts are at least ______. Independent: Samples and individual observations are _________________. When sampling, check that _______ < ____.

P-value and Conclusion We can use either Table C or technology to the associated P-value given 2 and df. The calculator gives a more precise answer (2cdf) and is used here to obtain a P-value of _________.

Since our P-value of _________ is ____________ than = 0.05, we __________ have sufficient Page 11 of 16

evidence to reject H0 and conclude there ______ a difference in the distributions of wine purchases at this store when no music, French music, or Italian music is played. Remember, we cannot state for Ha that "all the proportions are different;" we can only say "some of the proportions are not equal."

Follow-up Analysis If we reject the null hypothesis, we should perform a follow-up analysis to see which of the individual components contributed most to the 2 statistic. In the above example, two of the categories contributed to a large proportion of the overall statistic, so we are led to believe that the sale of _________ wine is strongly affected by Italian and French music.

2 test for Homogeneity on the Calculator (Refer to pg. 705-706) We can perform the 2 test on the calculator (2 2-way test), which greatly simplifies calculations. Make sure to write down relevant assumptions, test statistic, degrees of freedom, P-value, components, and conclusion in context.

Two more good examples are found on pages 707 and 710, and there are many more in the homework problems. In accordance with the other hypothesis tests, use PHANTOMS to help guide your thought process through the solution. I will omit them here for sake of space.

2 test for Association The final test you will learn is called a chi-square test for association (also known as a chi-square test for independence; I will stick with the former for consistency/preference). There does exist a subtle difference between this test and the 2 test for homogeneity (in the Hypotheses and Conclusion), but for the most part the two tests are performed in a similar fashion. In particular, a close observation of what we did in the first part (in music and phone calls examples) is take data from many independent Page 12 of 16

samples/groups (different types of music playing and cell vs. landline) and compared their distributions. However, what we are about to do next is take a single random sample of individuals chosen from a single population and analyze the relationship between designated categorical variables.

Let's look at an example. The one given in the book is "Do Angry People Have More Heart Disease?" We wish to compare CHD vs. Anger. However, rather than sampling different groups of individuals based on their level of anger and seeing whether the distribution of CHD is the same (homogenous) across groups, we sampled 8474 people collectively and compared the two variables, sorting the people into a two-way table. We wish to see if an association exists between the variables in the sample evidence for the entire population. Ask yourself this to decide which of the two tests to use. Do not forget the statistic mantra, "correlation does NOT imply causation." Having an association means that knowing one variable will affect the probability of another variable, not necessarily that one causes the other (in this manner, it is helpful to think of "independent" vs. "dependent").

Hypotheses: Like before, the null hypothesis is a state of no difference (no association; independent) H0: _____________________________________________________________ _____________________________________________________________ Ha: _____________________________________________________________ _____________________________________________________________

Assumptions Random: ___________________________________________________________________ Large Sample Size: __________________________________________________________ Independent: _________________________________________________________________

2 statistic and P-value These, along with expected counts, are legit found in the same way as in the test for homogeneity. 2 = _____________, df = ____________ P-value = _____________. Page 13 of 16

Conclusion: Because our P-value is ________ than = 0.05, we ________ have sufficient evidence to __________ H0 and conclude that anger level and heart disease _________ associated in the population of people with normal blood pressure.

If we had sufficient evidence to reject the null hypothesis in the previous problem, don't forget to conduct a follow-up analysis. The calculator option is the same as for homogeneity (2 2-way test).

Once again, if you haven't noticed, these last two 2 tests are very similar, which makes it easy to perform, but difficult to distinguish between which one applies. In my opinion, the best way is to look at the method of sampling and decide from there. On FRQs, be precise with your diction during the Hypothesis and Conclusion steps.

Interesting Tidbits (pg. 720-721) Sometimes we are given some quantitative data (i.e. income) and wish to conduct a 2 test. To do so, we can group the individual incomes into an income range, effectively making each group a categorical variable.

Sometimes, depending on how we sample, the large sample size condition may not always be met. In that case, we might be able to combine some rows together, and meet the sample size condition without screwing up the results. Neat.

Page 14 of 16

Page 15 of 16

Appendix I - Multiple Comparisons


The multiple null-hypotheses we've learned are powerful tools [and will be liberally featured on the AP exam], but they must be used wisely [for with great power comes great responsibility]. We use chisquare tests for multiple categorical variables because we wish to see if the distribution for the sample as a whole differs from what is expected, rather than any particular subgroup. Furthermore, let's assume [using z/t-tests] that when comparing one variable (say, blue M&Ms) we find a significant difference, but for another (say, red M&Ms) we do not. These two computations would lead to different results/conclusions, and we are left unsure about whether or not we can truly reject the null hypothesis.

Appendix II - The Chi-square Statistic 2


Essentially, 2 is a measure of how far away observed counts are from their respective expected counts. The difference is squared in the numerator to maintain positive values, and dividing by expected counts allows categories with the larger relative difference to contribute more heavily to the total. For example: Let's say your expected counts were 10 and 100, and you ended up getting 11 and 101. The difference between these values is the same, namely, 1. However, the first pair had a 10% increase while the second had only a 1% increase. Clearly we would want the first pair to be represented more (weighted heavily). Therefore, when we take (difference)2/expected, we end up with 1/10 = 0.10, and 1/100 = 0.01 respectively, and the first pair contributes more to the overall 2 statistic, 0.10 + 0.01 = 0.11.

Appendix III - The Chi-square Distributions


If we compare many 2 distributions at varying degrees of freedom, we find that as degrees of freedom (df) increase, the density curves become less skewed (tends more towards normal). In addition: The mean of a particular 2 distribution is equal to its degrees of freedom. The peak of the 2 density curve is at df 2.

Page 16 of 16

You might also like