You are on page 1of 7

TEST FOR INDEPENDENCE

By: Ondoy, Rica Jan Procianos, Enoch (Reporter) Sampaco, Nisa

Introduction
In the test for independence, the claim is that the row and column variables are independent of each other. This is the null hypothesis. The multiplication rule said that if two events were independent, then the probability of both occurring was the product of the probabilities of each occurring. This is the key to working the test for independence. If you end up rejecting the null hypothesis, then the assumption must have been wrong and the row and column variable are dependent. Remember, all hypothesis testing is done under the assumption the null hypothesis is true. The test statistic used is the same as the chi-square goodness-of-fit test. The principle behind the test for independence is the same as the principle behind the goodness-of-fit test. The test for independence is always a right tail test. In fact, you can think of the test for independence as a goodness-of-fit test where the data is arranged into table form. This table is called a contingency table.

The test statistic has a chi-square distribution when the following assumptions are met: The data are obtained from a random sample The expected frequency of each category must be at least 5. The following are properties of the test for independence: The data are the observed frequencies. The data is arranged into a contingency table. The degrees of freedom are the degrees of freedom for the row variable times the degrees of freedom for the column variable. It is not one less than the sample size; it is the product of the two degrees of freedom. It is always a right tail test. It has a chi-square distribution. The expected value is computed by taking the row total times the column total and dividing by the grand total The value of the test statistic doesn't change if orders of the rows and columns are interchanged (transpose of the matrix).

This approach consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results. State the Hypotheses Suppose that Variable A has r levels, and Variable B has c levels. The null hypothesis states that knowing the level of Variable A does not help you predict the level of Variable B. That is, the variables are independent. Test for Independence| 2

H0: Variable A and Variable B are independent. Ha: Variable A and Variable B are not independent. The alternative hypothesis is that knowing the level of Variable A can help you predict the level of Variable B.
Note: Support for the alternative hypothesis suggests that the variables are related; but the relationship is not necessarily causal, in the sense that one variable "causes" the other.

Formulate an Analysis Plan The analysis plan describes how to use sample data to accept or reject the null hypothesis. The plan should specify the following elements.
Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but

any value between 0 and 1 can be used.


Test method. Use the chi-square test for independence to determine whether there is a significant

relationship between two categorical variables.

Analyze Sample Data Using sample data, find the degrees of freedom, expected frequencies, test statistic, and the Pvalue associated with the test statistic.

Degrees of freedom. The degrees of freedom (v) is equal to:

v = (r - 1) * (c - 1)
where, r is the number of row and c is the number of column

Expected frequencies. The expected frequency counts are computed separately for each level of one categorical variable at each level of the other categorical variable. Compute r * c expected frequencies, according to the following formula.

Er,c = (nr * nc) / n


where Er,c is the expected frequency count for level r of Variable A and level c of Variable B, nr is the total number of sample observations at level r of Variable A, nc is the total number of sample observations at level c of Variable B, and n is the total sample size.

Test statistic. The test statistic is a chi-square random variable (2) defined by the following equation.

Test for Independence| 3

Interpret Results If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting the null hypothesis when the P-value is less than the significance level.

Sample Problems:
1.) We wish to determine whether the opinions of the voting residents of the state of Illinois concerning a new tax reform are independent of their levels of income. A random sample of 1000 voters are classified as to whether they are in low, medium, or high income bracket and whether or not they favor a new tax reform. The observed frequencies are presented in the contingency table below.

Tax Reform For Against

Low 182 154

Income Level Medium 213 138

High 203 110

Solution: 1. State the hypothesis: H0 : A voters opinion concerning the new tax reform and his or her level of income are independent. Ha : A voters opinion concerning the new tax reform and his or her level of income are not independent.

2. Get the marginal frequencies or the sum of the frequencies in each row and column. Tax Reform For Against Total Income Level Medium High 213 (209.9) 203 (187.2) 138 (141.1) 110 (125.8) 351 313

Low 182 (200.9) 154 (135.1) 336

Total 598 402 1000

3. Find the degrees of freedom (v), v = (2 1) * (3 -1) = 3

Test for Independence| 4

4. Find the expected frequencies(Er,c ): E1,1 = (336)(598)/1000 = 200.9 E1,2 = (336)(402)/1000 = 135.1 E2,1 = (351)(598)/1000 = 209.9 E2,2 = (351)(402)/1000 = 141.1 E3,1 = (313)(598)/1000 = 187.2 E3,2 = (313)(402)/1000 = 125.8 5. Solve for the calculated chi-squared value (2) and find its probability

6. In a given significant level find the probability of 2 In this case we choose a significant level () of 0.05. From table A.5 we find that, 20.05 (v=2) = 5.991

7. Make a conclusion as based on statistics that is approximated by the chi-squared distribution. Since 2 > 20.05 with v = 2 at the 0.05 level of significance, we should reject the null hypothesis (H0). By that we can conclude that a voters opinion concerning the new tax reform and his or her level of income are not independent.

Test for Independence| 5

2.)

A quality control engineer wants to compare the production process of machine parts manufactured by four different companies. The engineer randomly samples a total of 270 parts from the four companies and summarizes the results in the following table. Based upon the results of the sample, can the quality control engineer conclude that part quality and manufacturer are independent at level of significance of 1%? Company A 8 60 Company B 8 65 Company C 10 60 Company D 9 50

Defective Non-defective

Solution: 1. State the hypothesis: H0 : machine parts quality and manufacturer independent Ha : machine parts quality and manufacturer NOT independent

2. Get the marginal frequencies or the sum of the frequencies in each row and column. Defective Non-defective Total Company A 8 (8.81) 60 (59.18) 68 Company B 8 (9.46) 65 (63.54) 73 Company C 10 (9.07) 60 (60.92) 70 Company D 9 (7.65) 50 (51.35) 59 Total 35 235 270

3. Find the degrees of freedom (v), v = (2 1) * (4 -1) = 3 4. Find the expected frequencies(Er,c ): E1,1 = (68)(35)/270 = 8.81 E1,2 = (68)(235)/270 = 59.18 E2,1 = (73)(35)/270 = 9.46 E2,2 = (73)(235)/270 = 63.54 E3,1 = (70)(35)/270 = 9.07 E3,2 = (70)235)/270 = 60.92 E4,1 = (59)(35)/270 = 7.65 E4,2 = (59)(235)/270 = 51.35

5. Solve for the calculated chi-squared value (2) and find its probability.

Test for Independence| 6

6. In a given significant level find the probability of 2 . In this case the given significant level () of 0.01. From table A.5 we find that, 20.01 (v=3) = 11.345

7. Make a conclusion as based on statistics that is approximated by the chi-squared distribution. Since 2 < 20.01 with v=3 at the 0.01 level of significance, we should accept the null hypothesis (H0).

Test for Independence| 7

You might also like