You are on page 1of 6

CROSS TABULATION & THE 2TEST FOR INDEPENDENCE

Cross tabulation: frequency count of the joint occurrence of two variables.

2 test for independence: hypothesis test used to evaluate assertions about the relationship
between cross tabulated variables.
Example Research Problem
Miller Brewing Company hires you to determine if peoples preferences for light beer depend
upon their demographic characteristics. They intend to use your results to develop an
advertising plan that will target those who prefer light beer. Your research design includes
collecting data about beer preferences, sex, and income from a randomly selected group of
beer drinkers. One of the hypotheses you test is that preference for light beer depends upon
the consumer's income. To cross tabulate the variables and test for a relationship you code the
variables as:

PREFERENCE(1) = dislikes light beer INCOME(1) = low


PREFERENCE(2) = indifferent about light beer INCOME(2) = middle
PREFERENCE(3) = prefers light beer INCOME(3) = high

Using SPSS Analyze>Descriptive Statistics>Crosstabs>Statistics>Chi-square, you generate


the follow output:
NOTE: In SPSS, put the dependent variable in the rows.
PEFERENCE * INCOME Crosstabulation

INCOME
low middle high Total
PEFERENCE dislikes Count 22 13 3 38
Expected
13.8 14.3 9.9 38.0
Count
% within
57.9% 34.2% 7.9% 100.0%
PEFERENCE
% within
40.7% 23.2% 7.7% 25.5%
INCOME
indifferent Count 18 21 11 50
Expected
18.1 18.8 13.1 50.0
Count
% within
36.0% 42.0% 22.0% 100.0%
PEFERENCE
% within
33.3% 37.5% 28.2% 33.6%
INCOME
prefers Count 14 22 25 61
Expected
22.1 22.9 16.0 61.0
Count
% within
23.0% 36.1% 41.0% 100.0%
PEFERENCE
% within
25.9% 39.3% 64.1% 40.9%
INCOME
Total Count 54 56 39 149
Expected
54.0 56.0 39.0 149.0
Count
% within
36.2% 37.6% 26.2% 100.0%
PEFERENCE
% within
100.0% 100.0% 100.0% 100.0%
INCOME

1
Chi-Square Tests

Asymp.
Sig.
Value df (2-sided)
Pearson a
18.597 4 .001
Chi-Square
Likelihood Ratio 19.390 4 .001
Linear-by-Linear
17.698 1 .000
Association
N of Valid Cases 149
a. 0 cells (.0%) have expected count less than 5.
The minimum expected count is 9.95.

Symmetric Measures

Asymp. Approx.
Value Std. Errora Approx. Tb Sig.
Ordinal by Ordinal Gamma .458 .093 4.592 .000
N of Valid Cases
149

a. Not assuming the null hypothesis.


b. Using the asymptotic standard error assuming the null hypothesis.

HOW THE X2 TEST FOR INDEPENDENCE WORKS

1) The hypothesis test is based on the presumption that the variables are not related,
i.e., the value of the dependent variable is not being affected by the value of the
independent variable. When this is the case, you would expect the number of
observations from each independent variable category to be approximately
proportionally the same for each dependent variable category.

2) The hypothesis test evaluates the difference between the values actually observed
(O) and the values that would be expected (E) if the variables are not related.

3) As part of the analysis, SPSS calculates the Es and compares them to the Os. The
comparison will find either:

little difference between O and E. This means that what was actually observed is
what you would have expected to observe if the variables were not related. In this
case, the variables are independent (not related) because the value of the
dependent variable is not correlated with the value of the independent variable.

considerable difference between O and E. This means that what was actually
observed is different than what you would have expected to observe if the variables
were not related. In this case, the variables are not independent (are related)
because the value of the dependent variable is correlated with the value of the
independent variable.
2
4) STATISTICAL ACCURACY REQUIRES:

for a 2x2 (2 rows and 2 columns) cross tabulation, all Es 5.


for a cross tabulation larger than 2x2, no more than 20% of Es < 5.

SPSS Analyze>Descriptive Statistics>Crosstabs>Statistics>Chi-square will


report the value of the minimum expected frequency and the percent of Es that are
less than 5. If the requirements on E are not met, combine rows and/or columns
by recoding your data. SPSS Analyze>Descriptive Statistics>Crosstabs>Cells
reports the expected frequencies for each cell, which can be use to decide which
rows and/or columns to combine.

STEPS FOR THE X2 TEST FOR INDEPENDENCE

STEP 1: Formulate the hypotheses

The null hypothesis states that no relationship exists between the two variables.
The alternative hypothesis states that a relationship exists between the two variables.

STEP 2: Conduct the test


Set the level and reject H0 if the p-value is less than .

STEP 3: State the results

3
TESTING FOR INDEPENDENCE BETWEEN INCOME AND BEER PREFERENCE

STEP 1: Formulate the hypotheses


H0: No relationship exists between income and beer preference.
Ha: A relationship exists between income and beer preference.

PEFERENCE * INCOME Crosstabulation

INCOME
low middle high Total
PEFERENCE dislikes Count 22 13 3 38
Expected
13.8 14.3 9.9 38.0
Count
% within
57.9% 34.2% 7.9% 100.0%
PEFERENCE
% within
40.7% 23.2% 7.7% 25.5%
INCOME
Chi-Square Tests
indifferent Count 18 21 11 50
Expected Asymp.
18.1 18.8 13.1 50.0
Count Sig.
% within Value df (2-sided)
36.0% 42.0% 22.0% 100.0%
PEFERENCE Pearson a
% within 18.597 4 .001
Chi-Square
33.3% 37.5% 28.2% 33.6%
INCOME Likelihood Ratio 19.390 4 .001
prefers Count 14 22 25 61 Linear-by-Linear
Expected 17.698 1 .000
22.1 22.9 16.0 61.0 Association
Count
N of Valid Cases 149
% within
23.0% 36.1% 41.0% 100.0% a. 0 cells (.0%) have expected count less than 5.
PEFERENCE
% within The minimum expected count is 9.95.
25.9% 39.3% 64.1% 40.9%
INCOME
Total Count 54 56 39 149
Expected
54.0 56.0 39.0 149.0
Count
% within
36.2% 37.6% 26.2% 100.0%
PEFERENCE
% within
100.0% 100.0% 100.0% 100.0%
INCOME

Symmetric Measures

Asymp. Approx.
Value Std. Errora Approx. Tb Sig.
Ordinal by Ordinal Gamma .458 .093 4.592 .000
N of Valid Cases
149

a. Not assuming the null hypothesis.


b. Using the asymptotic standard error assuming the null hypothesis.

STEP 2: Conduct the test: Reject H0 because the obtained level [Asymp.
Sig. (2-sided)] is .001indicating highly significant results.

STEP 3: State the results


The results indicate the presence of a relationship between income and preference for
light beer.
INTERPRETING THE RESULTS
4
1) Percent of row totals and percent of column totals are used to identify the nature
of the relationship. Use SPSS Analyze>Descriptive Statistics>Crosstabs>Cells
to obtain the percent of row and percent of column tables.

2) Cramer's V is used to measure the strength of the relationship when the cross tab
involves a nominal scaled variable.
a) 0V1
b) The closer v is to 1 (0), the stronger (weaker) the relationship.

Use SPSS Analyze>Descriptive Statistics>Crosstabs>Statistics to obtain


Cramers V

3) Gamma () is used to measure the strength and direction of the relationship


when both variables are at least ordinal scaled.
a) -1 1
b) The closer is to 1 (-1), the stronger the positive (negative) relationship.

Use SPSS Analyze>Descriptive Statistics>Crosstabs>Statistics to obtain .

INTERPRETING & COMMUNICATING RESEARCH RESULTS


In order to determine if preferences for light beer depend upon consumer's income, 149
beer drinker's preferences for light beer were cross tabulated with their income. The 2
test indicates the presence of a relationship (p-value = .001, Appendix 1, p.7). The
gamma value of .457 indicates a moderate positive relationship. This means that higher
income consumers tend to prefer light beer. The cross tabulated results expressed as a
percentage of the row responses (PREFERENCE) affirm this result. 41 percent of
those who expressed a preference for light beer are in the HIGH income group, while
only 7.9% are in the LOW income group. Most notable is that more than half (57.9%) of
those who expressed a dislike for light beer were in the LOW income group. The
percentage of column responses (INCOME) provide additional support. 64.1 percent of
those who prefer light beer are from the HIGH income group, while only 25.9 percent of
the LOW income group shares this preference.

5
Example Research Problem:

The manager of the Onalaska Best Buy hires you to determine whether there is a relationship between the
advertising medium the customer heard about the store's 24 Hour Sale and the amount they spend. During
the 24 Hour Sale you randomly sample 200 customers that have made purchases. The output from the
analysis of these data appears below. The MEDIUM and EXPENDITURE variables are coded:

MEDIUM(1) = Newspaper EXPENDITURE(1) = Under $100


MEDIUM(2) = Radio EXPENDITURE(2) = $100-$199.99
MEDIUM(3) = Television EXPENDITURE(3) = $200 or more

EXPENDITURE * Medium Crosstabulation

Medium
Newspaper Radio Television Total
EXPENDITURE Under $100 Count 21 10 25 56
Expected Count 15.1 18.2 22.7 56.0
% within
37.5% 17.9% 44.6% 100.0%
EXPENDITURE
% within
38.9% 15.4% 30.9% 28.0%
Medium
$100-199.99 Count 26 27 13 66
Expected Count 17.8 21.5 26.7 66.0
% within
39.4% 40.9% 19.7% 100.0%
EXPENDITURE
% within
48.1% 41.5% 16.0% 33.0%
Medium
$200 or more Count 7 28 43 78
Expected Count 21.1 25.4 31.6 78.0
% within
9.0% 35.9% 55.1% 100.0%
EXPENDITURE
% within
13.0% 43.1% 53.1% 39.0%
Medium
Total Count 54 65 81 200
Expected Count 54.0 65.0 81.0 200.0
% within
27.0% 32.5% 40.5% 100.0%
EXPENDITURE
% within
100.0% 100.0% 100.0% 100.0%
Medium

Symmetric Measures Chi-Square Tests

Approx. Asymp.
Value Sig. Sig.
Nominal by Phi .402 .000 Value df (2-sided)
Nominal Cramer's V .284 .000 Pearson a
32.247 4 .000
Contingency Chi-Square
.373 .000 Likelihood Ratio 36.685 4 .000
Coefficient
N of Valid Cases Linear-by-Linear
200 9.703 1 .002
Association
a. Not assuming the null hypothesis. N of Valid Cases 200

b. Using the asymptotic standard error assuming the a. 0 cells (.0%) have expected count less than 5.
null hypothesis. The minimum expected count is 15.12.

Taking everything into consideration, what do you conclude about the influence advertising media have on
expenditures?

You might also like