You are on page 1of 32

MTE 3105 STATISTIK

PENGUJIAN HIPOTESIS 8
Ujian Khi Kuasa Dua

Dr. Lam Kah Kei
Chi Square Tests
to determine if a distribution of observed
frequencies differs from the theoretical expected
frequencies.
Chi-square statistics use categorical data,
thus instead of using means and variances, this
test uses frequencies.

Jenis Ujian Khi Kuasa Dua
Ujian Penyesuaian Terbaik
(Test of Goodness of Fit)
Ujian Perkaitan / Ketakbersandaran
(Test of Independence/ Test of Association)
Test for Independence
The test of independence analyzes the relationship
between two nominal variables.
The procedure uses the special terms independent
to mean not related/not associated, and not
independent to mean related/associated.
Two variables are independent if the occurrence of
one of the variables does not affect the occurrence
of the other.
The two nominal variables form a contingency table
of cells.
SW388R6
Data Analysis and Computers I

Slide 6
Independence Defined
When two variables are independent, there is
no relationship between them. We would
expect that the frequency breakdowns of the
dependent variable to be similar for all
groups.
SW388R6
Data Analysis and Computers I

Slide 7
Independence Demonstrated
Suppose we are interested in the relationship between
gender and attending college.

If there is no relationship between gender and attending
college, and 40% of our total sample attend college, we would
expect 40% of the males in our sample to attend college and
40% of the females to attend college.

If there is a relationship between gender and attending
college, we would expect a higher proportion of one group to
attend college than the other group, e.g. 60% to 20%.
SW388R6
Data Analysis and Computers I

Slide 8
Displaying Independent and Dependent
Relationships
Independent Relationship
between Gender and College
40% 40% 40%
0%
20%
40%
60%
80%
100%
Males Females Total
P
o
p
o
r
t
i
o
n

A
t
t
e
n
d
i
n
g

C
o
l
l
e
g
e
Dependent Relationship
between Gender and College
60%
20%
40%
0%
20%
40%
60%
80%
100%
Males Females Total
P
o
p
o
r
t
i
o
n

A
t
t
e
n
d
i
n
g

C
o
l
l
e
g
e
When the variables are
independent, the proportion
in both groups is close to the
same size as the proportion
for the total sample.
When group membership
makes a difference, the
dependent relationship is
indicated by one group having
a higher proportion than the
proportion for the total
sample.
Table of Contingency
Example: Is Disease Associated With Exposure?
2 x 2 contingency table
How many cells?
Column totals
Row totals
cells
Total
Displaying Data in Contingency Table

Table of Contingency
Language preference and types of school
3 x 2 Contingency table





How many schools?
How many cells in the table?
Expected Cell Frequencies (E
i
)

Test of Independence


SW388R6
Data Analysis and Computers I

Slide 14
Hypotheses
The null hypothesis is that the two variables are independent.
This will be true if the observed counts in the sample are equal
to the expected counts.

The research hypothesis states that the two variables are
dependent or related. This will be true if the observed counts
for the categories of the variables in the sample are different
from the expected counts.

The decision rule for the chi-square test of independence is the
same as our other statistical tests.
Chi-Square Test of Independence
Example: A study examined whether Schools for the Deaf that
identified giftedness in their students is related to the type of
language preferences. Table below shows observed number of
schools which did not identify students as gifted, (Level I) and
number of schools of the deaf which did (Level II) according to
language preference. Test at 5% significance level whether
relationship exists between the types of schools and language
preference.
H
0
: There is no association between types of schools and
language preference
or
There is independence between types of schools and
language preference.
H
1
: There is association between types of schools and
language preference
or
There is no independence between types of schools and
language preference
1. Determine the hypotheses.
Test of Independence
2. Calculate expected frequencies for each cell category
(contingency table).
3. Specify the distribution, df, level of significance.
Although not all expected frequencies E > 5, some books state assumptions as
every cell at least 1 and at least 20%, of E>5 then
2
distribution,
df = (r-1)(c-1) = (3-1)(2-1) = 2, and = .05

4. Determine the critical value and rejection rule.
From table,
Critical value
2
2,.05
= 5.991
Reject null if computed
2
>5.991
2
0
5.991
df = 2

Critical X
2
=
= 5.991
5. Compute the value of the test statistic.
857 . 2 Calculated
2

The test statistic 2.857 < 5.991 does not fall in the rejection
region, so fail to reject H
0
.


There is evidence that the types of schools is independent
(i.e. not associated) with the language preference.
2
2.857
6. Determine whether to reject H
0
and make decision
5.991
Another example
Cuba ini.
Darah manusia adalah diklasifikasikan sebagai A, B, AB
atau O. Tambahan, darah manusia juga boleh dikelaskan
sebagai Rh positif (Rh
+
) atau Rh negatif (Rh
-
). Tinjauan
500 individu yang dipilih secara rawak mendapati
keputusan seperti berikut:




Ujikan sama ada jenis darah dan aras Rh adalah berkaitan
pada aras kesignifikanan 0.05.
Aras Rh
Jenis Darah
A B AB O
Rh
+
176 28 22 198
Rh
-
30 12 4 30
H
0
: Jenis darah dan aras Rh adalah tidak berkaitan/tidak
bersandar (independent).
H
1
: Jenis darah dan aras Rh adalah berkaitan/bersandar
(dependent)
1. Determine the hypotheses.
Penyelesaian
2. Calculate expected frequencies for each cell category.
total Table
) total Column )( total Row (
) E ( frequency Expected
Frekuensi dijangka ditunjukkan dalam
kurungan
Aras Rh
Jenis Darah
A B AB O
Rh
+
176
(174.688)
28
(33.92)
22
(22.048)
198
(193.344)
Rh
-
30
(31.312
12
(6.08)
4
(3.952)
30
(34.656)
3. Specify the distribution, df, level of significance.
Semua frekuensi dijangka E > 1, hanya 1 dpd 8 atau <20% frekuensi
dijangka kurang dpd 5. Taburan
2

df = (r-1)(c-1) = (2-1)(4-1) = 3, and = .05

4. Determine the critical value and rejection rule.
Daripada jadual,
Nilai kritikal
2
3,.05
= 7.815
Tolak nol jika
2
> 7.815
5. Compute the value of the test statistic.
601 . 7
656 . 34
) 656 . 34 30 (
952 . 3
) 952 . 3 4 (
08 . 6
) 08 . 6 12 (
312 . 31
) 312 . 31 30 (
344 . 193
) 344 . 193 198 (
048 . 22
) 048 . 22 22 (
92 . 33
) 92 . 33 28 (
688 . 174
) 688 . 174 176 (
2 2 2 2
2 2 2 2
2


Statistik ujian 7.601 < 7.815, maka tidak menolak H
0
.


Tiada bukti yang mencukupi pada aras kesignifikanan =.05 untuk
menyokong tuntutan aras Rh dan jenis darah adalah berkaitan/
bersandar.
6. Determine whether to reject H
0
and make decision
Yates Correction
For a 2 x 2 contingency table, Yates correction
should be applied when calculating
2
,
where



E
) 5 . 0 E O (
2
2
Cuba ini.
A driving school examined the results of 100
candidates who took their test for the first
time. It was found that out of 40 men, 28
passed and out of 60 women, 34 passed. Do
these results indicate, at the 5% significance
level, a relationship between the gender of
the candidate and the ability to pass the
driving test at the first attempt?
H
0
: There is no relationship between the gender of a
candidate and the ability to pass at the first attempt.
H
1
: There is a relationship between the gender of a
candidate and the ability to pass at the first attempt.
Contingency table: 2 x2

Results driving test Total
Gender Pass Fail
Male 28 12 40
Female 34 26 60
Total 62 38 100
Calculated expected frequencies




All expected frequencies >5,
2
distribution
Critical value
2
=3.841, reject null if
2
>3.841
Results driving test Total
Gender Pass Fail
Male 24.8 15.2 40
Female 37.2 22.8 60
Total 62 38 100
Using Yates correction,







2
<3.841, do not reject null. There is not enough evidence
at the 5% sig level to indicate a relationship between
gender and ability to pass driving test

O E
(O E-0.5
2
)
E
28 24.8 0.293
34 37.2 0.195
12 15.2 0.479
26 22.8 0.319
O=100 E=100 1.289

You might also like