You are on page 1of 62

2007 Pearson Education

Chapter 5: Hypothesis Testing


and Statistical Inference

Hypothesis Testing

Hypothesis testing involves drawing


inferences about two contrasting propositions
(hypotheses) relating to the value of a
population parameter, one of which is
assumed to be true in the absence of
contradictory data.

We seek evidence to determine if the


hypothesis can be rejected; if not, we can
only assume it to be true but have not
statistically proven it true.

Hypothesis Testing Procedure
1. Formulate the hypothesis
2. Select a level of significance, which defines
the risk of drawing an incorrect conclusion
that a true hypothesis is false
3. Determine a decision rule
4. Collect data and calculate a test statistic
5. Apply the decision rule and draw a
conclusion

Hypothesis Formulation

Null hypothesis, H
0
a statement that is
accepted as correct

Alternative hypothesis, H
1
a proposition that
must be true if H
0
is false

Formulating the correct set of hypotheses


depends on burden of proof what you
wish to prove statistically should be H
1

Tests involving a single population parameter


are called one-sample tests; tests involving
two populations are called two-sample tests.

Types of Hypothesis Tests

One Sample Tests

H
0
: population parameter constant vs.
H
1
: population parameter < constant

H
0
: population parameter constant vs.
H
1
: population parameter > constant

H
0
: population parameter = constant vs.
H
1
: population parameter constant

Two Sample Tests

H
0
: population parameter (1) - population parameter (2) 0 vs.
H
1
: population parameter (1) - population parameter (2) < 0

H
0
: population parameter (1) - population parameter (2) 0 vs.
H
1
: population parameter (1) - population parameter (2) > 0

H
0
: population parameter (1) - population parameter (2) = 0 vs.
H
1
: population parameter (1) - population parameter (2) 0

Four Outcomes
1. The null hypothesis is actually true, and the
test correctly fails to reject it.
2. The null hypothesis is actually false, and the
hypothesis test correctly reaches this
conclusion.
3. The null hypothesis is actually true, but the
hypothesis test incorrectly rejects it (Type I
error).
4. The null hypothesis is actually false, but the
hypothesis test incorrectly fails to reject it
(Type II error).

Quantifying Outcomes

Probability of Type I error (rejecting H


0
when
it is true) = = level of significance

Probability of correctly failing to reject H


0
= 1
= confidence coefficient

Probability of Type II error (failing to reject H


0

when it is false) =

Probability of correctly rejecting H


0
when it is
false = 1 = power of the test

Decision Rules

Compute a test statistic from sample data and


compare it to the hypothesized sampling
distribution of the test statistic

Divide the sampling distribution into a


rejection region and non-rejection region.

If the test statistic falls in the rejection region,


reject H
0
(concluding that H
1
is true);
otherwise, fail to reject H
0

Rejection Regions

Hypothesis Tests and
Spreadsheet Support
Type of Test Excel/PHStat Procedure
One sample test for mean, unknown PHStat: One Sample Test Z-test for the
Mean, Sigma Known
One sample test for mean, unknown PHStat: One Sample Test t-test for the
Mean, Sigma Unknown
One sample test for proportion PHStat: One Sample Test Z-test for the
Proportion
Two sample test for means, known Excel z-test: Two-Sample for Means
PHStat: Two Sample Tests Z-Test for
Differences in Two Means
Two sample test for means, unknown,
unequal
Excel t-test: Two-Sample Assuming
Unequal Variances

Hypothesis Tests and
Spreadsheet Support (contd)
Type of Test Excel/PHStat Procedure
Two sample test for means, unknown,
assumed equal
Excel t-test: Two-Sample Assuming Equal
Variances
PHStat: Two Sample Tests t-Test for
Differences in Two Means
Paired two sample test for means Excel t-test: Paired Two-Sample for Means
Two sample test for proportions PHStat: Two Sample Tests Z-Test for
Differences in Two Proportions
Equality of variances Excel F-test Two-Sample for Variances
PHStat: Two Sample Tests F-Test for
Differences in Two Variances

One Sample Tests for Means
Standard Deviation Unknown

Example hypothesis

H
0
:
0
versus H
1
: <
0

Test statistic:

Reject H
0
if t < -t
n-1,

n s
x
t
/
0


Example
For the Customer Support Survey.xls data, test the
hypotheses

H
0
: mean response time 30 minutes

H
1
: mean response time < 30 minutes
Sample mean = 21.91; sample standard deviation =
19.49; n = 44 observations
Reject H0 because t = 2.75 < -t43,0.05 = -1.6811

PHStat Tool: t-Test for Mean

PHStat menu > One Sample


Tests > t-Test for the Mean,
Sigma Unknown
Enter null hypothesis and alpha
Enter sample statistics or data
range
Choose type of test

Results

Using p-Values

p-value = probability of obtaining a test


statistic value equal to or more extreme than
that obtained from the sample data when H
0

is true
Test Statistic
Lower one-tailed test Two-tailed test

0
Test Statistic

One Sample Tests for
Proportions

Example hypothesis

H
0
:
0
versus H
1
: <
0

Test statistic:

Reject if z < -z


) 1 (
0 0
0

p
z

Example

For the Customer Support Survey.xls data, test the hypothesis that the
proportion of overall quality responses in the top two boxes is at least
0.75

H
0
: .75

H
0
: < .75

Sample proportion = 0.682; n = 44

For a level of significance of 0.05, the critical value of z is -1.645;


therefore, we cannot reject the null hypothesis

PHStat Tool: One Sample z-
Test for Proportions

PHStat > One Sample Tests > z-Tests


for the Proportion
Enter null hypothesis,
significance level, number
of successes, and sample
size
Enter type of test

Results

Type II Errors and the Power
of a Test

The probability of a Type II error, , and the


power of the test (1 ) cannot be chosen by
the experimenter.

The power of the test depends on the true


value of the population mean, the level of
confidence used, and the sample size.

A power curve shows (1 ) as a function of

1
.

Example Power Curve

Two Sample Tests for Means
Standard Deviation Known

Example hypothesis

H
0
:
1

2
0 versus H
1
:
1
-
2
< 0

Test Statistic:

Reject if z < -z


2
2
2 1
2
1
2 1
/ / n n
x x
z
+


Two Sample Tests for Means
Sigma Unknown and Equal

Example hypothesis

H
0
:
1

2
0 versus H
1
:
1
-
2
> 0

Test Statistic:

Reject if z > z


2 1
2 1
2 1
2
2 2
2
1 1
2 1
2
) 1 ( ) 1 (
n n
n n
n n
s n s n
x x
z
+


Two Sample Tests for Means
Sigma Unknown and Unequal

Example hypothesis

H
0
:
1

2
= 0 versus H
1
:
1
-
2
0

Test Statistic:

Reject if z > z
/2
or z < - z
/2

t = (x
1
-
x
2
) /
2
2
2
1
2
1
n
s
n
s
+
1
]
1

+
1
]
1

1
]
1

+
1
) / (
1
) / (
2
2
2
2
2
1
2
1
2
1
2
2
2
2
1
2
1
n
n s
n
n s
n
s
n
s
with df =

Excel Data Analysis Tool: Two
Sample t-Tests

Tools > Data Analysis > t-test: Two Sample


Assuming Unequal Variances, or t-test: Two
Sample Assuming Equal Variances

Enter range of data, hypothesized mean


difference, and level of significance

Tool allows you to test H


0
:
1
-
2
= d

Output is provided for upper-tail test only

For lower-tail test, change the sign on t


Critical one-tail, and subtract P(T<=t) one-tail
from 1.0 for correct p-value

PHStat Tool: Two Sample
t-Tests

PHStat > Two Sample Tests > t-Test


for Differences in Two Means

Test assumes equal variances

Must compute and enter the sample


mean, sample standard deviation, and
sample size

Comparison of Excel and PHStat
Results Lower-Tail Test

Two Sample Test for Means
With Paired Samples

Example hypothesis

H
0
: average difference

= 0 versus

H
1
: average difference 0

Test Statistic:

Reject if t > t
n-1,
/2
or t < - t
n-1,
/2

n s
D
t
D
D
/


Two Sample Tests for
Proportions

Example hypothesis

H
0
:
1

2
= 0 versus H
1
:
1
-
2
0

Test Statistic:

Reject if z > z
/2
or z < - z
/2

,
_

2 1
2 1
1 1
) 1 (
n n
p p
p p
z
where
2 1
n n
samples both in successes of number
p
+


Hypothesis Tests and
Confidence Intervals

If a 100(1 )% confidence interval contains


the hypothesized value, then we would not
reject the null hypothesis based on this value
with a level of significance .

Example hypothesis

H
0
:
0
versus H
1
: <
0

If a 100(1-)% confidence interval does not


contain
0
, then we can reject H
0

F-Test for Differences in Two
Variances

Hypothesis

H
0
:
1
2

2
2
= 0 versus H
1
:
1
2
-
2
2
0

Test Statistic:

Assume s
1
2
> s
2
2

Reject if F > F
/2,n1-1,n2-1
(see Appendix A.4)

Assumes both samples drawn from normal


distributions
2
2
2
1
s
s
F

Excel Data Analysis Tool: F-
Test for Equality of Variances

Tools > Data Analysis > F-test for


Equality of Variances

Specify data ranges

Use /2 for the significance level!

If the variance of Variable 1 is greater


than the variance of variable 2, the
output will specify the upper tail;
otherwise, you obtain the lower tail
information.

PHStat Tool: F-Test for
Differences in Variances

PHStat menu > Two Sample Tests > F-


test for Differences in Two Variances

Compute and enter sample standard


deviations

Enter the significance level , not /2


as in Excel

Excel and PHStat Results

Analysis of Variance (ANOVA)

Compare the means of m different


groups (factors) to determine if all are
equal

H
0
:
1

1
...
m

H
1
: at least one mean is different from the
others

ANOVA Theory

n
j
= number of observations in sample j

SST = total variation in the data

SSB = variation between groups

SSW = variation within groups


SST = SSB + SSW



n
j
n
i
ij
j
X X SST
1 1
2
) (


n
j
j j
X X n SSB
1
2
) (



n
j
n
i
j ij
j
X X SSW
1 1
2
) (

ANOVA Test Statistic

MSB = SSB/(m 1)

MSW = SSW/(n m)

Test statistic: F = MSB/MSW

Has an F-distribution with m-1 and n-m


degrees of freedom

Reject H
0
if F > F
/2,m-1,n-m

Excel Data Analysis Tool for
ANOVA

Tools > Data Analysis > ANOVA: Single


Factor

ANOVA Results

ANOVA Assumptions

The m groups or factor levels being studied


represent populations whose outcome
measures are

Randomly and independently obtained

Are normally distributed

Have equal variances

Violation of these assumptions can affect the


true level of significance and power of the
test.

Nonparametric Tests

Used when assumptions (usually


normality) are violated. Examples:

Wilcoxon rank sum test for testing


difference between two medians

Kurskal-Wallis rank test for determining


whether multiple populations have equal
medians.

Both supported by PHStat



Tukey-Kramer Multiple
Comparison Procedure

ANOVA cannot identify which means


may differ from the rest

PHStat menu > Multiple Sample Tests


> Tukey-Kramer Multiple Comparison
Procedure
Enter Q Statistic from Table A.5

Chi-Square Test for
Independence

Test whether two categorical variables


are independent

H
0
: the two categorical variables are
independent

H
1
: the two categorical variables are
dependent

Example

Is gender independent of holding a CPA


in an accounting firm?

Chi-Square Test for
Independence

Test statistic

Reject H
0
if
2
>
2
, (r-1)(c-1)

PHStat tool available in Multiple Sample


Tests menu
e
e o
f
f f
2
2
) (

where f
0
= observed frequency
f
e
= expected frequency if H
0
true
in the cells of the contingency table

Example
Expected No CPA CPA Total
Female 6.74 7.26 14
Male 6.26 6.74 13
Total 13 14 27
Critical value with = 0.05 and (2 - 1)(2 - 1) - 1 df =
3.841; therefore, we cannot reject the null hypothesis
that the two categorical variables are independent.

PHStat Procedure Results

Design of Experiments

A test or series of tests that enables the


experimenter to compare two or more
methods to determine which is better,
or determine levels of controllable
factors to optimize the yield of a
process or minimize the variability of a
response variable.

Factorial Experiments

All combinations of levels of each factor are considered.


With m factors at k levels, there are k
m
experiments.

Example: Suppose that temperature and reaction time


are thought to be important factors in the percent yield of
a chemical process. Currently, the process operates at a
temperature of 100 degrees and a 60 minute reaction
time. In an effort to reduce costs and improve yield, the
plant manager wants to determine if changing the
temperature and reaction time will have any significant
effect on the percent yield, and if so, to identify the best
levels of these factors to optimize the yield.

Designed Experiment

Analyze the effect of two levels of each


factor (for instance, temperature at 100
and 125 degrees, and time at 60 and
90 minutes)

The different combinations of levels of


each factor are commonly called
treatments.

Treatment Combinations
Low
High
Low High

Experimental Results

Main Effects

Measures the difference in the response that


results from different factor levels

Calculations

Temperature effect = (Average yield at high level) (Average yield


at low level)
= (B + D)/2 (A + C)/2
= (90.5 + 81)/2 (84 + 88.5)/2
= 85.75 86.25 = 0.5 percent.

Reaction effect = (Average yield at high level) (Average yield at


low level)
= (C + D)/2 (A + B)/2
= (88.5 + 81)/2 (84 + 90.5)/2
= 84.75 87.25 = 2.5 percent.


Interactions

When the effect of changing one factor


depends on the level of other factors.

When interactions are present, we


cannot estimate response changes by
simply adding main effects; the effect
of one factor must be interpreted
relative to levels of the other factor.

Interaction Calculations

Take the average difference in response


when the factors are both at the high or low
levels and subtracting the average difference
in response when the factors are at opposite
levels.

Temperature Time Interaction


= (Average yield, both factors at same level)
(Average yield, both factors at opposite levels)
= (A + D)/2 (B + C)/2
= (84 + 81)/2 (90.5 + 88.5)/2 = -7.0 percent

Graphical Illustration of
Interactions

Two-Way ANOVA

Method for analyzing variation in a 2-factor


experiment

SST = SSA + SSB + SSAB + SSW


where
SST = total sum of squares
SSA = sum of squares due to factor A
SSB = sum of squares due to factor B
SSAB = sum of squares due to interaction
SSW = sum of squares due to random variation (error)

Mean Squares

MSA = SSA/(r 1)

MSB = SSB/(c 1)

MSAB = SSAB/(r-1)(c-1)

MSW = SSW/rc(k-1),
where k = number of replications of
each treatment combination.

Hypothesis Tests

Compute F statistics by dividing each mean square


by MSW.

F = MSA/MSW tests the null hypothesis that means for


each treatment level of factor A are the same against the
alternative hypothesis that not all means are equal.

F = MSB/MSW tests the null hypothesis that means for


each treatment level of factor A are the same against the
alternative hypothesis that not all means are equal.

F = MSAB/MSW tests the null hypothesis that the


interaction between factors A and B is zero against the
alternative hypothesis that the interaction is not zero.

Excel Anova: Two-Factor with
Replication

Results
Examine p-
values for
significance

You might also like