You are on page 1of 30

Inference about population variance

(one sample case)


Module II: Application of Epidemiology & Biostatistics in Public Health
Variance
Measures level of risk
Large variation is considered bad poor quality

Sample Variance



Population Variance

1
) (
2
2

n
x x
s
i
N
x
i
2
2
) (


=

o
Point estimate for population variance
An estimate of the population variance (o
2
) is
provided by the sample variance (s
2
)

To obtain and perform a hypothesis test for variance,
we must know the shape & properties for the
probability distribution of the sample variance.
Sampling distribution of sample variance (s
2
):
If the underlying population is normal. Then the same
variance (s
2
) is an unbiased estimator of the population
variance (o
2
) or the mean of all possible sample variance
will be equal to population variance i.e.

E (s
2
) = o
2


Chi-square (_
2
) Distribution
If the observations (data) follows normal distribution
with population mean () and standard deviation (o),
then the quantity (n-1)s
2
/o
2
has a chi-square (_
2
)
distribution with df = (n-1)

Properties of Chi-square (_
2
) distribution:
The distribution is not symmetric, it is skewed to the right
The mean of the chi-square distribution is equal to its
degree of freedom (df)
It has many shapes, which are based on its degree of
freedom (df)
Chi-square distribution for different df
How to observe a value in Chi-square table

Find the critical chi-square value for 15 df when o =
0.05 and the test is right tailed?

Find the critical chi-square value for 10 df when o =
0.05 and the test is left tailed?

Find the critical chi-square value for 22 df when o =
0.05 and a two tailed test is conducted?
Chi-square test Assumptions & Formula
Assumptions
The sample must be randomly selected from the population
The population must be normally distributed for the variable
under study
The observations must be independent of one another

Formula for Chi-square test for a single variance



Degree of freedom (df) = n-1
N = sample size
s
2
= sample variance
o
2
= population variance
2

=
o
2
2
) 1 ( s n
x
Testing hypothesis - Population Variance o
2

Step 1
State the null and alternative hypothesis






Step II
Level of significance

Step III
State associated test statistics & calculate its value

Step IV
State the critical region

Step V
Make an appropriate decision

Two tailed One tailed (Right) One tailed (Left)
H0: o
2
= o
2
0
H0: o
2
s o
2
0
H0: o
2
> o
2
0
H1: o
2
= o
2
o
H1: o
2
> o
2
o
H1: o
2
< o
2
o
Confidence Interval for
Population Variance (o
2
)
2

< <

o o
o
1 ,
2
2
2
,
2
2
) 1 ( ) 1 (
df df
x
s n
x
s n
Where _
2
has (n-1) degree of freedom
Rejection Region

Right tailed test
_
2
> _
2
df, o

Left tailed test
_
2
< _
2
df, o

Two tailed test
_
2
< _
2
df, 1-o/2
& _
2
> _
2
df, 1-o/2









Example
A cigarette manufacturer wishes to test the claim that the
variance of the nicotine content of its cigarette is 0.644.
nicotine content is measured in mg, and assume that it is
normally distributed. A sample of 20 cigarettes has a
standard deviation of 1.00mg. At o = 0.05, is there
enough evidence to reject the manufacturers claim?
Also calculate and interpret 99% CI?

State the null and alternative hypothesis
Level of significance
State associated test statistics & calculate its value
State the critical region
Make an appropriate decision

how to calculate p-values

df = 19
_2 = 29.5
o = 0.05

Reject the null hypothesis (significant), if the
observed p-value is less than o.

Note: subtract from 1 if values are left tailed.
Inference about population variance
(two independent sample)
Module II: Application of Epidemiology & Biostatistics in Public Health
Testing the Assumption of Equal Population
Variance
Comparing population variance equality
o
1
2
= o
2
2
o
1
2
= o
2
2

Hypothesis
Ho: o
1
2
= o
2
2
or o
1
= o
2

Ha: o
1
2
= o
2
2
or o
1
= o
2


Ratio of two population variances

) ( 1 :
2
2
2
1
2
2
2
1
= = o o
o
o
o
H
) ( 1 :
2
2
2
1
2
2
2
1
= = o o
o
o
o
H
F-distribution
F Statistics will be used
The larger of the variance is placed in the numerator





Properties
Ratio of two chi-square distribution
Only positive values are possible like _2
Shape depends on the number of df in each sample (df1 & df2)
The mean value of F is approximately equal to 1
Using an F-table, we can determine a rejection region based on
o

2
2
2
1
s
s
F =
Assumption & F critical value
Assumption
Two population from which the sample were obtained
must be normally distributed
The samples must be independent of each other

F critical value
Find critical value for a right tailed F test when o = 0.05,
df for numerator are 15 and df for denominator are 21?

Find critical value for a two tailed F test when o = 0.05
sample size for numerator is 15 and for denominator is
21?

If o value is different: F
0.095,3,5






1 , 2
2 , 1 1
1
df df
df df
F
F
,
,
=
o
o
Testing hypothesis - Population Variance o
2

Step 1
State the null and alternative hypothesis






Step II
Level of significance

Step III
State associated test statistics & calculate its value

Step IV
State the critical region

Step V
Make an appropriate decision

Two tailed One tailed (Right) One tailed (Left)
H0: o
1
2
= o
2
2

H0: o
1
2
s o
2
2

H0: o
1
2
> o
2
2

H1: o
1
2
= o
2
2

H1: o
1
2
> o
2
2

H1: o
1
2
< o
2
2

Confidence Interval for
Population Variance (o
2
)
(
(

< <
(
(

2
o o
o
o
1
, ,
2
2
2
1
2
2
2
1
, ,
2
2
2
1
2 1 2 1
1 1
df df df df
F s
s
F s
s
Where df
1
= n
1
-1 & df
2
= n
2
-1
Rejection Region

Right tailed test
F > F
o,df

Left tailed test
F < F
1-o,df

Two tailed test
F< F
1-o/2,df
& F> F
1-o/2,df






Example
A medical researcher wishes to see whether the
variances of the heart rate (beats/min) of smokers is
different from the variance of heart rates of people
who do not smoke. Using o = 0.05, is there enough
evidence to support the claim?
Construct a 95% CI?

State the null and alternative hypothesis
Level of significance
State associated test statistics & calculate its value
State the critical region
Make an appropriate decision

t test for two independent sample
Module II: Application of Epidemiology & Biostatistics in Public Health
t test for two independent samples
Samples are independent samples when they are not related

Two different options
When variance of population are not equal
When variances are equal

Assumptions
It is a statistical test for the mean of a population
Used when population is normally or approximately normally
distributed

Used when
o is unknown
q s 30

Degree of freedom
d.f. = q - 1

When variances are unequal





Where
x
1
-x
2
is the observed difference between two sample
means

1
-
2
= 0
Denominator is the Standard Error of the differences
between two means
( ) ( )
2
2
2
1
2
1
2 1
n
s
n
s
x x
t
+

=
2
1

When variances are equal





Where
n
1
+n
2
-2 = df [(n
1
-1)+(n
2
-1)]
Pooled estimate of variance
Weighted average of variance using the two sample variances
and the degree of freedom of each variances as the weights

( ) ( )
( ) ( )
2 1 2 1
2
2 2
2
1 1
2 1
1 1
2
1 1
n n n n
s n s n
x x
t
+
+
+

=
2
1

Note:


To use t test, first we have to use F test to determine
whether the variances are equal or not.

Testing hypothesis Two population mean
Step 1
State the null and alternative hypothesis






Step II
Level of significance

Step III
State associated test statistics & calculate its value

Step IV
State the critical region

Step V
Make an appropriate decision

Two tailed One tailed (Right) One tailed (Left)
H0:
1
2
=
2
2

H0:
1
2
s
2
2

H0:
1
2
>
2
2

H1:
1
2
=
2
2

H1:
1
2
>
2
2

H1:
1
2
<
2
2

Confidence Interval for
Difference of two means
Variances unequal




Variances equal
2
2
2
1
2
1
2 1
) (
n
s
n
s
t x x +
2
o
( ) ( )
2 1 2 1
2
2 2
2
1 1
2 1
1 1
*
2
1 1
) (
n n n n
s n s n
t x x +
+
+

2
o
Example
A researcher wishes to determine whether the
salaries of residents employed by private hospitals
are higher then those of residents employed by
government owned hospitals. She selected a sample
of residents from each type of hospital and
calculates the means and SD of their salaries. At o =
0.01, can she conclude that the private hospitals pay
more than government hospitals?
Find out the CI?

Assume that population are approximately normally
distributed and variances are equal.





State the null and alternative hypothesis
Level of significance
State associated test statistics & calculate its value
State the critical region
Make an appropriate decision

Private Government
x
1
= $26800 x
2
= $25400
s
1
=$600 s
2
= $450
n
1
=10 n
2
= 8
Thank You

You might also like