You are on page 1of 12

STAT 225

Lecture 32-33 Notes

Confidence Intervals
Why do we bother analyzing data? We want to draw conclusions from the data we have observed.
Why cant we just accept our sample mean as the official mean for the population?
Every time we draw a new sample we will more than likely observe a different value for the sample mean
X . This is due to sampling variability.
Two most common types of formal statistical inference:
Confidence Intervals: Used when we want to provide an estimate for a population parameter.
Significance Tests: Used when we want to assess the evidence provided by the data in favor of
some claim about the population (yes/no question about the population).
Confidence Intervals allow us to estimate the population mean.
The true mean for the population exists and is a fixed real number, but we just dont know what it is.
Using our sample statistic, we can create a net to give us an estimate of where to expect the population
parameter to be. For a fixed sample size the size of the net depends on how much confidence we want
to place on our estimate.
Defn : A level (1-) confidence interval for a population parameter is an interval computed from the
sample data by a method that has probability (1-) of producing an interval containing the true value of
the parameter.

If we just take a single sample, our


single confidence interval net may or
may not include the population
parameter.
However if we take many samples of
the same size and create a confidence
interval from each sample statistic, over
the long run (1-)% of our confidence
intervals will contain the true population
parameter (if we are using a (1-)%
confidence interval).

Confidence interval for a population mean :

x (Z 2 )

Z 2
Percentile

where Z denotes the ( 1 )th percentile from the std. normal distribution
2

.1
1.645

.05
1.96

.01
2.57

95th

97.5th

99.5th

Q: For a fixed sample size n, what happens to the width of the confidence interval as the confidence level
(1-) increases?

By increasing the confidence level we are essentially increasing the size of our net.

Q: For a fixed confidence level (1-), what happens to the width of the confidence interval as the sample
size increases?

A smaller net is good because it gives you more information. It is a smaller range for where to expect
the true population parameter to reside.

In general a confidence interval will take the following form:

( Estimate) (ConfidenceCoefficient )*(Std.Dev.ofEstimate)

We will refer to (ConfidenceCoefficient )*(Std.Dev.ofEstimate) as the Margin of Error (MOE).

Note: The MOE determines the width of the interval. The width is 2*MOE. The estimate is always in
the center of the interval.
Recall that when a SRS sampling scheme is employed the mean and standard deviation of the sample
mean are:

Also recall that if the underlying distribution is normal, then so is the distribution of the sample mean
regardless of the sample size, i.e. If X ~ N(, ) then X ~N(,

) for any n.

As well if the sample size is large ( 30) and a SRS scheme was used then no matter what the underlying
distribution of X is, the Central Limit Theorem (CLT) allows us to approximate probabilities about the
sample mean X .

Ex. 1: SAT scores of Purdue University applicants follow a normal distribution with unknown mean
and known standard deviation 185. A sample of 15 applicants SAT scores produces a sample mean of
1097. Calculate both a 95% and 99% confidence interval for the true population mean .

What happens if the MOE is too large?


There are two ways to reduce it:
1)

Draw a larger sample

2) Decrease the confidence level

Choosing the correct sample size:


A wise practitioner of statistics never plans data collection without at the same time planning the
inference. You can arrange to have both high confidence and a small margin of error. Heres how:

MOE = (Z )

For a given MOE, confidence level, and standard deviation just solve the equation for n. The resulting
quantity gives a lower bound for the sample size. Note: The sample size n affects the MOE while the
size of the population has nothing to do with it.
For example if the maximum allowable MOE is given by E then we can set up an inequality as follows:
2

solving for n we get the following lower bound for the sample size, n 2 .
E (Z 2 )
n
E

Ex. 2: You want to rent an unfurnished one-bedroom apartment in Boston next year. The mean monthly
rent for a random sample of 32 apartments advertised in the local newspaper is $1,400. Assume that the
standard deviation is $220. Find a 95% confidence interval for the mean monthly rent for unfurnished
one-bedroom apartments available for rent in this community?

How large a sample of one-bedroom apartments would be needed to estimate the mean
within $50 with 90% confidence?

Some Cautions: You may only use the formula x (Z )


2

under certain circumstances.

The data should be produced from a SRS from the population.

Do not use if the sampling is anything more complicated than an SRS (such as stratified or
multistage sampling).

Data must be collected correctly (no bias). The margin of error covers only random sampling
errors. Under-coverage and non-response errors are not covered.

Outliers can have a big effect on the confidence interval. (This makes sense because we use the
mean and standard deviation to get a CI, recall both of these statistics are sensitive to extreme
observations).

You must know the standard deviation of the underlying population, .

If the sample size is small and the underlying distribution is not Normal the true confidence level
will be different from the value (1-) used in computing the interval.

Ex. 3: A questionnaire of drinking habits was given to a random sample of fraternity members, and
each student was asked to report the # of beers he had drunk in the past month. The sample of 30
students resulted in an average of 22 beers with a population standard deviation of 9 beers.
a)

Give a 90% confidence interval for the mean number of beers drunk by fraternity
members in the past month.

b) Is it true that 90% of the fraternity members each month drink the number of beers that
lie in the interval you found in part (a)? Explain your answer.

c) What is the margin of error for the 90% confidence interval?

d) How many students should you sample if you want a margin of error of 1 for a 90%
confidence interval?

Ex. 4: A random sample of 30 STAT 225 students Exam 1 scores yields a mean of 82.83. Assuming
the population standard deviation is 10:
a)

Find the 90% confidence interval for the mean score for STAT 225 students.

b) Find the 95% confidence interval.

c)

Find the 99% confidence interval.

d) How do the margins of error in (a), (b), and (c) change as the confidence level increases?
Why?

Ex. 5: An agronomist examines the cellulose content of a variety of alfalfa hay. Suppose that the
cellulose content in the population has standard deviation = 8 mg/g. A sample of 32 cuttings has mean
cellulose content X = 145 mg/g.
a) Give a 95% confidence interval for the mean cellulose content in the population.

Ex. 6: To assess the accuracy of a laboratory scale, a standard weight known to weigh 10 grams is
weighed repeatedly. The scale readings are Normally distributed with the unknown mean (which is 10
grams if the scale is unbiased). The standard deviation of the scale readings is known to be .0002 grams.
a) The weight is measured 5 times. The mean result is 10.0023 grams. Give a 98% confidence
interval for the mean of repeated measurements of the weight.

b) How many measurements must be averaged to get a margin of error of .0001 with 98%
confidence?

Confidence Intervals for Proportions


n

X Xj
j1

X n X j # Successes

n j 1 n
# Trials

0 p 1

Note: p is essentially a sample mean where all the data points in the sample take on the value 0 or 1.

p = 0 only when all trials result in failures. p = 1 only when all trials result in successes.

Q: How large of a SRS sample do we need in order to achieve accurate Normal approximations?

We will use the following rule of thumb:

n p 5 & n (1 p) 5

Of course we dont know p so this is impossible to check and will only

come into play when we hypothesize a particular value for p.

Mean, standard deviation, and standard error of p :

p E[ p ] p

p (1 p)
n

SE p

p (1 p)
n

Hence, if the rule of thumb is satisfied and SRS is used, by the CLT we have: p ~ N p ,

pp
~ N(0,1)
p (1 p)
n

p (1 p)

pp
D N(0,1) for large n
p (1 p)
n

100(1-)% Confidence interval for p:

p Z 2

p (1 p)
n

We may use this formula only when the rule of thumb for C.I.s is satisfied.
z*

1.645

1.96

2.576

0.9

0.95

0.99

As before it may be the case that some maximum allowable margin of error, m, is given and we want to
find the corresponding sample size needed to achieve a MOE m.

Z
n 2
m

p (1 p )

where p is some initial estimate of p, perhaps available from past

studies. If a good estimate is not available we can always take the most conservative approach and let
p =1/2.

Ex. 7: The South African mathematician John Kerrich, while a prisoner of war during WWII, tossed a coin
1,000 times and obtained 527 heads.
a)

Construct a 95% confidence interval for p the true probability of the coin landing on heads.

b)

How large of a sample size would John Kerrich need in order to construct a 95% confidence
interval with a margin of error less than .025?

Ex. 8: To obtain an estimate of the proportion, p, of New York City residents who feel that the quality of
life in New York City has become worse in the past few years, a telephone poll by Time/CNN revealed
that 686 out of 1,009 residents said that life has become worse.
a) Give a point estimate for p.

b) Find an approximate 98% confidence interval for p.

Ex. 9: A light bulb manufacturer sells a light bulb that has a mean life of 1,450 hours with a standard
deviation of 33.7 hours. A new manufacturing process is being tested and there is interest in knowing
the mean life of the new bulbs. How large a sample is required so that x 5 is a 95% confidence
interval for ? You may assume the change in the standard deviation is minimal.

Ex. 10: A standard 6-sided die has been loaded to change the probability of rolling a 6. In order to
estimate p, the new probability of rolling a 6, how many times must the die be rolled so that we are 99%
confident that the maximum error of the estimate of p is E = 0.02?

Ex. 11: Some college professors and students examined 137 Canadian Geese for patent schistosome in
the year they hatched. Of these 137 birds, 54 were infected. They were interested in estimating p, the
proportion of infected birds of this type. For future studies determine the sample size n so that the
estimate of p is within E = 0.04 of the unknown p with 90% confidence.

Ex. 12: Out of 1,000 welds that have been made on a tower, it is suspected that 15% of the welds are
defective. To estimate p, the proportion of defective welds, how many welds must be inspected to have
95% confidence, approximately, that the maximum error of the estimate of p is 0.04?

Ex. 13: A quality engineer wanted to be 98% confident that the maximum error of the estimate of the
mean strength, , of the left hinge on a vanity cover molded by a machine is 0.25. A preliminary sample
of size n =32 parts yielded a sample mean of x = 35.68 and a sample standard deviation of s = 1.723 .
Assuming s find the necessary sample size.

Ex. 14: A well-known bank credit card firm wishes to estimate the proportion of credit card holders who
carry a nonzero balance at the end of the month and incur an interest charge. Assume that the desired
margin of error is 0.03 at 98% confidence.
a) How large a sample should be selected if it is anticipated that roughly 79% of the firm's card holders
carry a nonzero balance at the end of the month?

b) How large a sample should be selected if no planning value for the proportion could be specified?

Ex. 15: Cincinnati/Northern Kentucky International Airport had the second highest on-time arrival rate
for 2005 among the nation's busiest airports (The Cincinnati Enquirer, February 3, 2006). Assume the
findings were based on 455 on-time arrivals out of a sample of 550 flights.
a) Develop a point estimate of the on-time arrival rate for the airport.

b) Construct and interpret a 95% confidence interval for the on-time arrival rate of the population of all
flights at the airport during 2005.

You might also like