Professional Documents
Culture Documents
Confidence Intervals
Hypothesis tests
January 19
Patrick Breheny
1/46
Introduction
Confidence Intervals
Hypothesis tests
Recap
Patrick Breheny
2/46
Introduction
Confidence Intervals
Hypothesis tests
3/46
Introduction
Confidence Intervals
Hypothesis tests
Patrick Breheny
4/46
Introduction
Confidence Intervals
Hypothesis tests
It turns out that the interval (1.9,3.5) does this job, with a
confidence level of 95%
We will discuss the nuts and bolts of constructing confidence
intervals often during the rest of the course
First, we need to understand what a confidence interval is
Why (1.9,3.5)? Why not (1.6,3.3)?
And what the heck does a confidence level of 95% mean?
Patrick Breheny
5/46
Introduction
Confidence Intervals
Hypothesis tests
Patrick Breheny
6/46
Introduction
Confidence Intervals
Hypothesis tests
5
4
3
2
1
Replications
Patrick Breheny
7/46
Introduction
Confidence Intervals
Hypothesis tests
5
4
3
2
1
Replications
Patrick Breheny
8/46
Introduction
Confidence Intervals
Hypothesis tests
Patrick Breheny
9/46
Introduction
Confidence Intervals
Hypothesis tests
Patrick Breheny
10/46
Introduction
Confidence Intervals
Hypothesis tests
Confidence levels
Patrick Breheny
11/46
Introduction
Confidence Intervals
Hypothesis tests
Amount of information
It is hopefully obvious that the more information you collect,
the less uncertainty you should have about the truth
Doing this experiment on thousands of children should allow
you to pin down the answer to a tighter interval than if only
hundreds of children were involved
It may be surprising that the interval is as wide as it is for the
polio study: after all, hundreds of thousands of children were
involved
However, keep in mind that a very small percentage of those
children actually contracted polio the 99.9% of children in
both groups who never got polio tell us very little about
whether the vaccine worked or not
Only about 200 children in the study actually contracted
polio, and these are the children who tell us how effective the
vaccine is (note that 200 is a lot smaller than 400,000!)
Patrick Breheny
12/46
Introduction
Confidence Intervals
Hypothesis tests
Precision of measurement
The final factor that determines the width of a confidence
interval is the precision with which things are measured
I mentioned that the diagnosis of polio is not black and white
misdiagnoses are possible
Every misdiagnosis increases our uncertainty about the effect
of the vaccine
As another example, consider a study of whether an
intervention reduces blood pressure
Blood pressure is quite variable, so researchers in such studies
will often measure subjects blood pressure several times at
different points in the day, then take the average
The average will be more precise than any individual
measurement, and they will reduce their uncertainty about the
effect of the treatment
Patrick Breheny
13/46
Introduction
Confidence Intervals
Hypothesis tests
Patrick Breheny
14/46
Introduction
Confidence Intervals
Hypothesis tests
Patrick Breheny
15/46
Introduction
Confidence Intervals
Hypothesis tests
Patrick Breheny
16/46
Introduction
Confidence Intervals
Hypothesis tests
1.0
1.5
2.0
3.0
95
8
2.5
0.
0.
0.
0.
0.
99
3.5
4.0
Patrick Breheny
17/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Patrick Breheny
18/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Hypotheses
The specific value corresponds to a certain hypothesis about
the world
For example, in our polio example, a ratio of 1 corresponded
to the hypothesis that the vaccine provides no benefit or harm
compared to placebo
This specific value of interest is called the null hypothesis
(null referring to the notion that nothing is different
between the two groups the observed differences are entirely
due to random chance)
The goal of hypothesis testing is to weigh the evidence and
deliver a number that quantifies whether or not the null
hypothesis is plausible in light of the data
Patrick Breheny
19/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
p values
All hypothesis tests are based on calculating the probability
of obtaining results as extreme or more extreme than
the one observed in the sample, given that the null
hypothesis is true
This probability is denoted p and called the p-value of the test
The smaller the p-value is, the stronger the evidence against
the null:
A p-value of 0.5 says that if the null hypothesis was true, then
we would obtain a sample that looks like the observed sample
50% of the time; the null hypothesis looks quite reasonable
A p-value of 0.001 says that if the null hypothesis was true,
then only 1 out of every 1,000 samples would resemble the
observed sample; the null hypothesis looks doubtful
Patrick Breheny
20/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Patrick Breheny
21/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Patrick Breheny
22/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Patrick Breheny
23/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Patrick Breheny
24/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Patrick Breheny
25/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Patrick Breheny
26/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Patrick Breheny
27/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Conclusion
In general, then, confidence levels and hypothesis tests lead to
similar conclusions
For example, in our polio example, both methods indicated
that the study provided strong evidence that the vaccine
reduced the probability of contracting polio well beyond what
you would expect by chance alone
This is a good thing it would be confusing otherwise
However, the information provided by each technique is
different: the confidence interval is an attempt to provide
likely values for a parameter of interest, while the hypothesis
test is an attempt to measure the evidence against the
hypothesis that the parameter is equal to a certain, specific
number
Patrick Breheny
28/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
p-value cutoffs
Patrick Breheny
29/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Types of error
Patrick Breheny
30/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
p > (accept)
p < (reject)
Null hypothesis
True
False
Correct
Type II error
Type I error Correct
Patrick Breheny
31/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Patrick Breheny
32/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Significance
The proper balance of these two sorts of errors certainly
depends on the situation and the type of research being
conducted
That being said, the scientific community generally starts to
be convinced at around the p = .01 to p = .10 level
The term statistically significant is often used to describe
p-values below .05; the modifiers borderline significant
(p < .1) and highly significant (p < .01) are also used
However, dont let these clearly arbitrary cutoffs distract you
from the main idea that p-values measure how far off the data
are from what the theory predicts a p-value of .04 and a
p-value of 0.000001 are not at all the same thing, even though
both are significant
Patrick Breheny
33/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
p-value misconceptions
Patrick Breheny
34/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Reporting p-values
Patrick Breheny
35/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Patrick Breheny
36/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Interpretation
Patrick Breheny
37/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Conditional probability
Patrick Breheny
38/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Patrick Breheny
39/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Hypothetical example
Patrick Breheny
40/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Real example
You may be thinking, thats clearly ridiculous; no one would
reach such a conclusion in real life
Unfortunately, you would be mistaken: this happens all the
time
As an example, the Womens Health Initiative found that
low-fat diets reduce the risk of breast cancer with a p-value of
.07
The New York Times headline: Study finds low-fat diets
wont stop cancer
The lead editorial claimed that the trial represented strong
evidence that the war against fats was mostly in vain, and
sounded the death knell for the belief that reducing the
percentage of total fat in the diet is important for health
Patrick Breheny
41/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Patrick Breheny
42/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Patrick Breheny
43/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Nexium
As an example of statistical vs. clinical significance, consider
the story of Nexium, a heartburn medication developed by
AstraZeneca
AstraZeneca originally developed the phenomenally successful
drug Prilosec
However, with the patent on the drug set to expire, the
company modified Prilosec slightly and showed that for a
condition called erosive esophagitis, the new drugs healing
rate was 90%, compared to Prilosecs 87%
Because the sample size was so large (over 5,000), this finding
was statistically significant, and AstraZeneca called the new
drug Nexium
Patrick Breheny
44/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
Nexium (contd)
Patrick Breheny
45/46
Introduction
Confidence Intervals
Hypothesis tests
Introduction
Confidence intervals and hypothesis tests
Significance
p-value misconceptions
46/46