Professional Documents
Culture Documents
Lecture 1
John J. Chen, Ph.D. Assistant Professor in Biostatistics Stony Brook University School of Medicine
Biostatistics for Fellows and Residents of GCRC & Surgery Dept. December 1, 2004
Outline of Lecture 1
2. 3. 4. 5. 6. 7. 8. 9. The importance of medical statistics The goal of statistics How to obtain a good sample? Types of data Descriptive statistics Graphical ways of presenting data Normal distribution Area under the curve
Objectives of Lecture 1
1. To be able to define statistics and to distinguish between population and sample 2. To be able to use real life examples to illustrate the goal of statistics 3. To be able to define and calculate different descriptive statistics 4. To be able to calculate AUC of a normal distribution
Warning Signs
Reader
Researcher
Medical students may not like statistics, but as doctors they will.
Martin Bland, Letter to the Editor, 1998. BMJ; 316:1674.
Medical students may not like statistics, but as good doctors they will have to understand statistics.
John Chen, 2004, Advice to GCRC & Surgery Fellows and Residents
Definition of Statistics
The theory and methodology for study design and for describing, analyzing, and interpreting data generated from such studies.
Probability
Descriptive Statistics
Descriptive Statistics
Population
Parameters (, )
Sample
Statistics X ( , S)
Sampling Techniques
1. Simple random sample 2. Stratified sample 3. Systematic sample 4. Cluster sample 5. Convenience sample
Quantitative
Numeric Continuous
Measures of variability
Standard deviation Variance Range
Xi
n N
i=1
( sample mean )
Xi
( population mean )
If values are in ascending order, the median is: - the (n +1)/2 term, if n is an odd number - the average of the (n/2)th and (n/2+1)th terms, if n is an even number
Unimodal Multimodal
Measures of Variability
1. Standard deviation (SD)
S=
( sample SD )
N (X i - ) 2 = i=1 N
( population SD )
s = sample SD
2 2
2= population SD
An Example
Consider the following values {2, 3, 6, 9, 2} and calculate the following:
Mean? Median? Mode? Range? SD? Variance?
An Example (cont.)
{2, 3, 6, 9, 2}
Mean = n
i=1
Xi
2+2+3+6+9 = 5 = 4.4
An Example (cont.)
{2, 3, 6, 9, 2} Order from low to high { 2, 2, 3, 6, 9 }
n + 1 term = 5 + 1 Median=
2 2 2, 2, 3, 6, 9
=3
rd
term
An Example (cont.)
{2, 3, 6, 9, 2} Mode = value occurring most frequently =2
An Example (cont.)
{2, 3, 6, 9, 2} Order from low to high: { 2, 2, 3, 6, 9 } Range = highest value lowest value = 9-2 = 7
An Example (cont.)
{2, 3, 6, 9, 2} Standard deviation =
=
i=1 (X i - X) / ( n 1 )
2
An Example (cont.)
{2, 3, 6, 9, 2} Variance = (Standard deviation) = 3.05 = 9.30
2 2
SEX Frequency 496 145 641 Percent 77.4 22.6 100.0 Cumulative Percent 77.4 100.0
Valid
F M Total
500
400
300
200
Count
100 F M
SEX
60
40
N = 641.00
0 0. 22 0 0. 20 0 0. 18 0 0. 16 0 0. 14 0 0. 12 0 0. 10
200
100
-100
N= 496 145
SEX
Total
100
Height (cm)
Weight (kg)
- 2
+ 2
Bell shaped curve with highest point at Symmetric about Unimodal Continuous distribution Approaches horizontal axis but never touches
- 2
+ 2
-2
-1
+1
+2
-Z
=0
.0147 .0111 .0083 .0061 .0045 .0033
.0193
-1
=0
Z
.0 . . . .9 1.0 1.1 . . .
0.00
1.000 . . . .3681 .3173 .2713 . . .
0.01
.9920 . . . .3628 .3125 .2670 . . .
0.02
.9840 . . . .3576 .3077 .2627 . . .
.
. . . . . . . . . .
.
. . . . . . . . . .
.
. . . . . . . . . .
0.09
. . . . . . . . . .
0.3173 2
0.3173 2
-1
=0
0.3173 2
0.3173 2
-1
=0
-2
=0
Z
.0 . . . 1.9 2.0 2.1 . . .
0.00
1.000 . . . .0574 .0455 .0357 . . .
0.01
.9920 . . . .0561 .0444 .0349 . . .
0.02
.9840 . . . .0549 .0434 .0340 . . .
.
. . . . . . . . . .
.
. . . . . . . . . .
.
. . . . . . . . . .
0.09
. . . . . . . . . .
0.0455 2
0.0455 2
-2
=0
0.0455 2
0.0455 2
-2
=0
=65 pg/ml
40
65
90
-2
Z
.0 . . . 1.9 2.0 2.1 . . .
0.00
1.000 . . . .0574 .0455 .0357 . . .
0.01
.9920 . . . .0561 .0444 .0349 . . .
0.02
.9840 . . . .0549 .0434 .0340 . . .
.
. . . . . . . . . .
.
. . . . . . . . . .
.
. . . . . . . . . .
0.09
. . . . . . . . . .
10
-3
Z
.0 . . . 2.9 3.0 3.1 . . .
0.00
1.000 . . . .0037 .0027 .0019 . . .
0.01
.9920 . . . .0036 .0026 .0019 . . .
0.02
.9840 . . . .0035 .0025 .0018 . . .
.
. . . . . . . . . .
.
. . . . . . . . . .
.
. . . . . . . . . .
0.09
. . . . . . . . . .
2 = 0.00135 4 10
0.0027
-3
12 - 10 = 1 2
10
12
Z
.0 . . . .9 1.0 1.1 . . .
0.00
1.000 . . . .3681 .3173 .2713 . . .
0.01
.9920 . . . .3628 .3125 .2670 . . .
0.02
.9840 . . . .3576 .3077 .2627 . . .
.
. . . . . . . . . .
.
. . . . . . . . . .
.
. . . . . . . . . .
0.09
. . . . . . . . . .
12 - 10= 1 2
= 0.3173 2 = 0.1587
10
12
10
14
-1
Z
.0 . . 1.0 . . . 2.0 . .
0.00
1.000 . . .3173 . . . .0455 . .
0.01
.9920 . . .3125 . . . .0444 . .
0.02
.9840 . . .3077 . . . .0434 . .
.
. . . . . . . . . .
.
. . . . . . . . . .
.
. . . . . . . . . .
0.09
. . . . . . . . . .
= 0.3173 2 = 0.1587 8 10 14 -1 0
Review of Lecture 1
2. 3. 4. 5. 6. 7. 8. 9. The importance of medical statistics The goal of statistics How to obtain a good sample? Types of data Descriptive statistics Graphical ways of presenting data Normal distribution Area under the curve
Achieving Objectives
1. To be able to define statistics and to distinguish between population and sample 2. To be able to use real life examples to illustrate the goal of statistics 3. To be able to define and calculate different descriptive statistics 4. To be able to calculate AUC of a normal distribution
Next Month
2. Goals of statistics; descriptive statistics; normal distribution; AUC 3. Sampling distribution; CI; hypothesis testing; p-value; power 4. Common statistical tests: one sample ttest, two independent sample t-test, two paired sample t-test, chi-sq. test, Fishers exact test
Three Lectures
2. Goals of statistics; descriptive statistics; normal distribution; AUC 3. Sampling distribution; CI; hypothesis testing; p-value; power 4. Common statistical tests: one sample ttest, two independent sample t-test, two paired sample t-test, chi-sq. test, Fishers exact test Lecture notes:
http://ms.cc.sunysb.edu/~jjchen
Probability
Descriptive Statistics
Descriptive Statistics
Population
Parameters (, )
Sample
Statistics X ( , S)
-2
-1
+1
+2
An Example
Establishing Serum Creatinine Reference Range 200 healthy volunteers of age 25 to 35 yrs old were evaluated for sCr. The values (mg/dL) follow approximately a normal distribution, N(1.2, 0.22). What will be a 95% (middle range) reference (or normal) range? Sol.: Z = (X - ) / X = + Z*
Therefore, X(low) = 1.2 - 1.96*0.2 = 0.8 mg/dL X(high) = 1.2 + 1.96*0.2 = 1.6 mg/dL
Outline of Lecture 2
1. 2. 3. 4. 5. 6. 7. Sampling distribution Central Limit Theorem Confidence interval t-distribution Hypothesis testing Types I & II errors, statistical power p-value
Lecture notes:
http://ms.cc.sunysb.edu/~jjchen
Objectives of Lecture 2
1. To describe sampling distribution and Central Limit Theorem, and comprehend their importance in Statistics 2. To correctly construct and interpret confidence intervals for population means 3. To describe basic steps of hypothesis testing, using real life examples 4. To correctly define type I & II errors, statistical power, effect size 5. To correctly interpret p-value of a statistical test
Sampling Distribution
The distribution of individual observations versus the distribution of sample means
population
X ~ N (0,1) / n
2. Given the population has mean , then the mean of X = the sampling distribution, 3. if the population has variance 2, the standard deviation of the sampling distribution, or the standard error (a measure of the amount of sampling error) is s.e.( X ) = = X n
Confidence Intervals
95% CI for ? Prob ( ?? < < ??) = 0.95
Z .0 . . . 1.8 1.9 2.0 . . . 0.00 1.000 . . . .0719 .0574 .0455 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.05 .9601 . . . .0643 .0512 .0404 . . . 0.06 .9522 . . . .0629 .0500 .0394 . . .
Prob ( X 1.96
< < X + 1. 96
) = 0.95
1.96
Confidence Intervals
95% Confidence Interval for :
X 1. 96 n
Definition 1: You can be 95% sure that the true mean ( ) will fall within the upper and lower bounds. Definition 2: 95% of the intervals constructed using sample means ( x ) will contain the true mean ( ).
Confidence Interval
A simulation demo:
http://www.ruf.rice.edu/%7Elane/stat_sim/conf_interval/index.html
Confidence Intervals
90% CI for ?
(Two-Sided Tail Probabilities of the Normal Curve)
Z . . . 1.5 1.6 1.7 . . . . 0.00 . . . .1336 .1096 .0891 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0.04 . . . .1236 .1010 .0819 . . . . 0.05 . . . .1211 .0989 .0801 . . . . 0.06 . . . .1188 .0969 .0784 . . . . . . . . . . . . . . . . . . . . . . . . . . 0.09 . . . .1118 .0910 .0735 . . . .
Confidence Intervals
CIs for : 90% CI : x 1.65 ( / 95% CI : x 1.96 ( / 99% CI : x 2.58 ( / n) n) n)
Confidence Intervals
Problem:
A fellow wanted to determine the average serum creatinine level among healthy elderly male subjects from Stony Brook village. From the literature he found that the standard deviation of serum creatinine is around 0.15 mg/dL for various studied patient groups. But he could not find any information about the of serum creatinine among local elderly males. The fellow decided to measure 30 healthy elderly male volunteers from Stony Brook, and the average creatinine level was 0.94 mg/dL. What is the 95% CI for ?
Confidence Intervals
Solution : 95% CI = x 1.96 ( / n)
Confidence Intervals
Problem: Total knee replacement usually requires a few days of hospital stay after the surgery. The length of stay was recorded for 90 patients with total knee replacement at Hospital XYZ. The sample mean was 4.20 days and the sample s.d.=1.05. Construct a 90% CI for the population mean length of hospital stay for total knee replacement.
Confidence Intervals
Solution : As n=90 is relatively large, sample s.d. can be used to approximate population s.d. 90% CI = X 1.65 ( / n)
The t - Distribution
WilliamS.Gosset (1876 1937)
A small sample from normal distribution Unknown population standard deviation, x- t= with n -1 degrees of freedom s/ n
The (Students) t-distribution is very similar to normal distribution, with heavier tails.
2Q (Q)
0.10 (0.05)
6.3138 2.9200 2.3534 2.1318 2.0151 1.9432 1.8946 1.8595 1.8331 1.8125 1.7959 1.7823 1.7709 1.7613 1.7530 1.7459 1.7396 1.7341 1.7291 1.7247 1.7207 1.7171 1.7139 1.7109 1.7081 1.7056 1.7033 1.7011 1.6991 1.6973
0.05 (0.025)
12.706 4.3026 3.1825 2.7764 2.5706 2.4469 2.3646 2.3060 2.2621 2.2281 2.2010 2.1788 2.1604 2.1448 2.1314 2.1199 2.1098 2.1009 2.0930 2.0860 2.0796 2.0739 2.0687 2.0639 2.0595 2.0555 2.0518 2.0484 2.0452 2.0423
0.01 (0.005)
63.657 9.9251 5.8408 4.6040 4.0323 3.7075 3.4995 3.3555 3.2498 3.1693 3.1057 3.0545 3.0122 2.9768 2.9467 2.9207 2.8982 2.8784 2.8609 2.8453 2.8313 2.8187 2.8073 2.7969 2.7874 2.7787 2.7707 2.7632 2.7564 2.7500
0.005 (0.0025)
127.32 14.0911 7.4533 5.5980 4.7734 4.3169 4.0293 3.8326 3.6895 3.5814 3.4967 3.4285 3.3726 3.3258 3.2862 3.2521 3.2226 3.1967 3.1738 3.1535 3.1353 3.1189 3.1041 3.0906 3.0783 3.0670 3.0566 3.0470 3.0382 3.0299
0.001 (0.0005)
636.62 31.6075 12.9258 8.6087 6.8701 5.9590 5.4088 5.0421 4.7805 4.5871 4.4374 4.3184 4.2215 4.1412 4.0735 4.0157 3.9659 3.9224 3.8841 3.8502 3.8200 3.7928 3.7683 3.7461 3.7258 3.7073 3.6903 3.6746 3.6601 3.6466
Confidence Intervals
Problem:
A fellow wanted to determine the average serum creatinine level among healthy elderly adult male subjects from Stony Brook village. From the literature she could not find any information on on or of sCr among local healthy elderly males. She measured 15 health elderly male volunteers from Stony Brook and the sample mean sCr is 0.94 mg/dL with a sample standard deviation of 0.15 mg/dL. What should be the 95% CI for ?
Critical t-value
(tail probabilities of the t-distributions)
Degrees of Freedom 1 . . 2Q (Q) 0.10 (0.05) . . . 1.7709 1.7613 1.7530 . . 0.05 (0.025) . . . 2.1604 2.1448 2.1314 . . 0.01 (0.005) . . . 3.0122 2.9768 2.9467 . . 0.005 (0.0025) . . . . . . . . 0.001 (0.0005) . . . . . . . .
13 14 15
. .
Confidence Intervals
Solution : 95% CI for = x 2.14 ( s / = 0.94 2.14*(0.15 / = 0.94 0.08 = (0.86, 1.02) 15 )
n )
Hypothesis Testing
Example: As the serum creatinine normal range depends on the population studied. A fellow wanted to evaluate the mean serum creatinine among adult males living in Stony Brook. From the literature she found that one well-established study showed an average of sCr of 1.18 mg/dL for adult males living on the west coast. But based on her knowledge and experience, she believes that the of sCr among local adult males should be different. She decided to check this by measuring sCr for 49 local adult male volunteers.
Hypothesis Testing
Basic steps of hypothesis testing:
1. State null (H0: ) and alternative (H1:) hypotheses 2. Choose a significance level, (usually 0.05 or 0.01) 3. Determine the critical (or rejection) region and the non-rejection region, based on the sampling distribution 4. Based on the sample, calculate the test statistic and compare it with the critical values 5. Make a decision, and state the conclusion
Critical value
standard deviation is 0.15 mg/dL (based on literature for othe similar studies). Step 1. State H0: and H1: H0 : sCr= 1.18 vs. H1 : sCr 1.18.
X sCr 1 .22 1 .18 Z= = = 1 .87 . / n 0. 15 / 49
Statistical Decision
Type II error:
Power: 1-
Statistical Decision
Truth H0 True Reject H0 Decision Not reject H0 1- H0 False 1-
Note: Statistically significant does not necessarily mean biologica (or clinically) significant!!!
p-Values
Interpretation: The p-value is the probability of obtaining a result as extreme or more extreme than the one observed based on the current sample, given the null hypothesis is true.
p-Values
Stony Brook Adult Male sCr Example: H0: =1.18, = 0.15, X=1.22
p-value =?
p-value =?
1.18
1.22
1.87
Z
0.0 . . . . 1.7 1.8 . . .
. . . . . . . . . . .
0.07 . . . . . . 0.061 . . .
. . . . . . . . . . .
0.09 . . . . . . . . . .
p-Values
Stony Brook Adult Male sCr Example H0: =1.18, = 0.15, X=1.22
p-value = 0.061
-1.87
1.87
p-Values
What if X=1.23? H0: =1.18, = 0.15, n=49.
X sCr 1.23 1 .18 Z= = = 2.31 . / n 0 .15 / 49
p-value =?
p-value =?
1.18
1.23
2.31
Review of Lecture 2
1. 2. 3. 4. 5. 6. 7. Sampling distribution Central Limit Theorem Confidence interval t-distribution Hypothesis testing Types I & II errors, statistical power p-value
Achieving Objectives
1. To describe sampling distribution and Central Limit Theorem, and comprehend their importance in Statistics 2. To correctly construct and interpret confidence intervals for population means 3. To describe basic steps of hypothesis testing, using real life examples 4. To correctly define type I & II errors, statistical power, effect size 5. To correctly interpret p-value of a statistical test
Next Month
2. Goals of statistics; descriptive statistics; normal distribution; AUC 3. Sampling distribution; CI; hypothesis testing; p-value; power 4. Common statistical tests: one sample ttest, two independent sample t-test, two paired sample t-test, chi-sq. test, Fishers exact test Lecture notes:
http://ms.cc.sunysb.edu/~jjchen
Three Lectures
2. Goals of statistics; descriptive statistics; normal distribution; AUC 3. Sampling distribution; CI; hypothesis testing; p-value; power 4. Common statistical tests: one sample ttest, two independent sample t-test, two paired sample t-test, chi-sq. test, Fishers exact test
Outline of Lecture 3
1. 2. 3. 4. 5. 6. 7. 8. Hypothesis testing Types I & II errors, statistical power p-value One-sample t-test Two independent samples t-test Two paired samples t-test Chi-squared test & Fishers exact test Local biostatistical resources
Lecture notes:
http://ms.cc.sunysb.edu/~jjchen
X ~ N (0,1) / n
2. Given the population has mean , then the mean of X = the sampling distribution, 3. if the population has variance 2, the standard deviation of the sampling distribution, or the standard error (a measure of the amount of sampling error) is s.e.( X ) = = X n
Hypothesis Testing
Example: As the serum creatinine normal range depends on the population studied. A fellow wanted to evaluate the mean serum creatinine among adult males living in Stony Brook. From the literature she found that one well-established study showed an average of sCr of 1.18 mg/dL for adult males living on the west coast. But based on her knowledge and experience, she believes that the of sCr among local adult males should be different. She decided to check this by measuring sCr for 49 local adult male volunteers.
Critical value
Power:
Statistical Decision
Design factors: - effect size - power - alpha level - std. dev. - sample size (1- ) Non-rejection region Power: 1- Type II error:
p-Values
Interpretation: The p-value is the probability of obtaining a result as extreme or more extreme than the one observed based on the current sample, given the null hypothesis is true.
p-Values
Stony Brook Adult Male sCr Example: H0: =1.18, = 0.15, X=1.22
p-value =?
p-value =?
1.18
1.22
1.87
Z
0.0 . . . . 1.7 1.8 . . .
. . . . . . . . . . .
0.07 . . . . . . 0.061 . . .
. . . . . . . . . . .
0.09 . . . . . . . . . .
p-Values
Stony Brook Adult Male sCr Example H0: =1.18, = 0.15, X=1.22
p-value = 0.061
-1.87
1.87
p-Values
What if X=1.23? H0: =1.18, = 0.15, n=49.
X sCr 1.23 1 .18 Z= = = 2.31 . / n 0 .15 / 49
p-value =?
p-value =?
1.18
1.23
2.31
88
100
112
-2.0
2.0
t 24, 0.05 = ?
Critical t Value
(tail probabilities of the t-distributions)
Degrees of Freedom 1 . . 2Q (Q) 0.10 (0.05) . . . 0.05 (0.025) . . . 0.01 (0.005) . . . . . . . . 0.005 (0.0025) . . . . . . . . 0.001 (0.0005) . . . . . . . .
23 24 25
. .
t=
112 100 30 / 25
Population 2
(2 , )
Sx
2
x2 = Sp
n1
n2
2
s (n 1) + s (n 1) Sp = n +n 2
1
20.1 18.9
Sx1
2
x2
1.2
Sx1
x2 = 65.97
= 832.59 + 618.75 22
1 1 = 3.3 + 12 12
t = 1.2 = 0.36
3.3
t 0.05, 22 = ?
t Table
(tail probabilities of the t-distributions)
Degrees of Freedom 1 . . 2Q (Q) 0.10 (0.05) . . . 0.05 (0.025) . . . 0.01 (0.005) . . . . . . . . 0.005 (0.0025) . . . . . . . . 0.001 (0.0005) . . . . . . . .
21 22 23
. .
-2.07
0 0.36
2.07
t=
t 0.05, 22 = 2.07
Same subject for both treatments: -- placebo (X1) versus active (X2) -- before (X1) versus after (X2) Intra individual comparison, e.g., left (X1) versus right (X2)
Approach: reduce data to one sample t-test problem. First, calculate the difference, d = X2 - X1 , for each subject; then, perform one sample t-test on the d scores, with d.f.=n-1.
t= d-0 Sd
Paired t - Test
Problem: Does the medication significantly lower blood pressure?
Subject 1 2 3 4 5 Reaction to Placebo 150 180 148 172 160 Reaction to Med. 130 148 126 150 136
Paired t - Test
Subject 1 2 3 4 5 Total Reaction to Placebo 150 180 148 172 160 Reaction to Medication 130 148 126 150 136
d
20 32 22 22 24 120
Paired t - Test
Solution: Sd = di i d= n
2
120 = = 24 5
=
( d d) n-1
22
Critical t value
5-1, 0.05
=?
(tail probabilities of the t-distributions)
2Q (Q) 0.10 (0.05) . . 0.05 (0.025) . . 0.01 (0.005) . . 0.005 (0.0025 ) . . . . . . . 0.001 (0.0005) . . . . . . .
Degrees of Freedom 1 .
3 4 5 .
.
Paired t - Test
Tests
2
0.100
2.7055 4.6052 6.2514 7.7795 9.2363 10.6447 12.0171 13.3616 14.6836 15.9872 17.2750 18.5493 19.8120 21.0641 22.3071 23.5418 24.7690 25.9894 27.2036 28.4120 29.6151 30.8133 32.0069 33.1962 34.3816
0.050
3.8415 5.9915 7.8147 9.4877 11.0705 12.5916 14.0671 15.5073 16.9190 18.3071 19.6751 21.0261 22.3621 23.6848 24.9958 26.2962 27.5871 28.8693 30.1435 31.4104 32.6706 33.9244 35.1725 36.4151 37.6525
0.010
6.6349 9.2102 11.3447 13.2768 15.0864 16.8118 18.4751 20.0900 21.6658 23.2095 24.7250 26.2170 27.6882 29.1409 30.5778 31.9998 33.4086 34.8052 36.1912 37.5660 38.9321 40.2893 41.6383 42.9797 44.3144
0.005
7.8944 10.5963 12.8383 14.8605 16.7495 18.5479 20.2776 21.9549 23.5891 25.1886 26.7569 28.2999 29.8195 31.3198 32.8014 34.2675 35.7186 37.1562 38.5823 39.9970 41.4017 42.7955 44.1808 45.2291 46.9280
0.001
10.828 13.8173 16.2672 18.4667 20.5165 22.4599 24.3219 26.1237 27.8768 29.5871 31.2628 32.9099 34.5283 36.1258 37.6973 39.2520 40.7908 42.3131 43.8206 45.3141 46.7982 48.2678 49.7262 51.1831 52.6165
Tests
2
Goodness of Fit observed frequencies on a single variable are compared with a corresponding set of expected values ( or theoretical frequencies) (OE) = E
2 2
Tests
2
Tests
2
2 = 1.6+0.4+0.4+2.5+1.6+2.5 = 9.0,
with d.f.= (# of categ.) 1 = 6 1 = 5. From Chi-sq. Table, 25, 0.10= 9.2363, i.e., do not reject the null hypothesis, and the p-value is about 0.15.
Tests
2
Test of Independence two categorical variables are involved, and the observed and expected frequencies are compared. Here the expected frequencies are those the researcher would expect if the two variables were independent of each other.
Observed:
R1 R2
C1 A B A+B
Tests of Independence
2
Observed:
DRE+ BIOP+ BIOP50 10 60 DRE20 20 40 70 30 100
2 Tests of Independence
Solution:
O 50 10 20 20 E ? ? ? ? 50 10 60 20 20 40 70 30 100
60 * 70 E= 100 = 42
2 Tests of Independence
Solution:
O 50 10 20 20 E 42 18 28 12 (O-E) 8 -8 -8 8 (O-E) 64 64 64 64
2
2 Table
0.005 7.8944
. . . . . .
0.001 10.828
. . . . . .
1 2 3 4 . .
.
Therefore, the results from the two are not independent (p < 0.001).
Poured First
Milk Tea
0 4
4 0
Milk Tea
3 1
1 3
4 0
0 4
Therefore, the experiment did not establish a significant association between the actual order of pouring and the womans guess.
A Summary of Statistics
Sampling Population Parameters (, ) Probability
Descriptive Statistics Sample Statistics (X, S)
Normal, t, and Chi-sq Sampling distribution Central Limit Theorem CI, hypothesis testing p-value, Type I & II error
Categorical variable
One categorical variable Chi-sq GOF test Two categorical variables Chi-sq independence test Fishers exact test( exp.)
If a mixture of cont. and categ. indep. variables, General linear models Time to event data Survival analysis If no distribution assumptions, Non-parametrics Others:
Mixture of independent variables Logistic regression (binary outcome) longitudinal analysis, factor analysis,
Review of Lecture 3
1. 2. 3. 4. 5. 6. 7. 8. Hypothesis testing Types I & II errors, statistical power p-value One-sample t-test Two independent samples t-test Two paired samples t-test Chi-squared test & Fishers exact test Local biostatistical resources
Lecture notes:
http://ms.cc.sunysb.edu/~jjchen