Statistics Cheating Sheet

Module 2 Exam Cheating Sheet
Basics of statistics
Variables
Measures of central tendency
Measures of dispersion
Kurtosis
Statistical tests
-Continuous: interval (e.g.

temperature), ratio (e.g.
weight). More precision and
power
-Ordinal: eg. Glasgow scale
-Categorical: clinical significance
-Mean: influenced by outliers

and skewness
-Median
-Mode
-Range= greatest value

smallest value
-IQR= 25th 75th
-Standard deviation, variance
+ leptokurtic
0 mesokurtic (normal)
- platykurtic
-Parametric: normal distribution

-Non- parametric: non-normal
distribution
Skewness
+/- 0.5 (moderate), >1, <-1 (high), 0= normal distribution
A normal distribution has a

kurtosis of 3.
Key concepts
Shapiro-Wilk: test for assessing normality. H0: data comes from a normally distributed population.
Central limit theorem: in large data is ok to use parametric tests, even if the population is not normally distributed we assume it is normally distributed. (100
population is considered large).
Outcome: dependent
Explanatory variable: independent/predictor/factor (ANOVA)
P-value: the chance the results will be false positive (type I error).
P0.05 rejects null hypothesis. Statistically significant
P0.05 fails to reject null hypothesis. Not statistically significant.
Type I error: incorrect rejection of a true null hypothesis. False positive.
Type II error: failure to reject a false null hypothesis. False negative.
H0: null hypothesis statement that the effect described in the experimental hypothesis does not exist
Paired: dependent samples, different measurements in the same sample.
Unpaired: independent samples, one measurement in two different groups.
One-tailed: expected direction
Two tailed: expected difference
Homoskedasticity: variance among individuals/groups/residuals is equal
Parametric tests
Test
Hypothesis
Assumptions
Notes
T-test
H0: mean1=mean2 /mean1 mean2=0
ANOVA
H0: mean1=mean2=mean n
Independent
observations
Homoskedasticity
Pearson correlation
H1: at least one of the means is different

H0: Perason correlation coefficient= 0, no correlation /
coefficient 0, no positive correlation
Can be paired/unpaired and Two-tailed/onetailed

Bonferroni: Adjust p-value with the alpha 0.05
level
Linear regression
H0: slope (b coeff) is equal to zero, no effect, a change in the

predictor is not associate to a change in the response
Non-parametric tests
H1: slope (b coeff) is different from zero
r coefficient:
Little or No relationship: 0-0.25
Fair relationship 0.25-0.50
Moderate to good relationship: 0.50-0.75
Good to excellent relationship: above 0.75
Linearity,
homoscedasticity or
the residuals,
independence of
errors, normality of
residuals.
Linear regression equation/best fit line=

Y=a+bX= Intercept+ beta coefficient (X)
Least squares method: This method finds the
values of b that minimize the squared vertical
distance from the line to each of the points.
Residual: deviation from the population line
Pearson Chi-square
H0: No association between independent var and outcome
Fishers exact test
H0: No association between independent var and outcome
Spearman correlation
H0: rank correlation=0
Statistically significant correlation and

important/meaningful correlation are not the
same
Mann-Whitney/Wilcoxon H0: Group 1= group 2
Wilcoxon rank sum test: unpaired groups

Wilcoxon signed-rank test: paired groups
If sum is smaller or equal to critical value, there
is a significant difference between the 2 groups
Kruskal-Wallis
Same as ANOVA
Compares to critical value.
H0: mean1=mean2=mean n
H1: at least one of the means is different
Goodness of fit: compare observed with

expected values
Contingency table with values less than 5
Stata outputs and graphs

T-test
Two-sided p-value
ANOVA
Pearson Chi-square
One-sided p-value
Pearson correlation
Linear regression
p-value
Sample size
Parameters
Method
Pilot study
Minimally clinical
significant difference
Results from other trials
Power, (1 ): 80-90%
Significance level, (): 0.01/0.05
Variability of the observations:
Continuous: Expected mean
difference dispersion (SD)
Categorical: proportion in both
groups
Minimum expected difference (effect
size)/Smallest effect: Difference in
means or proportions divided by
standard deviation (Cohen's d).
Significance criterion (p-value): 0.05
Definition
Results of previous small

feasibility study
Calculation from similar studies
Average sample size
Small
Caculation based on clinical

experience
Consider a difference of X% to
be clinically relevant
Large
Advantages
Exactly the same design of

the actual study
Increase study power

Easy for statistical analysis
Disadvantages
Selection bias
Overestimation of tx
effect
Underestimation of
population variability
Waste of resources
Unnecesary exposure of
patients
High costs
Easier determination of the

difference between
interventions
The trials have to be as similar
as possible.
Sensitiviy analysis: see how sample size changes in response to changes in parameters.
When to increase sample size?
Decrease alpha: less type I error

Want more power: less type II error
Expecting a very small treatment/exposure effect: biologically reasonable, may be less clinically significant
Small-Medium
Survival analysis
Time to event
Functions
Survival function S(t):
probability of surviving at
least to time (t)
Hazard function h(t):
conditional probability of
dying at time (t) having
survived to that time.
The graph of S(t) against t
is called the survival curve.
Displays the cumulative
probability (the survival
probability) of an
individual remaining free
of the endpoint at any
time after baseline.
Hazard: probability of
dying (or experiencing the
event in question) given
that patients have survived
up to a given point in time,
or the risk for death at that
moment.
Kaplan-Meier
It doesn't give a pvalue to compare
statistically two
groups
Can say visually that
two groups are
significantly
different.
Can be used to
estimate the survival
curve from the
observed survival
times
Logrank test
Comparison of two
survival curves
H0: there is no
difference between the
population survival
curves
Includes censored cases
Takes time into account
Assumption:
proportional hazards
Test whether there is a
difference between the
survival times of
different groups but it
does not allow other
explanatory variables to
be taken into account.
Cox-proportional hazards
Analogous to a
multiple regression
model
Adjusting for
covariates and
confounders
Several independent
variables
The response
(dependent) variable
is the hazard.
Assumption: the
hazard ratio does not
depend on time
Can be expressed in
logarithms
Censoring
The survival time is called
censored if the event is not
observed by the end of the
study
Subjects for whom the
event has not occurred at
the end of the follow up
period
Lost-to-follow-up patients
during the study or patients
who underwent a
competing event (e.g.,
death from some other
disease or from an
accident).
Missing data
Type of data
Missing Completely at
Random (MCAR): not
related to the outcome
or to the independent
variables. Best case.
Missing at Random
(MAR): related to the
independent variables.
Can use ITT
Missing Not at Random
(MNAR): missing data is
related to outcome (the
worst case and there
are no simple and
effective methods of
addressing this type of
missing data).
Complete Case Analysis

(CCA) /Listwise Deletion
-Simplest
-Worst approach
-Patients who dropped
out are excluded from all
analysis
-Biased results:
randomization is lost
-Decreases study power,
increases type II error
-Used when dropouts are
balanced across
treatment groups
- Used when there is a
low rate of attrition
(dropouts)
-MCAR
Last Observation Carried

Forward (LOCF)
-After dropping out the
patients would have the
same constant value of that
last observed.
- Used with ITT, includes all
the study subjects in the final
analysis
-Mimics real life
-Accepted by FDA
-It can lead to biased
estimates, especially for
patients lost just after the
start of the trial and for
imbalanced losses
-Decrease the statistical
power
-Assumption: gradual
improvement of the
dependent variable from the
start of the trial till its end.
Baseline Observation
Carried Forward (BOCF)
-Missing data is
replaced with the
baseline value of the
patient
-Assumes that the
patients returns to his/
her baseline value
-Not commonly used
-Underestimate the
effects of treatment
-Bias towards the null
hypothesis
Regression substitution
Worst case scenario
-Missing values are

replaced by values
estimated from
regression models
performed on the nonmissing values
-Reduce bias
- Involves specific training
-Not commonly used,
questioned by reviewers
-Underestimates the
variance between groups
-MNAR
-Assumption: most
variables have complete
data for most participants
and they are strongly
correlated with the
variable with the missing
data
-Substitutes the missing

values for the less
favorable result.
-More trust in the study
results, especially when
the results are positive
-Cannot be used with high
dropouts
-It can underestimate the
true magnitude of
intervention effect
-Studies with
dichotomous outcome
-In a small number of
missing data
Meta-analysis
Pool size effect calculation
Continuous outcome: mean and standard deviation (inverse
variance method)
Categorical outcome: relative risk or odds ratio
Cumulative meta-analysis: it gives the changes in the effect size over time. Studies
are added one at a time and summarized as each new study is added.
Sensitivity analysis: it gives estimates with exclusion of one study at a time as to
assess whether the results are being changed/ driven by one study in particular.
Interpreting forest plot

Graphs
Funnel plot: publication bias.
Asymmetry: publication bias/
overestimation of treatment effect/
increased standard error
Diamond on the left: less episodes of outcome in tx group

Diamond on the right: more episodes of outcome in tx group
Diamond touches the line: no statistically significant difference
Diamond doesnt touch the line: statistically significant difference between
groups
Forest plot: summary of the data, effect

sizes.
Assessing Heterogeneity
Quantitative: Cochrans Q statistics
Qualitative: forest plot, funnel plot.
Covariate analysis: increases the efficiency of a given association between dependent and independent variable
Subgroup analysis: Assesses whether treatment effect (independent) on the dependent variable is different across two variables (e.g.
gender: males and females). Interaction test.

Statistics Cheating Sheet

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics Cheating Sheet

Uploaded by

Copyright:

Available Formats

Module 2 Exam Cheating Sheet

Measures of central tendency

-Continuous: interval (e.g.

-Mean: influenced by outliers

-Range= greatest value

-Parametric: normal distribution

A normal distribution has a

H0: mean1=mean2 /mean1 mean2=0

H1: at least one of the means is different

Can be paired/unpaired and Two-tailed/onetailed

H0: slope (b coeff) is equal to zero, no effect, a change in the

H1: slope (b coeff) is different from zero

Linear regression equation/best fit line=

H0: No association between independent var and outcome

Fishers exact test

H0: No association between independent var and outcome

H0: rank correlation=0

Statistically significant correlation and

Mann-Whitney/Wilcoxon H0: Group 1= group 2

Wilcoxon rank sum test: unpaired groups

Goodness of fit: compare observed with

Stata outputs and graphs

Results from other trials

Results of previous small

Calculation from similar studies

Average sample size

Caculation based on clinical

Exactly the same design of

Increase study power

Easier determination of the

When to increase sample size?

Decrease alpha: less type I error

Complete Case Analysis

Last Observation Carried

Worst case scenario

-Missing values are

-Substitutes the missing

Interpreting forest plot

Diamond on the left: less episodes of outcome in tx group

Forest plot: summary of the data, effect

You might also like