You are on page 1of 10

Module 2 Exam Cheating Sheet

Basics of statistics
Variables

Measures of central tendency

Measures of dispersion

Kurtosis

Statistical tests

-Continuous: interval (e.g.


temperature), ratio (e.g.
weight). More precision and
power
-Ordinal: eg. Glasgow scale
-Categorical: clinical significance

-Mean: influenced by outliers


and skewness
-Median
-Mode

-Range= greatest value


smallest value
-IQR= 25th 75th
-Standard deviation, variance

+ leptokurtic
0 mesokurtic (normal)
- platykurtic

-Parametric: normal distribution


-Non- parametric: non-normal
distribution

Skewness
+/- 0.5 (moderate), >1, <-1 (high), 0= normal distribution

A normal distribution has a


kurtosis of 3.

Key concepts
Shapiro-Wilk: test for assessing normality. H0: data comes from a normally distributed population.
Central limit theorem: in large data is ok to use parametric tests, even if the population is not normally distributed we assume it is normally distributed. (100
population is considered large).
Outcome: dependent
Explanatory variable: independent/predictor/factor (ANOVA)
P-value: the chance the results will be false positive (type I error).
P0.05 rejects null hypothesis. Statistically significant
P0.05 fails to reject null hypothesis. Not statistically significant.
Type I error: incorrect rejection of a true null hypothesis. False positive.
Type II error: failure to reject a false null hypothesis. False negative.
H0: null hypothesis statement that the effect described in the experimental hypothesis does not exist
Paired: dependent samples, different measurements in the same sample.
Unpaired: independent samples, one measurement in two different groups.
One-tailed: expected direction
Two tailed: expected difference
Homoskedasticity: variance among individuals/groups/residuals is equal

Parametric tests

Test

Hypothesis

Assumptions

Notes

T-test

H0: mean1=mean2 /mean1 mean2=0

ANOVA

H0: mean1=mean2=mean n

Independent
observations
Homoskedasticity

Pearson correlation

H1: at least one of the means is different


H0: Perason correlation coefficient= 0, no correlation /
coefficient 0, no positive correlation

Can be paired/unpaired and Two-tailed/onetailed


Bonferroni: Adjust p-value with the alpha 0.05
level

Linear regression

H0: slope (b coeff) is equal to zero, no effect, a change in the


predictor is not associate to a change in the response

Non-parametric tests

H1: slope (b coeff) is different from zero

r coefficient:
Little or No relationship: 0-0.25
Fair relationship 0.25-0.50
Moderate to good relationship: 0.50-0.75
Good to excellent relationship: above 0.75
Linearity,
homoscedasticity or
the residuals,
independence of
errors, normality of
residuals.

Linear regression equation/best fit line=


Y=a+bX= Intercept+ beta coefficient (X)
Least squares method: This method finds the
values of b that minimize the squared vertical
distance from the line to each of the points.
Residual: deviation from the population line

Pearson Chi-square

H0: No association between independent var and outcome

Fishers exact test

H0: No association between independent var and outcome

Spearman correlation

H0: rank correlation=0

Statistically significant correlation and


important/meaningful correlation are not the
same

Mann-Whitney/Wilcoxon H0: Group 1= group 2

Wilcoxon rank sum test: unpaired groups


Wilcoxon signed-rank test: paired groups
If sum is smaller or equal to critical value, there
is a significant difference between the 2 groups

Kruskal-Wallis

Same as ANOVA
Compares to critical value.

H0: mean1=mean2=mean n
H1: at least one of the means is different

Goodness of fit: compare observed with


expected values
Contingency table with values less than 5

Stata outputs and graphs


T-test

Two-sided p-value

ANOVA

Pearson Chi-square

One-sided p-value

Pearson correlation

Linear regression
p-value

Sample size
Parameters

Method

Pilot study

Minimally clinical
significant difference

Results from other trials

Power, (1 ): 80-90%
Significance level, (): 0.01/0.05
Variability of the observations:
Continuous: Expected mean
difference dispersion (SD)
Categorical: proportion in both
groups
Minimum expected difference (effect
size)/Smallest effect: Difference in
means or proportions divided by
standard deviation (Cohen's d).
Significance criterion (p-value): 0.05

Definition

Results of previous small


feasibility study

Calculation from similar studies

Average sample size

Small

Caculation based on clinical


experience
Consider a difference of X% to
be clinically relevant
Large

Advantages

Exactly the same design of


the actual study

Increase study power


Easy for statistical analysis

Disadvantages

Selection bias
Overestimation of tx
effect
Underestimation of
population variability

Waste of resources
Unnecesary exposure of
patients
High costs

Easier determination of the


difference between
interventions
The trials have to be as similar
as possible.

Sensitiviy analysis: see how sample size changes in response to changes in parameters.

When to increase sample size?

Decrease alpha: less type I error


Want more power: less type II error
Expecting a very small treatment/exposure effect: biologically reasonable, may be less clinically significant

Small-Medium

Survival analysis
Time to event

Functions
Survival function S(t):
probability of surviving at
least to time (t)
Hazard function h(t):
conditional probability of
dying at time (t) having
survived to that time.
The graph of S(t) against t
is called the survival curve.
Displays the cumulative
probability (the survival
probability) of an
individual remaining free
of the endpoint at any
time after baseline.
Hazard: probability of
dying (or experiencing the
event in question) given
that patients have survived
up to a given point in time,
or the risk for death at that
moment.

Kaplan-Meier
It doesn't give a pvalue to compare
statistically two
groups
Can say visually that
two groups are
significantly
different.
Can be used to
estimate the survival
curve from the
observed survival
times

Logrank test
Comparison of two
survival curves
H0: there is no
difference between the
population survival
curves
Includes censored cases
Takes time into account
Assumption:
proportional hazards
Test whether there is a
difference between the
survival times of
different groups but it
does not allow other
explanatory variables to
be taken into account.

Cox-proportional hazards
Analogous to a
multiple regression
model
Adjusting for
covariates and
confounders
Several independent
variables
The response
(dependent) variable
is the hazard.
Assumption: the
hazard ratio does not
depend on time
Can be expressed in
logarithms

Censoring
The survival time is called
censored if the event is not
observed by the end of the
study
Subjects for whom the
event has not occurred at
the end of the follow up
period
Lost-to-follow-up patients
during the study or patients
who underwent a
competing event (e.g.,
death from some other
disease or from an
accident).

Missing data
Type of data
Missing Completely at
Random (MCAR): not
related to the outcome
or to the independent
variables. Best case.
Missing at Random
(MAR): related to the
independent variables.
Can use ITT
Missing Not at Random
(MNAR): missing data is
related to outcome (the
worst case and there
are no simple and
effective methods of
addressing this type of
missing data).

Complete Case Analysis


(CCA) /Listwise Deletion
-Simplest
-Worst approach
-Patients who dropped
out are excluded from all
analysis
-Biased results:
randomization is lost
-Decreases study power,
increases type II error
-Used when dropouts are
balanced across
treatment groups
- Used when there is a
low rate of attrition
(dropouts)
-MCAR

Last Observation Carried


Forward (LOCF)
-After dropping out the
patients would have the
same constant value of that
last observed.
- Used with ITT, includes all
the study subjects in the final
analysis
-Mimics real life
-Accepted by FDA
-It can lead to biased
estimates, especially for
patients lost just after the
start of the trial and for
imbalanced losses
-Decrease the statistical
power
-Assumption: gradual
improvement of the
dependent variable from the
start of the trial till its end.

Baseline Observation
Carried Forward (BOCF)
-Missing data is
replaced with the
baseline value of the
patient
-Assumes that the
patients returns to his/
her baseline value
-Not commonly used
-Underestimate the
effects of treatment
-Bias towards the null
hypothesis

Regression substitution

Worst case scenario

-Missing values are


replaced by values
estimated from
regression models
performed on the nonmissing values
-Reduce bias
- Involves specific training
-Not commonly used,
questioned by reviewers
-Underestimates the
variance between groups
-MNAR
-Assumption: most
variables have complete
data for most participants
and they are strongly
correlated with the
variable with the missing
data

-Substitutes the missing


values for the less
favorable result.
-More trust in the study
results, especially when
the results are positive
-Cannot be used with high
dropouts
-It can underestimate the
true magnitude of
intervention effect
-Studies with
dichotomous outcome
-In a small number of
missing data

Meta-analysis
Pool size effect calculation
Continuous outcome: mean and standard deviation (inverse
variance method)
Categorical outcome: relative risk or odds ratio

Cumulative meta-analysis: it gives the changes in the effect size over time. Studies
are added one at a time and summarized as each new study is added.
Sensitivity analysis: it gives estimates with exclusion of one study at a time as to
assess whether the results are being changed/ driven by one study in particular.

Interpreting forest plot


Graphs
Funnel plot: publication bias.
Asymmetry: publication bias/
overestimation of treatment effect/
increased standard error

Diamond on the left: less episodes of outcome in tx group


Diamond on the right: more episodes of outcome in tx group
Diamond touches the line: no statistically significant difference
Diamond doesnt touch the line: statistically significant difference between
groups

Forest plot: summary of the data, effect


sizes.
Assessing Heterogeneity
Quantitative: Cochrans Q statistics
Qualitative: forest plot, funnel plot.

Covariate analysis: increases the efficiency of a given association between dependent and independent variable
Subgroup analysis: Assesses whether treatment effect (independent) on the dependent variable is different across two variables (e.g.
gender: males and females). Interaction test.

You might also like