Professional Documents
Culture Documents
Basics of statistics
Variables
Measures of dispersion
Kurtosis
Statistical tests
+ leptokurtic
0 mesokurtic (normal)
- platykurtic
Skewness
+/- 0.5 (moderate), >1, <-1 (high), 0= normal distribution
Key concepts
Shapiro-Wilk: test for assessing normality. H0: data comes from a normally distributed population.
Central limit theorem: in large data is ok to use parametric tests, even if the population is not normally distributed we assume it is normally distributed. (100
population is considered large).
Outcome: dependent
Explanatory variable: independent/predictor/factor (ANOVA)
P-value: the chance the results will be false positive (type I error).
P0.05 rejects null hypothesis. Statistically significant
P0.05 fails to reject null hypothesis. Not statistically significant.
Type I error: incorrect rejection of a true null hypothesis. False positive.
Type II error: failure to reject a false null hypothesis. False negative.
H0: null hypothesis statement that the effect described in the experimental hypothesis does not exist
Paired: dependent samples, different measurements in the same sample.
Unpaired: independent samples, one measurement in two different groups.
One-tailed: expected direction
Two tailed: expected difference
Homoskedasticity: variance among individuals/groups/residuals is equal
Parametric tests
Test
Hypothesis
Assumptions
Notes
T-test
ANOVA
H0: mean1=mean2=mean n
Independent
observations
Homoskedasticity
Pearson correlation
Linear regression
Non-parametric tests
r coefficient:
Little or No relationship: 0-0.25
Fair relationship 0.25-0.50
Moderate to good relationship: 0.50-0.75
Good to excellent relationship: above 0.75
Linearity,
homoscedasticity or
the residuals,
independence of
errors, normality of
residuals.
Pearson Chi-square
Spearman correlation
Kruskal-Wallis
Same as ANOVA
Compares to critical value.
H0: mean1=mean2=mean n
H1: at least one of the means is different
Two-sided p-value
ANOVA
Pearson Chi-square
One-sided p-value
Pearson correlation
Linear regression
p-value
Sample size
Parameters
Method
Pilot study
Minimally clinical
significant difference
Power, (1 ): 80-90%
Significance level, (): 0.01/0.05
Variability of the observations:
Continuous: Expected mean
difference dispersion (SD)
Categorical: proportion in both
groups
Minimum expected difference (effect
size)/Smallest effect: Difference in
means or proportions divided by
standard deviation (Cohen's d).
Significance criterion (p-value): 0.05
Definition
Small
Advantages
Disadvantages
Selection bias
Overestimation of tx
effect
Underestimation of
population variability
Waste of resources
Unnecesary exposure of
patients
High costs
Sensitiviy analysis: see how sample size changes in response to changes in parameters.
Small-Medium
Survival analysis
Time to event
Functions
Survival function S(t):
probability of surviving at
least to time (t)
Hazard function h(t):
conditional probability of
dying at time (t) having
survived to that time.
The graph of S(t) against t
is called the survival curve.
Displays the cumulative
probability (the survival
probability) of an
individual remaining free
of the endpoint at any
time after baseline.
Hazard: probability of
dying (or experiencing the
event in question) given
that patients have survived
up to a given point in time,
or the risk for death at that
moment.
Kaplan-Meier
It doesn't give a pvalue to compare
statistically two
groups
Can say visually that
two groups are
significantly
different.
Can be used to
estimate the survival
curve from the
observed survival
times
Logrank test
Comparison of two
survival curves
H0: there is no
difference between the
population survival
curves
Includes censored cases
Takes time into account
Assumption:
proportional hazards
Test whether there is a
difference between the
survival times of
different groups but it
does not allow other
explanatory variables to
be taken into account.
Cox-proportional hazards
Analogous to a
multiple regression
model
Adjusting for
covariates and
confounders
Several independent
variables
The response
(dependent) variable
is the hazard.
Assumption: the
hazard ratio does not
depend on time
Can be expressed in
logarithms
Censoring
The survival time is called
censored if the event is not
observed by the end of the
study
Subjects for whom the
event has not occurred at
the end of the follow up
period
Lost-to-follow-up patients
during the study or patients
who underwent a
competing event (e.g.,
death from some other
disease or from an
accident).
Missing data
Type of data
Missing Completely at
Random (MCAR): not
related to the outcome
or to the independent
variables. Best case.
Missing at Random
(MAR): related to the
independent variables.
Can use ITT
Missing Not at Random
(MNAR): missing data is
related to outcome (the
worst case and there
are no simple and
effective methods of
addressing this type of
missing data).
Baseline Observation
Carried Forward (BOCF)
-Missing data is
replaced with the
baseline value of the
patient
-Assumes that the
patients returns to his/
her baseline value
-Not commonly used
-Underestimate the
effects of treatment
-Bias towards the null
hypothesis
Regression substitution
Meta-analysis
Pool size effect calculation
Continuous outcome: mean and standard deviation (inverse
variance method)
Categorical outcome: relative risk or odds ratio
Cumulative meta-analysis: it gives the changes in the effect size over time. Studies
are added one at a time and summarized as each new study is added.
Sensitivity analysis: it gives estimates with exclusion of one study at a time as to
assess whether the results are being changed/ driven by one study in particular.
Covariate analysis: increases the efficiency of a given association between dependent and independent variable
Subgroup analysis: Assesses whether treatment effect (independent) on the dependent variable is different across two variables (e.g.
gender: males and females). Interaction test.