Professional Documents
Culture Documents
Always mention get rid of the outliers, so don’t skew the dataset
For a given statistical test, the sample size is calculated from statistical power, effect size, and
significance level.
Type 1 error: false positive, reject a true null; type 2 error: false negative, fail to reject a false
null
One-Sample T-test
- Assumptions
o Independent random sampling
o Normal distribution
o SD of sampled population equals to that of the comparison population
Chi-square
- Overview: measure whether there is a relationship between two categorical variables;
whether distribution of categorical variables differ from each other.
- statistical independence means that the frequency distribution of a variable is the
same for all levels of some other variable.
- Expected frequencies are the frequencies we expect in our sample if the null
hypothesis holds.
- Observed frequencies
- One-way Chi Square/Goodness of fit test
- Only one IV (one variable, like a one sample t-test)
- determine whether or not the relative frequencies in the observed
categories are similar to, or statistically different from, the hypothesized
relative frequencies within those same categories
- Alternative - there is a difference between each level of the IV
- Assumptions
- Nominal/ordinal (categorical) data for DV
- Independence of observations
- Groups of the categorical IV should be mutually exclusive (e.g. a male
employee will be counted only under the male level, and cannot be
counted under the female level)
- At least 5 expected frequencies in each group of your categorical IV
- How to interpret results
- Chi square value should be significant at the 0.05 level - implying that
each level of the IV is significantly different from the other levels
- Effect size - Cramer’s phi
- 0.1 - small, 0.3 - medium, 0.5 - large
- Two-way Chi Square/Test of Independence
- 2x2 design two variables, similar to an interaction effect of an ANOVA
- Null - the two IVs are independent of each other
- Is the outcome in one variable related to the outcome in some other
variable
- Assumptions
- Two IVs should be measured using categorical data (nominal/ordinal DV)
- Each IV should have at least two levels that are independent of each
other/mutually exclusive
- How to interpret results
- Pearson Chi Square should be statistically significant at the 0.05 level -
implying that there is a statistically significant association between the two
IVs (they seem to be interacting)
- Effect size - Cramer’s phi
- 0.1 - small, 0.3 - medium, 0.5 - large
One-way ANOVA
- Overview: compare the means of # sample groups and determine whether any of those
means are statistically significantly different from each other.
- One IV multiple levels
- Independent
- Assumptions
- HOV (Levene’s test for HOV)
- Independent Random Sampling
- Normal Distribution
- How to interpret results
- F ratio should be significant at the 0.05 level - only tells you that a
statistically significant difference exists but not where it exists
- Use post-hoc tests
- if IV has only two levels - use an independent samples t-
test
- If IV has three levels - use Fisher’s LSD (because only
three pairs of comparisons)
- If IV has more than three levels - use Tukey’s HSD
Note: Can use a Bonferroni adjusted alpha level for post-hoc tests
- Effect size - ETA SQUARED
- Small: .01
- Medium: 06
- Large: .14
- http://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/effectSize
- Repeated Measures
- Assumptions
- Sphericity (Mauchly’s W) - basically means that all pairwise interactions
will be equally large (for amount of interaction between any two levels of
the IV)
- Independent Random Sampling
- Normal Distribution
- How to interpret results
- F ratio should be significant at the 0.05 level - only tells you that a
statistically significant difference exists but not where it exists
- Use post-hoc tests
- if IV has only two levels - use (a dependent?) samples t-test
- If IV has three levels - use Fisher’s LSD (because only
three pairs of comparisons)
- If IV has more than three levels - use Tukey’s HSD
Note: Can use a Bonferroni adjusted alpha level for post-hoc tests
- Effect size - ETA SQUARED
- Small: .01
- Medium: .06
- Large: .14
- http://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/effectSize
Two-way ANOVA
- Overview: understand if there is an interaction between the two independent variables on
the dependent variable
- Multiple IVs with multiple levels
Note - for independent/repeated measures - same protocol except look at assumptions for
respective one-way designs)
Mixed design:
- Null: the means of IV1 conditions are equal; the means of IV2 conditions are
equal; no interaction between IV1 & IV2
- Assumptions
- HOV
- Homogeneity of Covariance across groups (Box’s M)
- Sphericity
- Independent Random Sampling
- Normal Distribution
- How to interpret results
- You will have two main effects and one interaction effect (3 F ratios)
- If main effects significant, use post-hoc tests for follow up (same as One-
Way ANOVAs)
- If interaction effect significant, will need to heed more to this than your
significant main effects
- If IV(s) has two levels - post-hoc will be a t-test
- If IV(s) has more than two levels - simple main effect - keeping
one level of IV1 constant, and doing a one-way ANOVA for the
other IV
- Again, this will tell you that a difference exists, but you
don’t know where, so maybe follow up with a Fisher’s
LSD/Tukey’s HSD
- Ordinal and disordinal (check)
- Effect size
- Eta squared for main effects of each of the two IVs
- 0.01 - small, 0.06 - medium, 0.14 - large
- Partial eta squared for the interaction effect between the two IVs
- 0.01 - small, 0.09 - medium, 0.25 - large
Correlation
- Requirements:
- Each variable should be continuous
- Each observation/participant should have a pair of values (related variables)
- Assumptions:
- Linearity
- Independent random sampling
- Normal distribution
- Absence of influential outliers
- Homoscedasticity (look at scatterplot, distance from data points to straight line
should be roughly equal)
- Used for:
- Test-retest reliability
- Internal consistency (split-half)
- Interrater reliability
- Criterion validity
- Strength of correlation, Pearson’s r:
- .1, .3, .5 (Cohen, 1988)
Linear Regression
- Assumptions:
- Linearity
- Independent random sampling
- Normal distribution
- Absence of influential outliers
- Homoscedasticity (look at scatterplot, distance from data points to straight line
should be roughly equal)
- How to interpret results
- R squared - The amount of variance accounted for by the IV
- Small - 0.01, Medium - 0.09, Large - 0.25
- Check the beta weight for per unit change in DV resulting from per unit change in
IV - sign of beta weight will explain direction of relationship
Logistic Regression
- Overview:
- predicting for every unit increase in IV, what is the likelihood of a dichotomous
outcome
- model the probability of an event occurring depending on the values of the
independent variables, which can be categorical or numerical
- estimate the probability that an event occurs for a randomly selected observation
versus the probability that the event does not occur
- predict the effect of a series of variables on a binary response
- classify observations by estimating the probability that an observation is in a
particular category
- Assumptions (http://www.statisticssolutions.com/assumptions-of-logistic-regression/)
- NOT REQUIRED: First, logistic regression does not require a linear relationship
between the dependent and independent variables. Second, the error terms
(residuals) do not need to be normally distributed. Third, homoscedasticity is not
required. Finally, the dependent variable in logistic regression is not measured on
an interval or ratio scale. However, some other assumptions still apply.
- Dependent variable to be binary
- Observations to be independent of each other - in other words, the observations
should not come from repeated measurements or matched data.
- Little or no multicollinearity among the independent variables - this means
that the independent variables should not be too highly correlated with each other
- Linearity of independent variables and log odds. although this analysis does
not require the dependent and independent variables to be related linearly, it
requires that the independent variables are linearly related to the log odds.
- Logistic regression typically requires a large sample size. A general guideline
is that you need at minimum of 10 cases with the least frequent outcome for each
independent variable in your model. For example, if you have 5 independent
variables and the expected probability of your least frequent outcome is .10, then
you would need a minimum sample size of 500 (10*5 / .10).
- How to interpret results
- Nagelkerke R squared - amount of variance accounted for by predictors in the
regression model in predicting the DV
- Odds-ratio - change in odds of predicting DV with one unit change in one IV,
holding all other variables constant
- Example - if position grade was found significant, then it would suggest
that a change in one unit of grade will affect (increase/decrease) the
likelihood of return by (odds ratio) times - this is with reference to
expatriate training
Multiple Regression
- Assumptions of Multiple Regression (COHEN, 2013)
- Minimal sample size - 41 + number of predictors (accounts for
- Independent Random Sampling – individual cases should be selected
independently of each other
- Normal Distributions – all variables involved in the multiple regression are
normally distributed
- Homoscedasticity – errors from the regression surface (e.g. line, plane, etc.) have
the same variance in all locations
- Multivariate Outliers – combinations of values on three or more variables that are
unusual and may indicate measurement errors or psychological phenomena
- Measuring Leverage, Residuals and Influence – we probably want to do an outlier
analysis to ensure that these are in check
- Leverage – outliers that can easily rotate the regression line
- Residuals – a point’s value on the regression line minus its predicted value
- Influence – outliers that have leverage and large residual values
- Dichotomous Predictors – all categorical variables have been coded into numeric
values
- Categorical variables with more than two levels have been coded into
dichotomous categorical variables – these are IVs, therefore multiple
logistic regression not done
- Problems with Multiple Regression that will be addressed prior/post-test
- Multicollinearity – no two variables are perfectly or highly correlated, or
predicted by a combination of other variables
- Shrinkage – when regression model based on one sample but to be applied to
another
- Cross-validation to address shrinkage – take one half of sample and create
regression model, use beta weights to apply to the other half of sample and
see if it works
- HAVING TOO MANY PREDICTORS - USE A BONFERRONI ADJUSTED
ALPHA FOR YOUR REGRESSION MODEL + MINIMAL SAMPLE SIZE =
41 + # OF PREDICTORS - CITE THE STATS TEXTBOOK - COHEN (2013)
- How to interpret results
- Check R squared value for amount of variance explained by predictors in the
regression model
- Check beta weights to see what the per unit change in DV resulting from per unit
change in IV will be (for each IV) - the sign of the beta weight will indicate
positive/negative relationship
NOTE: We will only use predictors that have statistically significant correlations with the DV in
the regression model - relationship can be positive/negative or weak/moderate/strong
EFFECT SIZE CHART (WE DON’T KNOW ABOUT ETA SQUARED - TBD)
Effect size ‘r’ (correlation) R2 (R squared) f Cohen’s d (t-
Cramer’s phi Partial eta tests)
(for chi square) squared