You are on page 1of 5

Basic Statistcs Formula Sheet

Steven W. Nydick May 25, 2012

This document is only intended to review basic concepts/formulas from an introduction to statistics course. Only mean-based procedures are reviewed, and emphasis is placed on a simplistic understanding is placed on when to use any method. After reviewing and understanding this document, one should then learn about more complex procedures and methods in statistics. However, keep in mind the assumptions behind certain procedures, and know that statistical procedures are sometimes exible to data that do not necessarily match the assumptions.

Descriptive Statistics
Elementary Descriptives (Univariate & Bivariate)
Name Mean Variance Standard Dev Covariance Correlation Population Symbol
2 x

Sample Symbol x s2 x sx sxy rxy

Sample Calculation x = s2 x = sx =
x N (xx )2 N 1

Main Problems Sensitive to outliers Sensitive to outliers Biased Outliers, uninterpretable units Range restriction, outliers, nonlinearity Doesnt make distribution normal

Alternatives Median, Mode MAD, IQR MAD Correlation

x xy xy

2 sx
(xx )(y y ) N 1 sxy sx sy (zx zy ) N 1

sxy = rxy = rxy =

z-score

zx

zx

zx =

xx ; sx

z = 0; s2 z = 1

Simple Linear Regression (Usually Quantitative IV; Quantitative DV)


Part Regular Equation Slope Intercept Standardized Equation Slope Intercept Eect Size Population Symbol yi = + xi + zyi = xy zxi + xy None P2
i i

Sample Symbol yi = a + bxi + ei b a zyi = rxy zxi + ei rxy None R2

Sample Calculation y i = a + bxi b=


sxy s2 x

Meaning Predict y from x

(xx )(y y ) (xx )2

Predicted change in y for unit change in x Predicted y for x = 0 Predict zy from zx

a=y bx z yi = rxy zxi rxy = 0


2 2 ry y = rxy sxy sx sy

=b

sx sy

Predicted change in zy for unit change in zx Predicted zy for zx = 0 is 0 Variance in y accounted for by regression line

Inferential Statistics
t-tests (Categorical IV (1 or 2 Groups); Quantitative DV)
Test One Sample Paired Samples Independent Samples Correlation Regression (FYI) Statistic x D x 1 x 2 r a&b Parameter D 1 2 =0 & e = Standard Deviation sx = sD = sp =
(xx )2 N 1 )2 (D D ND 1

Standard Error
sx N sD

df N 1 ND 1

t-obt tobt = tobt = tobt = tobt = tobt =


x 0
sx N

ND

D D 0 s D
ND

2 (n1 1)s2 1 +(n2 1)s2 n1 +n2 2

sp

1 n1

1 n2

n1 + n2 2 N 2 N 2

( x1 x 2 )(1 2 )0 sp r
1r 2 N 2 1 n1 1 +n 2

NA
(y y )2 N 2

NA sa & sb

a0 sa

& tobt =

b0 sb

t-tests Hypotheses/Rejection
Question Greater Than? One Sample H0 : # H1 : > # Less Than? H0 : # H1 : < # Not Equal To? H0 : = # H1 : = # Paired Sample H0 : D # H1 : D > # H0 : D # H1 : D < # H0 : D = # H1 : D = # Independent Sample H 0 : 1 2 # H 1 : 1 2 > # H 0 : 1 2 # H 1 : 1 2 < # H 0 : 1 2 = # H 1 : 1 2 = # When to Reject Extreme positive numbers tobt > tcrit (one-tailed) Extreme negative numbers tobt < tcrit (one-tailed) Extreme numbers (negative and positive) |tobt | > |tcrit | (two-tailed)

t-tests Miscellaneous
Test One Sample Paired Samples Independent Samples Condence Interval: % = (1 )% x tN 1; crit(2-tailed)
sx N

Unstandardized Eect Size x 0 D

Standardized Eect Size = d


x 0 sx D sD

sD tN 1; crit(2-tailed) D D

ND 1 n1

= d = d

(x 1 x 2 ) tn1 +n2 2; crit(2-tailed) sp

1 n2

x 1 x 2

x 1 x 2 sp

One-Way ANOVA (Categorical IV (Usually 3 or More Groups); Quantitative DV)


Source Between Within Total Sums of Sq.
g j =1

df g1 N g N 1

Mean Sq. SSB/df B SSW/df W

F -stat M SB/M SW

Eect Size 2 =
SSB SST

nj ( xj x G )2 1)s2 j x G )2

g j =1 (nj i,j (xij

1. We perform ANOVA because of family-wise error -- the probability of rejecting at least one true H0 during multiple tests. 2. G is grand mean or average of all scores ignoring group membership. 3. x j is the mean of group j ; nj is number of people in group j ; g is the number of groups; N is the total number of people.

One-Way ANOVA Hypotheses/Rejection


Question Is at least one mean dierent? Hypotheses H 0 : 1 = 2 = = k H1 : At least one is dierent from at least one other Remember Post-Hoc Tests: LSD, Bonferroni, Tukey (what are the rank orderings of the means?) When to Reject Extreme positive numbers Fobt > Fcrit

Chi Square (2 ) (Categorical IV; Categorical DV)


Test Independence Hypotheses H0 : Vars are Independent H1 : Vars are Dependent Goodness of Fit H0 : Model Fits H1 : Model Doesnt Fit 1. 2. 3. 4. Remember: the sum is over the number of cells/columns/rows (not the number of people) For Test of Independence: pj and pk are the marginal proportions of variable j and variable k respectively For Goodness of Fit: pi is the expected proportion in cell i if the data t the model N is the total number of people From Table N pi Cells - 1
(fO i fE i )2 C i=1 fE i

Observed From Table

Expected N pj p k

df (Cols - 1)(Rows - 1)

2 Stat
R i=1 (fO ij fE ij ) C j =1 fE ij
2

When to Reject Extreme Positive Numbers


2 2 obt > crit

Extreme Positive Numbers


2 2 obt > crit

Assumptions of Statistical Models


Correlation
1. Estimating: Relationship is linear 2. Estimating: No outliers 3. Estimating: No range restriction 4. Testing: Bivariate normality

Regression
1. Relationship is linear 2. Bivariate normality 3. Homoskedasticity (constant error variance) 4. Independence of pairs of observations

One Sample t-test


1. x is normally distributed in the population 2. Independence of observations

Independent Samples t-test


1. Each group is normally distributed in the population 2. Homogeneity of variance (both groups have the same variance in the population) 3. Independence of observations within and between groups (random sampling & random assignment)

Paired Samples t-test


1. Dierence scores are normally distributed in the population 2. Independence of pairs of observations

One-Way ANOVA
1. Each group is normally distributed in the population 2. Homogeneity of variance 3. Independence of observations within and between groups

Chi Square (2 )
1. No small expected frequencies Total number of observations at least 20 Expected number in any cell at least 5 2. Independence of observations Each individual is only in ONE cell of the table

Central Limit Theorem

Possible Decisions/Outcomes

H0 True H0 False Given a population distribution with a mean and a variance 2 , the sampling distribution of the mean using sample size N (or, to put it another way, the distribution Rejecting H0 Type I Error () Correct Decision (1 ; Power) 2 2 of sample means) will have a mean of x = and a variance equal to x = N , Not Rejecting H0 Correct Decision (1 ) Type II Error ( ) which implies that x = N . Furthermore, the distribution will approach the normal 2 distribution as N , the sample size, increases. Power Increases If: N , , , Mean Dierence , or One-Tailed Test

You might also like