Professional Documents
Culture Documents
1 Summary Statistics
2 Normality
3 The Correlation Matrix
4 Regression Analysis: The Basics
5 Regression Analysis: 7 Hypothesis Testing and Modelling Issues
5.1 Significance of individual parameters
5.2 Explanatory power: adjR2
5.3 Significance of the equation: The F-test
5.4 Dummy variables
5.5 The standard error
5.6 Collinearity
5.7 The general-to-specific modelling methodology
Statistical analysis begins with the collection of a sample of data. These data can be
subjected to various kinds of statistical analysis. The purpose of the analysis is to draw
inferences regarding the population from which the sample is drawn.
1 Summary Statistics
The mean is the arithmetic average.1 The standard deviation and variance measure
variability of the observations around the mean. The larger is the variability, the greater is
the standard deviation. The variance is the standard deviation squared. The minimum
(Min) is the smallest observation. The maximum (Max) is the largest observation. The
mode is the most common observation. The median (Med) is the middle observation.2
2 Normality3
All basic methods of statistical analysis assume that the observations follow an
approximately normal distribution. A normal distribution can be understood as a bell
shaped frequency distribution around the mean. There are two elementary tests of
Skewness is the absence of symmetry. It involves the hump of the distribution being away
from the mean. If there is no skewness then the statistical measure of skewness (s) is zero.
There are tests available to show whether s differs significantly from zero. As a rule of
thumb, |s| > 1 indicates potentially serious non-normality.
5.6 Collinearity
One of the assumptions underlying OLS regression methods is that the explanatory
variables are "linearly independent". If two independent variables are highly correlated
they should not both appear in a regression equation, because they do not differ
sufficiently from each other. As a rule of thumb, if |r| ≥ 0.7 between two variables, the
variables should not both appear as independent variables in a regression equation,
because they lack independent explanatory power.