You are on page 1of 35

Multiple Regression

Muhammad Zaheer
MAJOR TYPES OF MULTIPLE
REGRESSION
• Standard or simultaneous
• Hierarchical or sequential
• Stepwise
Standard or simultaneous
• In standard multiple regression, all the
independent (or predictor) variables are
entered into the equation simultaneously.
• Each independent variable is evaluated in
terms of its predictive power, over and above
that offered by all the other independent
variables.
• This is the most commonly used multiple
regression analysis.
Hierarchical multiple regression
• In hierarchical regression (also called sequential
regression), the independent variables are
entered into the equation in the order specified
by the researcher based on theoretical grounds.
• Variables or sets of variables are entered in steps
(or blocks), with each independent variable being
assessed in terms of what it adds to the
prediction of the dependent variable after the
previous variables have been controlled for.
Stepwise multiple regression
• In stepwise regression, the researcher
provides a list of independent variables and
then allows the program to select which
variables it will enter and in which order they
go into the equation, based on a set of
statistical criteria.
• There are three different versions of this
approach: forward selection, backward
deletion and stepwise regression.
ASSUMPTIONS OF MULTIPLE
REGRESSION
• Sample size
– Stevens (1996, p. 72) recommends that ‘for social
science research, about 15 participants per
predictor are needed for a reliable equation
– Tabachnick and Fidell (2007,p. 123) give a formula
for calculating sample size requirements, taking
into account the number of independent variables
that you wish to use: N > 50 + 8m (where m =
number of independent variables).
ASSUMPTIONS OF MULTIPLE
REGRESSION
• Multicollinearity and singularity
This refers to the relationship among the independent
variables. Multicollinearity exists when the independent
variables are highly correlated (r=.9 and above)
• Outliers
Multiple regression is very sensitive to outliers (very high
or very low scores). Checking for extreme scores should
be part of the initial data screening. You should do this for
all the variables, both dependent and independent, that
you will be using in your regression analysis.
ASSUMPTIONS OF MULTIPLE
REGRESSION
• Normality, linearity, homoscedasticity, independence of Residuals

• These all refer to various aspects of the distribution of scores and the nature of the
underlying relationship between the variables. These assumptions can be checked
from the residuals scatterplots which are generated as part of the multiple regression
procedure.

Residuals are the differences between the obtained and the predicted dependent
variable (DV) scores. The residuals scatterplots allow you to check:
• normality: the residuals should be normally distributed about the predicted DV
scores
• linearity: the residuals should have a straight-line relationship with predicted DV
scores
• homoscedasticity: the variance of the residuals about predicted DV scores should
be the same for all predicted scores.
Example
• File name: survey4ED.sav
Variables:
• Total perceived stress (tpstress)
• Total Perceived Control of Internal States
(tpcoiss)
• Total Mastery (tmast)
• Total Social Desirability (tmarlow)
• Age: age in years.
Research Questions
1. How well do the two measures of control
(mastery, PCOISS) predict perceived stress?
How much variance in perceived stress scores
can be explained by scores on these two scales?
2. Which is the best predictor of perceived stress:
control of external events (Mastery Scale) or
control of internal states (PCOISS)?
3. If we control for the possible effect of age and
socially desirable responding, is this set of
variables still able to predict a significant amount
of the variance in perceived stress?
Required
• one continuous dependent variable (Total perceived stress)
• two or more continuous independent variables (mastery,
PCOISS). (You can also use dichotomous independent
variables, e.g. males=1, females=2.)

• What it does: Multiple regression tells you how much of


the variance in your dependent variable can be explained
by your independent variables.
• It also gives you an indication of the relative contribution of
each independent variable.
• Tests allow you to determine the statistical significance of
the results, in terms of both the model itself and the
individual independent variables
STANDARD MULTIPLE REGRESSION
• Question 1: How well do the two measures of
control (mastery, PCOISS) predict perceived
stress? How much variance in perceived stress
scores can be explained by scores on these
two scales?
• Question 2: Which is the best predictor of
perceived stress: control of external events
(Mastery Scale) or control of internal states
(PCOISS)?
Procedure
• Before you start the following procedure,
choose Edit from the menu, select Options,
and make sure there is a tick in the box No
scientific notation for small numbers in tables.
Procedure
1. From the menu at the top of the screen, click on Analyze, then select
Regression, then Linear.
2. Click on your continuous dependent variable (e.g. Total perceived stress:
tpstress) and move it into the Dependent box.
3. Click on your independent variables (Total Mastery: tmast; Total PCOISS:
tpcoiss) and click on the arrow to move them into the Independent box.
4. For Method, make sure Enter is selected. (This will give you standard
multiple regression.)
5. Click on the Statistics button.
• Select the following: Estimates, Confidence Intervals, Model fit,
descriptives, Part and partial correlations and Collinearitydiagnostics.
• In the Residuals section, select Casewise diagnostics and Outliers outside 3
standard deviations. Click on Continue.
Procedure
6. Click on the Options button. In the Missing Values section, select
Exclude cases pairwise. Click on Continue.
7. Click on the Plots button.
• Click on *ZRESID and the arrow button to move this into the Y box.
• Click on *ZPRED and the arrow button to move this into the X box.
• In the section headed Standardized Residual Plots, tick the Normal
probability plot option. Click on Continue.
8. Click on the Save button.
• In the section labelled Distances, select Mahalanobis box and
Cook’s.
• Click on Continue and then OK (or on Paste to save to Syntax Editor).
SPSS Output
Step 1: Checking the assumptions
• Multicollinearity
• The correlations between the variables in your
model are provided in the table labelled
Correlations.(r<.70)
• The other value given is the VIF (Variance infl
ation factor), VIF values above 10 would be a
concern here, indicating multicollinearity
Step 1: Checking the assumptions
• Outliers, normality, linearity,
homoscedasticity, independence of residuals
• Inspect the Normal Probability Plot (P-P) of
the Regression Standardised Residual and the
Scatterplot
• Normal P-P Plot, points will lie in a reasonably
straight diagonal line from bottom left to top
right. This would suggest no major deviations
from normality.
Step 1: Checking the assumptions
• Scatterplot of the standardised residuals (the
second plot displayed) you are hoping that the
residuals will be roughly rectangularly distributed
• Casewise Diagnostics. This presents information
about cases that have standardised residual
values above 3.0 or below –3.0. In a normally
distributed sample, we would expect only 1 per
cent of cases to fall outside this range. In this
sample, we have found one case (case number
165) with a residual value of –3.48
Step 2: Evaluating the model
• Look in the Model Summary box and check
the value given under the heading R Square.
This tells you how much of the variance in the
dependent variable (perceived stress) is
explained by the model (which includes the
variables of Total Mastery and Total PCOISS).
Step 3: Evaluating each of the
independent variables
• The next thing we want to know is which of the variables
included in the model contributed to the prediction of the
dependent variable. We find this information in the output
box labelled Coefficients.
• Look in the column labelled Beta under Standardised
Coefficients. To compare the different variables it is
important that you look at the standardised coefficients,
not the unstandardised ones. ‘Standardised’ means that
these values for each of the different variables have been
converted to the same scale so that you can compare them.
• If you were interested in constructing a regression
equation, you would use the unstandardised coefficient
values listed as B.
Step 3: Evaluating each of the
independent variables
• In this case, we are interested in comparing the
contribution of each independent variable; therefore
we will use the beta values. Look down the Beta
column and find which beta value is the largest
(ignoring any negative signs out the front). In this case
the largest beta coefficient is –.42, which is for Total
Mastery. This means that this variable makes the
strongest unique contribution to explaining the
dependent variable, when the variance explained by all
other variables in the model is controlled for.
• The Beta value for Total PCOISS was slightly lower (–
.36), indicating that it made less of a unique
contribution.
HIERARCHICAL MULTIPLE
REGRESSION
• Let us evaluate the ability of the model (which
includes Total Mastery and Total PCOISS) to
predict perceived stress scores, after controlling
for a number of additional variables (age, social
desirability)
• Question 3: If we control for the possible effect of
age and socially desirable responding, is our set
of variables (Mastery, PCOISS) still able to predict
a significant amount of the variance in perceived
stress?
Procedure
1. From the menu at the top of the screen, click on Analyze, then
select Regression, then Linear.
2. Choose your continuous dependent variable (e.g. total perceived
stress: tpstress) and move it into the Dependent box.
3. Move the variables you wish to control for into the Independent box
(e.g. age, total social desirability: tmarlow). This will be the first block
of variables to be entered in the analysis (Block 1 of 1).
4. Click on the button marked Next. This will give you a second
independent variables box to enter your second block of variables into
(you should see Block 2 of 2).
5. Choose your next block of independent variables (e.g. Total Mastery:
tmast, Total PCOISS: tpcoiss).
6. In the Method box, make sure that this is set to the default (Enter).
7. Click on the Statistics button. Select the following: Estimates,
Model fi t, R squared change, Descriptives, Part and partial
correlations and Collinearity diagnostics. Click on Continue
8. Click on the Options button. In the Missing Values section,
click on Exclude cases pairwise. Click on Continue.
9. Click on the Plots button:
• Click on *ZRESID and the arrow button to move this into the Y
box.
• Click on *ZPRED and the arrow button to move this into the X
box.
• In the section headed Standardized Residual Plots, tick the
Normal probability plot option. Click on Continue.
10. Click on the Save button. Click on Mahalonobis and Cook’s.
Click on Continue and then OK (or on Paste to save to Syntax
Editor).
INTERPRETATION OF OUTPUT
• In the Model Summary box there are two
models listed. Model 1 refers to the first block
of variables that were entered (Total social
desirability and age), while Model 2 includes
all the variables that were entered in both
blocks (Total social desirability, age, Total
Mastery, Total PCOISS).
Step 1: Evaluating the model
• Check the R Square values in the first Model
summary box. After the variables in Block 1
(social desirability & age) have been entered, the
overall model explains 5.7 per cent of the
variance (.057 × 100).
• After Block 2 variables (Total Mastery, Total
PCOISS) have also been included, the model as a
whole explains 47.4 per cent (.474 × 100). It is
important to note that this second R square value
includes all the variables from both blocks, not
just those included in the second step.
• To find out how much of this overall variance is explained by
our variables of interest (Mastery, POCISS) after the effects of
age and socially desirable responding are removed, you need
to look in the column labelled R Square change.
• In the output presented above you will see, on the line
marked Model 2, that the R square change value is .42. This
means that Mastery and PCOISS explain an additional 42 per
cent (.42 × 100) of the variance in perceived stress, even
when the effects of age and socially desirable responding are
statistically controlled for.
• This is a statistically significant contribution, as indicated by
the Sig. F change value for this line (.000). The ANOVA table
indicates that the model as a whole (which includes both
blocks of variables) is significant (F (4, 421) = 94.78, p < .0005).
Step 2: Evaluating each of the independent
variables
• look in the Coefficients table in the Model 2 row. This
summarizes the results, with all the variables entered into the
equation. Scanning the Sig. column, there are only two
variables that make a unique statistically significant
contribution (less than .05).
• In order of importance (according to their beta values), they
are: Mastery (beta = –.44) and Total PCOISS (beta = –.33).
Neither age nor social desirability made a unique
contribution.
• Remember, these beta values represent the unique
contribution of each variable, when the overlapping effects of
all other variables are statistically removed.
• In different equations, with a different set of independent
variables, or with a different sample these values would
change.
Assignment
• Data fi le: sleep4ED.sav.
1. Conduct a standard multiple regression to explore factors that impact on
people’s
level of daytime sleepiness. For your dependent variable, use the Sleepiness and
Associated Sensations Scale total score (totSAS). For independent variables, use
sex, age, physical fitness rating (fitrate) and scores on the HADS Depression Scale
(depress). Assess how much of the variance in total sleepiness scores is explained
by the set of variables (check your R square value). Which of the variables make a
unique significant contribution (check your beta values)?
2. Repeat the above analysis, but this time use a hierarchical multiple regression
procedure entering sex and age in the fi rst block of variables and physical fi tness
and depression scores in the second block. After controlling for the demographic
variables of sex and age, do the other two predictor variables make a signifi cant
contribution to explaining variance in sleepiness scores? How much additional
variance in sleepiness is explained by physical fi tness and depression, after
controlling
for sex and age?

You might also like