You are on page 1of 72

SW388R7 Data Analysis & Computers II Slide 1

Multiple Regression Basic Relationships


Purpose of multiple regression Different types of multiple regression

Standard multiple regression


Hierarchical multiple regression Stepwise multiple regression Steps in solving regression problems

SW388R7 Data Analysis & Computers II Slide 2

Purpose of multiple regression


The purpose of multiple regression is to analyze the relationship between metric or dichotomous independent variables and a metric dependent variable. If there is a relationship, using the information in the independent variables will improve our accuracy in predicting values for the dependent variable.

SW388R7 Data Analysis & Computers II Slide 3

Types of multiple regression

There are three types of multiple regression, each of which is designed to answer a different question: Standard multiple regression is used to evaluate the relationships between a set of independent variables and a dependent variable. Hierarchical, or sequential, regression is used to examine the relationships between a set of independent variables and a dependent variable, after controlling for the effects of some other independent variables on the dependent variable. Stepwise, or statistical, regression is used to identify the subset of independent variables that has the strongest relationship to a dependent variable.

SW388R7 Data Analysis & Computers II Slide 4

Standard multiple regression

In standard multiple regression, all of the independent variables are entered into the regression equation at the same time Multiple R and R measure the strength of the relationship between the set of independent variables and the dependent variable. An F test is used to determine if the relationship can be generalized to the population represented by the sample. A t-test is used to evaluate the individual relationship between each independent variable and the dependent variable.

SW388R7 Data Analysis & Computers II Slide 5

Hierarchical multiple regression

In hierarchical multiple regression, the independent variables are entered in two stages. In the first stage, the independent variables that we want to control for are entered into the regression. In the second stage, the independent variables whose relationship we want to examine after the controls are entered. A statistical test of the change in R from the first stage is used to evaluate the importance of the variables entered in the second stage.

SW388R7 Data Analysis & Computers II Slide 6

Stepwise multiple regression

Stepwise regression is designed to find the most parsimonious set of predictors that are most effective in predicting the dependent variable. Variables are added to the regression equation one at a time, using the statistical criterion of maximizing the R of the included variables. When none of the possible addition can make a statistically significant improvement in R, the analysis stops.

SW388R7 Data Analysis & Computers II Slide 7

Problem 1 - standard multiple regression


In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data, violation of assumptions, or outliers, and that the split sample validation will confirm the generalizability of the results. Use a level of significance of 0.05. The variables "strength of affiliation" [reliten] and "frequency of prayer" [pray] have a strong relationship to the variable "frequency of attendance at religious services" [attend].

Survey respondents who were less strongly affiliated with their religion attended religious services less often. Survey respondents who prayed less often attended religious services less often.
1. 2. 3. 4. True True with caution False Inappropriate application of a statistic

SW388R7 Data Analysis & Computers II Slide 8

Dissecting problem 1 - 1
When a problem states that there is a relationship between some independent variables and a dependent variable, we do standard multiple regression. The variables listed first in the 1. statement are the problemIn the dataset GSS2000.sav, is the following statement true, false, or an incorrect independent variables (ivs): application of a statistic? Assume that there is no problem with missing data, violation of "strength of affiliation" [reliten] assumptions, or outliers, and that the split sample validation will confirm the and "frequency of prayer" [pray]

generalizability of the results. Use a level of significance of 0.05.

The variables "strength of affiliation" [reliten] and "frequency of prayer" [pray] have a strong relationship to the variable "frequency of attendance at religious services" [attend]. Survey respondents who were less strongly affiliated with their religion attended religious services less often. Survey respondents who prayed less often attended religious services The variable that is less often. related to is the 1. 2. 3. 4. True True with caution False Inappropriate application of a statistic
dependent variable (dv): "frequency of attendance at religious services" [attend].

SW388R7 Data Analysis & Computers II Slide 9

Dissecting problem 1 - 2
In order fordataset GSS2000.sav, we 1. In the a problem to be true, is the following statement true, false, or an incorrect will have find: a statistic? Assume that there is no problem with missing data, violation of application of a statistically significant relationship assumptions, or outliers, and that the split sample validation will confirm the between the ivs and the dv generalizability of correct strength a relationship of thethe results. Use a level of significance of 0.05.

The variables "strength of affiliation" [reliten] and "frequency of prayer" [pray] have a strong relationship to the variable "frequency of attendance at religious services" [attend]. Survey respondents who were less strongly affiliated with their religion attended religious services less often. Survey respondents who prayed less often attended religious services less often. 1. 2. 3. 4. True True with caution False Inappropriate application of a statistic
The relationship of each of the independent variables to the dependent variable must be statistically significant and interpreted correctly.

SW388R7 Data Analysis & Computers II Slide 10

Request a standard multiple regression

To compute a multiple regression in SPSS, select the Regression | Linear command from the Analyze menu.

SW388R7 Data Analysis & Computers II Slide 11

Specify the variables and selection method


First, move the dependent variable attend to the Dependent text box. Second, move the independent variables reliten and pray to the Independent(s) list box.

Fourth, click on the Statistics button to specify the statistics options that we want.

Third, select the method for entering the variables into the analysis from the drop down Method menu. In this example, we accept the default of Enter for direct entry of all variables, which produces a standard multiple regression.

SW388R7 Data Analysis & Computers II Slide 12

Specify the statistics output options

First, mark the checkboxes for Estimates on the Regression Coefficients panel.

Second, mark the checkboxes for Model Fit and Descriptives.

Third, click on the Continue button to close the dialog box.

SW388R7 Data Analysis & Computers II Slide 13

Request the regression output

Click on the OK button to request the regression output.

SW388R7 Data Analysis & Computers II Slide 14

LEVEL OF MEASUREMENT
Multiple regression requires that the dependent variable be metric and the independent variables be metric or dichotomous. "Frequency of attendance at religious services" [attend] is an ordinal level variable, which satisfies the level of measurement requirement if we follow the convention of treating ordinal level variables as metric variables. Since some data analysts do not agree with this convention, a note of caution should be included in our interpretation. "Strength of affiliation" [reliten] and "frequency of prayer" [pray] are ordinal level variables. If we follow the convention of treating ordinal level variables as metric variables, the level of measurement requirement for multiple regression analysis is satisfied. Since some data analysts do not agree with this convention, a note of caution should be included in our interpretation.

SW388R7 Data Analysis & Computers II Slide 15

SAMPLE SIZE
Descriptiv e Statistics Mean HOW OFTEN R ATTENDS RELIGIOUS SERVICES STRENGTH OF AFFILIATION HOW OFTEN DOES R PRAY 3.15 2.12 2.90 Std. Deviation 2.653 1.084 1.575 N 113 113 113

The minimum ratio of valid cases to independent variables for multiple regression is 5 to 1. With 113 valid cases and 2 independent variables, the ratio for this analysis is 56.5 to 1, which satisfies the minimum requirement. In addition, the ratio of 56.5 to 1 satisfies the preferred ratio of 15 to 1.

SW388R7 Data Analysis & Computers II Slide 16

OVERALL RELATIONSHIP BETWEEN INDEPENDENT AND DEPENDENT VARIABLES - 1

The probability of the F statistic (49.824) for the overall regression relationship is <0.001, less than or equal to the level of significance of 0.05. We reject the null hypothesis that there is no relationship between the set of independent variables and the dependent variable (R = 0). We support the research hypothesis that there is a statistically significant relationship between the set of independent variables and the dependent variable.

ANOVAb Model 1 Sum of Squares 374.757 413.685 788.442 df 2 110 112 Mean Square 187.379 3.761 F 49.824 Sig. .000 a

Regression Residual Total

a. Predictors: (Constant), HOW OFTEN DOES R PRAY, STRENGTH OF AFFILIATION b. Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICES

SW388R7 Data Analysis & Computers II Slide 17

OVERALL RELATIONSHIP BETWEEN INDEPENDENT AND DEPENDENT VARIABLES - 2

The Multiple R for the relationship between the set of independent variables and the dependent variable is 0.689, which would be characterized as strong using the rule of thumb than a correlation less than or equal to 0.20 is characterized as very weak; greater than 0.20 and less than or equal to 0.40 is weak; greater than 0.40 and less than or equal to 0.60 is moderate; greater than 0.60 and less than or equal to 0.80 is strong; and greater than 0.80 is very strong.

Model Summary Model 1 R R Square .689 a .475 Adjusted R Square .466 Std. Error of the Estimate 1.939

a. Predictors: (Constant), HOW OFTEN DOES R PRAY, STRENGTH OF AFFILIATION

SW388R7 Data Analysis & Computers II Slide 18

RELATIONSHIP OF INDIVIDUAL INDEPENDENT VARIABLES TO DEPENDENT VARIABLE - 1


For the independent variable strength of affiliation, the probability of the t statistic (-5.857) for the b coefficient is <0.001 which is less than or equal to the level of significance of 0.05. We reject the null hypothesis that the slope associated with strength of affiliation is equal to zero (b = 0) and conclude that there is a statistically significant relationship between strength of affiliation and frequency of attendance at religious services.

Coefficientsa Unstandardized Coefficients B Std. Error 7.167 .442 -1.138 -.554 .194 .134 Standardized Coefficients Beta -.465 -.329

Model 1

(Constant) STRENGTH OF AFFILIATION HOW OFTEN DOES R PRAY

t 16.206 -5.857 -4.145

Sig. .000 .000 .000

a. Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICES

SW388R7 Data Analysis & Computers II Slide 19

RELATIONSHIP OF INDIVIDUAL INDEPENDENT VARIABLES TO DEPENDENT VARIABLE - 2


Coefficientsa Unstandardized Coefficients B Std. Error 7.167 .442 -1.138 -.554 .194 .134 Standardized Coefficients Beta -.465 -.329

Model 1

(Constant) STRENGTH OF AFFILIATION HOW OFTEN DOES R PRAY

t 16.206 -5.857 -4.145

Sig. .000 .000 .000

a. Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICES

The b coefficient associated with strength of affiliation (-1.138) is negative, indicating an inverse relationship in which higher numeric values for strength of affiliation are associated with lower numeric values for frequency of attendance at religious services. Since both variables are ordinal level, we will have to look at the coding for each before we can make a correct interpretation. For ordinal level variables the numeric codes can be associated with labels in ascending or descending order.

SW388R7 Data Analysis & Computers II Slide 20

RELATIONSHIP OF INDIVIDUAL INDEPENDENT VARIABLES TO DEPENDENT VARIABLE - 3


The independent variable strength of affiliation is an ordinal variable that is coded so that higher numeric values are associated with survey respondents who were less strongly affiliated with their religion.

SW388R7 Data Analysis & Computers II Slide 21

RELATIONSHIP OF INDIVIDUAL INDEPENDENT VARIABLES TO DEPENDENT VARIABLE - 4

The dependent variable frequency of attendance at religious services is also an ordinal variable. It is coded so that lower numeric values are associated with survey respondents who attended religious services less often.

Therefore, the negative value of b implies that survey respondents who were less strongly affiliated with their religion attended religious services less often.

SW388R7 Data Analysis & Computers II Slide 22

RELATIONSHIP OF INDIVIDUAL INDEPENDENT VARIABLES TO DEPENDENT VARIABLE - 5


For the independent variable frequency of prayer, the probability of the t statistic (-4.145) for the b coefficient is <0.001 which is less than or equal to the level of significance of 0.05. We reject the null hypothesis that the slope associated with frequency of prayer is equal to zero (b = 0) and conclude that there is a statistically significant relationship between frequency of prayer and frequency of attendance at religious services.

Coefficientsa Unstandardized Coefficients B Std. Error 7.167 .442 -1.138 -.554 .194 .134 Standardized Coefficients Beta -.465 -.329

Model 1

(Constant) STRENGTH OF AFFILIATION HOW OFTEN DOES R PRAY

t 16.206 -5.857 -4.145

Sig. .000 .000 .000

a. Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICES

SW388R7 Data Analysis & Computers II Slide 23

RELATIONSHIP OF INDIVIDUAL INDEPENDENT VARIABLES TO DEPENDENT VARIABLE - 6


Coefficientsa Unstandardized Coefficients B Std. Error 7.167 .442 -1.138 -.554 .194 .134 Standardized Coefficients Beta -.465 -.329

Model 1

(Constant) STRENGTH OF AFFILIATION HOW OFTEN DOES R PRAY

t 16.206 -5.857 -4.145

Sig. .000 .000 .000

a. Dependent Variable: HOW OFTEN R ATTENDS RELIGIOUS SERVICES

The b coefficient associated with how often does r pray (-0.554) is negative, indicating an inverse relationship in which higher numeric values for how often does r pray are associated with lower numeric values for frequency of attendance at religious services. Since both variables are ordinal level, we will have to look at the coding for each before we can make a correct interpretation. For ordinal level variables the numeric codes can be associated with labels in ascending or descending order.

SW388R7 Data Analysis & Computers II Slide 24

RELATIONSHIP OF INDIVIDUAL INDEPENDENT VARIABLES TO DEPENDENT VARIABLE - 7

The independent variable frequency of prayer is an ordinal variable that is coded so that higher numeric values are associated with survey respondents who prayed less often.

SW388R7 Data Analysis & Computers II Slide 25

RELATIONSHIP OF INDIVIDUAL INDEPENDENT VARIABLES TO DEPENDENT VARIABLE - 8

The dependent variable frequency of attendance at religious services is also an ordinal variable. It is coded so that lower numeric values are associated with survey respondents who attended religious services less often.

Therefore, the negative value of b implies that survey respondents who prayed less often attended religious services less often.

SW388R7 Data Analysis & Computers II Slide 26

Answer to problem 1
The independent and dependent variables were metric (ordinal). The ratio of cases to independent variables was 56.5 to 1. The overall relationship was statistically significant and its strength was characterized correctly. The b coefficient for all variables was statistically significant and the direction of the relationships were characterized correctly. The answer to the question is true with caution. The caution is added because of the ordinal variables.

SW388R7 Data Analysis & Computers II Slide 27

Problem 2 hierarchical regression


In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data, violation of assumptions, or outliers, and that the split sample validation will confirm the generalizability of the results. Use a level of significance of 0.05. After controlling for the effects of the variables "age" [age] and "sex" [sex], the addition of the variables "happiness of marriage" [hapmar], "condition of health" [health], and "attitude toward life" [life] reduces the error in predicting "general happiness" [happy] by 36.1%. After controlling for age and sex, the variables happiness of marriage, condition of health, and attitude toward life each make an individual contribution to reducing the error in predicting general happiness. Survey respondents who were less happy with their marriages were less happy overall. Survey respondents who said they were not as healthy were less happy overall. Survey respondents who felt life was less exciting were less happy overall. 1. 2. 3. 4. True True with caution False Inappropriate application of a statistic

SW388R7 Data Analysis & Computers II Slide 28

Dissecting problem 2 - 1
The variables listed first in the problem statement are the independent variables (ivs) 14. In the want to control whose effect wedataset GSS2000.sav, is the following statement true, false, or an incorrect application of a before we test for thestatistic? Assume that there is no problem with missing data, violation of assumptions, or outliers, relationship: "age"[age] and and that the split sample validation will confirm the "sex" [sex], generalizability of the results. Use a level of significance of 0.05.

After controlling for the effects of the variables "age" [age] and "sex" [sex], the addition of the variables "happiness of marriage" [hapmar], "condition of health" [health], and "attitude toward life" [life] reduces the error in predicting "general happiness" [happy] by 36.1%. After controlling for age and sex, the variables happiness of marriage, condition of health,

The variables that we add in after the an individual contribution to reducing the error in and attitude toward life each make control variables are the independent predicting general happiness. Survey respondents who were less happy with their marriages The variable that to be variables that we think will have a were less happy overall. Survey respondents who said they were not asor relatedwere less predicted healthy to is statistical relationship to the the were less variable happy overall. Survey respondents who felt life was less exciting dependenthappy overall. dependent variable: (dv): "general happiness" "happiness of marriage" [hapmar], [happy] "condition of health" [health], and 1. True "attitude toward life" [life]

2. True with caution 3. False 4. Inappropriate application of a statistic

SW388R7 Data Analysis & Computers II Slide 29

Dissecting problem 2 - 2
In order for a problem to be true, the 14. In the dataset GSS2000.sav, is relationship between the addedfalse, or an incorrect the following statement true, variables and the dependent variable must be application of a statistic? Assume that there is significant, and the strength of violation of no problem with missing data, statistically assumptions, or outliers, and that the split sample after including the control the relationship validation will confirm the generalizability of the results. Use a level ofmust be correctly stated. variables significance of 0.05.

After controlling for the effects of the variables "age" [age] and "sex" [sex], the addition of the variables "happiness of marriage" [hapmar], "condition of health" [health], and "attitude toward life" [life] reduces the error in predicting "general happiness" [happy] by 36.1%. After controlling for age and sex, the variables happiness of marriage, condition of health, and attitude toward life each make an individual contribution to reducing the error in predicting general happiness. Survey respondents who were less happy with their marriages were less happy overall. Survey respondents who said they were not as healthy were less happy overall. Survey respondents who felt life was less exciting were less happy overall. 1. True generally not interested We are in whether or not the 2. True with caution control variables have a statistically 3. False relationship to the significant dependent variables. 4. Inappropriate application of a statistic
The relationship between each of the independent variables entered after the control variables and the dependent variable must be statistically significant and interpreted correctly.

SW388R7 Data Analysis & Computers II Slide 30

Request a hierarchical multiple regression

To compute a multiple regression in SPSS, select the Regression | Linear command from the Analyze menu.

SW388R7 Data Analysis & Computers II Slide 31

Specify independent variables to control for


First, move the dependent variable happy to the Dependent text box.

Second, move the independent variables to control for age and sex to the Independent(s) list box.

Fourth, click on the Next button to tell SPSS to add another block of variables to the regression analysis.

Third, select the method for entering the variables into the analysis from the drop down Method menu. In this example, we accept the default of Enter for direct entry of all variables in the first block which will force the controls into the regression.

SW388R7 Data Analysis & Computers II Slide 32

Add the other independent variables


SPSS identifies that we will now be adding variables to a second block.

First, move the other independent variables hapmar, health and life to the Independent(s) list box for block 2.

Second, click on the Statistics button to specify the statistics options that we want.

SW388R7 Data Analysis & Computers II Slide 33

Specify the statistics output options

First, mark the checkboxes for Estimates on the Regression Coefficients panel.

Second, mark the checkboxes for Model Fit, Descriptives, and R squared change. The R squared change statistic will tell us whether or not the variables added after the controls have a relationship to the dependent variable.

Third, click on the Continue button to close the dialog box.

SW388R7 Data Analysis & Computers II Slide 34

Request the regression output

Click on the OK button to request the regression output.

SW388R7 Data Analysis & Computers II Slide 35

LEVEL OF MEASUREMENT
Multiple regression requires that the dependent variable be metric and the independent variables be metric or dichotomous. "General happiness" [happy] is an ordinal level variable, which satisfies the level of measurement requirement if we follow the convention of treating ordinal level variables as metric variables. Since some data analysts do not agree with this convention, a note of caution should be included in our interpretation.

"Age" [age] is an interval level variable, which satisfies the level of measurement requirements for multiple regression analysis.
"Happiness of marriage" [hapmar], "condition of health" [health], and "attitude toward life" [life] are ordinal level variables. If we follow the convention of treating ordinal level variables as metric variables, the level of measurement requirement for multiple regression analysis is satisfied. Since some data analysts do not agree with this convention, a note of caution should be included in our interpretation. "Sex" [sex] is a dichotomous or dummy-coded nominal variable which may be included in multiple regression analysis.

SW388R7 Data Analysis & Computers II Slide 36

SAMPLE SIZE
Descriptiv e Statistics GENERAL HAPPINESS AGE OF RESPONDENT RESPONDENTS SEX HAPPINESS OF MARRIAGE CONDITION OF HEALTH IS LIFE EXCITING OR DULL Mean 1.63 45.50 1.61 1.42 1.80 1.49 Std. Deviation .626 15.221 .490 .540 .810 .525 N 90 90 90 90 90 90

The minimum ratio of valid cases to independent variables for multiple regression is 5 to 1. With 90 valid cases and 5 independent variables, the ratio for this analysis is 18.0 to 1, which satisfies the minimum requirement. In addition, the ratio of 18.0 to 1 satisfies the preferred ratio of 15 to 1.

SW388R7 Data Analysis & Computers II Slide 37

OVERALL RELATIONSHIP BETWEEN INDEPENDENT AND DEPENDENT VARIABLES


ANOVAc Model 1 Sum of Squares .006 34.894 34.900 12.601 22.299 34.900 df 2 87 89 5 84 89 Mean Square .003 .401 2.520 .265 F .007 Sig. .993 a

Regression Residual Total Regression Residual Total

9.493

.000 b

a. Predictors: (Constant), RESPONDENTS SEX, AGE OF RESPONDENT b. Predictors: (Constant), RESPONDENTS SEX, AGE OF RESPONDENT, IS LIFE EXCITING OR DULL, HAPPINESS OF MARRIAGE, CONDITION OF HEALTH c. Dependent Variable: GENERAL HAPPINESS

The probability of the F statistic (9.493) for the overall regression relationship for all indpendent variables is <0.001, less than or equal to the level of significance of 0.05. We reject the null hypothesis that there is no relationship between the set of all independent variables and the dependent variable (R = 0). We support the research hypothesis that there is a statistically significant relationship between the set of all independent variables and the dependent variable.

SW388R7 Data Analysis & Computers II Slide 38

REDUCTION IN ERROR IN PREDICTING DEPENDENT VARIABLE - 1


Model Summary Change Statistics

Model 1 2

R R Square a .013 .000 b .601 .361

Adjusted R Square -.023 .323

Std. Error of the Estimate .633 .515

R Square Change .000 .361

F Change .007 15.814

df1 2 3

df2 87 84

Sig. F Change .993 .000

a. Predictors: (Constant), RESPONDENTS SEX, AGE OF RESPONDENT b. Predictors: (Constant), RESPONDENTS SEX, AGE OF RESPONDENT, IS LIFE EXCITING OR DULL, HAPPINESS OF MARRIAGE, CONDITION OF HEALTH

The R Square Change statistic for the increase in R associated with the added variables (happiness of marriage, condition of health, and attitude toward life) is 0.361. Using a proportional reduction in error interpretation for R, information provided by the added variables reduces our error in predicting general happiness by 36.1%.

SW388R7 Data Analysis & Computers II Slide 39

REDUCTION IN ERROR IN PREDICTING DEPENDENT VARIABLE - 2


Model Summary Change Statistics

Model 1 2

R R Square .013 a .000 b .601 .361

Adjusted R Square -.023 .323

Std. Error of the Estimate .633 .515

R Square Change .000 .361

F Change .007 15.814

df1 2 3

df2 87 84

Sig. F Change .993 .000

a. Predictors: (Constant), RESPONDENTS SEX, AGE OF RESPONDENT b. Predictors: (Constant), RESPONDENTS SEX, AGE OF RESPONDENT, IS LIFE EXCITING OR DULL, HAPPINESS OF MARRIAGE, CONDITION OF HEALTH

The probability of the F statistic (15.814) for the change in R associated with the addition of the predictor variables to the regression analysis containing the control variables is <0.001, less than or equal to the level of significance of 0.05. We reject the null hypothesis that there is no improvement in the relationship between the set of independent variables and the dependent variable when the predictors are added (R Change = 0). We support the research hypothesis that there is a statistically significant improvement in the relationship between the set of independent variables and the dependent variable.

SW388R7 Data Analysis & Computers II Slide 40

RELATIONSHIP OF ADDED INDEPENDENT VARIABLES TO DEPENDENT VARIABLE - 1


Coefficientsa Unstandardized Coefficients B Std. Error 1.594 .341 .000 .005 .011 .140 .432 .341 -.001 .004 -.013 .115 Standardized Coefficients Beta

Model 1

(Constant) AGE OF RESPONDENT .012 RESPONDENTS SEX .008 (Constant) AGE OF RESPONDENT -.035 RESPONDENTS SEX -.010 HAPPINESS OF .599 .104 .517 5.741 If there MARRIAGE is a relationship between each added individual independent variable and the dependent variable, the probability CONDITION OF HEALTH of the b coefficient (slope of.131 regression .101 .072 1.408 of the statistical test the ISline) EXCITING ORthan or equal to the level of significance. The LIFE will be less .170 .108 .142 1.570 null DULL hypothesis for this test states that b is equal to zero,

t 4.677 .107 .078 1.268 -.385 -.113

Sig. .000 .915 .938 .208 .701 .911 .000 .163 .120

a. Dependent Variable: GENERAL HAPPINESS

indicating a flat regression line and no relationship.

If we reject the null hypothesis and find that there is a relationship between the variables, the sign of the b coefficient indicates the direction of the relationship for the data values. If b is greater than or equal to zero, the relationship is positive or direct. If b is less than zero, the relationship is negative or inverse. If the variable is dichotomous or ordinal, the direction of the coding must be taken into account to make a correct interpretation.

SW388R7 Data Analysis & Computers II Slide 41

RELATIONSHIP OF ADDED INDEPENDENT VARIABLES TO DEPENDENT VARIABLE - 2


Coefficientsa Unstandardized Coefficients B Std. Error 1.594 .341 .000 .005 .011 .140 .432 .341 -.001 .004 -.013 .115 .599 .101 .170 .104 .072 .108 Standardized Coefficients Beta .012 .008 -.035 -.010 .517 .131 .142

Model 1

(Constant) AGE OF RESPONDENT RESPONDENTS SEX (Constant) AGE OF RESPONDENT RESPONDENTS SEX HAPPINESS OF MARRIAGE CONDITION OF HEALTH IS LIFE EXCITING OR DULL

t 4.677 .107 .078 1.268 -.385 -.113 5.741 1.408 1.570

Sig. .000 .915 .938 .208 .701 .911 .000 .163 .120

a. Dependent Variable: GENERAL HAPPINESS

For the independent variable happiness of marriage, the probability of the t statistic (5.741) for the b coefficient is <0.001 which is less than or equal to the level of significance of 0.05.

We reject the null hypothesis that the slope associated with happiness of marriage is equal to zero (b = 0) and conclude that there is a statistically significant relationship between happiness of marriage and general happiness.

SW388R7 Data Analysis & Computers II Slide 42

RELATIONSHIP OF ADDED INDEPENDENT VARIABLES TO DEPENDENT VARIABLE - 3


Coefficientsa Unstandardized Coefficients B Std. Error 1.594 .341 .000 .005 .011 .140 .432 .341 -.001 .004 -.013 .115 .599 .101 .170 .104 .072 .108 Standardized Coefficients Beta .012 .008 -.035 -.010 .517 .131 .142

Model 1

The b coefficient associated with happiness a. Dependent Variable: GENERAL HAPPINESS of marriage (0.599) is positive, indicating a direct relationship in which higher numeric values for happiness of marriage are associated with higher numeric values for general happiness.

(Constant) AGE OF RESPONDENT RESPONDENTS SEX (Constant) AGE OF RESPONDENT RESPONDENTS SEX HAPPINESS OF MARRIAGE CONDITION OF HEALTH IS LIFE EXCITING OR DULL

t 4.677 .107 .078 1.268 -.385 -.113 5.741 1.408 1.570

Sig. .000 .915 .938 .208 .701 .911 .000 .163 .120

SW388R7 Data Analysis & Computers II Slide 43

RELATIONSHIP OF ADDED INDEPENDENT VARIABLES TO DEPENDENT VARIABLE - 4


The independent variable happiness of marriage is an ordinal variable that is coded so that higher numeric values are associated with survey respondents who were less happy with their marriages.

SW388R7 Data Analysis & Computers II Slide 44

RELATIONSHIP OF ADDED INDEPENDENT VARIABLES TO DEPENDENT VARIABLE - 5

The dependent variable general happiness is also an ordinal variable. It is coded so that higher numeric values are associated with survey respondents who were less happy overall.

Therefore, the positive value of b implies that survey respondents who were less happy with their marriages were less happy overall.

SW388R7 Data Analysis & Computers II Slide 45

RELATIONSHIP OF ADDED INDEPENDENT VARIABLES TO DEPENDENT VARIABLE - 6


Coefficientsa Unstandardized Coefficients B Std. Error 1.594 .341 .000 .005 .011 .140 .432 .341 -.001 .004 -.013 .115 .599 .101 .170 .104 .072 .108 Standardized Coefficients Beta .012 .008 -.035 -.010 .517 .131 .142

Model 1

(Constant) AGE OF RESPONDENT RESPONDENTS SEX (Constant) AGE OF RESPONDENT RESPONDENTS SEX HAPPINESS OF MARRIAGE CONDITION OF HEALTH IS LIFE EXCITING OR DULL

t 4.677 .107 .078 1.268 -.385 -.113 5.741 1.408 1.570

Sig. .000 .915 .938 .208 .701 .911 .000 .163 .120

a. Dependent Variable: GENERAL HAPPINESS

For the independent variable condition of health, the probability of the t statistic (1.408) for the b coefficient is 0.163 which is greater than the level of significance of 0.05. We fail to reject the null hypothesis that the slope associated with condition of health is equal to zero (b = 0) and conclude that there is not a statistically significant relationship between condition of health and general happiness. The statement in the problem that "survey respondents who said they were not as healthy were less happy overall" is incorrect.

SW388R7 Data Analysis & Computers II Slide 46

Answer to problem 2

The independent and dependent variables were metric or dichotomous. Some are ordinal. The ratio of cases to independent variables was 18.0 to 1. The overall relationship was statistically significant and its strength was characterized correctly. The change in R2 associated with adding the second block of variables was statistically significant and correctly interpreted. The b coefficient for happiness of marriage was statistically significant and correctly interpreted. The b coefficient for condition of health was not statistically significant. We cannot conclude that there was a relationship between condition of health and general happiness. The answer to the question is false.

SW388R7 Data Analysis & Computers II Slide 47

Problem 3 Stepwise Regression


26. In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data, violation of assumptions, or outliers, and that the split sample validation will confirm the generalizability of the results. Use a level of significance of 0.05. From the list of variables "number of hours worked in the past week" [hrs1], "occupational prestige score" [prestg80], "highest year of school completed" [educ], and "highest academic degree" [degree], the best predictors of "total family income" [income98] are "highest academic degree" [degree] and "occupational prestige score" [prestg80]. Highest academic degree and occupational prestige score have a moderate relationship to total family income. The most important predictor of total family income is occupational prestige score. The second most important predictor of total family income is highest academic degree. Survey respondents who had higher academic degrees had higher total family incomes. Survey respondents who had more prestigious occupations had higher total family incomes. 1. 2. 3. 4. True True with caution False Inappropriate application of a statistic

SW388R7 Data Analysis & Computers II Slide 48

Dissecting problem 3 - 1
The variables listed first in the The variable that to be problem statement are the 26. In the dataset GSS2000.sav, is the following statement true, false, or an incorrect predicted or related to is independent variables from which application of a statistic? Assume that there is no problemthe dependentdata, violation of with missing variable the computer will select the best that the split sample validation will confirm the assumptions, or outliers, and (dv): "total family income" subset using statistical criteria. [income98]

generalizability of the results. Use a level of significance of 0.05.

From the list of variables "number of hours worked in the past week" [hrs1], "occupational prestige score" [prestg80], "highest year of school completed" [educ], and "highest academic degree" [degree], the best predictors of "total family income" [income98] are "highest academic degree" [degree] and "occupational prestige score" [prestg80]. Highest academic degree and occupational prestige score have a moderate relationship to total family income. The most important predictor of total family income is occupational prestige score. The The best predictors are the variables secondthat will be meet predictor of total family income is highest academic degree. most important the statistical Survey respondents who had higher academic degrees had higher total family incomes. Survey respondents who had more prestigious occupations had higher total family incomes. 1. 2. 3. 4. True True with caution False Inappropriate application of a statistic
criteria for inclusion in the model.

SW388R7 Data Analysis & Computers II Slide 49

Dissecting problem 3 - 2
In statement true, false, or an we 26. In the dataset GSS2000.sav, is the followingorder for a problem to be true,incorrect will problem application of a statistic? Assume that there is nohave find:with missing data, violation of a statistically significant relationship assumptions, or outliers, and that the split sample validation will confirm the dv between the included ivs and the generalizability of the results. Use a level of significance of 0.05. correct strength a relationship of the

From the list of variables "number of hours worked in the past week" [hrs1], "occupational prestige score" [prestg80], "highest year of school completed" [educ], and "highest academic degree" [degree], the best predictors of "total family income" [income98] are "highest academic degree" [degree] and "occupational prestige score" [prestg80]. Highest academic degree and occupational prestige score have a moderate relationship to total family income. The most important predictor of total family income is occupational prestige score. The second most important predictor of total family income is highest academic degree. Survey respondents who had higher academic degrees had higher total family incomes. Survey respondents who had of the prestigious occupations had higher total family incomes. The importance more variables is 1. 2. 3. 4. True True with caution False Inappropriate application of a statistic
provided by the stepwise order of entry of the variable into the regression analysis.

SW388R7 Data Analysis & Computers II Slide 50

Dissecting problem 3 - 3
26. In the dataset GSS2000.sav, is the following statement true, false, or an incorrect application of a statistic? Assume that there is no problem with missing data, violation of assumptions, or outliers, and that the split sample validation willthe independent The relationship between each of confirm the variables of significance of 0.05. generalizability of the results. Use a levelentered after the control variables and

the dependent variable must be statistically significant worked in the correctly. From the list of variables "number of hours and interpreted past week" [hrs1], "occupational

prestige score" [prestg80], "highest year of school completed" [educ], and "highest academic Since statistical significance of a variable's degree" [degree], the best predictors of "total family income" [income98] are the "highest contribution toward explaining the variance in academic degree" [degree] and "occupational prestigealmost always used Highest academic dependent variable is score" [prestg80]. as the degree and occupational prestige score have a moderatestatistical significance family criteria for inclusion, the relationship to total of the relationships is usually assured. income. The most important predictor of total family income is occupational prestige score. The second most important predictor of total family income is highest academic degree. Survey respondents who had higher academic degrees had higher total family incomes. Survey respondents who had more prestigious occupations had higher total family incomes. 1. 2. 3. 4. True True with caution False Inappropriate application of a statistic

SW388R7 Data Analysis & Computers II Slide 51

Request a stepwise multiple regression

To compute a multiple regression in SPSS, select the Regression | Linear command from the Analyze menu.

SW388R7 Data Analysis & Computers II Slide 52

Specify variables and method for selecting variables


First, move the dependent variable income98 to the Dependent text box.

Second, move the independent variables to control for hrs1, prestg80, educ, and degree to the Independent(s) list box.

Third, select the Stepwise method for entering the variables into the analysis from the drop down Method menu.

SW388R7 Data Analysis & Computers II Slide 53

Open statistics options dialog box

First, click on the Statistics button to specify the statistics options that we want.

SW388R7 Data Analysis & Computers II Slide 54

Specify the statistics output options

First, mark the checkboxes for Estimates on the Regression Coefficients panel.

Second, mark the checkboxes for Model Fit and Descriptives.

Third, click on the Continue button to close the dialog box.

SW388R7 Data Analysis & Computers II Slide 55

Request the regression output

Click on the OK button to request the regression output.

SW388R7 Data Analysis & Computers II Slide 56

LEVEL OF MEASUREMENT
Multiple regression requires that the dependent variable be metric and the independent variables be metric or dichotomous. "Total family income" [income98] is an ordinal level variable, which satisfies the level of measurement requirement if we follow the convention of treating ordinal level variables as metric variables. Since some data analysts do not agree with this convention, a note of caution should be included in our interpretation.

"Number of hours worked in the past week" [hrs1], "occupational prestige score" [prestg80], and "highest year of school completed" [educ] are interval level variables, which satisfies the level of measurement requirements for multiple regression analysis.
"Highest academic degree" [degree] is an ordinal level variable. If we follow the convention of treating ordinal level variables as metric variables, the level of measurement requirement for multiple regression analysis is satisfied. Since some data analysts do not agree with this convention, a note of caution should be included in our interpretation.

SW388R7 Data Analysis & Computers II Slide 57

SAMPLE SIZE
Descriptiv e Statistics TOTAL FAMILY INCOME NUMBER OF HOURS WORKED LAST WEEK RS OCCUPATIONAL PRESTIGE SCORE (1980) HIGHEST YEAR OF SCHOOL COMPLETED RS HIGHEST DEGREE Mean 17.06 41.45 45.64 14.00 1.74 Std. Deviation 4.130 12.076 14.183 2.587 1.159 N 151 151 151 151 151

The minimum ratio of valid cases to independent variables for stepwise multiple regression is 5 to 1. With 151 valid cases and 4 independent variables, the ratio for this analysis is 37.75 to 1, which satisfies the minimum requirement. However, the ratio of 37.75 to 1 does not satisfy the preferred ratio of 50 to 1. A caution should be added to the interpretation of the analysis and a split sample validation should be conducted.

SW388R7 Data Analysis & Computers II Slide 58

RELATIONSHIP BETWEEN BEST PREDICTORS AND THE DEPENDENT VARIABLE - 1


a Variables Entered/Remov ed

Model 1

Variables Entered

Variables Removed

The best subset of predictors for total family income included the independent variables: highest academic degree and occupational prestige score.
2

RS HIGHEST DEGREE

RS OCCUPATI ONAL PRESTIGE SCORE (1980)

Method Stepwise (Criteria: Probabilit y-of-F-to-e nter <= .050, Probabilit y-of-F-to-r emove >= .100). Stepwise (Criteria: Probabilit y-of-F-to-e nter <= .050, Probabilit y-of-F-to-r emove >= .100).

a. Dependent Variable: TOTAL FAMILY INCOME

SW388R7 Data Analysis & Computers II Slide 59

RELATIONSHIP BETWEEN BEST PREDICTORS AND THE DEPENDENT VARIABLE - 2


The probability of the F statistic (29.146) for the regression relationship which includes these variables is <0.001, less than or equal to the level of significance of 0.05. We reject the null hypothesis that there is no relationship between the best subset of independent variables and the dependent variable (R = 0). We support the research hypothesis that there is a statistically significant relationship between the best subset of independent variables and the dependent variable.

ANOVAc Model 1 Sum of Squares 620.049 1938.415 2558.464 722.947 1835.517 2558.464 df 1 149 150 2 148 150 Mean Square 620.049 13.009 361.473 12.402 F 47.661 Sig. .000 a

Regression Residual Total Regression Residual Total

29.146

.000 b

a. Predictors: (Constant), RS HIGHEST DEGREE b. Predictors: (Constant), RS HIGHEST DEGREE, RS OCCUPATIONAL PRESTIGE SCORE (1980) c. Dependent Variable: TOTAL FAMILY INCOME

SW388R7 Data Analysis & Computers II Slide 60

RELATIONSHIP BETWEEN BEST PREDICTORS AND THE DEPENDENT VARIABLE - 3


Model Summary Model 1 2 R R Square a .492 .242 b .532 .283 Adjusted R Square .237 .273 Std. Error of the Estimate 3.607 3.522

a. Predictors: (Constant), RS HIGHEST DEGREE b. Predictors: (Constant), RS HIGHEST DEGREE, RS OCCUPATIONAL PRESTIGE SCORE (1980)

The Multiple R for the relationship between the subset of independent variables that best predict the dependent variable is 0.532, which would be characterized as moderate using the rule of thumb than a correlation less than or equal to 0.20 is characterized as very weak; greater than 0.20 and less than or equal to 0.40 is weak; greater than 0.40 and less than or equal to 0.60 is moderate; greater than 0.60 and less than or equal to 0.80 is strong; and greater than 0.80 is very strong.

SW388R7 Data Analysis & Computers II Slide 61

RELATIONSHIP BETWEEN BEST PREDICTORS AND THE DEPENDENT VARIABLE - 4


a Variables Entered/Remov ed

Based on the table of "Variables Entered/ Removed," the most important predictor of total family income is highest academic degree. The second most important predictor of total family income is occupational prestige score. The importance of the predictors stated in the problem is not correct.

Model 1

Variables Entered

Variables Removed

RS HIGHEST DEGREE

2 RS OCCUPATI ONAL PRESTIGE SCORE (1980)

Method Stepwise (Criteria: Probabilit y-of-F-to-e nter <= .050, Probabilit y-of-F-to-r emove >= .100). Stepwise (Criteria: Probabilit y-of-F-to-e nter <= .050, Probabilit y-of-F-to-r emove >= .100).

a. Dependent Variable: TOTAL FAMILY INCOME

SW388R7 Data Analysis & Computers II Slide 62

Answer to problem 3
The independent and dependent variables were metric, interval or ordinal. The ratio of cases to independent variables was 37.75 to 1. The relationship of the included variables was statistically significant and the strength of the relationship was characterized correctly. However, the order of entry, or importance, was not stated correctly in the problem. The answer to the question is false.

SW388R7 Data Analysis & Computers II Slide 63

Standard multiple regression - 1


The following is a guide to the decision process for answering problems about standard multiple regression analysis:
Dependent variable metric? Independent variables metric or dichotomous?

No

Inappropriate application of a statistic

Yes

Ratio of cases to independent variables at least 5 to 1?

No

Inappropriate application of a statistic

Yes No

Probability of ANOVA test of regression less than/equal to level of significance?

False

Yes

SW388R7 Data Analysis & Computers II Slide 64

Standard multiple regression - 2

Strength of relationship for included variables interpreted correctly?

No

False

Yes

Probability of relationship between each IV and DV <= level of significance?

No

False

Yes No

Direction of relationship between each IV and DV interpreted correctly?

False

Yes

SW388R7 Data Analysis & Computers II Slide 65

Standard multiple regression - 3

Any independent variable or dependent variable ordinal level of measurement?

Yes

True with caution

No

Ratio of cases to independent variables at preferred sample size of at least 15 to 1?

No

True with caution

Yes True

SW388R7 Data Analysis & Computers II Slide 66

Hierarchical regression - 1
The following is a guide to the decision process for answering problems about hierarchical regression analysis:
Dependent variable metric? Independent variables metric or dichotomous?

No

Inappropriate application of a statistic

Yes

Ratio of cases to independent variables at least 5 to 1?

No

Inappropriate application of a statistic

Yes
Probability of ANOVA test of regression less than/equal to level of significance?

No

False

Yes

SW388R7 Data Analysis & Computers II Slide 67

Hierarchical regression - 2

Probability of F test of for change in R less than or equal to level of significance?

No

False

Yes

Change in R correctly reported and interpreted?

No

False

Yes
Probability of relationship between each IV added after controls and DV less than or equal to level of significance?

No

False

Yes

SW388R7 Data Analysis & Computers II Slide 68

Hierarchical regression - 3

Direction of relationship between each IV added after controls and DV interpreted correctly?

No

False

Yes

Any independent variable or dependent variable ordinal level of measurement?

Yes

True with caution

No

Ratio of cases to independent variables at preferred sample size of at least 15 to 1?

No

True with caution

Yes True

SW388R7 Data Analysis & Computers II Slide 69

Stepwise regression - 1
The following is a guide to the decision process for answering problems about stepwise regression analysis:
Dependent variable metric? Independent variables metric or dichotomous?

No

Inappropriate application of a statistic

Yes

Ratio of cases to independent variables at least 5 to 1?

No

Inappropriate application of a statistic

Yes No

Is the list of independent variables selected for inclusion correct?

False

Yes

SW388R7 Data Analysis & Computers II Slide 70

Stepwise regression - 2

Probability of ANOVA test of regression less than/equal to level of significance?

No

False

Yes

Strength of relationship for included variables interpreted correctly?

No

False

Yes No

Is the stated order of importance independent variables correct?

False

Yes

SW388R7 Data Analysis & Computers II Slide 71

Stepwise regression - 3
Yes

Probability of relationship between each included IV and DV less than or equal to level of significance?

No

False

Yes

Direction of relationship between each included IV and DV interpreted correctly?

No

False

Yes

SW388R7 Data Analysis & Computers II Slide 72

Stepwise regression - 4

Yes

Any independent variable or dependent variable ordinal level of measurement?

Yes

True with caution

No

Ratio of cases to independent variables at preferred sample size of at least 50 to 1?

No

True with caution

Yes True

You might also like