Regression Lecture 4

MULTIPLE LINEAR REGRESSION Selecting best regression
In either multiple regression or polynomial regression, if a large number of predictor

variables are available, we are usually faced with the problem of selecting a subset
of these variables which will provide the best model with the minimum number of
predictor variables.
Minitab has available four methods for selecting the “best” subset, these being
Considering all possible regressions
Forward selection
Backward elimination
Stepwise regression
which are applied to the cement composition data set below.
Woods, H. (1932) Effects of composition of Portland cement on heat evolved

during hardening Industrial and Engineering Chemistry 24 1207–14
X1 percentage of tricalcium aluminate

X2 percentage of tricalcium silicate
X3 percentage of tetracalcium alumino ferrite
X4 percentage of dicalcium silicate
X5 heat evolved in calories per gram of cement
Row X1 X2 X3 X4 X5
1 7 26 6 60 78.5
2 1 29 15 52 74.3
3 11 56 8 20 104.3
4 11 31 8 47 87.6
5 7 52 6 33 95.9
6 11 55 9 22 109.2
7 3 71 17 6 102.7
8 1 31 22 44 72.5
9 2 54 18 22 93.1
10 21 47 4 26 115.9
11 1 40 23 34 83.8
12 11 66 9 12 113.3
13 10 68 8 12 109.4
4.1
ALL POSSIBLE REGRESSIONS Introduction
The selection of a particular subset of predictor variables must be based on some

criterion for comparing subsets, a particular subset being preferred if it gives
predictions that are in some sense better. A measure of the fit of a subset model
can be defined by
1 X
n
Jp = 2 mse(ŷi )
σ i=1
where mse(ŷi ) is the mean square error for each fitted value. Better subsets have
smaller values of Jp , and one of the simplest estimates of Jp is Mallows’ Cp which
for a model with p parameters can be expressed as
RSSp
Cp = + 2p − n
σ̂ 2
RSSp − RSSk′
= + p − (k ′ − p)
σ̂ 2
= (k ′ − p)(Fp − 1) + p
where σ̂ 2 is from the full model and Fp is the F statistic for testing that the
predictors left out of the subset model but included in the full model have zero
coefficients.
Cp is a measure of the differences in fitting errors between the full model and a
particular subset model. For the full model Ck′ = k ′ and Mallows suggests that
good models have Cp ≃ p.
Using the appropriate commands or menu options, Minitab will produce a list giving
for the various subsets of predictor variables the values of R2 , S and Cp . The subset
model is chosen on the basis of the value of Cp , a large value of R2 and a small
value of S.
4.2
ALL POSSIBLE REGRESSIONS Minitab output
Variables R2
X4 67.5
X2 66.6
X1 53.4
X3 28.6
X1 X2 97.9
X1 X4 97.2
X3 X4 93.5
X2 X3 84.7
X2 X4 68.0
X1 X3 54.8
X1 X2 X4 98.234
X1 X2 X3 98.228
X1 X3 X4 98.1
X2 X3 X4 97.3
X1 X2 X3 X4 98.2
MTB > BReg ’X5’ ’X1’ ’X2’ ’X3’ ’X4’ ;

SUBC> NVars 1 4;
SUBC> Best 2;
SUBC> Constant.
Best Subsets Regression: X5 versus X1, X2, X3, X4
Response is X5
Mallows X X X X
Vars R-Sq R-Sq(adj) C-p S 1 2 3 4
1 67.5 64.5 138.7 8.9639 X
1 66.6 63.6 142.5 9.0771 X
2 97.9 97.4 2.7 2.4063 X X
2 97.2 96.7 5.5 2.7343 X X
3 98.2 97.6 3.0 2.3087 X X X
3 98.2 97.6 3.0 2.3121 X X X
4 98.2 97.4 5.0 2.4460 X X X X
4.3
FORWARD SELECTION Introduction
In simple linear regression, the relationship between Y and X1 is measured by the

Pearson Correlation coefficient rY X1 . The relationship between Y and X1 , adjusted
for X2 is the partial correlation between Y and X1 adjusted for X2 and is denoted by
rY X1 |X2 . The partial correlation coefficient is calculated as the correlation coefficient
between the residuals in the regression of Y on X2 and the residuals in the regression
of X1 on X2 .
The forward selection method begins by selecting as the first predictor the one which
has the highest correlation with the response. As long as this predictor is significant,
the predictor, of those remaining, which has the highest partial correlation with the
response is added to the model as long as it has a significant sequential sum of
squares in the regression. Further predictors are added to the model in the same
way, the process stopping when the sequential sum of squares of a predictor is no
longer significant.
For a sequential sum of squares to be considered significant, its p–value must be less
than a specified significance level which in Minitab is the value of Alpha to enter
in the Stepwise regression menu. This value is initially set by Minitab to a default
value of 0.25.
4.4
FORWARD SELECTION Minitab output
MTB > corr c1-c5
Correlations (Pearson)
X1 X2 X3 X4
X2 0.229
X3 -0.824 -0.139
X4 -0.245 -0.973 0.030
X5 0.731 0.816 -0.535 -0.821
MTB > regr c5 1 c4
Regression Analysis
The regression equation is

X5 = 118 - 0.738 X4
Predictor Coef StDev T P

Constant 117.568 5.262 22.34 0.000
X4 -0.7382 0.1546 -4.77 0.000
S = 8.964 R-Sq = 67.5% R-Sq(adj) = 64.5%
Analysis of Variance
Source DF SS MS F P
Regression 1 1831.9 1831.9 22.80 0.000
Error 11 883.9 80.4
Total 12 2715.8
4.5
FORWARD SELECTION Minitab output
Stepwise Regression: X5 versus X1, X2, X3, X4
Forward selection. Alpha-to-Enter: 0.25
Response is X5 on 4 predictors, with N = 13
Step 1 2 3
Constant 117.57 103.10 71.65
X4 -0.738 -0.614 -0.237

T-Value -4.77 -12.62 -1.37
P-Value 0.001 0.000 0.205
X1 1.44 1.45
T-Value 10.40 12.41
P-Value 0.000 0.000
X2 0.42
T-Value 2.24
P-Value 0.052
S 8.96 2.73 2.31

R-Sq 67.45 97.25 98.23
R-Sq(adj) 64.50 96.70 97.64
Mallows C-p 138.7 5.5 3.0
4.6
BACKWARD ELIMINATION Introduction
In backward elimination, all predictors are placed in the model at the start, and
then progressively removed. The criteria at any stage for removing a predictor are
that of all the predictors in the model, it has the smallest sequential sum of squares
of all the predictors if each of them were fitted as the last predictor to be fitted in
the model and furthermore, this predictor does not have a significant sequential sum
of squares. The process stops when all predictors, if they were the last predictor
fitted in the model, have a significant sequential sum of squares.
A sequential sum of squares is considered to be not significant if its p–value is greater

than the value specified in Minitab for Alpha to remove. The default value in
Minitab for Alpha to remove is 0.1.
4.7
BACKWARD ELIMINATION Minitab output
Backward elimination. Alpha-to-Remove: 0.1
Step 1 2 3
Constant 62.41 71.65 52.58
X1 1.55 1.45 1.47

T-Value 2.08 12.41 12.10
P-Value 0.071 0.000 0.000
X2 0.510 0.416 0.662

T-Value 0.70 2.24 14.44
P-Value 0.501 0.052 0.000
X3 0.10
T-Value 0.14
P-Value 0.896
X4 -0.14 -0.24
T-Value -0.20 -1.37
P-Value 0.844 0.205
S 2.45 2.31 2.41

R-Sq 98.24 98.23 97.87
R-Sq(adj) 97.36 97.64 97.44
Mallows C-p 5.0 3.0 2.7
4.8
STEPWISE Introduction and Minitab output
Stepwise regression combines aspects of forward selection and backward elimination,

starting off by adding variables in the same way as forward selection, but at each
step which has two or more predictors in the model, allowing for one of these to be
removed if its sequential sum of squares is not significant.
Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15
Step 1 2 3 4
Constant 117.57 103.10 71.65 52.58
X4 -0.738 -0.614 -0.237

T-Value -4.77 -12.62 -1.37
P-Value 0.001 0.000 0.205
X1 1.44 1.45 1.47

T-Value 10.40 12.41 12.10
P-Value 0.000 0.000 0.000
X2 0.416 0.662
T-Value 2.24 14.44
P-Value 0.052 0.000
S 8.96 2.73 2.31 2.41

R-Sq 67.45 97.25 98.23 97.87
R-Sq(adj) 64.50 96.70 97.64 97.44
Mallows C-p 138.7 5.5 3.0 2.7
4.9

Regression Lecture 4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression Lecture 4

Uploaded by

Copyright:

Available Formats

MULTIPLE LINEAR REGRESSION Selecting best regression

In either multiple regression or polynomial regression, if a large number of predictor

which are applied to the cement composition data set below.

Woods, H. (1932) Effects of composition of Portland cement on heat evolved

X1 percentage of tricalcium aluminate

X5 heat evolved in calories per gram of cement

The selection of a particular subset of predictor variables must be based on some

MTB > BReg ’X5’ ’X1’ ’X2’ ’X3’ ’X4’ ;

Best Subsets Regression: X5 versus X1, X2, X3, X4

In simple linear regression, the relationship between Y and X1 is measured by the

MTB > corr c1-c5

MTB > regr c5 1 c4

The regression equation is

Predictor Coef StDev T P

S = 8.964 R-Sq = 67.5% R-Sq(adj) = 64.5%

Stepwise Regression: X5 versus X1, X2, X3, X4

Forward selection. Alpha-to-Enter: 0.25

Response is X5 on 4 predictors, with N = 13

X4 -0.738 -0.614 -0.237

S 8.96 2.73 2.31

A sequential sum of squares is considered to be not significant if its p–value is greater

Stepwise Regression: X5 versus X1, X2, X3, X4

Backward elimination. Alpha-to-Remove: 0.1

Response is X5 on 4 predictors, with N = 13

X1 1.55 1.45 1.47

X2 0.510 0.416 0.662

S 2.45 2.31 2.41

Stepwise regression combines aspects of forward selection and backward elimination,

Stepwise Regression: X5 versus X1, X2, X3, X4

Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15

Response is X5 on 4 predictors, with N = 13

X4 -0.738 -0.614 -0.237

X1 1.44 1.45 1.47

S 8.96 2.73 2.31 2.41

You might also like