You are on page 1of 9

MULTIPLE LINEAR REGRESSION Selecting best regression

In either multiple regression or polynomial regression, if a large number of predictor


variables are available, we are usually faced with the problem of selecting a subset
of these variables which will provide the best model with the minimum number of
predictor variables.

Minitab has available four methods for selecting the “best” subset, these being
Considering all possible regressions
Forward selection
Backward elimination
Stepwise regression

which are applied to the cement composition data set below.

Woods, H. (1932) Effects of composition of Portland cement on heat evolved


during hardening Industrial and Engineering Chemistry 24 1207–14

X1 percentage of tricalcium aluminate


X2 percentage of tricalcium silicate
X3 percentage of tetracalcium alumino ferrite
X4 percentage of dicalcium silicate

X5 heat evolved in calories per gram of cement

Row X1 X2 X3 X4 X5

1 7 26 6 60 78.5
2 1 29 15 52 74.3
3 11 56 8 20 104.3
4 11 31 8 47 87.6
5 7 52 6 33 95.9
6 11 55 9 22 109.2
7 3 71 17 6 102.7
8 1 31 22 44 72.5
9 2 54 18 22 93.1
10 21 47 4 26 115.9
11 1 40 23 34 83.8
12 11 66 9 12 113.3
13 10 68 8 12 109.4

4.1
ALL POSSIBLE REGRESSIONS Introduction

The selection of a particular subset of predictor variables must be based on some


criterion for comparing subsets, a particular subset being preferred if it gives
predictions that are in some sense better. A measure of the fit of a subset model
can be defined by
1 X
n
Jp = 2 mse(ŷi )
σ i=1

where mse(ŷi ) is the mean square error for each fitted value. Better subsets have
smaller values of Jp , and one of the simplest estimates of Jp is Mallows’ Cp which
for a model with p parameters can be expressed as

RSSp
Cp = + 2p − n
σ̂ 2
RSSp − RSSk′
= + p − (k ′ − p)
σ̂ 2
= (k ′ − p)(Fp − 1) + p

where σ̂ 2 is from the full model and Fp is the F statistic for testing that the
predictors left out of the subset model but included in the full model have zero
coefficients.

Cp is a measure of the differences in fitting errors between the full model and a
particular subset model. For the full model Ck′ = k ′ and Mallows suggests that
good models have Cp ≃ p.

Using the appropriate commands or menu options, Minitab will produce a list giving
for the various subsets of predictor variables the values of R2 , S and Cp . The subset
model is chosen on the basis of the value of Cp , a large value of R2 and a small
value of S.

4.2
ALL POSSIBLE REGRESSIONS Minitab output

Variables R2

X4 67.5
X2 66.6
X1 53.4
X3 28.6

X1 X2 97.9
X1 X4 97.2
X3 X4 93.5
X2 X3 84.7
X2 X4 68.0
X1 X3 54.8

X1 X2 X4 98.234
X1 X2 X3 98.228
X1 X3 X4 98.1
X2 X3 X4 97.3

X1 X2 X3 X4 98.2

MTB > BReg ’X5’ ’X1’ ’X2’ ’X3’ ’X4’ ;


SUBC> NVars 1 4;
SUBC> Best 2;
SUBC> Constant.

Best Subsets Regression: X5 versus X1, X2, X3, X4

Response is X5

Mallows X X X X
Vars R-Sq R-Sq(adj) C-p S 1 2 3 4
1 67.5 64.5 138.7 8.9639 X
1 66.6 63.6 142.5 9.0771 X
2 97.9 97.4 2.7 2.4063 X X
2 97.2 96.7 5.5 2.7343 X X
3 98.2 97.6 3.0 2.3087 X X X
3 98.2 97.6 3.0 2.3121 X X X
4 98.2 97.4 5.0 2.4460 X X X X

4.3
FORWARD SELECTION Introduction

In simple linear regression, the relationship between Y and X1 is measured by the


Pearson Correlation coefficient rY X1 . The relationship between Y and X1 , adjusted
for X2 is the partial correlation between Y and X1 adjusted for X2 and is denoted by
rY X1 |X2 . The partial correlation coefficient is calculated as the correlation coefficient
between the residuals in the regression of Y on X2 and the residuals in the regression
of X1 on X2 .

The forward selection method begins by selecting as the first predictor the one which
has the highest correlation with the response. As long as this predictor is significant,
the predictor, of those remaining, which has the highest partial correlation with the
response is added to the model as long as it has a significant sequential sum of
squares in the regression. Further predictors are added to the model in the same
way, the process stopping when the sequential sum of squares of a predictor is no
longer significant.

For a sequential sum of squares to be considered significant, its p–value must be less
than a specified significance level which in Minitab is the value of Alpha to enter
in the Stepwise regression menu. This value is initially set by Minitab to a default
value of 0.25.

4.4
FORWARD SELECTION Minitab output

MTB > corr c1-c5

Correlations (Pearson)

X1 X2 X3 X4
X2 0.229
X3 -0.824 -0.139
X4 -0.245 -0.973 0.030
X5 0.731 0.816 -0.535 -0.821

MTB > regr c5 1 c4

Regression Analysis

The regression equation is


X5 = 118 - 0.738 X4

Predictor Coef StDev T P


Constant 117.568 5.262 22.34 0.000
X4 -0.7382 0.1546 -4.77 0.000

S = 8.964 R-Sq = 67.5% R-Sq(adj) = 64.5%

Analysis of Variance

Source DF SS MS F P
Regression 1 1831.9 1831.9 22.80 0.000
Error 11 883.9 80.4
Total 12 2715.8

4.5
FORWARD SELECTION Minitab output

Stepwise Regression: X5 versus X1, X2, X3, X4

Forward selection. Alpha-to-Enter: 0.25

Response is X5 on 4 predictors, with N = 13

Step 1 2 3
Constant 117.57 103.10 71.65

X4 -0.738 -0.614 -0.237


T-Value -4.77 -12.62 -1.37
P-Value 0.001 0.000 0.205

X1 1.44 1.45
T-Value 10.40 12.41
P-Value 0.000 0.000

X2 0.42
T-Value 2.24
P-Value 0.052

S 8.96 2.73 2.31


R-Sq 67.45 97.25 98.23
R-Sq(adj) 64.50 96.70 97.64
Mallows C-p 138.7 5.5 3.0

4.6
BACKWARD ELIMINATION Introduction

In backward elimination, all predictors are placed in the model at the start, and
then progressively removed. The criteria at any stage for removing a predictor are
that of all the predictors in the model, it has the smallest sequential sum of squares
of all the predictors if each of them were fitted as the last predictor to be fitted in
the model and furthermore, this predictor does not have a significant sequential sum
of squares. The process stops when all predictors, if they were the last predictor
fitted in the model, have a significant sequential sum of squares.

A sequential sum of squares is considered to be not significant if its p–value is greater


than the value specified in Minitab for Alpha to remove. The default value in
Minitab for Alpha to remove is 0.1.

4.7
BACKWARD ELIMINATION Minitab output

Stepwise Regression: X5 versus X1, X2, X3, X4

Backward elimination. Alpha-to-Remove: 0.1

Response is X5 on 4 predictors, with N = 13

Step 1 2 3
Constant 62.41 71.65 52.58

X1 1.55 1.45 1.47


T-Value 2.08 12.41 12.10
P-Value 0.071 0.000 0.000

X2 0.510 0.416 0.662


T-Value 0.70 2.24 14.44
P-Value 0.501 0.052 0.000

X3 0.10
T-Value 0.14
P-Value 0.896

X4 -0.14 -0.24
T-Value -0.20 -1.37
P-Value 0.844 0.205

S 2.45 2.31 2.41


R-Sq 98.24 98.23 97.87
R-Sq(adj) 97.36 97.64 97.44
Mallows C-p 5.0 3.0 2.7

4.8
STEPWISE Introduction and Minitab output

Stepwise regression combines aspects of forward selection and backward elimination,


starting off by adding variables in the same way as forward selection, but at each
step which has two or more predictors in the model, allowing for one of these to be
removed if its sequential sum of squares is not significant.

Stepwise Regression: X5 versus X1, X2, X3, X4

Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15

Response is X5 on 4 predictors, with N = 13

Step 1 2 3 4
Constant 117.57 103.10 71.65 52.58

X4 -0.738 -0.614 -0.237


T-Value -4.77 -12.62 -1.37
P-Value 0.001 0.000 0.205

X1 1.44 1.45 1.47


T-Value 10.40 12.41 12.10
P-Value 0.000 0.000 0.000

X2 0.416 0.662
T-Value 2.24 14.44
P-Value 0.052 0.000

S 8.96 2.73 2.31 2.41


R-Sq 67.45 97.25 98.23 97.87
R-Sq(adj) 64.50 96.70 97.64 97.44
Mallows C-p 138.7 5.5 3.0 2.7

4.9

You might also like