Professional Documents
Culture Documents
A variable will contribute to the observed variability of the response variable if its
coefficient in the regression equation is non–zero and so we test the null hypothesis
H0 : β1 = . . . = βk = 0 against the alternative H1 : βi 6= 0 for some i = 1, 2, . . . , k.
Total n−1 SY Y
3.1
MULTIPLE LINEAR REGRESSION Sequential Sums of Squares
the regression sum of squares can be partitioned into components which measure the
contributions to the reduction in the error variability due to each of the predictors.
These sequential sums of squares are obtained by entering the predictor variables
into Minitab in the order X1 , X2 , X3 and so on. If a sequential sum of squares such
as SS(β2 |β0 , β3 ) were wanted, the predictor variables would have to be entered into
Minitab starting with X3 and followed by X2 .
3.2
MULTIPLE LINEAR REGRESSION Example
Nurses are given an aptitude test on entrance to nursing school and the scores
recorded (X1 ). They were given a hospital final examination (X2 ) just prior to a
State Board Examination (Y ). It is desired to find a relationship between the State
Board result and the previous marks. The following data are available.
Y X1 X2
450 82 87
468 88 88
457 89 84
505 74 89
495 99 90
525 75 91
525 80 92
540 89 93
525 86 94
530 67 94
If values of the response variable and stored in c1 and the values for the two
predictor variables are in columns c2 and c3 of the worksheet, the regression
coefficients, the analysis of variance and the residual plots are obtained using the
menu system with the following steps.
For this example, the simple linear regressions of Y on X1 and Y on X2 are also
obtained.
3.3
MULTIPLE LINEAR REGRESSION Example
Analysis of Variance
Source DF SS MS F P
Regression 1 1121 1121 1.04 0.338
Residual Error 8 8637 1080
Total 9 9758
Analysis of Variance
Source DF SS MS F P
Regression 1 8063.6 8063.6 38.07 0.000
Residual Error 8 1694.4 211.8
Total 9 9758.0
3.4
MULTIPLE LINEAR REGRESSION Example
Analysis of Variance
Source DF SS MS F P
Regression 2 8109.6 4054.8 17.22 0.002
Residual Error 7 1648.4 235.5
Total 9 9758.0
Source DF Seq SS
X1 1 1121.4
X2 1 6988.2
3.5
MULTIPLE LINEAR REGRESSION Example
Analysis of Variance
Source DF SS MS F P
Regression 2 8109.6 4054.8 17.22 0.002
Residual Error 7 1648.4 235.5
Total 9 9758.0
Source DF Seq SS
X2 1 8063.6
X1 1 46.0
3.6
MULTIPLE LINEAR REGRESSION Coefficient of Determination
In simple linear regression, the total sum of squares (TSS) is a measure of the
variability in the response (Y) with no account of the predictor (X). The residual
sum of squares (RSS) measures the variability in the response when the predictor
is used and the reduction in variability is given by TSS - RSS = RegSS.
A measure of the effect of the predictor in reducing the variation in the response is
the reduction in the variation in Y as a proportion of the total variation, that is,
RegSS RSS
R2 = =1− .
TSS TSS
If any terms are added to the model, the value of R2 will be increased even if these
terms do not aid in the prediction of the response variable.
does not necessarily increase if more terms are added to the model and in comparing
2
models the one with the largest Radj is usually chosen.
3.7
MULTIPLE LINEAR REGRESSION Mulitcollinearity
First, when a predictor variable is added to the regression model, and the predictor
variable is related to the predictor variables already in the model, the least squares
estimates of the regression parameters will change. That is, the least squares
estimates of the regression parameters depend upon which predictors have been
included in the model and a physical interpretation of a regression parameter
becomes uncertain as a unit change in one predictor variable, holding the other
predictor variables constant is not possible.
Secondly, the significance of any predictor variable with respect to the response
variable is determined by the value of its corresponding t–statistic. When multi-
collinearity exists, predictor variables (which are correlated) contribute redundant
information and this can lead to a reduction in the value of the t–statistic obtained
by fitting the regression using the full set of correlated predictor variables compared
with the values of the t–statistics obtained by fitting a regression with a subset of
the predictor variables. This can cause some of the predictor variables to appear
less important in the regression model and in extreme cases every predictor variable
can appear non–significant whereas some of the predictor variables do accurately
predict the response.
3.8
MULTIPLE LINEAR REGRESSION Mulitcollinearity
1
V IFi =
1 − Ri2
3.9