Regression

MINITAB Users Guide 2 2-1
MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

2
Regression
Regression Overview, 2-2
I Regression, 2-3
I Stepwise Regression, 2-14
I Best Subsets Regression, 2-20
I Fitted Line Plot, 2-24
I Residual Plots, 2-27
Logistic Regression Overview, 2-29
I Binary Logistic Regression, 2-33
I Ordinal Logistic Regression, 2-44
I Nominal Logistic Regression, 2-51
See also,
I Resistant Line, Chapter 8
I Regression with Life Data, Chapter 16
STREGRSN.MK5 Page 1 Friday, December 17, 1999 12:23 PM
Chapter 2 Regression Overview
2-2 MINITAB Users Guide 2
Regression Overview
Regression analysis is used to investigate and model the relationship between a
response variable and one or more predictors. MINITAB provides various least-squares
and logistic regression procedures.
I Use least squares regression when your response variable is continuous.
I Use logistic regression when your response variable is categorical.
Both least squares and logistic regression methods estimate parameters in the model so that the
fit of the model is optimized. Least squares minimizes the sum of squared errors to obtain
parameter estimates, whereas MINITABs logistic regression commands obtain maximum
likelihood estimates of the parameters. See Logistic Regression Overview on page 2-29 for
more information about logistic regression.
Use the table below to assist in selecting a procedure:
Use to
response
type
estimation
method
Regression
(page 2-3)
perform simple or multiple regression: fit a model,
store regression statistics, examine residual diagnostics,
generate point estimates, generate prediction and
confidence intervals, and perform lack-of-fit tests
continuous least squares
Stepwise
(page 2-14)
perform stepwise, forward selection, or backward
elimination which add or remove variables from a
model in order to identify a useful subset of predictors
Best Subsets
(page 2-20)
identify subsets the predictors based on the maximum
R
2
criterion
Fitted Line Plot
(page 2-24)
perform linear and polynomial regression with a single
predictor and plot a regression line through the data;
on the actual or log
10
scale
Residual Plots
(page 2-27)
generate a set of residual plots to use for residual
analysis: normal score plot, a chart of individual
residuals, a histogram of residuals, and a plot of fits
versus residuals
Binary Logistic
(page 2-33)
perform logistic regression on a response with only two
possible values, such as presence or absence
categorical maximum
likelihood
Ordinal Logistic
(page 2-44)
perform logistic regression on a response with three or
more possible values that have a natural order, such as
none, mild, or severe
categorical maximum
likelihood
Nominal
Logistic
(page 2-51)
perform logistic regression on a response with three or
more possible values that have no natural order, such
as sweet, salty, or sour
categorical maximum
likelihood
Regression Regression
Regression
You can use Regression to perform simple and multiple regression using the method of
least squares. Use this procedure for fitting general least squares models, storing
regression statistics, examining residual diagnostics, generating point estimates,
generating prediction and confidence intervals, and performing lack-of-fit tests.
You can also use this command to fit polynomial regression models. However, if you
want to fit a polynomial regression model with a single predictor, you may find it more
advantageous to use Fitted Line Plot (page 2-24).
Data
Enter response and predictor variables in numeric columns of equal length so that each
row in your worksheet contains measurements on one observation or subject.
MINITAB omits all observations that contain missing values in the response or in the
predictors, from calculations of the regression equation and the ANOVA table items.
h To do a linear regression
1 Choose Stat Regression Regression.
2 In Response, enter the column containing the response (Y) variable.
3 In Predictors, enter the columns containing the predictor (X) variables.
4 If you like, use one or more of the options listed below, then click OK.
Options
Graphs subdialog box
I draw five different residual plots for regular, standardized, or deleted residualssee
Choosing a residual type on page 2-5. Available residual plots include a:
Chapter 2 Regression
histogram.
normal probability plot.
plot of residuals versus the fitted values ( ).
plot of residuals versus data order. The row number for each data point is shown
on the x-axis (for example, 1 2 3 4 n).
separate plot for the residuals versus each specified column.
For a discussion, see Residual plots on page 2-6.
Results subdialog box
I display the following in the Session window:
no output
the estimated regression equation, table of coefficients, s, R
2
, and the analysis of
variance table
the default output, which includes the above output plus the sequential sums of
squares and the fits and residuals of unusual observations
the default output, plus the full table of fits and residuals
Options subdialog box
I perform weighted regressionsee Weighted regression on page 2-6
I exclude the intercept term from the regression by unchecking Fit Interceptsee
Regression through the origin on page 2-7
I display the variance inflation factor (VIFa measure of multicollinearity effect)
associated with each predictorsee Variance inflation factor on page 2-7
I display the Durbin-Watson statistic which detects autocorrelation in the residuals
see Detecting autocorrelation in residuals on page 2-7
I display the PRESS statistic and adjusted R-squared
I perform a pure error lack-of-fit test for testing model adequacy when there are
predictor replicatessee Testing lack-of-fit on page 2-8
I perform a data subsetting lack-of-fit test to test the model adequacysee Testing
lack-of-fit on page 2-8
I predict the response, confidence interval, and prediction interval for new
observationssee Prediction of new observations on page 2-9
Storage subdialog box
I store the coefficients, fits, and regular, standardized, and deleted residualssee
Choosing a residual type on page 2-5.
I store the leverages, Cooks distances, and DFITS, for identifying outlierssee
Identifying outliers on page 2-9.
Y

I store the mean square error, the (XX)
-1
matrix, and the R matrix of the QR or
Cholesky decomposition. (The variance-covariance matrix of the coefficients is
MSE*(XX)
-1
.) See Help for information on these matrices.
Residual analysis and regression diagnostics
Regression analysis usually does not end when a regression model has been fit. You also
can examine residual plots and other regression diagnostics to assess if the residuals
appear random and normally distributed. MINITAB provides a number of residual plots
through the Graphs subdialog box. Alternatively, after fits and residuals are stored, you
can use Stat Regression Residual Plots to obtain four plots within a single graph
window.
MINITAB also produces regression diagnostics for identifying outliers or unusual
observations. These observations may have a significant influence upon the regression
results. See Identifying outliers on page 2-9. You might check unusual observations to see
if they are correct. If so, you can try to determine why they are unusual and consider
what effect they have on the regression equation. You might wish to examine how
sensitive the regression results are to the outliers being present. Outliers can suggest
inadequacies in the model or a need for additional information.
Choosing a residual type
You can calculate three types of residuals. Use the table below to help you choose which
type you would like to plot:
Residual type Choose when you want to Calculation
regular examine residuals in the original scale of the data response fit
standardized use a rule of thumb for identifying observations that are
not fit well by the model. A standardized residual greater
than 2, in absolute value, might be considered to be
large. MINITAB displays these observations in a table of
unusual observations, labeled with an R.
(residual) / (standard
deviation of the
residual)
Studentized identify observations that are not fit well by the model.
Removing observations can affect the variance estimate
and also can affect parameter estimates. A large absolute
Studentized residual may indicate that including the
observation in the model increases the error variance or
that it has a large affect upon the parameter estimates, or
both.
(residual) / (standard
deviation of the
residual). The i
th

studentized residual is
computed with the i
th

observation removed.
Residual plots
MINITAB generates residual plots that you can use to examine the goodness of model fit.
You can choose the following residual plots:
I Normal plot of residuals. The points in this plot should generally form a straight line
if the residuals are normally distributed. If the points on the plot depart from a
straight line, the normality assumption may be invalid. To perform a statistical test for
normality, use Stat Basic Statistics Normality Test (page 1-43).
I Histogram of residuals. This plot should resemble a normal (bell-shaped)
distribution with a mean of zero. Substantial clusters of points away from zero may
indicate that factors other than those in the model may be influencing your results.
I Residuals versus fits. This plot should show a random pattern of residuals on both
sides of 0. There should not be any recognizable patterns in the residual plot. The
following may indicate error that is not random:
a series of increasing or decreasing points
a predominance of positive residuals, or a predominance of negative residuals
patterns such as increasing residuals with increasing fits
I Residuals versus order. This is a plot of all residuals in the order that the data was
collected and can be used to find non-random error, especially of time-related
effects.
I Residuals versus other variables. This is a plot of all residuals versus another
variable. Commonly, you might use a predictor or a variable left out of the model
and see if there is a pattern that you may wish to fit.
If certain residual values are of concern, you can brush your graph to identify these
values. See the Brushing Graphs chapter in MINITAB Users Guide 1 for more information.
Weighted regression
Weighted least squares regression is a method for dealing with observations that have
nonconstant variances. If the variances are not constant, observations with
I large variances should be given relatively small weights
I small variances should be given relatively large weights
The usual choice of weights is the inverse of pure error variance in the response.
h To perform weighted regression
1 Choose Stat Regression Regression Options.
2 In Weights, enter the column containing the weights. The weights must be greater
than or equal to zero. Click OK in each dialog box.
If there are n observations in the data set, MINITAB forms an n n matrix W with the
column of weights as its diagonal and zeros elsewhere. MINITAB calculates the regression
coefficients by (XWX)
-1
(XWY). This is equivalent to minimizing a weighted error sum
of squares,
, where w
i
is the weight.
Regression through the origin
By default, the y-intercept term (also called the constant) is included in equation. Thus,
MINITAB fits the model
However, if the response at X = 0 is naturally zero, a model without an intercept can
make sense. If so, Uncheck Fit Intercept in the Options subdialog box, and the
0
term
will be omitted. Thus, MINITAB fits the model
Because it is difficult to interpret the R
2
when the constant is omitted, the R
2
is not
printed. If you wish to compare fits of models with and without intercepts, compare
mean square errors and examine residual plots.
Variance inflation factor
The variance inflation factor (VIF) is used to detect whether one predictor has a strong
linear association with the remaining predictors (the presence of multicollinearity
among the predictors). VIF measures how much the variance of an estimated regression
coefficient increases if your predictors are correlated (multicollinear).
VIF = 1 indicates no relation; VIF > 1, otherwise. The largest VIF among all predictors is
often used as an indicator of severe multicollinearity. Montgomery and Peck [21]
suggest that when VIF is greater than 5-10, then the regression coefficients are poorly
estimated. You should consider the options to break up the multicollinearity: collecting
additional data, deleting predictors, using different predictors, or an alternative to least
square regression. For additional information, see [3], [21].
Detecting autocorrelation in residuals
In linear regression, it is assumed that the residuals are independent (that is, they are
not autocorrelated) of each other. If the independence assumption were violated, some
model fitting results would be questionable. For example, positive correlation between
w
i
y fit ( )
2
[ ]
Y
0

1
+ X
1

2
X
2

k
X
k
+ + + + =
Y
1
X
1

2
X
2

k
X
k
+ + + + =
error terms tends to inflate the t-values for coefficients. Because of that, checking model
assumptions after fitting a model is an important part of regression analysis.
MINITAB provides two methods to check this assumption:
I A graph of residuals versus data order (1 2 3 4 n) can provide a means to visually
inspect residuals for autocorrelation.
I The Durbin-Watson statistic tests for the presence of autocorrelation in regression
residuals by determining whether or not the correlation between two adjacent error
terms is zero. The test is based upon an assumption that errors are generated by a
first-order autoregressive process. If there are missing observations, these are omitted
from the calculations, and only the nonmissing observations are used.
To reach a conclusion from the test, you will need to compare the displayed statistic
with lower and upper bounds in a table. If D > upper bound, no correlation; if D <
lower bound, positive correlation; if D is in between the two bounds, the test is
inconclusive. For additional information, see [4], [22].
Testing lack-of-fit
MINITAB provides two lack-of-fit tests so you can determine whether or not the
regression model adequately fits your data. The pure error lack-of-fit test requires
replicates; the data subsetting lack-of-fit test does not require replicates.
I Pure error lack-of-fit testIf your predictors contain replicates (repeated x values
with one predictor or repeated combinations of x values with multiple predictors),
MINITAB can calculate a pure error test for lack-of-fit. The error term will be
partitioned into pure error (error within replicates) and a lack-of-fit error. The F-test
can be used to test if you have chosen an adequate regression model. For additional
information, see [9], [22], [29].
I Data subsetting lack-of-fit testMINITAB also performs a lack-of-fit test that does
not require replicates but involves subsetting the data, and attempts to identify the
nature of any lack-of-fit. This test is nonstandard, but it can provide information
about the lack-of-fit relative to each variable. See [6] and Help for more information.
MINITAB performs 2k+1 hypothesis tests, where k is the number of predictors, and
then combines them using Bonferroni inequalities to give an overall significance level
of 0.1. A message is printed out for each test for which there is evidence of lack-of-fit.
For each predictor, a curvature test and an interaction test are performed by
comparing the fit above and below the predictor mean using indicator variables. A
test can also be performed by fitting the model to the central portion of the data
and then comparing the error sums of squares of that central data portion to the
error sums of squares of all the data.
Prediction of new observations
If you have new predictor (X) values and you wish to know what the response
would be using the regression equation, then use Prediction intervals for new
observations in the Options subdialog box. Enter constants or columns
containing the new x values, one for each predictor. Columns must be of equal
length. If you enter a constant and a column(s), MINITAB will assume that you
want predicted values for all combinations of constant and column values. You
can change the confidence level from the default 95%, and you can also store
the printed values: fits, standard errors of fits, confidence limits, and prediction
limits. If you use prediction with weights, see Help for obtaining correct results.
Identifying outliers
In addition to graphs, you can store three additional measures for the purpose
of identifying outliers, or unusual observations that can have a significant
influence upon the regression. The measures are leverages, Cooks distance,
and DFITS:
I Leverages are the diagonals of the hat matrix, H = X (XX)
-1
X, where X is
the design matrix. Note that h
i
depends only on the predictors; it does not
involve the response Y. Many people consider h
i
to be large enough to merit
checking if it is more than 2p/n or 3p/n, where p is the number of predictors
(including one for the constant). MINITAB displays these in a table of unusual
observations with high leverage. Those with leverage over 3p/n or 0.99,
whichever is smallest, are marked with an X and those with leverage greater
than 5p/n are marked with XX.
I Cooks distance combines leverages and Studentized residuals into one
overall measure of how unusual the predictor values and response are for
each observation. Large values signify unusual observations. Geometrically,
Cooks distance is a measure of the distance between coefficients calculated
with and without the i
th
observation. Cook [7] and Weisberg [29] suggest
checking observations with Cooks distance > F (.50, p, np), where F is a
value from an F-distribution.
I DFITS, like Cooks distance, combines the leverage and the Studentized
residual into one overall measure of how unusual an observation is. DFITS
(also called DFFITS) is the difference between the fitted values calculated
with and without the i
th
observation, and scaled by stdev (
i
). Belseley, Kuh,
and Welsch [3] suggest that observations with DFITS > 2 should be
considered as unusual. See Help for more details on these measures.
e Example of performing a simple linear regression
You are a manufacturer who wishes to easily obtain a quality measure on a
product, but the procedure is expensive. However, there is a quick-and-dirty
way of doing the same thing that is much less expensive but also is slightly less
precise. You examine the relationship between the two scores to see if you can
Y
p n
predict the desired score (Score2) from the score that is easy to obtain (Score1). You
also obtain a prediction interval for an observation with Score1 being 8.2.
1 Open the worksheet EXH_REGR.MTW.
3 In Response, enter Score2. In Predictors, enter Score1.
4 Click Options.
5 In Predict intervals for new observations, type 8.2. Click OK in each dialog box.
Session
window
output
Regression Analysis: Score2 versus Score1

The regression equation is
Score2 = 1.12 + 0.218 Score1
Predictor Coef SE Coef T P
Constant 1.1177 0.1093 10.23 0.000
Score1 0.21767 0.01740 12.51 0.000
S = 0.1274 R-Sq = 95.7% R-Sq(adj) = 95.1%
Analysis of Variance
Source DF SS MS F P
Regression 1 2.5419 2.5419 156.56 0.000
Residual Error 7 0.1136 0.0162
Total 8 2.6556
Unusual Observations
Obs Score1 Score2 Fit SE Fit Residual St Resid
9 7.50 2.5000 2.7502 0.0519 -0.2502 -2.15R
R denotes an observation with a large standardized residual
Predicted Values for New Observations
New Obs Fit SE Fit 95.0% CI 95.0% PI
1 2.9026 0.0597 ( 2.7614, 3.0439) ( 2.5697, 3.2356)
Values of Predictors for New Observations
New Obs Score1
1 8.20
Interpreting the results
The regression procedure fits the model
Y
0

1
X + + =
where Y is the response, X is the predictor,
0
and
1
are the regression coefficients, and
is an error term having a normal distribution with mean of zero and standard
deviation . MINITAB estimates
0
by b
0
,
1
by b
1
, and by s. The fitted equation is then

where is called the predicted or fitted value. In this example, b
0
is 1.12 and b
1
is
0.218.
Table of Coefficients. The first table in the output gives the estimated coefficients, b
0

and b
1
, along with their standard errors. In addition, a t-value that tests whether the
null hypothesis of the coefficient is equal to zero and the corresponding p-value is
given. In this example, the p-values that are used to test whether the constant and slope
are equal to zero are printed as 0.000, because MINITAB rounds these values to three
decimal points. These p-values are actually less than 0.0005. These values indicate that
there is sufficient evidence that the coefficients are not zero for likely Type I error rates
( levels).
S = 0.1274. This is an estimate of , the estimated standard deviation about the
regression line. Note that
R-Sq = 95.7%. This is R
2
, also called the coefficient of determination. Note that R
2
=
Correlation (Y, )
2
. Also,
R
2
= (SS Regression) / (SS Total)
The R
2
value is the proportion of variability in the Y variable (in this example, Score2)
accounted for by the predictors (in this example, Score1).
R-Sq(adj) = 95.1%. This is R
2
adjusted for degrees of freedom. If a variable is added to
an equation, R
2
will get larger even if the added variable is of no real value. To
compensate for this, MINITAB also prints R-Sq (adj), which is an approximately unbiased
estimate of the population R
2
that is calculated by the formula
converted to a percent, where p is the number of coefficients fit in the regression
equation (2 in our example). In the same notation, the usual R
2
is
Analysis of Variance. This table contains sums of squares (abbreviated SS). SS
Regression is sometimes written SS (Regression | b
0
) and sometimes called SS Model. SS
Error is sometimes written as SS Residual, SSE, or RSS. MS Error is often written as MSE.
SS Total is the total sum of squares corrected for the mean. Use the analysis of variance
Y
b
0
b
1
X + =
Y
s
2
MSError =
Y
R
2
adj ( ) 1
SS Error n p ( )
SS Total n 1 ( )
-------------------------------------------- =
R
2
1
SS Error
SS Total
-------------------- =
table to assess the overall fit. The F-test is a test of the hypothesis H
0
: All regression
coefficients, excepting
0
, are zero.
Unusual Observations. Unusual observations are marked with an X if the predictor is
unusual (large leverage), and they are marked with an R if the response is unusual (large
standardized residual). See Choosing a residual type on page 2-5 and Identifying outliers
on page 2-9. The default is to print only unusual observations. You can choose to print a
full table of fitted values by selecting this option in the Results subdialog box.
The Fit or fitted Y value is sometimes called predicted Y value or . SE Fit is the
(estimated) standard error of the fitted value. St Resid is the standardized residual.
Predicted Values. The interval displayed under 95% CI is the confidence interval for the
population mean of all responses (Score2) that correspond to the given value of the
predictor (Score1 = 8.2). The interval displayed under 95% PI is the prediction interval
for an individual observation taken at Score1 = 8.2. The confidence interval is
appropriate for the data used in the regression. If you have new observations, use the
prediction interval. See Prediction of new observations on page 2-9.
Regression analysis would not be complete without examining residual patterns. The
following multiple regression example and residual plots procedure provide additional
information about regression analysis.
e Example of a multiple regression
As part of a test of solar thermal energy, you measure the total heat flux from homes.
You wish to examine whether total heat flux (Heatflux) can be predicted by insulation,
by the position of the focal points in the east, south, and north directions, and by the
time of day. Data are from [21], page 486. You found, using best subsets regression on
page 2-23, that the best two-predictor model included the variables North and South
and the best three-predictor added the variable East. You would like to evaluate the
three-predictor model using multiple regression.
3 In Response, enter Heatflux.
4 In Predictors, enter North South East. Click OK.
Y

Session
window
output
Regression Analysis: HeatFlux versus North, South, East

HeatFlux = 389 - 24.1 North + 5.32 South + 2.12 East
Predictor Coef SE Coef T P
Constant 389.17 66.09 5.89 0.000
North -24.132 1.869 -12.92 0.000
South 5.3185 0.9629 5.52 0.000
East 2.125 1.214 1.75 0.092
S = 8.598 R-Sq = 87.4% R-Sq(adj) = 85.9%
Source DF SS MS F P
Regression 3 12833.9 4278.0 57.87 0.000
Residual Error 25 1848.1 73.9
Total 28 14681.9
Source DF Seq SS
North 1 10578.7
South 1 2028.9
East 1 226.3
Unusual Observations
Obs North HeatFlux Fit SE Fit Residual St Resid
4 17.5 230.70 210.20 5.03 20.50 2.94R
22 17.6 254.50 237.16 4.24 17.34 2.32R
R denotes an observation with a large standardized residual
MINITAB fits the regression model
where Y is the response, X
1
, X
2
, and X
3
are the predictors,
0
,
1
,
2
, and
3
are the
regression coefficients, and is an error term having a normal distribution with mean of
0 and standard deviation .
The multiple regression output is similar to the simple regression output, but it also
includes the sequential sums of squares. Sequential sums of squares differ from
t-statistics. T-statistics test the null hypothesis that each coefficient is zero, given that all
other variables are present in the model. The sequential sums of squares are the unique
sums of squares of the current variable, given the sums of squares of any previously
entered variables.
For example, in the sequential sums of squares column of the Analysis of Variance table,
the value for North (10578.7) is the sums of squares for North; the value for South
(2028.9) is the unique sums of squares for South given the sums of squares for North;
Y
0

1
X
1

2
X
2

3
X
3
e + + + + =
Chapter 2 Stepwise Regression
and the value for East (226.3) is the unique sums of squares for East given the sums of
squares of North and South.
The first line in the sequential sums of squares table gives SS (b
1
| b
0
), or the reduction
in SS Error due to fitting the b
1
term (an equivalent is to use X1 as a predictor),
assuming that you have already fit b
0
. The next line gives SS (b
2
| b
0
, b
1
), or the
reduction in SS Error due to fitting the b
2
term, assuming that you have already fit the
terms b
0
and b
1
. The next line is SS (b
3
| b
0
, b
1
, b
2
), and so on. If you want a different
sequence, say SS (b
2
| b
0
, b
3
), then repeat the regression procedure and enter X3 first,
then X2. MINITAB does not print p-values for the sequential sums of squares. Except for
the last sequential sums of squares, the mean square error should not be used to test
the significance of these terms.
In this example, t-test p-values of less than 0.0005 indicate that there is significant
evidence that the coefficients of variables North and South are not zero. The coefficient
of the variable East, however, has an t-test p-value of 0.092. If the evidence for the
coefficient not being zero appears insufficient and if it adds little to the prediction, you
may choose the more parsimonious model with predictors North and South. Make this
decision only after examining the residuals. In the residual plots example on page 2-28,
you examine the residuals from the model with predictors North and South.
(Alternatively, you could have used the graphs available in the Graphs subdialog box.)
Stepwise Regression
Stepwise regression removes and adds variables to the regression model for the purpose
of identifying a useful subset of the predictors. MINITAB provides three commonly used
procedures: standard stepwise regression (adds and removes variables), forward
selection (adds variables), and backward elimination (removes variables).
Data
Enter response and predictor variables in the worksheet in numeric columns of equal
length so that each row in your worksheet contains measurements on one observation
or subject. MINITAB automatically omits rows with missing values from the calculations.
Stepwise Regression Regression
h To do a stepwise regression
1 Choose Stat Regression Stepwise.
2 In Response, enter the numeric column containing the response (Y) data.
3 In Predictors, enter the numeric columns containing the predictor (X) variables.
Options
Stepwise dialog box
I By entering variables in Predictors to include in every model, you can designate a
set of predictor variables that cannot be removed from the model, even when their
p-values are less than the Alpha to enter value.
Method subdialog box
I perform standard stepwise regression (adds and removes variables), forward
selection (adds variables), or backward elimination (removes variables).
I when you choose the Stepwise method, you can enter a starting set of predictor
variables in Enter. These variables are removed if their p-values are greater than the
Alpha to enter value. If you want keep variables in the model regardless of their
p-values, enter them in Predictors to include in every model in the main dialog
box. See Stepwise regression (default) on page 2-16.
I when you choose the Stepwise or Forward selection method, you can set the value of
the for entering a new variable in the model in Alpha to enter. See Stepwise
regression (default) and Forward selection on page 2-17.
I when you choose the Stepwise or Backward elimination method, you can set the
value of for removing a variable from the model in Alpha to remove. See Stepwise
regression (default) and Backward elimination on page 2-17.
I display the next best alternate predictors up to the number requested. If a new
predictor is entered into the model, MINITAB displays the predictor which was the
second best choice, the third best choice, and so on, up to the requested number.
I set the number of steps between pauses. See User intervention on page 2-17.
I exclude the intercept term from the regression by unchecking Fit Intercept. See
Regression through the origin on page 2-7.
Method
MINITAB provides three commonly used procedures: standard stepwise regression (page
2-16), forward selection (page 2-17), and backward elimination (page 2-17)
Stepwise regression (default)
The basic method of stepwise regression is to calculate an F-statistic for each variable in
the model. Suppose the model contains X1, , Xj. Then the F-statistic for Xi is
with 1 and n j 1 degrees of freedom.
MINITAB then determines the corresponding p-value. If the p-value for any variable is
greater than Alpha to remove, the variable with the largest p-value is removed from
the model. The regression equation is calculated for this smaller model, the results are
printed, and the procedure proceeds to a new step.
If no variable can be removed, the procedure attempts to add a variable. An F-statistic is
calculated for each variable not yet in the model. Suppose the model, at this stage,
contains X1, , Xp. Then the F-statistic for a new variable, X
j + 1
is
MINITAB then determines the corresponding p-value. The variable with the smallest
p-value is then added, provided its p-value is smaller than Alpha to enter. Adding this
variable is equivalent to choosing the variable with the largest partial correlation or to
choosing the variable that most effectively reduces the error SS. The regression equation
is then calculated, results are displayed, and the procedure goes to a new step.
When no more variables can be entered into or removed from the model, the stepwise
procedure ends.
SSE X1 X i 1 ( ) X i 1 + ( ) Xj , , , , , [ ] SSE X1 Xj , , [ ]
MSE X1 Xj , , [ ]
-------------------------------------------------------------------------------------------------------------------------------------------------
SSE X1 Xj , , [ ] SSE X1 Xj X
j 1 +
, , , [ ]
MSE X1 Xj X
j 1 +
, , , [ ]
---------------------------------------------------------------------------------------------------------- -
Forward selection
This procedure adds variables to the model using the same method as the stepwise
procedure. Once added, however, a variable is never removed. The forward selection
procedure ends when none of the candidate variables have a p-value smaller than
Alpha to enter.
Backward elimination
This procedure starts with the model that contains all the predictors and then removes
variables, one at a time, using the same method as the stepwise procedure. No variable,
however, can re-enter the model. The backward elimination procedure ends when none
of the variables included the model have a p-value greater than Alpha to remove.
User intervention
Stepwise proceeds automatically by steps and then pauses. You can set the number of
steps between pauses in the Options subdialog box.
The number of steps can start at one with the default and maximum determined by the
output width. Set a smaller value if you wish to intervene more often. You must check
Editor Enable Commands in order to intervene and use the procedure interactively.
If you do not, the procedure will run to completion without pausing.
At the pause, MINITAB displays a MORE? prompt. At this prompt, you can continue the
display of steps, terminate the procedure, or intervene by typing a subcommand.
To Type
display another page of steps (or until
no more predictors can enter or leave
the model)
YES
terminate the procedure NO
enter a set of variables ENTER CC
remove a set of variables REMOVE CC
force a set of variables to be in model FORCE CC
display the next best alternate predictors BEST K
set the number of steps between pauses STEPS K
change F to enter FENTER K
change F to remove FREMOVE K
change to enter AENTER K
change to remove AREMOVE K
Use of variable selection procedures
Variable selection procedures can be a valuable tool in data analysis, particularly in the
early stages of building a model. At the same time, these procedures present certain
dangers. Here are some considerations:
I Since the procedures automatically snoop through many models, the model
selected may fit the data too well. That is, the procedure can look at many
variables and select ones which, by pure chance, happen to fit well.
I The three automatic procedures are heuristic algorithms, which often work very well
but which may not select the model with the highest R
2
value (for a given number of
predictors).
I Automatic procedures cannot take into account special knowledge the analyst may
have about the data. Therefore, the model selected may not be the best from a
practical point of view.
e Example of a stepwise regression
Students in an introductory statistics course participated in a simple experiment. Each
student recorded his or her height, weight, gender, smoking preference, usual activity
level, and resting pulse. They all flipped coins, and those whose coins came up heads
ran in place for one minute. Afterward, the entire class recorded their pulses once more.
You wish to find the best predictors for the second pulse rate.
1 Open the worksheet PULSE.MTW.
2 Press c+M to make the Session window active.
3 Check Editor Enable Commands.
4 Choose Stat Regression Stepwise.
5 In Response, enter Pulse2.
6 In Predictors, enter Pulse1 RanWeight.
7 Click Options.
8 In Number of steps between pauses, enter 2. Click OK in each dialog box.
9 At the first More? prompt, type Yes.
10 At the second More? prompt, type No.
Session
window
output
Stepwise Regression: Pulse2 versus Pulse1, Ran, ...

F-to-Enter: 4 F-to-Remove: 4
Response is Pulse2 on 6 predictors, with N = 92
Step 1 2
Constant 10.28 44.48
Pulse1 0.957 0.912
T-Value 7.42 9.74
P-Value 0.000 0.000
Ran -19.1
T-Value -9.05
P-Value 0.000
S 13.5 9.82
R-Sq 37.97 67.71
R-Sq(adj) 37.28 66.98
C-p 103.2 13.5
More? (Yes, No, Subcommand, or Help)
Step 3
Constant 42.62
Pulse1 0.812
T-Value 8.88
P-Value 0.000
Ran -20.1
T-Value -10.09
P-Value 0.000
Sex 7.8
T-Value 3.74
P-Value 0.000
S 9.18
R-Sq 72.14
R-Sq(adj) 71.19
C-p 1.9
More? (Yes, No, Subcommand, or Help)
This example uses six predictors. You requested that MINITAB do two steps of the
automatic stepwise procedure, display the results, and allow you to intervene.
The first page of output gives results for the first two steps. In step 1, the variable
Pulse1 entered the model; in step 2, the variable Ran entered. No variables were
removed on either of the first two steps. For each model, MINITAB displays the constant
Chapter 2 Best Subsets Regression
term, the coefficient and its t-value for each variable in the model, S (square root of
MSE), and R
2
.
Because you answered YES at the MORE? prompt, the automatic procedure continued
for one more step, adding the variable Sex. At this point, no more variables could enter
or leave, so the automatic procedure stopped and again allowed you to intervene.
Because you do not want to intervene, you typed NO.
The stepwise output is designed to present a concise summary of a number of fitted
models. If you want more information on any of the models, you can use the regression
procedure (page 2-3).
Best Subsets Regression
Best subsets regression generates regression models using the maximum R
2
criterion by
first examining all one-predictor regression models and then selecting the two models
giving the largest R
2
. MINITAB displays information on these models, examines all
two-predictor models, selects the two models with the largest R
2
, and displays
information on these two models. This process continues until the model contains all
predictors.
Data
Enter response and predictor variables in the worksheet in numeric columns of equal
length so that each row in your worksheet contains measurements on one unit or
subject. MINITAB automatically omits rows with missing values from all models.
When there are a large number of predictors, there are many subsets. There is a limit to
the number of free predictors (variables which may optionally enter the model). If you
have m free predictors and force q predictors to be in the model, then (m q) must be
less than or equal to 31.
In general, best subsets regression can take a long time when there are 15 or more free
predictors. The length of time, however, varies with the data set. If you have to analyze
a very large data set, consider forcing certain predictors to be in the model, thereby
decreasing the number of free variables.
Best Subsets Regression Regression
h To do a best subsets regression
1 Choose Stat Regression Best Subsets.
2 In Response, enter the numeric column containing the response (Y) data.
3 In Free predictors, enter up to 31 numeric columns containing the predictor (X)
variables. MINITAB can include any of these free predictors if they meet the specified
criteria (for example, greater than F to enter).
Options
Best Subsets Regression dialog box
I force a set of predictors to be in the model by entering these variables in Predictors
in all models. These variables cannot be removed, even when their F-statistics are
less than the F to Enter value.
I display information for a specified number of free variables in the models. You can
give a minimum and maximum. For example, if you specify 3 as the minimum and 6
as the maximum, MINITAB will display the best 3, 4, 5, and 6 variable models (this
number does not include variables forced into the model).
I display information about the best models of each variable number. You can
specify the number of best models to display. For example, if you specify 3,
MINITAB will display the best, second best, and third best models for each number of
variables. You can enter a value from 1 to 5 (the default is 2).
I exclude the intercept term from the regression by unchecking Fit Interceptsee
Regression through the origin on page 2-7.
Chapter 2 Best Subsets Regression
Using the best subsets regression procedure
The best subsets regression procedure can be used to select a group of likely models for
further analysis. The general method is to select the smallest subset that fulfills certain
statistical criteria. The reason that you would use a subset of variables rather than a full
set is because the subset model may actually estimate the regression coefficients and
predict future responses with smaller variance than the full model using all predictors
[15].
The statistics R
2
, adjusted R
2
, C
p
, and s (square root of MSE) are calculated by the best
subsets procedure and can be used as comparison criteria.
Typically, you would only consider subsets that provide the largest R
2
value. However,
R
2
always increases with the size of the subset. For example, the best 5-predictor model
will always have a higher R
2
than the best 4-predictor model. Therefore, R
2
is most
useful when comparing models of the same size. When comparing models with the
same number of predictors, choosing the model with the highest R
2
is equivalent to
choosing the model with the smallest SSE.
Use adjusted R
2
and C
p
to compare models with different numbers of predictors. In this
case, choosing the model with the highest adjusted R
2
is equivalent to choosing the
model with the smallest mean square error (MSE). If adjusted R
2
is negative (usually
when there is a large number of predictors and small R
2
) then MINITAB sets the adjusted
R
2
to zero.
The C
p
statistic is given by the formula
where SSE
p
is SSE for the best model with p parameters (including the intercept, if it is
in the equation), and MSE
m
is the mean square error for the model with all m
predictors.
In general, look for models where C
p
is small and close to p. If the model is adequate
(i.e., fits the data well), then the expected value of C
p
is approximately equal to p (the
number of parameters in the model). A small value of C
p
indicates that the model is
relatively precise (has small variance) in estimating the true regression coefficients and
predicting future responses. This precision will not improve much by adding more
predictors. Models with considerable lack-of-fit have values of C
p
larger than p. See [15]
for additional information on C
p
.
Exercise caution when using variable selection procedures such as best subsets (and
stepwise regression). These procedures are automatic and therefore do not consider the
practical importance of any of the predictors. In addition, anytime you fit a model to
data, the goodness of the fit comes from two basic sources:
I fitting the underlying structure of the data (a structure that will appear in other data
sets gathered in the same way)
I fitting the peculiarities of the one particular data set you analyze
C
p
SSE
p
MSE
m
---------------- - n 2p ( ) =
Best Subsets Regression Regression
Unfortunately, when you search through many models to find the best, as you do in
best subsets regression, a good fit is often chosen largely for the second reason. There
are two ways that you can verify a model obtained by a variable selection procedure.
You can
I verify the model using a new set of data.
I take the original data set and randomly divide it into two parts. Then use the variable
selection procedure on one part to select a model and verify the fit using the second
part.
e Example of best subsets regression
Total heat flux is measured as part of a solar thermal energy test. You wish to see how
total heat flux is predicted by other variables: insolation, the position of the focal points
in the east, south, and north directions, and the time of day. Data are from
Montgomery and Peck [21], page 486.
1 Open the worksheet EXH_REGR.
2 Choose Stat Regression Best Subsets.
4 In Free Predictors, enter Insolation-Time. Click OK.
Session
window
output
Best Subsets Regression: HeatFlux versus Insolation, East, ...

Response is HeatFlux
I
n
s
o S N
l E o o T
a a u r i
t s t t m
Vars R-Sq R-Sq(adj) C-p S i t h h e
1 72.1 71.0 38.5 12.328 X
1 39.4 37.1 112.7 18.154 X
2 85.9 84.8 9.1 8.9321 X X
2 82.0 80.6 17.8 10.076 X X
3 87.4 85.9 7.6 8.5978 X X X
3 86.5 84.9 9.7 8.9110 X X X
4 89.1 87.3 5.8 8.1698 X X X X
4 88.0 86.0 8.2 8.5550 X X X X
5 89.9 87.7 6.0 8.0390 X X X X X
Each line of the output represents a different model. Vars is the number of variables or
predictors in the model. The statistics R
2
, adjusted R
2
, C
p
, and s are displayed next (R
2

Chapter 2 Fitted Line Plot
and adjusted R
2
are converted to percentages). Predictors that are present in the model
are indicated by an X.
In this example, the best one-predictor model uses North (R
2
adj = 71.0) and the
second-best one-predictor model uses Insolation (R
2
adj = 37.1). Moving from the best
one-predictor model to the best two-predictor model increased the adjusted R
2
from
71.0 to 84.8. R
2
usually increases slightly as more predictors are added even when the
new predictors do not improve the model. The best two-predictor model might be
considered as the minimum fit. The multiple regression example on page 2-12 and the
residual plots example on page 2-28 indicate that adding the variable East does not
improve the fit of the model.
Fitted Line Plot
This procedure performs regression with linear and polynomial (second or third order)
terms, if requested, of a single predictor variable and plots a regression line through the
data, on the actual or log
10
scale. Polynomial regression is one method for modeling
curvature in the relationship between a response variable (Y) and a predictor variable
(X) by extending the simple linear regression model to include X
2
and X
3
as predictors.
Data
Enter your response and single predictor variables in the worksheet in numeric columns
of equal length so that each row in your worksheet contains measurements on one unit
or subject. MINITAB automatically omits rows with missing values from the calculations.
h To do a fitted line plot
1 Choose Stat Regression Fitted Line Plot.
2 In Response (Y), enter the numeric column containing the response data.
3 In Predictor (X), enter the numeric column containing the predictor variable.
Fitted Line Plot Regression
Options
Fitted Line Plot dialog box
I choose a linear (default), quadratic, or cubic regression model to automatically
include all lower order terms. See Polynomial regression model choices on page 2-25.
I transform the y-variable by log
10
Y. You can also choose to display the y-scale in the
log
10
scale.
I transform the x-variable by log
10
X. You can also choose to display the plot x scale in
the log
10
scale. If you use this option with polynomials of order greater than one,
then the polynomial regression will be based on powers of the log
10
X.
I display confidence bands and prediction bands about the regression line. You can
also change the confidence level from the default of 95%.
I replace the default title with your own title.
I store the residuals, fits, and regression model coefficients (b
0
, b
1
, b
2
, up to b
3
down
the column, where b
i
is the coefficient of the i
th
power of the predictor or
transformed predictor).
I store the scaled residuals and scaled fits when using the y-variable transformation,
log
10
Y.
Polynomial regression model choices
You can fit the following linear, quadratic, or cubic regression models:
Another way of modeling curvature is to generate additional models by using the log
10
of X and/or Y for linear, quadratic, and cubic models. In addition, taking the log
10
of Y
may be used to reduce right-skewness or nonconstant variance of residuals.
Model type Order Statistical model
linear first
quadratic second
cubic third
Y
0

1
+ X + =
Y
0

1
+ X
2
X
2
+ + =
Y
0

1
+ X
2
X
2
3
X
3
+ + + =
Chapter 2 Fitted Line Plot
e Example of plotting a fitted regression line
You are studying the relationship between a particular machine setting and the amount
of energy consumed. This relationship is known to have considerable curvature, and
you believe that a log transformation of the response variable will produce a more
symmetric error distribution. You choose to model the relationship between the
machine setting and the amount of energy consumed with a quadratic model.
2 Choose Stat Regression Fitted Line Plot.
3 In Response (Y), enter EnergyConsumption.
4 In Predictor (X), enter MachineSetting.
5 Under Type of Regression Model, choose Quadratic.
6 Click Options. Check Logten of Y, Display logscale for Y variable, Display
confidence bands, and Display prediction bands. Click OK in each dialog box.
Session
window
output
Polynomial Regression Analysis: EnergyConsum versus MachineSetti
log(EnergyConsum) = 7.06962 - 0.698628 MachineSetti
+ 0.0173974 MachineSetti**2

S = 0.167696 R-Sq = 93.1 % R-Sq(adj) = 91.1 %
Source DF SS MS F P
Regression 2 2.65326 1.32663 47.1743 0.000
Error 7 0.19685 0.02812
Total 9 2.85012
Source DF Seq SS F P
Linear 1 0.03688 0.1049 0.754
Quadratic 1 2.61638 93.0370 0.000
Graph
window
output

Residual Plots Regression
The quadratic model (p-value = 0.000, or actually p-value < 0.0005) appears to provide
a good fit to the data. The R
2
indicates that machine setting accounts for 93.1% of the
variation in log
10
of the energy consumed. A visual inspection of the plot reveals that
the data are evenly spread about the regression line, implying no systematic lack-of-fit.
The lines labeled CI are the 95% confidence limits for the log
10
of energy consumed.
The lines labeled PI are the 95% prediction limits for new observations.
Residual Plots
You can generate a set of plots to use for residual analysis by storing fits and residuals
using another procedure, such as regression, and then using the Residual Plots
procedure to produce a normal score plot, a chart of individual residuals, a histogram of
residuals, and a plot of fits versus residuals, all on the same graph.
Data
You must save a column of residuals and a column of fits from another MINITAB
procedure. MINITAB automatically omits rows with missing values from the calculations.
h To display the residual plots
1 Choose Stat Regression Residual Plots.
2 In Fits, enter the column containing stored fits.
3 In Residuals, enter the column containing the stored residuals.
4 If you like, use the option listed below, then click OK.
Options
You can replace the default title with your own title.
Chapter 2 Residual Plots
e Example of residual plots
You examine the residuals from the best two-predictor model of the best subsets
regression example on page 2-23. You determined in the multiple regression example
on page 2-12 that adding the third variable from the best three-predictor model may
not add appreciably to the fit. You now examine residual patterns from the best
two-predictor model to further examine goodness-of-fit.
Step 1: Store the residuals and fits from a regression analysis
4 In Predictors, enter South North.
5 Click Storage. Check Fits and Standardized residuals.
6 Click OK in each dialog box.
Step 2: Generate the residual plots
1 Choose Stat Regression Residual Plots.
2 In Fits, enter the column containing the stored fits.
3 In Residuals, enter the column containing the stored residuals. Click OK.
Session
window
output
Residual Plots
TEST 1. One point more than 3.00 sigmas from center line.
Test Failed at points: 22
TEST 2. 9 points in a row on same side of center line.
Test Failed at points: 16
Graph
window
output
Logistic Regression Overview Regression
The residuals plots procedure generates four plots in one graph window. The normal
plot shows an approximately linear pattern that is consistent with a normal distribution.
Similarly, the histogram exhibits a pattern that is consistent with a sample from a
normal distribution. However, the I Chart (a control chart of individual observations)
reveals that one point labeled with a 1the twenty-second valueis outside the three
sigma limits, and another point labeled with a 2 is flagged because it is the ninth in a
row on the same side of the mean.
The plot of residuals versus fits shows that the fit tends to be better for higher predicted
values. Investigation shows that the highest residual coincides with the highest value of
the variable East. Including East in the model and repeating the residual plots procedure
showed that no points are flagged as unusual (not shown). The contribution to the fit
by the variable East may warrant further investigation.
Logistic Regression Overview
Both logistic regression and least squares regression investigate the relationship
between a response variable and one or more predictors. A practical difference between
them is that logistic regression techniques are used with categorical response variables,
and linear regression techniques are used with continuous response variables.
MINITAB provides three logistic regression procedures that you can use to assess the
relationship between one or more predictor variables and a categorical response
variable of the following types:
Both logistic and least squares regression methods estimate parameters in the model so that the
fit of the model is optimized. Least squares minimizes the sum of squared errors to obtain
parameter estimates, whereas logistic regression obtains maximum likelihood estimates of the
parameters using an iterative-reweighted least squares algorithm [19].
Tip You can identify points in the plots using the brushing capabilities. See the Brushing
Graphs chapter in MINITAB Users Guide 1.
Variable
type
Number of
categories Characteristics Examples
Binary 2 two levels success, failure
yes, no
Ordinal 3 or more natural ordering
of the levels
none, mild, severe
fine, medium, coarse
Nominal 3 or more no natural ordering
of the levels
blue, black, red, yellow
sunny, rainy, cloudy
Chapter 2 Logistic Regression Overview
How to specify the model terms
The logistic regression procedures can fit models with:
I up to 9 factors and up to 50 covariates
I crossed or nested factorssee Crossed vs. nested factors on page 3-19
I covariates that are crossed with each other or with factors, or nested within factors
Model continuous predictors as covariates and categorical predictors as factors. Here
are some examples. A is a factor and X is a covariate.
The model for logistic regression is a generalization of the model used in MINITABs
general linear model (GLM) procedure. Any model fit by GLM can also be fit by the
logistic regression procedures. For a discussion of specifying models in general, see
Specifying the model terms on page 3-20 and Specifying reduced models on page 3-21. In
the logistic regression commands, MINITAB assumes any variable in the model is a
covariate unless the variable is specified as a factor. In contrast, GLM assumes that any
variable in the model is a factor unless the variable is specified as a covariate.
Model restrictions
Logistic regression models in MINITAB have the restrictions as GLM models:
I There must be enough data to estimate all the terms in your model, so that the
model is full rank. MINITAB will automatically determine if your model is full rank and
display a message. In most cases, eliminating some unimportant high order
interactions in your model should solve your problem.
I The model must be hierarchical. In a hierarchical model, if an interaction term is
included, all lower order interactions and main effects that comprise the interaction
term must appear in the model.
Model terms
A X AX fits a full model with a covariate crossed with a factor
A | X an alternative way to specify the previous model
A X XX fits a model with a covariate crossed with itself making a squared term
A X(A) fits a model with a covariate nested within a factor
Logistic Regression Overview Regression
Reference levels for factors
MINITAB needs to assign one factor level as the reference level, meaning that the
interpretation of the estimated coefficients is relative to this level. MINITAB designates
the reference level based on the data type:
I For numeric factors, the reference level is the level with the least numeric value.
I For date/time factors, the reference level is the level with the earliest date/time.
I For text factors, the reference level is the level that is first in alphabetical order.
You can change the default reference level in the Options subdialog box.
For more information, Interpreting the parameter estimates relative to the event and the
reference levels on page 2-39.
Logistic regression creates a set of design variables for each factor in the model. If there
are k levels, there will be k1 design variables and the reference level will be coded as 0.
Here are two examples of the default coding scheme:
Reference event for the response variable
MINITAB needs to designate one of the response values as the reference event. MINITAB
defines the reference event based on the data type:
I For numeric factors, the reference event is the greatest numeric value.
I For date/time factors, the reference event is the most recent date/time.
I For text factors, the reference event is the last in alphabetical order.
You can change the default reference event in the Options subdialog box.
For more information, Interpreting the parameter estimates relative to the event and the
reference levels on page 2-39.
Note If you have defined a value order for a text factor, the default rule above does not
apply. MINITAB designates the first value in the defined order as the reference value. See
Ordering Text Categories in the Manipulating Data chapter in MINITAB Users Guide 1.
Factor A with 4 levels
(1 2 3 4)
Factor B with 3 levels
(Temp Pressure Humidity)
reference
level
A1 A2 A3 reference
level
B1 B2
1 0 0 0 Humidity 0 0
2 1 0 0 Pressure 1 0
3 0 1 0 Temp 0 1
4 0 0 1
Note If you have defined a value order for a text factor, the default rule above does not
apply. MINITAB designates the last value in the defined order as the reference event. See
Ordering Text Categories in the Manipulating Data chapter in MINITAB Users Guide 1.
Chapter 2 Logistic Regression Overview
Worksheet structure
Data used for input to the logistic regression procedures may be arranged in two
different ways in your worksheet: as raw (categorical) data, or as frequency (collapsed)
data. For binary logistic regression, there are three additional ways to arrange the data
in your worksheet: as successes and trials, as successes and failures, or as failures and
trials. These ways are illustrated here for the same data.
The response entered as raw data or as frequency data
C1 C2 C3 C4
Response Factor Covar
0 1 12
1 1 12
1 1 12
.
.
.
.
.
.
.
.
.
1 1 12
0 2 12
1 2 12
.
.
.
.
.
.
.
.
.
1 2 12
.
.
.
.
.
.
.
.
.
C1 C2 C3 C4
Response Count Factor Covar
0 1 1 12
1 19 1 12
0 1 2 12
1 19 2 12
0 5 1 24
1 15 1 24
0 4 2 24
1 16 2 24
0 7 1 50
1 13 1 50
0 8 2 50
1 12 2 50
0 11 1 125
1 2 1 125
0 9 2 125
1 11 2 125
0 19 1 200
1 1 1 200
0 18 2 200
1 2 2 200
Raw Data: one row for each
observation
Frequency Data: one row for each
combination of factor and
1
19
1
19
Binary Logistic Regression Regression
The binary response entered as the number of successes, failures, or trials
Enter one row for each combination of factor and covariate.
Use caution when viewing large regression coefficients
If the absolute value of the regression coefficient is large, exercise caution in judging the
p-value of the test. When the absolute regression coefficients are large, their calculated
standard errors can be too large, leading you to conclude that they are not significant
[13]. If you have one or more large absolute regression coefficients for the factor(s) and/
or covariate(s), the best test is to perform logistic regression both with and without
these terms and make a conclusion based upon the change in the log-likelihood.
If you do test the significance of model terms in this way, your test statistic will be
2(log-likelihood from reduced model log-likelihood from full model). To compute
the p-value for this test, choose Calc Probability Distributions Chi-square. In
Degrees of freedom, enter the model degrees of freedom from full model model
degrees of freedom from reduced model, where the model degrees of freedom are the
number of estimated coefficients. Check Input constant, and enter the test statistic
from above. Store the answer in a constant, say k1, and then calculate the p-value as
1 k1 using Calc Calculator.
Binary Logistic Regression
Use binary logistic regression to perform logistic regression on a binary response
variable. A binary variable only has two possible values, such as presence or absence of a
particular disease. A model with one or more predictors is fit using an
iterative-reweighted least squares algorithm to obtain maximum likelihood estimates of
the parameters [19].
Successes and Trials
C1 C2 C3 C4
S T Factor Covar
19 20 1 12
19 20 2 12
15 20 1 24
16 20 2 24
13 20 1 50
12 20 2 50
9 20 1 125
11 20 2 125
1 20 1 200
2 20 2 200
Successes and Failures
C1 C2 C3 C4
S F Factor Covar
19 1 1 12
19 1 2 12
15 5 1 24
16 4 2 24
13 7 1 50
12 8 2 50
9 11 1 125
11 9 2 125
1 19 1 200
2 18 2 200
Failures and Trials
C1 C2 C3 C4
F T Factor Covar
1 20 1 12
1 20 2 12
5 20 1 24
4 20 2 24
7 20 1 50
8 20 2 50
11 20 1 125
9 20 2 125
19 20 1 200
18 20 2 200
Chapter 2 Binary Logistic Regression
Binary logistic regression has also been used to classify observations into one of two
categories, and it may give fewer classification errors than discriminant analysis for some
cases [10], [23].
Data
Your data must be arranged in your worksheet in one of five ways: as raw data, as
frequency data, as successes and trials, as successes and failures, or as failures and trials.
See Worksheet structure on page 2-32.
Factors, covariates, and response data can be numeric, text, or date/time. The reference
level and the reference event depend on the data type. See Reference levels for factors on
page 2-31 and Reference event for the response variable on page 2-31 for details.
The predictors may either be factors (nominal variables) or covariates (continuous
variables). Factors may be crossed or nested. Covariates may be crossed with each other
or with factors, or nested within factors.
The model can include up to 9 factors and 50 covariates. Unless you specify a predictor
in the model as a factor, the predictor is assumed to be a covariate. Model continuous
predictors as covariates and categorical predictors as factors. See How to specify the
model terms on page 2-30 for more information.
MINITAB automatically omits observations with missing values from all calculations.
h To do a binary logistic regression
1 Choose Stat Regression Binary Logistic Regression.
2 Do one of the following:
I If your data is in raw form, choose Response and enter the column containing the
response variable.
I If your data is in frequency form, choose Response and enter the column
containing the response variable. In Frequency, enter the column containing the
count or frequency variable.
I If your data is in success-trial, success-failure, or failure-trial form, choose Success
with Trial, Success with Failure, or Failure with Trial, and enter the respective
columns in the accompanying boxes.
3 In Model, enter the model terms. See How to specify the model terms on page 2-30.
Options
Binary Logistic Regression dialog box
I include categorical variables (factors) in the model
Graphics subdialog box
I plot delta Pearson
2
, delta deviance, delta based on standardized Pearson
residuals, and delta based on Pearson residuals versus:
the estimated event probability for each distinct factor/covariate pattern
the leverage for each distinct factor/covariate pattern
See Regression diagnostics and residual analysis on page 2-38.
I specify the link function: logit (the default), normit (also called probit), or gompit
(also called complementary log-log)see Link functions on page 2-46
I change the reference event of the response or the reference levels for the factors
see Interpreting the parameter estimates relative to the event and the reference levels on
page 2-39
I specify initial values for model parameters or parameter estimates for a validation
modelsee Entering initial values for parameter estimates on page 2-37
I change the maximum number of iterations for reaching convergence (the default is
20)
I change the number of groups for the Hosmer-Lemeshow goodness-of-fit test from
the default of 10see Groups for the Hosmer-Lemeshow goodness-of-fit test on page
2-38
no output.
basic information on response, parameter estimates, the log-likelihood, and the
test for all slopes being zero.
the default output, which includes the above output plus three goodness-of-fit
tests (Pearson, deviance, and Hosmer-Lemeshow), a table of observed and
expected frequencies, and measures of association.
the default, along with factor level values, and tests for terms with more than 1
degree of freedom. If you choose the logit link function, MINITAB also prints two
Brown goodness-of-fit tests.
I display the log-likelihood at each iteration of the parameter estimation process.
I store the following diagnostic measures:
Pearson, standardized Pearson, and deviance residuals
changes (delta) in: Pearson
2
, the deviance statistic, and the estimated regression
coefficients based on either standardized Pearson or Pearson residuals when the
respective factor/covariate patterns are removed
leverages, the diagonals of the hat matrix
See Regression diagnostics and residual analysis on page 2-38.
I store the following characteristics of the estimated regression equation:
predicted probabilities of success
estimated model coefficients, their standard errors, and the variance-covariance
matrix of the estimated coefficients
the log-likelihood for the last maximum likelihood iteration
I store the following aggregated data:
the number of occurrences for each factor/covariate pattern
the number of trials for each factor/covariate pattern
Link functions
MINITAB provides three link functionslogit (the default), normit (also called probit),
and gompit (also called complementary log-log)allowing you to fit a broad class of
binary response models. These are the inverse of the cumulative logistic distribution
function (logit), the inverse of the cumulative standard normal distribution function
(normit), and the inverse of the Gompertz distribution function (gompit). This class of
models is defined by:
g(
j
) = , where
The link function is the inverse of a distribution function. The link functions and their
corresponding distributions are summarized below (pi in the variance is 3.14159):
You want to choose a link function that results in a good fit to your data.
Goodness-of-fit statistics can be used to compare fits using different link functions.
Certain link functions may be used for historical reasons or because they have a special
meaning in a discipline.
An advantage of the logit link function is that it provides an estimate of the odds ratios.
For a comparison of link functions, see [19].
Entering initial values for parameter estimates
There are several scenarios for which you might enter values for parameter estimates.
For example, you may wish to give starting estimates so that the algorithm converges to
a solution, or you may wish to validate a model with an independent sample.
I ConvergenceThe maximum likelihood solution may not converge if the starting
estimates are not in the neighborhood of the true solution. If the algorithm does not
converge to a solution, you can specify what you think are good starting values for
parameter estimates in Starting estimates for algorithm in the Options subdialog
box.
I ValidationYou may also wish to validate the model with an independent sample.
Typically, this is done by splitting the data into two subsets. Use the first set to
estimate and store the coefficients. If you enter these estimates in Estimates for
validation model in the Options subdialog box, MINITAB will use these values as the
parameter estimates rather than calculating new parameter estimates. Then, you can
assess the model fit for the independent sample.
j
= the probability of a response for the j
th
factor/covariate pattern
g(
j
) = the link function (described below)
0
= the intercept
= a vector of predictor variables associated with the j
th
= a vector of unknown coefficients associated with the predictors
Name Link function Distribution Mean Variance
logit g(
j
) = log
e
(
j
/ (1
j
)) logistic 0 pi
2
/ 3
normit g(
j
) =
-1
(
j
) normal 0 1
gompit g(
j
) = log
e
(log
e
(1
j
)) Gompertz (Euler constant) pi
2
/ 6
0
x
j
+
x
j
In both cases, enter a column with the first entry being the constant estimate, and the
remaining entries corresponding to the model terms in the order in which they appear
in the Model box or the output.
Groups for the Hosmer-Lemeshow goodness-of-fit test
The Hosmer-Lemeshow statistic is the chi-square goodness-of-fit statistic from a 2 (the
number of groups) table. The default number of groups is 10. This may work for a large
number of problems but if the number of distinct factor/covariate patterns is small or
large you may wish to adjust the number of groups. Hosmer and Lemeshow suggest
using a minimum of six groups. See [16] for details.
Regression diagnostics and residual analysis
Following any modeling procedure, it is a good idea to assess the validity of your
model. Logistic regression has a collection of diagnostic plots, goodness-of-fits tests,
and other diagnostic measures to do this. These residuals and diagnostic statistics allow
you to identify factor/covariate patterns that are either poorly fit by the model, have a
strong influence upon the estimated parameters, or which have a high leverage.
MINITAB provides different options for each of these, as listed in the following table (See
Help for computational details). Hosmer and Lemeshow [16] suggest that you interpret
these diagnostics jointly to understand any potential problems with the model.
To identify Use... Which measures...
poorly fit factor/
covariate patterns
Pearson residual the difference between the actual and
predicted observation
standardized
Pearson residual
the difference between the actual and
predicted observation, but standardized to
have = 1
deviance residual deviance residuals, a component of deviance
2

delta chi-square changes in the Pearson
2
when the j
th
factor/
covariate pattern is removed
delta deviance changes in the deviance

when the j
th
factor/
covariate pattern is removed
factor/covariate
patterns with a
strong influence
on parameter
estimates
delta beta changes in the coefficients

when the j
th
factor/
covariate pattern is removedbased on
Pearson residuals
delta beta based
on standardized
Pearson residuals
changes in the coefficients when the j
th
factor/
covariate pattern is removedbased on
standardized Pearson residuals
factor/covariate
patterns with a
large leverage
leverage (Hi) leverages of the j
th
factor/covariate pattern, a
measure of how unusual predictor values are
The graphs available in the Graphs subdialog box allow you to visualize some of these
diagnostics jointly; you can plot a measure useful for identifying poorly fit factor/
covariate patterns (delta chi-square or delta deviance) or a measure useful for
identifying a factor/covariate pattern with a strong influence on parameter estimates
(one of the delta beta statistics) versus either the estimated event probability or
leverage. The estimated event probability is the probability of the event, given the data
and model. Leverages are used to assess how unusual the predictor values are (see
Identifying outliers on page 2-9). See [16] for a further discussion of diagnostic plots. You
can use MINITABs graph brushing capabilities to identify points. See the Brushing Graphs
chapter in MINITAB Users Guide 1 for more information.
Interpreting the parameter estimates relative to the event and the
reference levels
The interpretation of the parameter estimates depends on: the link function (see Link
functions on page 2-36), reference event (see Reference event for the response variable on
page 2-31), and reference factor levels (see Reference levels for factors on page 2-31). A
parameter estimate associated with a predictor (factor or covariate) represents the
change in the link function for each unit change in the predictor, while all other
predictors are held constant. A unit change in a factor refers to a comparison of a
certain level to the reference level.
The logit link provides the most natural interpretation of the parameter estimates and is
therefore the default link in MINITAB. A summary of the interpretation follows:
I The odds of a reference event is the ratio of P(event) to P(not event). The parameter
estimate of a predictor (factor or covariate) is the estimated change in the log of
P(event)/P(not event) for each unit change in the predictor, assuming the other
predictors remain constant.
I The parameter estimates can also be used to calculate the odds ratio, or the ratio
between two odds. Exponentiating the parameter estimate of a factor yields the ratio
of P(event)/P(not event) for a certain factor level compared to the reference level.
The odds ratios at different values of the covariate can be constructed relative to
zero. In the covariate case, it may be more meaningful to interpret the odds and not
the odds ratio. Note that a parameter estimate of zero or an odds ratio of one both
imply the same thingthe factor or covariate has no effect.
You can change the event or reference levels in the Options subdialog box if you wish to
change how you view the parameter estimates. To change the event, specify the new
event value in the Event box. To change the reference level for a factor, specify the
factor variable followed by the new reference level in the Reference factor level box.
You can specify reference levels for more than one factor at the same time. If the levels
are text or date/time, enclose them in double quotes.
e Example of a binary logistic regression
You are a researcher who is interested in understanding the effect of smoking and
weight upon resting pulse rate. Because you have categorized the responsepulse
rateinto low and high, a binary logistic regression analysis is appropriate to
investigate the effects of smoking and weight upon pulse rate.
2 Choose Stat Regression Binary Logistic Regression.
3 In Response, enter RestingPulse. In Model, enter Smokes Weight. In Factors
(optional), enter Smokes.
4 Click Graphs. Check Delta chi-square vs probability and Delta chi-square vs
leverage. Click OK.
5 Click Results. Choose In addition, list of factor level values, tests for terms with
more than 1 degree of freedom, and 2 additional goodness-of-fit tests. Click OK
in each dialog box.
Session
window
output

Binary Logistic Regression: RestingPulse versus Smokes, Weight

Link Function: Logit
Response Information
Variable Value Count
RestingP Low 70 (Event)
High 22
Total 92
Factor Information
Factor Levels Values
Smokes 2 No Yes
Logistic Regression Table
Odds 95% CI
Predictor Coef SE Coef Z P Ratio Lower Upper
Constant -1.987 1.679 -1.18 0.237
Smokes
Yes -1.1930 0.5530 -2.16 0.031 0.30 0.10 0.90
Weight 0.02502 0.01226 2.04 0.041 1.03 1.00 1.05
Log-Likelihood = -46.820
Test that all slopes are zero: G = 7.574, DF = 2, P-Value = 0.023
Goodness-of-Fit Tests
Method Chi-Square DF P
Pearson 40.848 47 0.724
Deviance 51.201 47 0.312
Hosmer-Lemeshow 4.745 8 0.784
Brown:
General Alternative 0.905 2 0.636
Symmetric Alternative 0.463 1 0.496
A
B
C
E
D
Graph
window
output

Table of Observed and Expected Frequencies:
(See Hosmer-Lemeshow Test for the Pearson Chi-Square Statistic)
Group
Value 1 2 3 4 5 6 7 8 9 10 Total
Low
Obs 4 6 6 8 8 6 8 12 10 2 70
Exp 4.4 6.4 6.3 6.6 6.9 7.2 8.3 12.9 9.1 1.9
High
Obs 5 4 3 1 1 3 2 3 0 0 22
Exp 4.6 3.6 2.7 2.4 2.1 1.8 1.7 2.1 0.9 0.1
Total 9 10 9 9 9 9 10 15 10 2 92
Measures of Association:
(Between the Response Variable and Predicted Probabilities)
Pairs Number Percent Summary Measures
Concordant 1045 67.9% Somers D 0.38
Discordant 461 29.9% Goodman-Kruskal Gamma 0.39
Ties 34 2.2% Kendalls Tau-a 0.14
Total 1540 100.0%
G
F
Interpretation of results
The Session window output contains the following seven parts:
a. Response Informationdisplays the number of missing observations and the number of
observations that fall into each of the two response categories. The response value that has
been designated as the reference event is the first entry under Value and labeled as the event.
In this case, the reference event is low pulse rate (see Reference event for the response variable
on page 2-31).
B Factor Informationdisplays all the factors in the model, the number of levels for each
factor, and the factor level values. The factor level that has been designated as the reference
level is first entry under Values, the subject does not smoke (see Reference levels for factors on
page 2-31).
C Logistic Regression Tableshows the estimated coefficients (parameter estimates), standard
error of the coefficients, z-values, and p-values. When you use the logit link function, you
also see the odds ratio and a 95% confidence interval for the odds ratio.
I From the output, you can see that both Smokes (z = 2.16, p = 0.031) and Weight
(z = 2.04, p = 0.041) have p-values less than 0.05, indicating that there is
sufficient evidence that the parameters are not zero using a significance level of
= 0.05.
I The coefficient of -1.193 for Smokes represents the estimated change in the log of
P(low pulse)/P(high pulse) when the subject smokes compared to when he/she
does not smoke, with the covariate Weight held constant. The coefficient of
0.0250 for Weight is the estimated change in the log of P(low pulse)/P(high pulse)
with a 1 unit (lb) increase in Weight, with the factor Smokes held constant.
I Although there is evidence that the parameter of Weight is not zero, the odds ratio
is very close to one (1.03), indicating that a one pound increase in weight
minimally effects a persons resting pulse rate. A more meaningful difference
would be found if you compared subjects with a larger weight difference (for
example, if the weight unit is 10 pounds, the odds ratio becomes 1.28, indicating
that the odds of a subject having a low pulse increases by 1.28 times with each 10
pound increase in weight).
I For Smokes, the negative coefficient of -1.193 and the odds ratio of 0.30 indicate
that subjects who smoke tend to have a higher resting pulse rate than subjects
who do not smoke. Given that subjects have the same weight, the odds ratio can
be interpreted as the odds of non-smokers in the sample having a low pulse being
30% of the odds of smokers having a low pulse.
D Next, the last Log-Likelihood from the maximum likelihood iterations is displayed along with
the statistic G. This statistic tests the null hypothesis that all the coefficients associated with
predictors equal zero versus these coefficients not all being equal to zero. In this example, G
= 7.574, with a p-value of 0.023, indicating that there is sufficient evidence that at least one of
the coefficients is different from zero, given that your accepted level is greater than 0.023.
I Note that for factors with more than 1 degree of freedom, MINITAB performs a
multiple degrees of freedom test with a null hypothesis that all the coefficients
associated with the factor are equal to 0 versus them not all being equal to 0. This
example does not have a factor with more than 1 degree of freedom.
E Goodness-of-Fit Testsdisplays Pearson, deviance, and Hosmer-Lemeshow goodness-of-fit
tests. In addition, two Brown testsgeneral alternative and symmetric alternativeare
displayed because you have chosen the logit link function and the selected option in the
Results subdialog box. The goodness-of-fit tests, with p-values ranging from 0.312 to 0.724,
indicate that there is insufficient evidence to claim that the model does not fit the data
adequately. If the p-value is less than your accepted level, the test would reject the null
hypothesis of an adequate fit.
F Table of Observed and Expected Frequenciesallows you to see how well the model fits
the data by comparing the observed and expected frequencies. There is insufficient evidence
that the model does not fit the data well, as the observed and expected frequencies are
similar. This supports the conclusions made by the Goodness of Fit Tests.
G Measures of Associationdisplay a table of the number and percentage of concordant,
discordant, and tied pairs, as well as common rank correlation statistics. These values
measure the association between the observed responses and the predicted probabilities.
I The table of concordant, discordant, and tied pairs is calculated by pairing the
observations with different response values. Here, you have 70 individuals with a
low pulse and 22 with a high pulse, resulting in 70 22 = 1540 pairs with different
response values. Based on the model, a pair is concordant if the individual with a
low pulse rate has a higher probability of having a low pulse, discordant if the
opposite is true, and tied if the probabilities are equal. In this example, 67.9% of
pairs are concordant and 29.9% are discordant. You can use these values as a
comparative measure of prediction, for example in comparing fits with different
sets of predictors or with different link functions.
I Somers D, Goodman-Kruskal Gamma, and Kendalls Tau-a are summaries of the
table of concordant and discordant pairs. These measures most likely lie between
0 and 1 where larger values indicate that the model has a better predictive ability.
In this example, the measure range from 0.14 to 0.39 which implies less than
desirable predictive ability.
PlotsIn the example, you chose two diagnostic plotsdelta Pearson
2
versus the
estimated event probability and delta Pearson
2
versus the leverage. Delta Pearson
2

for the j
th
factor/covariate pattern is the change in the Pearson
2
when all observations
with that factor/covariate pattern are omitted. These two graphs indicate that two
observations are not well fit by the model (high delta
2
). A high delta
2
can be caused
by a high leverage and/or a high Pearson residual. In this case, a high Pearson residual
caused the large delta
2
, because the leverages are less than 0.1. Hosmer and
Lemeshow indicate that delta
2
or delta deviance greater than 3.84 is large.
If you choose Editor Brush, brush these points, and then click on them, they will be
identified as data values 31 and 66. These are individuals with a high resting pulse, who
Chapter 2 Ordinal Logistic Regression
do not smoke, and who have smaller than average weights (Weight = 116, 136
pounds). You might further investigate these cases to see why the model did not fit
them well.
Ordinal Logistic Regression
Use ordinal logistic regression to perform logistic regression on an ordinal response
variable. Ordinal variables are categorical variables that have three or more possible
levels with a natural ordering, such as strongly disagree, disagree, neutral, agree, and
strongly agree. A model with one or more predictors is fit using an iterative-reweighted
least squares algorithm to obtain maximum likelihood estimates of the parameters [19].
Parallel regression lines are assumed, and therefore, a single slope is calculated for each
covariate. In situations where this assumption is not valid, nominal logistic regression,
which generates separate logit functions, is more appropriate.
Data
Your data may be arranged in one of two ways: as raw data or as frequency data. See
Worksheet structure on page 2-32.
Ordinal Logistic Regression Regression
h To do an ordinal logistic regression
1 Choose Stat Regression Ordinal Logistic Regression.
I If you have raw response data, in Response, enter the numeric column containing
the response data.
I If you have frequency data, in Response, enter the numeric column containing
the response values. Then, in Frequency, enter the variable containing the counts.
Options
Ordinal Logistic Regression dialog box
I include categorical variables (factors) in the model
I specify the link function: logit (the default), normit (also called probit), or gompit
(also called complementary log-log)see Link functions on page 2-46
I specify the order of the response values or change the reference levels for the
factorssee Interpreting the parameter estimates relative to the order of response values
and the reference levels on page 2-47
I change the maximum number of iterations for reaching convergence (default is 20)
no output
information on response, parameter estimates, the log-likelihood, and the test for
all slopes being zero
the default output, which includes the above output plus two goodness-of-fit tests
(Pearson and deviance), and measures of association
the default output plus factor level values, and tests for terms with more than one
degree of freedom
I display the log-likelihood at each iteration of the parameter estimation process
I store the following characteristics of the estimated regression equation:
matrix of the estimated coefficients.
the log-likelihood for the last maximum likelihood iteration.
I store the following event probabilities and/or aggregated data:
the number of trials for each factor/covariate pattern.
event probabilities, cumulative event probabilities, and number of occurrences for
each factor/covariate pattern. Note that the number of events (distinct response
values) must be specified to store these values.
Link functions
MINITAB provides three link functionslogit (the default), normit (also called probit),
and gompit (also called complementary log-log)allowing you to fit a broad class of
ordinal response models. These are the inverse of the cumulative logistic distribution
function (logit), the inverse of the cumulative standard normal distribution function
(normit), and the inverse of the Gompertz distribution function (gompit). This class of
models is defined by:
g(
ij
) =
where
k = the number of distinct values of the response or the number of possible events
ij
= the cumulative probability up to and including event i for the j
th
factor/
covariate pattern
g(
ij
) = the link function (described below)
= the constant associated with the i
th
event
i
x
j
+ , i=1, ..., k-1
i
The link function is the inverse of a distribution function. The link functions and their
corresponding distributions are summarized below ( in the variance is 3.14159):
You want to choose a link function that results in a good fit to your data.
Goodness-of-fit statistics can be used to compare the fits using different link functions.
Certain link functions may be used for historical reasons or because they have a special
meaning in a discipline.
An advantage of the logit link function is that it provides an estimate of the odds ratios.
The logit link function is the default. For a comparison of link functions, see [19].
Interpreting the parameter estimates relative to the order of response
values and the reference levels
The interpretation of the parameter estimates depends on: the link function (see Link
functions on page 2-36), reference event (see Reference event for the response variable on
page 2-31), and reference factor levels (see Reference levels for factors on page 2-31). A
parameter estimate associated with a predictor (factor or covariate) represents the
change in the link function for each unit change in the predictor, while all other
predictors are held constant. A unit change in a factor refers to a comparison of a
certain level to the reference level.
The logit link provides the most natural interpretation of the estimated coefficients and
is therefore the default link in MINITAB. A summary of the interpretation follows:
I The odds of a reference event is the ratio of P(event) to P(not event). The estimated
coefficient of a predictor (factor or covariate) is the estimated change in the log of
P(event)/P(not event) for each unit change in the predictor, assuming the other
predictors remain constant.
I The estimated coefficient can also be used to calculate the odds ratio, or the ratio
between two odds. Exponentiating the parameter estimate of a factor yields the ratio
of P(event)/P(not event) for a certain factor level compared to the reference level.
= a vector of predictor variables associated with the j
th
= a vector of coefficients associated with the predictors
Name Link function Distribution Mean Variance
logit g(
ij
) = log
e
(
ij
/ (1
ij
)) logistic 0 pi
2
/ 3
normit g(
ij
) =
-1
(
ij
) normal 0 1
gompit g(
ij
) = log
e
(log
e
(1
ij
)) Gompertz (Euler constant) pi
2
/ 6
x
j
the odds ratio. Note that a coefficient of zero or an odds ratio of one both imply the
same thingthe factor or covariate has no effect.
You can change the order of response values or the reference level in the Options
subdialog box if you wish to change how you view the parameter estimates. If your
responses were coded Low, Medium, and High, rather than 1, 2, 3, the default
alphabetical ordering of the responses would be improper. To change the order of
response values, specify the new order in the Order of the response values box. To
order as Low, Medium, and High, enter these values in this order, each enclosed in
double quotes. To change the reference level for a factor, specify the factor variable and
the new reference level in the Reference factor level box. You can specify reference
levels for more than one factor at the same time. If the levels are text or date/time,
enclose them in double quotes.
e Example of an ordinal logistic regression
Suppose you are a field biologist and you believe that the adult population of
salamanders in the Northeast has gotten smaller over the past few years. You would like
to determine whether any association exists between the length of time a hatched
salamander survives and level of water toxicity, as well as whether there is a regional
effect. Survival time is coded as 1 if it is less than 10 days, 2 if it is equal to 10 to 30
days, and 3 if it is equal to 31 to 60 days.
2 Choose Stat Regression Ordinal Logistic Regression.
3 In Response, enter Survival. In Model, enter Region ToxicLevel. In Factors (optional),
enter Region.
4 Click Results. Choose In addition, list of factor level values, and tests for terms
with more than 1 degree of freedom. Click OK in each dialog box.
Session
window
output

The Session window contains the following five parts:
a. Response Informationdisplays the number of observations that fall into each of the
response categories and the number of missing observations. The ordered response values,
from lowest to highest, are shown. Here, you use the default coding scheme, which orders
Ordinal Logistic Regression: Survival versus Region, ToxicLevel

Link Function: Logit
Survival 1 15
2 46
3 12
Total 73
Factor Information
Region 2 1 2
Odds 95% CI
Const(1) -7.043 1.680 -4.19 0.000
Const(2) -3.523 1.471 -2.39 0.017
Region
2 0.2015 0.4962 0.41 0.685 1.22 0.46 3.23
ToxicLev 0.12129 0.03405 3.56 0.000 1.13 1.06 1.21
Log-likelihood = -59.290
Pearson 122.799 122 0.463
Deviance 100.898 122 0.918
Measures of Association:
(Between the Response Variable and Predicted Probabilities)
Pairs Number Percent Summary Measures
Concordant 1127 79.3% Somers D 0.59
Discordant 288 20.3% Goodman-Kruskal Gamma 0.59
Ties 7 0.5% Kendalls Tau-a 0.32
Total 1422 100.0%
D
A
B
C
E
the values from lowest to highest: 1 is less than 10 days, 2 is equal to 10 to 30 days, and 3 is
equal to 31 to 60 days (see Reference event for the response variable on page 2-31).
level is the first entry under Values, region 1 (see Reference levels for factors on page 2-31).
error of the coefficients, z-values, and p-values. When you use the logit link function,
MINITAB displays the calculated odds ratio and a 95% confidence interval for the odds ratio.
I The values labeled Const(1) and Const(2) are estimated intercepts for the logits of
the cumulative probabilities of survival for less than 10 days, and for 10 to 30 days,
respectively. Because the cumulative probability for the last response value is 1,
there is no need to estimate an intercept for 31 to 60 days.
I The coefficient of 0.2015 for Region is the estimated change in the logit of the
cumulative survival time probability when the region is 2 compared to region
being 1, with the covariate ToxicLevel held constant. Because the p-value for this
parameter estimate is 0.685, there is insufficient evidence to conclude that Region
has an effect upon survival time.
I There is one parameter estimated for each covariate, which gives parallel lines for
the factor levels. Here, the estimated coefficient for the single covariate,
ToxicLevel, is 0.121, with a p-value of less than 0.0005. The p-value indicates that
there is sufficient evidence to conclude that the toxic level affects survival. The
positive coefficient, and an odds ratio that is greater than one, indicates that
higher toxic levels tend to be associated with lower values of survival.
I Next, MINITAB displays the last Log-Likelihood from the maximum likelihood
iterations along with the statistic G. This statistics tests the null hypothesis that all
the coefficients associated with predictors equal to 0 versus them not all being
equal to 0. In this example, G = 14.713 with a p-value of 0.001, indicating that
there is sufficient evidence to conclude that at least one of the coefficients is
different from zero.
D Goodness-of-Fit Testsdisplays both Pearson and deviance goodness-of-fit tests. In our
example, the p-value for the Pearson test is 0.463, and the p-value for the deviance test is
0.918, indicating that there is insufficient evidence to claim that the model does not fit the
data adequately. If the p-value is less than your selected level, the tests rejects the null
hypothesis of an adequate fit.
E Measures of Associationdisplays a table of the number and percentage of concordant,
discordant, and tied pairs, as well as common rank correlation statistics. These values
measure the association between the observed responses and the predicted probabilities.
I The table of concordant, discordant, and tied pairs is calculated by pairing the
observations with different response values. Here, you have fifteen 1s, fourty-six
2s, and twelve 3s, resulting in (15 46) + (15 12) + (46 12) = 1422 pairs of
different response values. Pairs involving the lowest coded response value (the 1-2
Nominal Logistic Regression Regression
and 1-3 value pairs in the example) are concordant if the cumulative probability
up to the lowest response value (here 1) is greater for the observation with the
lowest value. This pattern applies to other value pairs. Pairs involving responses
coded as 2 and 3 in this example are concordant if the cumulative probability up
to 2 is greater for the observation coded as 2. The pair is discordant if the opposite
is true and tied if the cumulative probabilities are equal. In this example, 79.3% of
pairs are concordant, 20.3% are discordant, and 0.5% are ties. You can use these
values as a comparative measure of prediction (for example, when evaluating
predictors and different link functions).
I Somers D, Goodman-Kruskal Gamma, and Kendalls Tau-a are summaries of the
table of concordant and discordant pairs. The numbers have the same numerator
(the number of concordant pairs minus the number of discordant pairs). The
denominators are the total number of pairs with Somers D, the total number of
pairs excepting ties with Goodman-Kruskal Gamma, and the number of all
possible observation pairs for Kendalls Tau-a. These measures most likely lie
between 0 and 1, where larger values indicate that the model has a better
predictive ability.
Nominal Logistic Regression
Use nominal logistic regression to perform logistic regression on a nominal response
variable using an iterative-reweighted least squares algorithm to obtain maximum
likelihood estimates of the parameters [19]. Nominal variables are categorical variables
that have three or more possible levels with no natural ordering. For example, the levels
in a food tasting study may include crunchy, mushy, and crispy.
Data
Your data may be arranged in one of two ways: as raw data or as frequency data. See
Worksheet structure on page 2-32.
Chapter 2 Nominal Logistic Regression
h To do a nominal logistic regression
1 Choose Stat Regression Nominal Logistic Regression.
I If you have raw response data, in Response, enter the numeric column containing
the response data.
I If you have frequency data, in Response, enter the numeric column containing
the response values. Then, in Frequency, enter the variable containing the counts.
Options
Nominal Logistic Regression dialog box
I define the factors (categorical variables) in the model
I change the reference event of the response or the reference levels for the factors
see Interpreting the parameter estimates relative to the reference event and reference
levels on page 2-54
I change the maximum number of iterations for reaching convergence from the
default of 20
no output
basic information on response, parameter estimates, the log-likelihood, and the
test for all slopes being zero
the default output, which includes the above output plus two goodness-of-fit tests
(Pearson and deviance)
the default output, plus factor level values and tests for terms with more than one
degree of freedom
I display the log-likelihood at each iteration of the parameter estimation process
I store characteristics of the estimated regression equation:
matrix of the estimated coefficients.
the log-likelihood for the last maximum likelihood iteration.
I store event probabilities and/or aggregated data:
the number of trials for each factor/covariate pattern.
event probabilities and number of occurrences for each factor/covariate pattern.
Note that the number of events (distinct response values) must be specified to
store these values.
Nominal logistic regression model
The model fit by nominal logistic regression is defined by:
, where
See [16] for additional discussion.
Interpreting the parameter estimates relative to the reference event
and reference levels
The interpretation of the parameter estimates depends upon the designated reference
event (see Reference event for the response variable on page 2-31) and reference factor
levels (see Reference levels for factors on page 2-31). A parameter estimate associated
with a predictor represents the change in the particular logit for each unit change in the
predictor, assuming that all other factors and covariates are held constant. A one unit
change in a factor refers to a comparison of a certain level to the reference level.
If there are k distinct response values, MINITAB estimates k1 sets of parameter estimates.
These are the estimated differences in log odds or logits of levels of the response
variable relative to the reference event. Each set contains a constant and coefficients for
the factors and the covariates. Note that these sets of parameter estimates give
nonparallel lines for the response value. The interpretation of the parameter estimates is
as follows:
I The coefficient of a predictor (factor or covariate) is the estimated change in the log
of P(response level)/P(reference event) for each unit change in the predictor,
assuming the other predictors remain constant.
I The coefficient can also be used to calculate the odds ratio, or the ratio between two
odds. Exponentiating the parameter estimate of a factor yields the ratio of P(response
level)/P(reference event) for a certain factor level compared to the reference level.
k = the number of distinct values of the response or the number of
possible events
ij
= the probability of the i
th
event for the j
th
(
1j
is the probability of the reference event for the j
th
factor/covariate
pattern)
b
i0
= the intercept for the (i1)
st
logit function
= a vector of predictor variables for the j
th
b
i
= a vector of unknown coefficients associated with the predictors for the
(i1)
st
logit function
log
e
i j
1j
-------

i 0
x
j
+
i
, i=2, ..., k =
x
j
the odds ratio. Note that a coefficient of zero or an odds ratio of one both imply the
same thingthe factor or covariate has no effect.
You can change the reference event or reference levels in the Options subdialog box if
you wish to change how you view the parameter estimates. To change the event,
specify the new event value in the Reference event box. To change the reference
event, give the new level. To change the reference level for a factor, specify the factor
variable followed by the new reference level in the Reference factor level box. You can
specify reference levels for more than one factor at the same time. If the levels are text
or date/time, enclose them in double quotes.
e Example of a nominal logistic regression
Suppose you are a grade school curriculum director interested in what children identify
as their favorite subject and how this subject is associated with their age or the teaching
method employed. Thirty children, 10 to 13 years old, had classroom instruction in
science, math, and language arts that employed either lecture or discussion techniques.
At the end of the school year, they were asked to identify their favorite subject. You use
nominal logistic regression because the response is categorical but possesses no implicit
categorical ordering.
2 Choose Stat Regression Nominal Logistic Regression.
3 In Response, enter Subject. In Model, enter TeachingMethod Age. In Factors
(optional), enter TeachingMethod.
4 Click Results. Choose In addition, list of factor level values, tests for terms with
more than 1 degree of freedom. Click OK in each dialog box.
Session
window
output

The Session window output contains the following five parts:
a. Response Informationdisplays the number of observations that fall into each of the
response categories (science, math, and language arts), as well as the number of missing
observations. The response value that has been designated as the reference event is the first
entry under Value and labeled as the reference event. Here, the default coding scheme
defines the reference event as science using reverse alphabetical order.
Nominal Logistic Regression: Subject versus TeachingMethod, Age

Subject science 10 (Reference Event)
math 11
arts 9
Total 30
Factor Information
Teaching 2 discuss lecture
Odds 95% CI
Logit 1: (math/science)
Constant -1.123 4.564 -0.25 0.806
Teaching
lecture -0.5631 0.9376 -0.60 0.548 0.57 0.09 3.58
Age 0.1247 0.4011 0.31 0.756 1.13 0.52 2.49
Logit 2: (arts/science)
Constant -13.848 7.243 -1.91 0.056
Teaching
lecture 2.770 1.372 2.02 0.044 15.96 1.08 234.91
Age 1.0135 0.5845 1.73 0.083 2.76 0.88 8.66
Log-likelihood = -26.446
Pearson 6.953 10 0.730
Deviance 7.886 10 0.640
A
B
C
D
E
level is the first entry under Values. Here, the default coding scheme defines the reference
level as discussion using alphabetical order.
error of the coefficients, z-values, and p-values. MINITAB also displays the odds ratio and a
95% confidence interval for the odds ratio. The coefficient associated with a predictor is the
estimated change in the logit

with a one unit change in the predictor, assuming that all other
factors and covariates are the same.
I If there are k distinct response values, MINITAB estimates k1 sets of parameter
estimates, here labeled as Logit(1) and Logit(2). These are the estimated
differences in log odds or logits of math and language arts, respectively, compared
to science as the reference event. Each set contains a constant and coefficients for
the factors (here, teaching method) and the covariates (here, age). The
TeachingMethod coefficient is the estimated change in the logit when
TeachingMethod is lecture compared to the teaching method being discussion,
with Age held constant. The Age coefficient is the estimated change in the logit
with a one year increase in age with teaching method held constant. These sets of
parameter estimates give nonparallel lines for the response values.
I The first set of estimated logits, labeled Logit(1), are the parameter estimates of
the change in logits of math relative to the reference event, science. The p-values
of 0.548 and 0.756 for TeachingMethod and Age, respectively, indicate that there
is insufficient evidence to conclude that a change in teaching method from
discussion to lecture or a change in age affected the choice of math as favorite
subject compared to science.
I The second set of estimated logits, labeled Logit(2), are the parameter estimates of
the change in logits of language arts relative to the reference event, science. The
p-values of 0.044 and 0.083 for TeachingMethod and Age, respectively, indicate
that there is sufficient evidence, if the p-values are less than your acceptable
level, to conclude that a change in teaching method from discussion to lecture or
a change in age affected the choice of language arts as the favorite subject
compared to science. The positive coefficient for teaching method indicates
students given a lecture style of teaching tend to prefer language arts over science
compared to students given a discussion style of teaching. The estimated odds
ratio of 15.96 implies that the odds of choosing language arts over science is
about 16 times higher for these students when the teaching method changes
from discussion to lecture. The positive coefficient associated with age indicates
that students tend to like language arts over science as they become older.
D Next, MINITAB displays the last Log-Likelihood from the maximum likelihood iterations
along with the statistic G. G is the difference in 2 log-likelihood for a model that only has
the constant terms and the fitted model shown in the Logistic Regression Table. G is the test
statistic for testing the null hypothesis that all the coefficients associated with predictors being
equal to 0 versus them not all being equal to 0. G = 12.825 with a p-value of 0.012, which
indicates that at = 0.05 there is sufficient evidence for at least one coefficient being
different from 0.
Chapter 2 References
E Goodness-of-Fit Testsdisplays Pearson and deviance goodness-of-fit tests. The p-value for
the Pearson test is 0.730 and the p-value for the deviance test is 0.640, indicating that there is
insufficient evidence for the model not fitting the data adequately. If the p-value is less than
your selected level, the test would indicate sufficient evidence for an inadequate fit.
References
[1] A. Agresti (1984). Analysis of Ordinal Categorical Data, John Wiley & Sons, Inc.
[2] A. Agresti (1990). Categorical Data Analysis, John Wiley & Sons, Inc.
[3] D.A. Belsley, E. Kuh, and R.E. Welsch (1980). Regression Diagnostics, John Wiley &
Sons, Inc.
[4] A. Bhargava (1989). Missing Observations and the Use of the Durbin-Watson
Statistic, Biometrika 76, 4, pp.828831.
[5] C.C. Brown (1982). On a Goodness of fit Test for the Logistic Model Based on
Score Statistics, Communications in Statistics, 11, pp.10871105.
[6] D.A. Burn and T.A. Ryan, Jr. (1983). A Diagnostic Test for Lack of Fit in Regression
Models, ASA 1983 Proceedings of the Statistical Computing Section, pp.286290.
[7] R.D. Cook (1977). Detection of Influential Observations in Linear Regression,
Technometrics 19, pp.1518.
[8] R.D. Cook and S. Weisberg (1982). Residuals and Influence in Regression, Chapman
and Hall.
[9] N.R. Draper and H. Smith (1981). Applied Regression Analysis, Second Edition, John
Wiley & Sons, Inc.
[10] S.E. Fienberg (1987). The Analysis of Cross-Classified Categorical Data. The MIT
Press.
[11] M.J. Garside (1971). Some Computational Procedures for the Best Subset
Problem, Applied Statistics 20, pp.815.
[12] James H. Goodnight (1979). A Tutorial on the Sweep Operator, The American
Statistician 33, pp.149158.
[13] W.W. Hauck and A. Donner (1977). Walds test as applied to hypotheses in logit
analysis. Journal of the American Statistical Association 72, 851-853.
[14] D.C. Hoaglin and R.E. Welsch (1978). The Hat Matrix in Regression and ANOVA,
The American Statistician 32, pp.1722, and Corrigenda 32, p.146.
[15] R.R. Hocking (1976). A Biometrics Invited Paper: The Analysis and Selection of
Variables in Linear Regression, Biometrics 32, pp.149.
References Regression
[16] D.W. Hosmer and S. Lemeshow (1989). Applied Logistic Regression, John Wiley &
Sons, Inc.
[17] LINPACK (1979). Linpack Users Guide by J.J. Dongarra, J.R. Bunch, C.B. Moler, and
G.W. Stewart, Society for Industrial and Applied Mathematics, Philadelphia, PA.
[18] J.H. Maindonald (1984). Statistical Computation, John Wiley & Sons, Inc.
[19] P. McCullagh and J.A. Nelder (1992). Generalized Linear Models, Chapman & Hall.
[20] W. Miller (1978). Performing Armchair Roundoff Analysis of Statistical
Algorithms, Communications in Statistics, pp.243255.
[21] D.C. Montgomery and E.A. Peck (1982). Introduction to Linear Regression Analysis.
John Wiley & Sons.
[22] J. Neter, W. Wasserman, and M. Kutner (1985). Applied Linear Statistical Models,
Richard D. Irwin, Inc.
[23] S.J. Press and S. Wilson (1978). Choosing Between Logistic Regression and
Discriminant Analysis, Journal of the American Statistical Association 73, 699-705.
[24] M. Schatzoff, R. Tsao, and S. Fienberg (1968). Efficient Calculation of All Possible
Regressions, Technometrics 10, pp.769779.
[25] G.W. Stewart (1973). Introduction to Matrix Computations, Academic Press.
[26] R.A. Thisted (1988). Elements of Statistical Computing: Numerical Computation,
Chapman & Hall.
[27] P. Velleman and R. Welsch (1981). Efficient Computation of Regression
Diagnostics, The American Statistician 35, pp.234242.
[28] P.F. Velleman, J. Seaman, and I.E. Allen (1977). Evaluating Package Regression
Routines, ASA 1977 Proceedings of the Statistical Computing Section.
[29] S. Weisberg (1980). Applied Linear Regression, John Wiley & Sons, Inc.
Acknowledgments
We are very grateful for help in the design of the regression algorithm from W. Miller
and from G.W. Stewart and for useful suggestions from. P.F. Velleman and S. Weisberg
and many others.
Chapter 2 References

Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression

Uploaded by

Copyright:

Available Formats

MINITAB Users Guide 2 2-1

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

STREGRSN.MK5 Page 4 Friday, December 17, 1999 12:23 PM

STREGRSN.MK5 Page 12 Friday, December 17, 1999 12:23 PM

You might also like