Logistic Regression and Discriminant Analysis PDF

Logistic Regression
and Discriminant Analysis

Associate Professor Prapon Sahapattana, Ph.D.
GSPA, NIDA
Topics covered
Logistic Regression
Understand Logistic Regression
Assumptions
The Logistical Model
The Way to Estimate Parameters of a Logistical
Regression Equation
Example of Analysis
How the Equation Predict the DV?
Maximum Likelihood Estimation
More on Test the Goodness of Fit of the model
Charts and examples in this slides came from Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
Topics covered
Discriminant Analysis
Discriminant Analysis Technique and Its Assumptions
Discriminant Analysis Model
How the Parameters of the Model be Estimated?
How to Predict the Group in DV?
Test the Signicance of IVs
Interpretation of the Coefcients
Compare Relative Impact of the Different IV on the DV
Goodness of Fit of the Model
Dependent Variable with Three Groups
Logistic Regression
Logistic Regression
A dependency technique
The dependent variable (Y) is binary
The independent variables (Xk) can be metric and/
or nonmetric
Technique used to predict a binary dependent
variable from one or more metric and/or nonmetric
independent variables
Assumptions
More robust and requires fewer assumptions than
multiple regression and discriminant analysis
Linearity and multivariate normality among Xk are not
necessary but they will increase power
Requires a substantial number of cases relative to
the number of Xk, particularly nonmetric Xk
(suggested number is 30 to 1 or more)
No multicollinearity
No outliers
P ro b ( e v e n t) = 1
-Z
(1 + e )
Prob (event) = the probability that a case is a member of the

category of the dependent variable that is coded 1
e -z = [1 / e z ]
Z = a + b1X1 + b2X2 + ... + bkXk
Z is called a Logit or Log Odds
Z = Log [ Prob (event) / Prob ( no event) ] or log [P / (1 - P)]
e = is a constant, the base of the natural logarithms (2.71828 ...)
eZ = e a + b1X1 + b2X2 + + bkXk

eZ = (P / 1 - P) = e a e b1X1 e b2X2 e bkXk

when Xk increases by one unit, the log odds ln(P / (1 - P)) change by bk unit
when Xk increases by one unit, the odds [P / (1 - P)] change by ebk unit
If bk is positive (+), ebk will be greater than 1, and the odds of the
event will increase
If bk is negative (-), ebk will be less than 1, and the odds of the event
will decrease
If bk is zero (0.0), ebk will equal 1, and the odds of the event will
remain unchanged
Questions Answered by Logistical Regression
Questions Answered by Logistical Regression
To what extent does each predictor variable
contribute to the probability of a case being in one
group or the other of the dependent variable?
How well does the model predict or explain group
membership in the binary dependent variable?
What is the probability that a particular case is in one
group or the other of the dependent variable?
The Way to Estimate Parameters of a Logistical
Regression Equation
Maximum likelihood estimation is used to estimate
the parameters a, b1, b2, + ... + bk
An iterative algorithm that attempts to estimate the

population parameters that most likely produced the
data
The process begins with starting values for the
estimated parameters, then iteratively changes these
values until the best tting model is identied
Example of analysis
A court administrator wishes to determine the type
of counsel that will be required among probationers
who have been revoked due to arrest on a new felony.
Dependent variable (binary)
Type of counsel: 0 = court appointed counsel, 1 = retained counsel
Independent variables
Length of previous probated sentence (sentence)
Number of prior convictions (pr_conv)
Time to disposition on the previous case (tm_disp)
Pretrial jail time on the previous case (jail_tm)
Example of analysis
P ro b ( e v e n t) = 1
-Z
(1 + e )
Z = a + b1(sentence) + b2(pr_conv) + b3(tm_disp) + b4(jail_tm)
= log [P / (1 - P)]
Prob (event) = the probability that a case is a member of the category of the
dependent variable that is coded 1 (retained counsel)
Data for Analysis
Logistic regression will predict type of counsel.

(0 = court appointed, 1 = retained, N0 = N1 = 35)
Analysis Results
P ro b ( e v e n t) = 1
-Z
(1 + e )
Z = a + b1 (sentence) + b2 (pr_conv) + b3 (tm_disp) + b4 (jail_tm)
Z = 3.956 - 0.31 (sentence) - 0.338 (pr_conv) - 0.009 (tm_disp) - 0.025 (jail_tm)

How the Equation Predict the DV?
If Sentence = 8, pr_conv = 1, t_disp = 34, jail_tm = 4
days, what is the probability of the DV = 1?
Z = 3.956 - 0.31(8) - 0.338(1) - 0.009(34) - 0.025(4)
Z = 0.732
P ro b ( e v e n t) = 1
- 0 .7 3 2
(1 + e )
Prob (event) = (1 / 1.4809) = 0.6752

The event being predicted is the category coded 1 in the
dependent variable.
Prob (event) = 0.6752 > Prob (no event) (0.3248)
Thus this case will have retained counsel.
Begins by setting values for of a, b1, b2, bk, and then
iteratively changing these values
Attempting at each iteration to improve the
goodness of t of the model to the data
Criterion used to determine whether the iterative
changes improved the goodness-of-t of the model
is Log Likelihood (LL).
Likelihood for the perfect model = 1, LL = 0 (-2LL = 0 for the best t;

-2LL = greater value means not t)
The model started with -2LL = 67.02 and tried to reduce the value of -2LL.
The model terminated with -2LL = 54.11
Statistics used to determine the improvement of change in -2LL is Chi-
square
R2 of Regression and -2LL of Logistical Regression
R2 of Regression and -2LL of Logistical Regression
Both used for measuring the goodness of t of the
model
R2 in linear regression ranges from 0 (no relationship)
to 1 (high relationship)
-2LL ranges from 0 (best t) to higher number (not t)
How to Determine the Signicance of Each IV?
Determined by a Wald statistic
Wald = (bk / SEbk)2
bk = logistical coefcient
SEbk= standard error of the logistical coefcient
Null hypothesis: k in the population = 0.0
Expected Change in the Odds Ratio
Exp (b ) = expected change in the odds ratio

k
Also used to interpret a coefcient (b ) k
Odds ratio = (Prob / Prob

(event) )
(no event)
When Xk changes by 1 unit, the odds ratio will change by Exp (b ).k
If b is (+), Exp(B) will be greater than 1; Xk increases the odds of

k
event (1).
If b is (-), Exp(B) will be less than 1; Xk decreases the odds of
k
event (1).
If b is (0), Exp(B) will be equal to 1; Xk has no effect on the odds
k
of event and not related to the DV.
Z and the Prob(event)
P ro b ( e v e n t) = 1
- Z
(1 + e )
If Z is positive (+), Prob (event) will be more than 0.5.

If Z is negative (-), Prob (event) will be less than 0.5.
If Z is equal to 0 (0), Prob (event) will be equal to 0.5.
Cox-Snell and Negelkerk R2 statistics
Classication table of observation and predictions
Casewise listing of the actual and predicted values of
the dependent variable
Analysis of the standardized or studentized residuals
Cox&Snell and Negelkerk R2 statistics
Cox&Snell R2 is similar to R2 in linear regression.
Rcs2 = 1 - (L0 / L1)2/N

L0 = likelihood of the null model
L1 = likelihood of the nal model
The Cox&Snell Rcs2 can not equal 1.0, even the model
perfectly ts the data.
Negelkerke Rn2 is the modication of Rcs2 that can equal
1.0 if the model is a perfect t.
Classication Table
What percent of the cases were predicted correctly?
What percent incorrectly?
Over all-hit ratio = 82.90% correct

80.0% court appointed counsel correctly predicted
85.7% retained counsel correctly predicted
Casewise listing of the actual and predicted values
of the dependent variable
Case 24, for example, was predicted to have

probability = .038.
The model predicted to be in group 0.
Residual = .962.
Discriminant Analysis
Discriminant Analysis Technique
Z = a + W1X1 + W2X2 + ... + WkXk
Dependent variable is nonmetric.

Independent variables can be metric and/or
nonmetric.
Used to predict or explain a nonmetric dependent
variable with two or more categories
Assumptions
Xk are multivariate normally distributed

Homogeneity of variance-covariance matrices of Xk
across groups

Xk are independent, non-collinear

The relationship is linear in its parameters

Absence of outliers & leverage points
Logistic Regression v Discriminant Analysis
Both techniques can be used with binary DV.
Discriminant Analysis can predict DV with 2 or more
groups.
Discriminant Analysis requires more restrictive
assumptions than logistic regression.
Sum of Square in Discriminant Analysis
Total SS = ( Zi- Z) 2
Between Group SS = ( Zj- Z) 2
Within Groups SS = ( Zij- Zj) 2
Total SS = Between Group SS + Within Groups SS

i = an individual case, j = group j
Zi = individual discriminant score
Z = grand mean of the discriminant scores
Zj = mean discriminant score for group j

Z = a + W1X1 + W2X2 + ... + WkXk
Z = discriminant score, a number used to predict group

membership of a case
a = discriminant constant
Wk = discriminant weight or coefcient, a measure of the extent
to which variable Xk discriminates among the groups of the DV
Xk = an IV can be metric or nonmetric
Discriminant analysis uses OLS to estimate the values of
the parameters (a) and Wk that minimize the Within
Group SS
Data for Analysis
Dependent Variable
Type of sentence (type_sent)
(0 = probation, 1 = prison)

Independent Variables
Degree of drug dependency (dr_score)
Age at rst arrest (age_rs)
Level of work skill (skl_index)
The seriousness of the crime (ser_indx)
(N = 70)
The Discriminant Analysis Model
Z = a + W1(dr_score) + W2(age_rs) + W3(skl_indx)... + W4(ser_indx)
After specify the dependent and the independent

variables were specied, the data should be tested
against models assumptions.
Methods for selection the IVs into the model:
Enter all
Stepwise: Use Wilks' lambda ( = WSS / TSS) criterion
Homogeneity of Variance/Covariance Matrices of
the Two Groups
The variances are on the diagonals, and the

covariances are on the off-diagonals.
Null hypothesis: the variance/covariance matrices of
the two groups are the same in the population.
Use Boxs M test
Homogeneity of Variance/Covariance Matrices of
the Two Groups
Only the assumption, homogeneity of variance/covariance

matrices of IVs across groups, are shown in this hand out.
Box's M = 0.361,
F = 0.116, p = 0.951
Thus, accept null hypothesis that the variance/covariance
matrices of the two groups are the same in the population.
Z = a + W1(dr_score) + W2(age_rs) + W3(skl_indx)... + W4(ser_indx)
CanonicalDiscriminantFunctionCoef f icients
Function
1
DR_SCORE .235
SER_INDX .564
(Constant) .706
Unstandardizedcoef f icients
From the table:

Z = -0.706 - 0.235 (dr_score) + 0.564 (ser_indx)
Selection criteria for IVs: Stepwise
Notice that 2 IVs were dropped from the model.
How to Predict the Group in DV?
If there is a case with dr_score = 9, ser_indx = 1, what

type of sentence this case would be?
This case has actual sentence of o.
From the model:
Z = -0.706 - 0.235 (9) + 0.564 (1)
= -2.25
Since the Z value close to 0 more than 1, the case will be predicted
to be in group 0 (probation).
Test the Signicance of IVs
The MANOVA sums of squares are used to calculate
Wilks' lambda () for each predictor by
Use one-way MANOVA with the grouping variable as the
IV and the discriminant predictors as the DVs
= WSS / TSS
VariablesintheAnalysis
Sig.ofFto Wilks'
Step Tolerance Remove Lambda
1 SER_INDX 1.000 .000
2 SER_INDX .864 .000 .983
DR_SCORE .864 .019 .832
H0: the discriminant coefcients in the population are

equal to zero
Interpretation of the Coefcients
When dr-score increases by one unit, the discriminant

score Z decreases by 0.235,
Holding the seriousness of the offence (ser_indx) constant
The more drug score, the more likely the case will be granted
probation (code = 0)
When ser_indx increases by one unit, the discriminant
score Z increases by 0.564
Holding the drug dependency (dr_score) constant
The more serious the offence, the more likely the case will be
sent to prison (code = 1)
Compare Relative Impact of the Different IV on the
DV
How to compare the impact of each IV on DV?

Compare the standardized discriminant coefcient
Compare the structure coefcients (discriminant loadings )
Compare the Standardized Discriminant
Coefcient
The discriminant coefcients can be converted to
standardized coefcients (Ck)
Zz = C1ZX1 + C2ZX2 + + CkZXk
C k = Wk (Xk - Xk)2 / (N - g)
Wk = the unstandardized discriminant coefcient of variable k
(Xk - Xk)2 = SS of the predictor variable
N = total sample size
g = number of DV groups
Calculating Standardized Discriminant Coefcients
CanonicalDiscriminantFunctionCoefficients
Function
1
DR_SCORE .235
SER_INDX .564
(Constant) .706
Unstandardizedcoefficients
Cdr_score = - 0.235 495.67/ (70 - 2) = - 0.6345

C ser_indx = + 0.5643 232.857/ (70 - 2) = +1.044
StandardizedCanonicalDiscriminantFunctionCoef f icients
Function
1
DR _SC O RE .625
S E R _ IN D X 1.044
SER_INDX has more discriminatory impact on type of

sentence than DR_SCORE because the absolute value (1.044)
> (0.625)
Structure Coefcient
The correlation between a predictor variable and the
discriminant scores produced by the discriminant
function
Also called discriminant loading
The higher the absolute value of the coefcient, the
greater the discriminatory impact of the predictor
variable on the DV.
The order of the highest discriminant power of the IVs:

SER_INDX, DR_SCORE, SKL_INDX, AND AGE_FIRS
Test the goodness of t of the model by:
Eigenvalues ()
Wilks' Lambda ()
Classication Table
Hit Ratio
Maximum Chance Criteria
t-test of the Hit Ratio
Presss Q Statistic
Casewise Plot of the Predictions
Eigenvalues ()
= BSS / WSS
The larger the value of , the greater the
discriminatory power of the model
When = 0.00, the model has no discriminatory
power.
BSS = 0.0 Eigenvalues
Canonical
Function Eigenvalue %ofVariance Cumulative% Correlation
1 a
.305 100.0 100.0 .483
a.
First1canonicaldiscriminantfunctionswereusedinthe
analysis.
Eigenvalues ()
Eigenvalues
Canonical
Function Eigenvalue %ofVariance Cumulative% Correlation
1 a
.305 100.0 100.0 .483
a.
= 0.305
First1canonicaldiscriminantfunctionswereusedinthe
analysis.
The discriminant function can explain the variance by

100% (from all variance that can be explained by the IVs).
Test the signicance of the model
Use Wilks' lambda ()
Wilks'Lambda
Wilks'
T e s t o f F u n c t i o n ( s ) Lambda Chisquare df Sig.
1 .766 17.837 2 .000
Ho: Z0 = Z(1)= Z in the population

2 = 17.837, df = 2, p = 0.0001
Reject Ho and concluded that the differences in the mean
discriminant scores of the two groups are not resulted from
sampling error.
Classication Table
How Well Does the Model Predict?
Correctly classied probationers (0) = 73.0%

Correctly classied prisoners (1) = 57.6%
Overall hit ratio = 65.7%
Maximum Chance Criteria
To answer whether the model predict any better than
chance:
Maximum chance criterion (MCC)
Predict that all 70 cases are in the group with the largest
number of cases
In the data set, probation group, n = 37; Prison group, n = 33
If all cases were predicted to be in probation,
MCC = (37 / 70) (100) = 52.86% correct by chance
MCC = 52.86% v The model = 65.71%

Testing the Hit Ratio
To test whether the model hit ratio is signicantly better
than chance:
t-test for groups of equal size
Press's Q statistic for groups of unequal size
H0: the model hit ratio is no better than chance
For Presss Q statistic
Q = [ N - (n) (g) ] 2 / [ N - (g - 1)]
N = total number of subjects
n = number of cases correctly classied
g = number of groups
Q is chi-square distributed for df = 1
In the example, Q = [ 70 - (46) (2) ] 2 / [ 70 - (2 - 1)] = 7.0145, p <
0.01
Reject Ho -> the model hit ratio is better than chance.
Casewise Plot of the Predictions
Number of Multiple Discriminant Functions
When there are g number of groups in the DV, (g - 1)
functions can be extracted from the data.
DV with 2 groups = 1 functions
DV with 3 groups = 2 functions
If the number of IVs (k) is less than the number of
group (g), the number of functions will be k.
2 IVs and DV with 4 groups = 2 functions
Dependent Variable with Three Groups
Dependent variable: Pre-disposition status (jail, bail,
or ROR(release on recognizant))
Independent variables:
Age of rst arrest (age_rs),
Age at time of arrest (age)
Degree of drug dependency (dr_score)
Number of prior arrests (pr_arrst)
Type of counsel (counsel 0 = court appointed, 1 =
retained)
Discriminant Functions
CanonicalDiscriminantFunctionCoefficients
Function
1 2
AGE .146 .253
COUNSEL 1.946 1.682
(Constant) 2.375 6.655
Since the DV has 3 groups, 2 functions were extracted.

Unstandardizedcoefficients
Z1 = 2.375 - 0.146 (age) + 1.946 (counsel)

Z2 = -6.655 + 0.253 (age) + 1.682 (counsel)
The chi-square test of the Wilks' is used to test the
signicance of each function.
Only the rst function is found signicant (from print out not
shown).
Correlation Between Each of the Predictor Variables
and the Discriminant Scores Produced By the
Two Functions
StructureMatrix
Function
1 2
COUNSEL .867 * .499
a
AGE_FIRS .291 * .067
PR_ARRST a
.219 * .190
AGE .654 .757 *
DR_SCORE a
.109 .195 *
Pooledwithingroupscorrelationsbetweendiscriminating
variablesandstandardizedcanonicaldiscriminantfunctions
Variablesorderedbyabsolutesizeofcorrelationwithinfunction.
*.
Largestabsolutecorrelationbetweeneachvariableand
anydiscriminantfunction
a.
Thisvariablenotusedintheanalysis.
The IVs counsel, age_rs, and pr_arrst load highest

on the 1st function
The IVs age and dr_score load highest on the 2nd
function.
Casewise plot of the cases
Hit Ratio of the Discriminant Model
Overall Hit Ratio = (44 / 70) (100) = 62.9%

Errors = (26 / 70) (100) = 37.14%

Logistic Regression and Discriminant Analysis PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Logistic Regression and Discriminant Analysis PDF

Uploaded by

Copyright:

Available Formats

Logistic Regression

and Discriminant Analysis

Prob (event) = the probability that a case is a member of the

eZ = e a + b1X1 + b2X2 + + bkXk

eZ = (P / 1 - P) = e a e b1X1 e b2X2 e bkXk

An iterative algorithm that attempts to estimate the

Z = a + b1(sentence) + b2(pr_conv) + b3(tm_disp) + b4(jail_tm)

Logistic regression will predict type of counsel.

Z = a + b1 (sentence) + b2 (pr_conv) + b3 (tm_disp) + b4 (jail_tm)

Z = 3.956 - 0.31 (sentence) - 0.338 (pr_conv) - 0.009 (tm_disp) - 0.025 (jail_tm)

Prob (event) = (1 / 1.4809) = 0.6752

Likelihood for the perfect model = 1, LL = 0 (-2LL = 0 for the best t;

Determined by a Wald statistic

Wald = (bk / SEbk)2

Exp (b ) = expected change in the odds ratio

Also used to interpret a coefcient (b ) k

Odds ratio = (Prob / Prob

If b is (+), Exp(B) will be greater than 1; Xk increases the odds of

If Z is positive (+), Prob (event) will be more than 0.5.

Rcs2 = 1 - (L0 / L1)2/N

Over all-hit ratio = 82.90% correct

Case 24, for example, was predicted to have

Z = a + W1X1 + W2X2 + ... + WkXk

Dependent variable is nonmetric.

Total SS = Between Group SS + Within Groups SS

Z = a + W1X1 + W2X2 + ... + WkXk

Z = discriminant score, a number used to predict group

Z = a + W1(dr_score) + W2(age_rs) + W3(skl_indx)... + W4(ser_indx)

After specify the dependent and the independent

The variances are on the diagonals, and the

Only the assumption, homogeneity of variance/covariance

From the table:

Z = -0.706 - 0.235 (dr_score) + 0.564 (ser_indx)

If there is a case with dr_score = 9, ser_indx = 1, what

H0: the discriminant coefcients in the population are

When dr-score increases by one unit, the discriminant

Z = -0.706 - 0.235 (dr_score) + 0.564 (ser_indx)

How to compare the impact of each IV on DV?

Cdr_score = - 0.235 495.67/ (70 - 2) = - 0.6345

SER_INDX has more discriminatory impact on type of

The order of the highest discriminant power of the IVs:

The discriminant function can explain the variance by

Ho: Z0 = Z(1)= Z in the population

Correctly classied probationers (0) = 73.0%

MCC = 52.86% v The model = 65.71%

Since the DV has 3 groups, 2 functions were extracted.

Z1 = 2.375 - 0.146 (age) + 1.946 (counsel)

The IVs counsel, age_rs, and pr_arrst load highest

Overall Hit Ratio = (44 / 70) (100) = 62.9%

You might also like