You are on page 1of 63

Logistic Regression

and Discriminant Analysis


Associate Professor Prapon Sahapattana, Ph.D.
GSPA, NIDA
Topics covered
Logistic Regression
Understand Logistic Regression
Assumptions
The Logistical Model
The Way to Estimate Parameters of a Logistical
Regression Equation
Example of Analysis
How the Equation Predict the DV?
Maximum Likelihood Estimation
More on Test the Goodness of Fit of the model

Charts and examples in this slides came from Charles M. Friel Ph.D., Criminal Justice Center, Sam Houston State University
Topics covered
Discriminant Analysis
Discriminant Analysis Technique and Its Assumptions
Discriminant Analysis Model
How the Parameters of the Model be Estimated?
How to Predict the Group in DV?
Test the Signicance of IVs
Interpretation of the Coefcients
Compare Relative Impact of the Different IV on the DV
Goodness of Fit of the Model
Dependent Variable with Three Groups
Logistic Regression
Logistic Regression
A dependency technique
The dependent variable (Y) is binary
The independent variables (Xk) can be metric and/
or nonmetric
Technique used to predict a binary dependent
variable from one or more metric and/or nonmetric
independent variables
Assumptions
More robust and requires fewer assumptions than
multiple regression and discriminant analysis
Linearity and multivariate normality among Xk are not
necessary but they will increase power
Requires a substantial number of cases relative to
the number of Xk, particularly nonmetric Xk
(suggested number is 30 to 1 or more)
No multicollinearity
No outliers
The Logistical Model
The Logistical Model

P ro b ( e v e n t) = 1

-Z
(1 + e )

Prob (event) = the probability that a case is a member of the


category of the dependent variable that is coded 1
e -z = [1 / e z ]
Z = a + b1X1 + b2X2 + ... + bkXk
Z is called a Logit or Log Odds
Z = Log [ Prob (event) / Prob ( no event) ] or log [P / (1 - P)]
e = is a constant, the base of the natural logarithms (2.71828 ...)

eZ = e a + b1X1 + b2X2 + + bkXk


The Logistical Model

eZ = (P / 1 - P) = e a e b1X1 e b2X2 e bkXk


when Xk increases by one unit, the log odds ln(P / (1 - P)) change by bk unit
when Xk increases by one unit, the odds [P / (1 - P)] change by ebk unit

If bk is positive (+), ebk will be greater than 1, and the odds of the
event will increase
If bk is negative (-), ebk will be less than 1, and the odds of the event
will decrease
If bk is zero (0.0), ebk will equal 1, and the odds of the event will
remain unchanged
Questions Answered by Logistical Regression
Questions Answered by Logistical Regression
To what extent does each predictor variable
contribute to the probability of a case being in one
group or the other of the dependent variable?
How well does the model predict or explain group
membership in the binary dependent variable?
What is the probability that a particular case is in one
group or the other of the dependent variable?
The Way to Estimate Parameters of a Logistical
Regression Equation
Maximum likelihood estimation is used to estimate
the parameters a, b1, b2, + ... + bk

An iterative algorithm that attempts to estimate the


population parameters that most likely produced the
data
The process begins with starting values for the
estimated parameters, then iteratively changes these
values until the best tting model is identied
Example of analysis
A court administrator wishes to determine the type
of counsel that will be required among probationers
who have been revoked due to arrest on a new felony.
Dependent variable (binary)
Type of counsel: 0 = court appointed counsel, 1 = retained counsel
Independent variables
Length of previous probated sentence (sentence)
Number of prior convictions (pr_conv)
Time to disposition on the previous case (tm_disp)
Pretrial jail time on the previous case (jail_tm)
Example of analysis
P ro b ( e v e n t) = 1

-Z
(1 + e )

Z = a + b1(sentence) + b2(pr_conv) + b3(tm_disp) + b4(jail_tm)

= log [P / (1 - P)]

Prob (event) = the probability that a case is a member of the category of the
dependent variable that is coded 1 (retained counsel)
Data for Analysis

Logistic regression will predict type of counsel.


(0 = court appointed, 1 = retained, N0 = N1 = 35)
Analysis Results

P ro b ( e v e n t) = 1

-Z
(1 + e )

Z = a + b1 (sentence) + b2 (pr_conv) + b3 (tm_disp) + b4 (jail_tm)

Z = 3.956 - 0.31 (sentence) - 0.338 (pr_conv) - 0.009 (tm_disp) - 0.025 (jail_tm)


How the Equation Predict the DV?
If Sentence = 8, pr_conv = 1, t_disp = 34, jail_tm = 4
days, what is the probability of the DV = 1?
Z = 3.956 - 0.31(8) - 0.338(1) - 0.009(34) - 0.025(4)
Z = 0.732

P ro b ( e v e n t) = 1

- 0 .7 3 2
(1 + e )

Prob (event) = (1 / 1.4809) = 0.6752


The event being predicted is the category coded 1 in the
dependent variable.
Prob (event) = 0.6752 > Prob (no event) (0.3248)
Thus this case will have retained counsel.
Maximum Likelihood Estimation
Begins by setting values for of a, b1, b2, bk, and then
iteratively changing these values
Attempting at each iteration to improve the
goodness of t of the model to the data
Criterion used to determine whether the iterative
changes improved the goodness-of-t of the model
is Log Likelihood (LL).
Maximum Likelihood Estimation

Likelihood for the perfect model = 1, LL = 0 (-2LL = 0 for the best t;


-2LL = greater value means not t)
The model started with -2LL = 67.02 and tried to reduce the value of -2LL.
The model terminated with -2LL = 54.11
Statistics used to determine the improvement of change in -2LL is Chi-
square
R2 of Regression and -2LL of Logistical Regression
R2 of Regression and -2LL of Logistical Regression
Both used for measuring the goodness of t of the
model
R2 in linear regression ranges from 0 (no relationship)
to 1 (high relationship)
-2LL ranges from 0 (best t) to higher number (not t)
How to Determine the Signicance of Each IV?

Determined by a Wald statistic

Wald = (bk / SEbk)2

bk = logistical coefcient
SEbk= standard error of the logistical coefcient
Null hypothesis: k in the population = 0.0
Expected Change in the Odds Ratio

Exp (b ) = expected change in the odds ratio


k

Also used to interpret a coefcient (b ) k

Odds ratio = (Prob / Prob


(event) )
(no event)

When Xk changes by 1 unit, the odds ratio will change by Exp (b ).k

If b is (+), Exp(B) will be greater than 1; Xk increases the odds of


k
event (1).
If b is (-), Exp(B) will be less than 1; Xk decreases the odds of
k
event (1).
If b is (0), Exp(B) will be equal to 1; Xk has no effect on the odds
k
of event and not related to the DV.
Z and the Prob(event)

P ro b ( e v e n t) = 1

- Z
(1 + e )

If Z is positive (+), Prob (event) will be more than 0.5.


If Z is negative (-), Prob (event) will be less than 0.5.
If Z is equal to 0 (0), Prob (event) will be equal to 0.5.
More on Test the Goodness of Fit of the model
More on Test the Goodness of Fit of the model
Cox-Snell and Negelkerk R2 statistics
Classication table of observation and predictions
Casewise listing of the actual and predicted values of
the dependent variable
Analysis of the standardized or studentized residuals
Cox&Snell and Negelkerk R2 statistics
Cox&Snell R2 is similar to R2 in linear regression.

Rcs2 = 1 - (L0 / L1)2/N



L0 = likelihood of the null model
L1 = likelihood of the nal model

The Cox&Snell Rcs2 can not equal 1.0, even the model
perfectly ts the data.
Negelkerke Rn2 is the modication of Rcs2 that can equal
1.0 if the model is a perfect t.
Classication Table
What percent of the cases were predicted correctly?
What percent incorrectly?

Over all-hit ratio = 82.90% correct


80.0% court appointed counsel correctly predicted
85.7% retained counsel correctly predicted
Casewise listing of the actual and predicted values
of the dependent variable

Case 24, for example, was predicted to have


probability = .038.
The model predicted to be in group 0.
Residual = .962.
Discriminant Analysis
Discriminant Analysis Technique

Z = a + W1X1 + W2X2 + ... + WkXk

Dependent variable is nonmetric.


Independent variables can be metric and/or
nonmetric.
Used to predict or explain a nonmetric dependent
variable with two or more categories
Assumptions
Xk are multivariate normally distributed

Homogeneity of variance-covariance matrices of Xk
across groups

Xk are independent, non-collinear

The relationship is linear in its parameters

Absence of outliers & leverage points
Logistic Regression v Discriminant Analysis
Both techniques can be used with binary DV.
Discriminant Analysis can predict DV with 2 or more
groups.
Discriminant Analysis requires more restrictive
assumptions than logistic regression.
Sum of Square in Discriminant Analysis
Total SS = ( Zi- Z) 2
Between Group SS = ( Zj- Z) 2
Within Groups SS = ( Zij- Zj) 2

Total SS = Between Group SS + Within Groups SS


i = an individual case, j = group j
Zi = individual discriminant score
Z = grand mean of the discriminant scores
Zj = mean discriminant score for group j

Discriminant Analysis Model
Discriminant Analysis Model

Z = a + W1X1 + W2X2 + ... + WkXk

Z = discriminant score, a number used to predict group


membership of a case
a = discriminant constant
Wk = discriminant weight or coefcient, a measure of the extent
to which variable Xk discriminates among the groups of the DV
Xk = an IV can be metric or nonmetric
Discriminant analysis uses OLS to estimate the values of
the parameters (a) and Wk that minimize the Within
Group SS
Data for Analysis
Dependent Variable
Type of sentence (type_sent)
(0 = probation, 1 = prison)

Independent Variables
Degree of drug dependency (dr_score)
Age at rst arrest (age_rs)
Level of work skill (skl_index)
The seriousness of the crime (ser_indx)
(N = 70)
The Discriminant Analysis Model

Z = a + W1(dr_score) + W2(age_rs) + W3(skl_indx)... + W4(ser_indx)

After specify the dependent and the independent


variables were specied, the data should be tested
against models assumptions.
Methods for selection the IVs into the model:
Enter all
Stepwise: Use Wilks' lambda ( = WSS / TSS) criterion
Homogeneity of Variance/Covariance Matrices of
the Two Groups

The variances are on the diagonals, and the


covariances are on the off-diagonals.
Null hypothesis: the variance/covariance matrices of
the two groups are the same in the population.
Use Boxs M test
Homogeneity of Variance/Covariance Matrices of
the Two Groups

Only the assumption, homogeneity of variance/covariance


matrices of IVs across groups, are shown in this hand out.
Box's M = 0.361,
F = 0.116, p = 0.951
Thus, accept null hypothesis that the variance/covariance
matrices of the two groups are the same in the population.
How the Parameters of the Model be Estimated?
How the Parameters of the Model be Estimated?
Z = a + W1(dr_score) + W2(age_rs) + W3(skl_indx)... + W4(ser_indx)
CanonicalDiscriminantFunctionCoef f icients

Function

1
DR_SCORE .235

SER_INDX .564

(Constant) .706

Unstandardizedcoef f icients

From the table:


Z = -0.706 - 0.235 (dr_score) + 0.564 (ser_indx)
Selection criteria for IVs: Stepwise
Notice that 2 IVs were dropped from the model.
How to Predict the Group in DV?

Z = -0.706 - 0.235 (dr_score) + 0.564 (ser_indx)

If there is a case with dr_score = 9, ser_indx = 1, what


type of sentence this case would be?
This case has actual sentence of o.
From the model:
Z = -0.706 - 0.235 (9) + 0.564 (1)
= -2.25
Since the Z value close to 0 more than 1, the case will be predicted
to be in group 0 (probation).
Test the Signicance of IVs
The MANOVA sums of squares are used to calculate
Wilks' lambda () for each predictor by
Use one-way MANOVA with the grouping variable as the
IV and the discriminant predictors as the DVs
= WSS / TSS
VariablesintheAnalysis

Sig.ofFto Wilks'
Step Tolerance Remove Lambda
1 SER_INDX 1.000 .000
2 SER_INDX .864 .000 .983
DR_SCORE .864 .019 .832

H0: the discriminant coefcients in the population are


equal to zero
Interpretation of the Coefcients
Z = -0.706 - 0.235 (dr_score) + 0.564 (ser_indx)

When dr-score increases by one unit, the discriminant


score Z decreases by 0.235,
Holding the seriousness of the offence (ser_indx) constant
The more drug score, the more likely the case will be granted
probation (code = 0)
When ser_indx increases by one unit, the discriminant
score Z increases by 0.564
Holding the drug dependency (dr_score) constant
The more serious the offence, the more likely the case will be
sent to prison (code = 1)
Compare Relative Impact of the Different IV on the
DV

Z = -0.706 - 0.235 (dr_score) + 0.564 (ser_indx)

How to compare the impact of each IV on DV?


Compare the standardized discriminant coefcient
Compare the structure coefcients (discriminant loadings )
Compare the Standardized Discriminant
Coefcient
The discriminant coefcients can be converted to
standardized coefcients (Ck)
Zz = C1ZX1 + C2ZX2 + + CkZXk

C k = Wk (Xk - Xk)2 / (N - g)
Wk = the unstandardized discriminant coefcient of variable k
(Xk - Xk)2 = SS of the predictor variable
N = total sample size
g = number of DV groups
Calculating Standardized Discriminant Coefcients
CanonicalDiscriminantFunctionCoefficients

Function
1
DR_SCORE .235
SER_INDX .564
(Constant) .706

Unstandardizedcoefficients

Cdr_score = - 0.235 495.67/ (70 - 2) = - 0.6345


C ser_indx = + 0.5643 232.857/ (70 - 2) = +1.044

StandardizedCanonicalDiscriminantFunctionCoef f icients

Function

1
DR _SC O RE .625

S E R _ IN D X 1.044

SER_INDX has more discriminatory impact on type of


sentence than DR_SCORE because the absolute value (1.044)
> (0.625)
Structure Coefcient
The correlation between a predictor variable and the
discriminant scores produced by the discriminant
function
Also called discriminant loading
The higher the absolute value of the coefcient, the
greater the discriminatory impact of the predictor
variable on the DV.

The order of the highest discriminant power of the IVs:


SER_INDX, DR_SCORE, SKL_INDX, AND AGE_FIRS
Goodness of Fit of the Model
Goodness of Fit of the Model
Test the goodness of t of the model by:
Eigenvalues ()
Wilks' Lambda ()
Classication Table
Hit Ratio
Maximum Chance Criteria
t-test of the Hit Ratio
Presss Q Statistic
Casewise Plot of the Predictions
Eigenvalues ()
= BSS / WSS
The larger the value of , the greater the
discriminatory power of the model
When = 0.00, the model has no discriminatory
power.
BSS = 0.0 Eigenvalues

Canonical
Function Eigenvalue %ofVariance Cumulative% Correlation
1 a
.305 100.0 100.0 .483

a.
First1canonicaldiscriminantfunctionswereusedinthe
analysis.
Eigenvalues ()
Eigenvalues

Canonical
Function Eigenvalue %ofVariance Cumulative% Correlation
1 a
.305 100.0 100.0 .483

a.

= 0.305
First1canonicaldiscriminantfunctionswereusedinthe
analysis.

The discriminant function can explain the variance by


100% (from all variance that can be explained by the IVs).
Test the signicance of the model
Use Wilks' lambda ()
Wilks'Lambda

Wilks'
T e s t o f F u n c t i o n ( s ) Lambda Chisquare df Sig.
1 .766 17.837 2 .000

Ho: Z0 = Z(1)= Z in the population


2 = 17.837, df = 2, p = 0.0001
Reject Ho and concluded that the differences in the mean
discriminant scores of the two groups are not resulted from
sampling error.
Classication Table
How Well Does the Model Predict?

Correctly classied probationers (0) = 73.0%


Correctly classied prisoners (1) = 57.6%
Overall hit ratio = 65.7%
Maximum Chance Criteria
To answer whether the model predict any better than
chance:
Maximum chance criterion (MCC)
Predict that all 70 cases are in the group with the largest
number of cases
In the data set, probation group, n = 37; Prison group, n = 33
If all cases were predicted to be in probation,
MCC = (37 / 70) (100) = 52.86% correct by chance

MCC = 52.86% v The model = 65.71%


Testing the Hit Ratio
To test whether the model hit ratio is signicantly better
than chance:
t-test for groups of equal size
Press's Q statistic for groups of unequal size
H0: the model hit ratio is no better than chance
For Presss Q statistic
Q = [ N - (n) (g) ] 2 / [ N - (g - 1)]
N = total number of subjects
n = number of cases correctly classied
g = number of groups
Q is chi-square distributed for df = 1
In the example, Q = [ 70 - (46) (2) ] 2 / [ 70 - (2 - 1)] = 7.0145, p <
0.01
Reject Ho -> the model hit ratio is better than chance.
Casewise Plot of the Predictions
Number of Multiple Discriminant Functions
When there are g number of groups in the DV, (g - 1)
functions can be extracted from the data.
DV with 2 groups = 1 functions
DV with 3 groups = 2 functions
If the number of IVs (k) is less than the number of
group (g), the number of functions will be k.
2 IVs and DV with 4 groups = 2 functions
Dependent Variable with Three Groups
Dependent variable: Pre-disposition status (jail, bail,
or ROR(release on recognizant))
Independent variables:
Age of rst arrest (age_rs),
Age at time of arrest (age)
Degree of drug dependency (dr_score)
Number of prior arrests (pr_arrst)
Type of counsel (counsel 0 = court appointed, 1 =
retained)
Discriminant Functions
CanonicalDiscriminantFunctionCoefficients

Function
1 2
AGE .146 .253
COUNSEL 1.946 1.682
(Constant) 2.375 6.655

Since the DV has 3 groups, 2 functions were extracted.


Unstandardizedcoefficients

Z1 = 2.375 - 0.146 (age) + 1.946 (counsel)


Z2 = -6.655 + 0.253 (age) + 1.682 (counsel)
The chi-square test of the Wilks' is used to test the
signicance of each function.
Only the rst function is found signicant (from print out not
shown).
Correlation Between Each of the Predictor Variables
and the Discriminant Scores Produced By the
Two Functions
StructureMatrix

Function
1 2
COUNSEL .867 * .499
a
AGE_FIRS .291 * .067
PR_ARRST a
.219 * .190
AGE .654 .757 *
DR_SCORE a
.109 .195 *

Pooledwithingroupscorrelationsbetweendiscriminating
variablesandstandardizedcanonicaldiscriminantfunctions
Variablesorderedbyabsolutesizeofcorrelationwithinfunction.
*.
Largestabsolutecorrelationbetweeneachvariableand
anydiscriminantfunction
a.
Thisvariablenotusedintheanalysis.

The IVs counsel, age_rs, and pr_arrst load highest


on the 1st function
The IVs age and dr_score load highest on the 2nd
function.
Casewise plot of the cases
Hit Ratio of the Discriminant Model

Overall Hit Ratio = (44 / 70) (100) = 62.9%


Errors = (26 / 70) (100) = 37.14%

You might also like