Professional Documents
Culture Documents
Michael Friendly
Arthritis treatment data Linear and Logit Regressions on Age 1.0 Probability (Improved) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
York University
-1
-2
Gender -3
Female Male
20
A B C D Department E F
30
40
50 AGE
60
70
80
40 35 30
-3.1
Sqrt(frequency)
25 20 15 10 5 0 -5 0 2 4 6 8 10 Number of males 12
Plots for logit models Diagnostic plots for generalized linear models
Logistic regression: Binary response Model plots Eect plots for generalized linear models Inuence measures and diagnostic plots
2 / 77
Logit models
Brown
Logit models
Logit models
For a binary response, each loglinear model is equivalent to a logit model (logistic regression, with categorical predictors) e.g., Admit Gender | Dept (conditional independence [AD][DG])
D G AD DG log mijk = + A i + j + k + ij + jk
(7) (8)
Thus, subtracting (7)-(8), terms not involving Admit will cancel: Ljk = = where,
: overall log odds of admission jDept : eect on admissions of department, associations among predictors are assumed, but dont appear in the logit model 4 / 77
= + jDept
(renaming terms)
3 / 77
Logit models
Logit models
Logit models
Other loglinear models have similar, simpler forms as logit models, where only the relations of the response to the predictors appear in the equivalent logit model. Admit Gender Dept (mutual independence [A][D][G]) log mijk Ljk
D G = + A i + j + k
A (A 1 2 ) =
Visualization procedures
CATPLOT macro - plot predicted, observed log odds from CATMOD INFLGLIM macro - inuence plots for generalized linear models HALFNORM macro - half-normal plot of residuals for generalized linear models
Admit Gender | Dept, except for Dept. A log mijk Ljk where,
jDept : eect on admissions for department j , (j =1) Gender : 1 df term for eect of gender in Dept. A.
D G AD DG AG + A i + j + k + ij + jk + (j =1) ik
SAS craft
All SAS procedures output dataset with obs., tted values, residuals, diagnostics, etc. New model new output dataset Plotting steps remain the same Similar ideas for SPSS, R
5 / 77 Logit models Plots for logit models Logit models Plots for logit models
6 / 77
.75
.50
Plots observed and predicted on the logit scale (type=FUNCTION) Main eects model parallel proles Probabilities on a separate scale (added below)
Probability (Admitted)
-1
.25
-2 .10
.50
Gender -3
Female Male
.05
-1
.25
C D Department
-2 .10
Gender -3
Female Male
.05
C D Department
7 / 77
8 / 77
Logit models
Logit models
PROC CATMOD output data set: observed & predicted, probabilities & logits
dept A A A A A A B B B B B B ... F F F F F F gender Male Male Male Female Female Female Male Male Male Female Female Female Male Male Male Female Female Female admit Admit Reject Admit Reject Admit Reject Admit Reject Admit Reject Admit Reject _TYPE_ FUNCTION PROB PROB FUNCTION PROB PROB FUNCTION PROB PROB FUNCTION PROB PROB FUNCTION PROB PROB FUNCTION PROB PROB _OBS_ 0.492 0.621 0.379 1.544 0.824 0.176 0.534 0.630 0.370 0.754 0.680 0.320 -2.770 0.059 0.941 -2.581 0.070 0.930 _PRED_ 0.582 0.642 0.358 0.682 0.664 0.336 0.539 0.631 0.369 0.639 0.654 0.346 -2.724 0.062 0.938 -2.625 0.068 0.932 _SEPRED_ 0.069 0.016 0.016 0.099 0.022 0.022 0.086 0.020 0.020 0.116 0.026 0.026 0.158 0.009 0.009 0.158 0.010 0.010
%include catdata(berkeley); proc catmod order=data data=berkeley; weight freq; response / out=predict; model admit = dept gender / ml; run;
No eect of Gender; big eect of Dept LR test (vs. saturated model): Model doesnt t well Why? How to modify?
9 / 77 Logit models Plots for logit models
This contains both the observed and tted logit values (_TYPE_='FUNCTION') and probabilities (_TYPE_='PROB')
10 / 77 Logit models CATPLOT macro
CATPLOT macro
Plot logit values (_TYPE_='FUNCTION') or probabilities (_TYPE_='PROB') With PSCALE macro, can plot on logit scale, with probability scale on right.
CATPLOT macro
Model: logit(Admit) = Dept Gender .90 2
.75
9 10 11 12 13 14 15 16 17 18 19 20
catberk2.sas %pscale(lo=-4, hi=3, anno=pscale); title 'Model: logit(Admit) = Dept Gender' a=-90 'Probability (Admitted)'; axis1 order=(-3 to 2) offset=(4) label=(a=90 'Log Odds (Admitted)'); axis2 label=('Department') offset=(4); %catplot(data=predict, class=gender, xc=dept, type=FUNCTION, /* plot logit values */ z=1.96, /* show 1.96 x SE -> 95% CI */ anno=pscale); /* add probability scale */
Probability (Admitted)
.50
-1
.25
-2 .10
Gender -3
Female Male
.05
C D Department
11 / 77
12 / 77
Logit models
CATPLOT macro
Logit models
CATPLOT macro
proc catmod order=data data=berkeley; response / out=predict; model admit = dept dept1AG / ml; %catplot(data=predict, xc=dept, class=gender, type=FUNCTION, z=1.96, legend=legend1);
logit(Admit) = Dept DeptA*Gender
2
9 10 11 12 13 14 15 16
Gender
-1
-2
-3
Female Male
C D Department
14 / 77
Analysis of Maximum Likelihood Estimates Standard ChiParameter Estimate Error Square Pr > ChiSq -------------------------------------------------------Intercept -0.6685 0.0392 291.22 <.0001 dept A 1.1606 0.0705 271.21 <.0001 B 1.2113 0.0802 227.95 <.0001 C 0.0528 0.0687 0.59 0.4426 D 0.00358 0.0727 0.00 0.9607 E -0.4210 0.0871 23.34 <.0001 dept1AG 1.0521 0.2627 16.04 <.0001
Logit models
CATPLOT macro
Logit models
catberk6.sas title 'logit(Admit) = Dept DeptA*Gender'; %catplot(data=predict, x=dept, class=gender, type=FUNCTION, /* plot the log odds */ z=1.96); /* 95% error bars */
-1
-2
Gender -3
Female Male
C D Department
17 / 77 Logit models Diagnostic plots for GLMs Logit models Diagnostic plots for GLMs
18 / 77
genberk1.sas %include catdata(berkeley); *-- make a cell ID variable, joining factors; data berkeley; set berkeley; cell = trim(put(dept,dept.)) || gender || trim(put(admit,yn.)); %inflglim(data=berkeley, class=dept gender admit, resp=freq, model=admit|dept gender|dept, dist=poisson, id=cell, gx=hat, gy=streschi);
19 / 77
All cells which do not t (|ri | > 2) are for department A. Males applying to dept A have large leverage large inuence (Cooks D)
20 / 77
Logit models
Logit models
Inuence plots in R
The influencePlot() function in the car package gives similar plots:
1 2 3 4 5
berkeley-diag.R berkeley <- as.data.frame(UCBAdmissions) ... berk.mod <- glm(Freq ~ Dept * (Gender+Admit), data=berkeley, family="poisson") influencePlot(berk.mod, id.n=3, id.col="red")
4 AFAdm AMRej
HALFNORM macro: Half-normal plot of residuals (Atkinson, 1981) Plot ordered absolute residuals, |r |(i ) vs. expected normal values, |z |(i ) Standard normal condence envelope not suitable for GLMs Simulate reference line and envelope with simulated condence intervals
1 2
Studentized Residuals
3 4 5
genberk1.sas %halfnorm(data=berkeley, class=dept gender admit, resp=freq, model=dept|gender dept|admit, dist=poisson, id=cell);
AMAdm
0.9
1.0
22 / 77
Binary response: success/failure, vote: yes/no Binomial data: x successes in n trials (grouped data) Ordinal response: none < some < severe depression Polytomous response: vote Liberal, Tory, NDP, Green
Explanatory variables
1 EF+
Quantitative regressors: age, dose Transformed regressors: age, log(dose) Polynomial regressors: age2 , age3 , Categorical predictors: treatment, sex Interaction regessors: treatment age, sex age
Points with largest |residual| labeled The model ts well, except in department A.
23 / 77 24 / 77
Binary response
Binary response
Probability (Improved)
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 20 30 40 50 AGE 60 70 80
Linear
.75
Logistic Normal
Probability
.50
.25
.00 -3 -2 -1 0 Predictor 1 2 3
25 / 77 Logistic regression models Binary response Logistic regression models Fitting logistic models
26 / 77
Models:
CLASS statement (V7+) no need for dummy variables
discrete predictors can specify order and parameterization (eect, polynomial, reference cell)
An equivalent (non-linear) form of the model may be specied for the probability, i , itself, i = {1 + exp([ + xT i ])}
1
MODEL statement allows GLM syntax, e.g., proc logistic; class Sex Treat; model Better = Sex | Treat | Age @2; Better = Sex Treat Age Sex*Treat Sex*Age Treat*Age
The logistic model is a linear model for the log odds, but also a multiplicative model for the odds of success, i T = exp( + xT i ) = exp() exp(xi ) 1 i
Plot with standard procedures (PROC GCHART, GPLOT) Utility macros (BARS, LABEL, POINTS, PSCALE, etc.) for custom displays
response; 0 0 0 0 1 1 2
Eect plots plot hierarchical subset of eects, averaging over those not included. INFLOGIS macro: Inuence plots for logistic regression models ADDVAR macro: Added variable plots for new predictors or transformations of old
29 / 77 Logistic regression models Empirical logit plots Logistic regression models Empirical logit plots
30 / 77
Linearity: Is a linear relation realistic? Smoothing: Discrete data often requires smoothing to see!
The LOGODDS macro: Show the data: Plot (0/1) responses [stacked or jittered]
yi +1/2 Divide X into groups (e.g., deciles), emprical logit, log ni yi +1/2 , for each Linear logistic regression, plus smoothed curve (LOWESS macro)
-1
1 2 3 4 5
%include catdata(arthrit); %logodds(data=arthrit, x=age, y=Better, /* vars to plot */ smooth=0.5, /* LOWESS smoothing parameter */ plot=logit); /* plot on logit scale */
-2
-3 20
31 / 77
30
40
50 AGE
60
70
80
32 / 77
glogist1c.sas proc logistic data=arthrit descending; class sex (ref=last) treat (ref=first) / param=ref; model better = sex treat age; output out=results p=prob l=lower u=upper xbeta=logit stdxbeta=selogit / alpha=.33;
Prob (Better)
0.8
0.6
2 3 4 5 6 7
0.2
0.4
0.0
Type III Analysis of Effects Effect DF 1 1 1 Wald Chi-Square 6.2576 10.7596 5.5655 Pr > ChiSq 0.0124 0.0010 0.0183
34 / 77 Logistic regression models PROC LOGISTIC: Fitting and plotting
SAS: PROC LOESS, lowess macro; R: lowess() There is a hint that the relation may be non-linear But data is thin at the extremes
33 / 77 Logistic regression models PROC LOGISTIC: Fitting and plotting
Analysis of Maximum Likelihood Estimates Parameter Intercept sex Female treat Treated age DF 1 1 1 1 Estimate -4.5033 1.4878 1.7598 0.0487 Standard Error 1.3074 0.5948 0.5365 0.0207 Wald Chi-Square 11.8649 6.2576 10.7596 5.5655 Pr > ChiSq 0.0006 0.0124 0.0010 0.0183
Odds Ratio Estimates Effect sex Female vs Male treat Treated vs Placebo age Point Estimate 4.427 5.811 1.050 95% Wald Confidence Limits 1.380 2.031 1.008 14.204 16.632 1.093
Parameter estimates (reference cell coding): 1 = 1.49 Females e 1.49 =4.43 more likely better than Males 2 = 1.76 Treated e 1.76 =5.81 more likely better than Placebo 3 = 0.0487 odds ratio=1.05 odds of improvement increase 5% each year. Over 10 years, odds of improvement = e 100.0486 = 1.63, a 63% increase.
35 / 77
prob predicted probabilities, with CI (lower ,upper ) logit predicted logit, with standard error selogit
36 / 77
Female
2 Treated .90 2
Male
.90
proc gplot data=results; plot (logit prob) * age = treat; by sex; symbol1 v=circle i=join l=3 c=black; symbol2 v=dot i=join l=1 c=red;
/* /* /* /*
.80 .70 .60 .50 .40 -1 Placebo -2 .30 .20 Probability Improved
SYMBOL statement dene the point value (v=), interpolate option (i=), line style (l=), color (c=), etc.
-1
.30 .20
-2
.10
.10
-3 20 30 40 50 Age
37 / 77 Logistic regression models PROC LOGISTIC: Fitting and plotting
.05 60 70 80
-3 20 30 40 50 Age 60 70 80
.05
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
glogist1c.sas *-- Error bars, on logit scale; %bars(data=results, var=logit, class=age, cvar=treat, by=age, barlen=selogit, out=bars); *-- Custom legends and panel labels; %label(data=results, y=logit, x=age, xoff=1, cvar=treat, by=sex, subset=last.treat, out=label1, pos=6, text=treat); %label(data=results, y=2.5, x=20, size=2, by=sex, subset=first.sex, out=label2, pos=6, text=sex); *-- Probability scales at right; %pscale(out=pscale, byvar=sex, byval=%str('Female','Male'));
title ' ' h=1.8 a=-90 'Probability Improved' /* right axis label */ h=2.5 a=-90 ' '; /* extra space */ goptions hby=0; /* suppress BY values */ proc gplot data=results; plot logit * age = treat / vaxis=axis1 haxis=axis2 hm=1 vm=1 nolegend anno=bars frame; by sex; axis1 label=(a=90 'Log Odds Improved') order=(-3 to 3); axis2 order=(20 to 80 by 10) offset=(2,6); symbol1 v=+ i=join l=3 c=black; symbol2 v=- i=join l=1 c=red; label age='Age'; run;
3 .95 3 .95
glogist1c.sas
Female
2 Treated .90 2
Male
.90
.80 Log Odds Improved .70 Placebo 0 .60 .50 .40 -1 .30 .20 -2 -2 Log Odds Improved 1 1 Treated 0 Probability Improved
.80 .70 .60 .50 .40 -1 Placebo .30 .20 Probability Improved
*-- Join ANNOTATE datasets; data bars; set label1 label2 bars pscale; proc sort; by sex;
.10
.10
-3 20 30 40 50 Age 60 70 80
.05
-3 20 30 40 50 Age 60 70 80
.05
39 / 77
40 / 77
Eect plots
General ideas
proc logistic data=arthrit descending; class sex (ref=last) treat (ref=first) / param=ref; model better = treat sex | age @2;; output out=results p=prob l=lower u=upper xbeta=logit stdxbeta=selogit / alpha=.33;
42 / 77
EFFPLOT macro
Works with PROC REG, PROC GLM, PROC LOGISTIC, PROC GENMOD Uses MEANPLOT macro to do the plotting Some limitations cant plot correct standard errors
Note : This provides a general means to visualize interactions in all linear and generalized linear models.
R: eects package
Most general: Handles linear models (lm()), generalized linear models (glm()), multinomial (multinom()) and proportional-odds (polr()) models. allEffects(model) calculates eects for all high-order terms in model plot(allEffects(model)) plots them
44 / 77
43 / 77
Eect plots
Eect plots
45 / 77 Eect plots Eect plots software Eect plots Eect plots software
46 / 77
cowles-logistic-eff.sas proc logistic data=cowles outest=parm descending ; class Sex; model Volunteer = Sex Extraver | Neurot / lackfit ; effectplot contour(x=Neurot y=Extraver) / at(sex=1.5) noobs; run;
arthritis-logistic-ods.sas %include catdata(arthrit); ods graphics on; proc logistic data=arthrit descending plots(only)=(effect(plotby=sex sliceby=treat showobs clband alpha=0.33)); class sex (ref=last) treat (ref=first) / param=ref; model better = sex treat age / clodds=wald; run; ods graphics off;
47 / 77
48 / 77
Eect plots
Eect plots
Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -2.358207 0.501320 -4.704 2.55e-06 sexmale -0.247152 0.111631 -2.214 0.02683 neuroticism 0.110777 0.037648 2.942 0.00326 extraversion 0.166816 0.037719 4.423 9.75e-06 neuroticism:extraversion -0.008552 0.002934 -2.915 0.00355 --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 1933.5 Residual deviance: 1897.4 AIC: 1907.4 on 1420 on 1416 degrees of freedom degrees of freedom *** * ** *** **
extraversion Prob(Volunteer)
0.9 0.75 0.5 0.25 0.1
extraversion
extraversion
extraversion
10 15 20 25
10 15 20 25
neuroticism
49 / 77 Eect plots Arrests Eect plots Arrests 50 / 77
Control variables: In Dec. 2002, the Toronto Star examined the issue of racial proling, by analyzing a data base of 600,000+ arrest records from 1996-2002. They focused on a subset of arrests for which police action was discretionary, e.g., simple possession of small quantities of marijuana, where the police could:
Release the arrestee with a summons like a parking ticket Bring to police station, hold for bail, etc. harsher treatment
year, age, sex employed, citizen Yes, No checks Number of police data bases (previous arrests, previous convictions, parole status, etc.) in which the arrestees name was found.
1 2 3 1 2 3 4 5 6 7 8 9 10 11
> library(effects) > data(Arrests) > some(Arrests) 915 1568 2981 3381 3516 4128 4142 4634 4732 5183 released colour year age sex employed citizen checks No Black 2001 35 Male Yes Yes 4 Yes White 2002 21 Male Yes Yes 0 Yes White 2000 23 Male Yes Yes 2 Yes Black 1998 23 Male No Yes 2 Yes White 2002 22 Male Yes Yes 0 No White 2001 29 Male Yes Yes 1 Yes Black 1998 23 Male Yes Yes 3 Yes White 2001 18 Male Yes Yes 0 Yes White 1999 21 Male Yes Yes 3 Yes White 1999 19 Male Yes Yes 0
Response variable: released Yes, No Main predictor of interest: skin-colour of arrestee (black, white)
51 / 77
52 / 77
Eect plots
Arrests
Eect plots
Arrests
Logistic regression model with all main eects, plus interactions of colour:year and colour:age
1 2 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14
> arrests.mod <- glm(released ~ employed + citizen + checks + colour * + year + colour * age, family = binomial, data = Arrests) > Anova(arrests.mod)
Probability(released)
q
0.86
Analysis of Deviance Table (Type II tests) Response: released LR Chisq Df Pr(>Chisq) employed 72.673 1 < 2.2e-16 *** citizen 25.783 1 3.820e-07 *** checks 205.211 1 < 2.2e-16 *** colour 19.572 1 9.687e-06 *** year 6.087 5 0.2978477 age 0.459 1 0.4982736 colour:year 21.720 5 0.0005917 *** colour:age 13.886 1 0.0001942 *** --Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
53 / 77 Eect plots Arrests
0.84
0.82
0.8
Black
White
colour
54 / 77 Eect plots Arrests
0.88
0.86
Probability(released)
q
0.84
Probability(released)
Up to 2000, strong evidence for dierential treatment of blacks and whites Also evidence to support Police claim of eect of training to reduce racial eects in treatment
0.9
Opposite age eects for blacks and whites: Young blacks treated more harshly than young whites Older blacks treated less harshly than older whites
0.82
q q
0.85
0.8
0.78
0.8
0.76
q
1997 1998 1999 2000 2001 2002 10 20 30 40 50 60
year
55 / 77
age
56 / 77
Eect plots
Arrests
Eect plots
Arrests
> arrests.effects <- allEffects(arrests.mod, xlevels = list(age = seq(15, + 45, 5))) > plot(arrests.effects, ylab = "Probability(released)", ask = FALSE)
employed effect plot
Probability(released) Probability(released)
0.88 0.86 0.84 0.82 0.8 0.78 0.76 0.74 0.88 0.86 0.84 0.82 0.8 0.78 0.76
0.9
q
No Yes
No
Yes
employed
citizen
checks
Probability(released)
0.9 0.85 0.8 0.75 0.7 1997 1998 1999 2000 2001 2002
q q qq q q q q
q q q q
Probability(released)
colour : Black
colour : White
colour : Black
0.9 0.85 0.8 0.75 15 20 25 30 35 40 45
colour : White
year
age
NB: These plots are computed at average levels of quantitative variables, but at reference levels of class variables: Sex=Male, citizen=Yes, employed=Yes
57 / 77 58 / 77 Inuence measures and diagnostic plots
Leverage: Potential impact of an individual case distance from the Residuals: Which observations are poorly tted? Inuence: Actual impact of an individual case leverage residual
proc logistic data=arthrit descending; model better = sex treat age / influence;
C, CBAR analogs of Cooks D in OLS standardized change in regression coecients when i -th case is deleted. DIFCHISQ, DIFDEV 2 when i -th case is deleted.
6uvvrhrqhh 7iiyrvr)Dsyrpr8rssvpvr8 ( ' & ' & 8uhtrvQrh8uvThr % $ # " ! ( 6uvvrhrqhh 7iiyrvr)Dsyrpr8rssvpvr8
8uhtrvQrh8uvThr
% $ # " !
! " # $ @vhrqQihivyv % & ' (
proc logistic data=arthrit descending plots(only label)=(leverage dpc); class sex (ref=last) treat (ref=first) / param=ref; model better = sex treat age ; run;
INFLOGIS macro
Specialized version of INFLGLIM macro for logistic regression Plots a measure of change in 2 (DIFCHISQ or DIFDEV) vs. predicted probability or leverage. Bubble symbols show actual inuence (C or CBAR) Shows standard cutos for large values Flexible labeling of unusual cases
$UWKUL GDWD 7iiyrv r)DWLsyVWUHDWPHQW rpr8rssv pvr8 ( ' & % $ # " ! ! " # $ % @vhrqQihivyv & ' ( 8uhtrvQrh8uvThr ( ' & % $ # " ! " # $ % & ' ( ! " # $ GrrhtrChhyr
63 / 77 1 2 3 4 5 6 7
8 9
8uhtrvQrh8uvThr
1 22 30 34 55 77
1 1 1 1 0 0
64 / 77
INFLOGIS macro
INFLOGIS macro
! " # $ @vhrqQihivyv % & ' (
65 / 77 Inuence measures and diagnostic plots Diagnostic plots in R
Diagnostic plots in R
In R, plotting a glm object gives the regression quartet
arth.mod1 <- glm(Better ~ Age+Sex+Treatment,data=Arthritis, family='binomial') plot(arth.mod1)
Residuals vs Fitted
2 2
56
Diagnostic plots in R
library(car) influencePlot(arth.mod1)
Arthritis data: influencePlot
2 56 58 52 1 4
1.5
39
Studentized Residuals
0.5
1.0
28
52 1 4
Residuals
0.5
28
39
28 39
0.0
Normal QQ
ScaleLocation
Residuals vs Leverage
HatValues
67 / 77 68 / 77
69 / 77 Inuence measures and diagnostic plots The Donner Party Inuence measures and diagnostic plots The Donner Party
70 / 77
0.8
Probability Died=1
0.6
0.4
0.2
0.0 0 10 20 30 Age 40 50 60 70
relation with Age is quadratic: youngest and oldest most likely to perish.
72 / 77
Quadratic model?
Fit: Pr(Death) Age + Age + Male Statistical evidence for Age2 equivocal:
Wald 2 (1) = 2.84, p = 0.09; but 2 LR G(1) = 4.40, p = 0.03. ... Analysis of Maximum Likelihood Estimates Parameter Variable Estimate INTERCPT AGE AGE2 MALE -1.7721 0.0168 0.00208 1.3745 Standard Wald Error Chi-Square 0.5673 0.0184 0.00123 0.5066 9.7588 0.8355 2.8439 7.3617 Pr > Chi-Square 0.0018 0.3607 0.0917 0.0067
2
Quadratic model?
Visual evidence is persuasive (but the data are thin at older ages)
1.0
0.6
Men
0.4
Women 0.2
Males: exp(1.3745) = 3.95 times as likely to die, controlling for Age, Age2
0.0 0 10 20 30 40 Age 50 60 70
73 / 77 Inuence measures and diagnostic plots The Donner Party Inuence measures and diagnostic plots The Donner Party
74 / 77
Breen, Patrick Reed, James Donner, Elizabeth Donner, Tamsen Graves, Elizabeth
Patrick Breen, James Reed: Older men who survived Elizabeth & Tamsen Donner, Elizabeth Graves: Older women who survived Moral lessons of this story:
Dont try to cross the Donner Pass in late October; if you do, bring food Plots of tted models show only what is included in the model Discrete data often need smoothing (or non-linear terms) to see the pattern Always examine model diagnostics preferably graphic
75 / 77
76 / 77
Summary: Part 4
Summary: Part 4
Logit models
Analogous to ANOVA models for a binary response Equivalent to loglinear model, including interaction of all predictors Fitting: SAS: PROC CATMOD, PROC LOGISTIC; R: glm() Visualization: plot tted logits (or probabilties) vs. factors (CATPLOT macro)
Logistic regression
Analogous to regression models for a binary response Coecients: increment to log odds / X ; exp multiplier of odds per X Discrete responses: smoothing often useful Visualization: plot tted logits (or probabilties) vs. predictors
Eect plots
Plot a main eect or interaction in the context of a more complex model Shows that eect controlling for (averaged over) all other model eects SAS: EFFPLOT macro; R: effects package