You are on page 1of 4

p.

3-3
A linear model approach for binomial data:
y
x
/n
x
=X| +c, c ~ N (0, o
2
I )
Q: when is the approach appropriate?
The pmf shape of the joint distribution of y
x
/n
x
is similar to the pdf shape of N (X|, o
2
I )
Some problems with this approach
Predicted probability may >1 or <0
Normal approximation might be too
much a stretch when n
i
s are not
large or p
x
~1/0
Variance of Binomial is not constant
Some of these problems could be
corrected by using transformation
and weighting
p. 3-4
Recall: linear model
model description 1: Y =X| +c, c ~ N (0, o
2
I )
model description 2: Y ~ N (X|, o
2
I )
Y ~ N (
x
, E
x
)


x
=q
x
, E
x
=o
2
I
Q: which description can be generalized to binomial data?
3 components in a generalized linear model (binomial example)
y
x
~B(n
x
, p
x
)

link functiong: g monotone and q


x
=g(p
x
) [for binimial,
g: (0, 1)(, ) ]
Common choices of link function for binomial data
Logit: q
x
=log(p
x
/(1p
x
))
Probit: q
x
=u
1
(p
x
), where u is the cdf of Normal
X =
p
i=1

i
h
i
(X
1
, . . . , X
m
)
x
X =
p
i=1

i
h
i
(X
1
, . . . , X
m
)
x
, <
x
<
Generalized Linear Model for Binomial Data
NTHU STAT 5230, 2011 Lecture Notes
made by Shao-Wei Cheng (NTHU)
p. 3-5
Complementary log-log: q
x
=log(log(1p
x
))
Logit is close to the complementary log-log when p
x
is small
Logit is close to probit when 0.1 <p
x
<0.9
(exercise: plot & check the difference between the 3 functions)
Log-likelihood of the GLM (use logit link as an example):
Estimation of |
Recall: in linear model, the LS estimator of | is also the MLE
For GLM, the concept of LS does not work any more still
can adopt the method of MLE (maximize l(|) as a function of |)
Usually no explicit formula for MLE of |
An algorithm to perform the maximization will be discussed in
future lecture
p. 3-6
I nference
RSS and deviance
Recall: in linear model, RSS play a critical role in inference
Q: In GLM, what is the concept similar to RSS?
consider two models
a larger model L with l parameters and likelihood L
L
a smaller model S with s parameters and likelihood L
S
S is nested in L (S c L)
To test H
0
: S v.s. H
1
:L\S, likelihood methods suggests the
likelihood ratio statistics:
2 log(L
L
/L
S
)
suppose that L is a saturated larger model, the test statistic
becomes:
where is the fitted values from the smaller model
D is called deviance, which plays a role similar to RSS
2 log(L
L
(

L
)/L
S
(

S
))
NTHU STAT 5230, 2011 Lecture Notes
made by Shao-Wei Cheng (NTHU)
p. 3-7
Since the saturated model fits as well as any model can fit, the
deviance D measures how close the (smaller) model comes to
perfection.
Deviance can be treated as a measure of goodness of fit
Suppose that y
i
is truly binomial and that the n
i
are relatively large
, if the (smaller) model is correct can use the
deviance to test whether the model is an adequate fit
The chi-square distribution is only an approximation that
becomes more accurate as the n
i
increase [often suggest n
i
> 5]
Use deviance to compare two models S and L, S nested inL
Larger model L: deviance D
L
and df
L
(=kl)
Smaller model S: deviance D
S
and df
S
(=ks)
To test H
0
: S v.s. H
1
:L\S, the test statistics is
D
S
D
L
which is asymptotically distributed as
In terms of the accuracy of dist. approx., test >goodness of fit
p. 3-8
(Walds test) alternative test for H
0
: |
i
=0
Can be generalized to H
0
: |
i
=c or H
0
: | =c
Asymptotic null distribution: N(0, 1)
in contrast to normal linear model, these two statistics
(deviance-based and Walds tests) are not identical
Hauck-Donner effect (see Hauck and Donner, 1977): for
sparse data(i.e., many n
i
s =1 or small), the standard errors
can be overestimatedand so the z-value is too small and the
significance of an effect could be missed
therefore, the deviance-based test is preferred
test statistics: z-value
NTHU STAT 5230, 2011 Lecture Notes
made by Shao-Wei Cheng (NTHU)
p. 3-9
100(1o)% confidence interval
Relationship between confidence interval and test
Approach 1: (from Walds test)
Approach 2: (profile likelihood-based method)
other |
j
s, j=i, set to the maximizing values
(recall: the computation of the C.I. for in Box-Cox method)
the profile likelihood method is generally preferable for the
same Hauck-Donner reason
Similar method can be generalized to construct confidence
region of several parameters
NTHU STAT 5230, 2011 Lecture Notes
made by Shao-Wei Cheng (NTHU)

You might also like