Professional Documents
Culture Documents
4
Copyright 2003 by the Johns Hopkins Bloomberg School of Public Health Printed in U.S.A.
All rights reserved DOI: 10.1093/aje/kwf215
PRACTICE OF EPIDEMIOLOGY
James A. Hanley1,2, Abdissa Negassa3, Michael D. deB. Edwardes2, and Janet E. Forrester4
1 Department of Epidemiology and Biostatistics, Faculty of Medicine, McGill University, Montreal, Quebec, Canada.
2 Division of Clinical Epidemiology, Royal Victoria Hospital, Montreal, Quebec, Canada.
3 Division of Epidemiology and Biostatistics, Department of Epidemiology and Social Medicine, Albert Einstein College of
Received for publication January 7, 2000; accepted for publication August 7, 2002.
The method of generalized estimating equations (GEE) is often used to analyze longitudinal and other
correlated response data, particularly if responses are binary. However, few descriptions of the method are
accessible to epidemiologists. In this paper, the authors use small worked examples and one real data set,
involving both binary and quantitative response data, to help end-users appreciate the essence of the method.
The examples are simple enough to see the behind-the-scenes calculations and the essential role of weighted
observations, and they allow nonstatisticians to imagine the calculations involved when the GEE method is
applied to more complex multivariate data.
correlation; epidemiologic methods; generalized estimating equation; longitudinal studies; odds ratio; statistics
The generalized estimating equations (GEE) (1, 2) Textbooks all advise researchers not to treat observations
method, an extension of the quasi-likelihood approach (3), is from the same household (or cluster) as if they were inde-
being increasingly used to analyze longitudinal (4) and other pendent and thus not to calculate standard errors using n =
(5) correlated data, especially when they are binary or in the 144 as the sample size. For example, Colton (8, pp. 4143)
form of counts. We are aware of only two articles which try warns against being misled by great masses of observa-
to make the GEE approach more accessible to nonstatisti- tions, which upon closer scrutiny, may often vanish, and he
cians. One focuses on software (6). The other, an excellent uses as an example an n of 800 blood pressure measure-
expository article (5) covering several approaches to corre- ments10 taken each week over an 8-week treatment
lated data, has limited coverage of GEE. Examples in most course, in 10 patients! He stresses that appropriate conclu-
texts and manuals are too extensive, and the treatment too sions regarding the drugs effect rely on subject-to-subject
theoretical, to allow end-users to follow the calculations or variation, so that the sample size of 10 subjects is crucial to
fully appreciate the principles behind them. In this paper, we such analysis. However, few texts explain how one is to
attempt to redress this. To illustrate the ideas, we use the data properly use all 800 (or, in our example, 144) data points, or
shown in table 1. They consist of the age- and sex-standard- how much each observation contributes statistically.
ized heights (and data on the covariates gender and socioeco- Although some articles do discuss how much statistical
nomic status) of 144 children in a sample of 54 randomly information is obtainable from observations on paired organs
selected households in Mexico (7). (9) or individuals in clusters such as classrooms or physicians
Correspondence to Dr. James A. Hanley, Department of Epidemiology and Biostatistics, Faculty of Medicine, McGill University, 1020 Pine
Avenue West, Montreal, Quebec H3A 1A2, Canada (e-mail: james.hanley@mcgill.ca).
TABLE 1. Heights (expressed as number of standard deviations above US age- and sex-specific
norms) of 144 children in a sample of 54 Mexican households*
Am J Epidemiol 2003;157:364375
366 Hanley et al.
TABLE 1. Continued
For the remainder of this section, we will assume that the Although our main example involves physical heights and
s are all equal. statistical weights, a side example is instructive. Assume
that the physical weights of elevator-taking adults vary
Variability of statistics derived from uncorrelated
from person to person by, for example, = 10 kg. Then
observations
elevators of 16 persons each (i.e., n = 16), randomly chosen
from among these, will vary from load to load with a stan-
When the observations are uncorrelated, the off-diagonal dard deviation of (only!) 16 (10) = 40 kg, while the average
elements in the correlation matrix are zero. If each of the n per person in each load of 16 will vary with a standard devi-
weights equals 1/n, then the weighted sum is the mean, y . Its ation of only 10/ 16 = 2.5 kg.
variance (the sum of the diagonal elements in part b of figure
1) is thus Variability of statistics derived from correlated
2
1 2 2 1 2 2 1 2 2 observations
Var [ y ] = --- + --- + + --- = ----- ,
n n n n
In the elevator example, the / n and n laws for
yielding the familiar formula SD[ y] = / n . the variability of the two statistics do not hold if the variable
With equal statistical weights of 1 each, the variance of the of interest on sampled individuals tends to be similar from
simple sum is Var[y] = n 2, so that SD[y] = n . one individual to the other (co-related)for example, if
Am J Epidemiol 2003;157:364375
Generalized Estimating Equations 367
FIGURE 1. Calculating the variance of a weighted sum of three correlated random variables, y1 to y3. Var[w1y1 + w2y2 + w3y3] is a function of
1) the weights, w1 to w3, 2) the standard deviations, 1 to 3, and 3) the matrix of the pairwise correlations, R11 to R33all shown on the left side
of the figure (part a) (R11 = R22 = R33 = 1). The variance of the weighted sum is the sum of the nine products (3 3 = 9) shown on the right (part b).
Am J Epidemiol 2003;157:364375
368 Hanley et al.
One can show that its variance, 2(1 + 2w2 + 2Rw2)/(1 + 2w)2, circumflex or hat over denotes an estimate of it, calcu-
is lowest when w = 1/(1 + R). For example, if R = 0.5, the lated from data). As was the practice in the pre-least-squares
optimal (relative) weights are in the ratio 1:(2/3):(2/3), and the era (14), one can combine the three separate estimating
variance is (3/7)2, smaller than that of the competitors. equations: ysingleton = 0, ysib1 = 0, and ysib2 = 0, using
More generally, suppose the sample of n consists of the weights wsingleton, wsib1, and wsib2, to obtain a single esti-
several sets of children from 1-, 2-, , k-child households, mating equation
with the same and the same pairwise within-household
correlation R in all households, regardless of size. If the w singleton ( y singleton ) + w sib1 ( y sib1 ) + w sib2 ( y sib2 ) =0.
responses are ordered by household, then the n n correla-
wy w
tion matrix consists of several repetitions of various block-
diagonal patterns, as in figure 2. One can show by calculus
In this simple case, = ---------- =
w -------
w
y.
that the optimal weights for combining the responses of indi- In this didactic example, the value of R used to construct
vidual children from households of sizes 1, 2, 3, ..., k are 1, the weights was considered known; in practice, it must be
1/(1 + R), 1/(1 + 2R), ..., 1/(1 + {k 1}R). These values can estimated, along with . The process is illustrated in figure 4,
be obtained by summing the entries in any row or column of using a total of five observations (n = 5) from two clusters.
the inverses of the 1 1, 2 2, ..., k k submatrices in the Beginning with R = 0, one calculates five weights and, from
overall n n block-diagonal matrix used in the GEE equa- them, an estimate of ; from the degree of similarity of the
tions (see next subsection). within-cluster residuals, one obtains a new estimate, r, of R.
With data from paired organs, all clusters are of size k = The cycle is repeated until the estimates stabilizethat is,
2. Rosner and Milton (9) illustrate this idea of effective until convergence is achieved.
sample size using responses of a persons left and right eyes The estimating equation for the parameter has an obvious
to the same treatment: If these have a correlation of 0.54, then form. Equations for multiple regression parametersrepre-
200 eyes, two from each of 100 persons, contribute the senting absolute or relative differences in means, proportions,
statistical equivalent of one-eye contributions from each of and ratesare formed by adapting the (iteratively re)weighted
130 persons (200 1/(1 + 0.54) = 130). The closer the corre- least squares equations used to obtain maximum likelihood
lation is to 1, the closer the effective sample size is to 100. parameter estimates from uncorrelated responses (3).
Estimation by GEE: the EE in GEE Estimation of a proportion (or odds) rather than a mean:
the G in GEE
In the n = 3 example in figures 2 and 3, consisting of just
one household of size k = 1 and one of size k = 2, each y is a Figure 5 shows the GEE estimation of the expected
separate legitimate (unbiased) estimator, , of (the proportion P from 0/3 and 4/5 positive responses in eight
Am J Epidemiol 2003;157:364375
Generalized Estimating Equations 369
subjects in two households. The weights are 1/(1 + 2R) and The sum of the eight weights, 0.53 each for the three
1/(1 + 4R) for the individuals in households of size three and persons in household 1 and 0.36 each for the five persons in
five, respectively. The final estimate of P is p = 0.42, corre- household 2, can be viewed as the effective sample size of
sponding to r = 0.45. It is a weighed average of the eight 0s 3.39. Estimation of logit[P] = log[P/(1 P)] involves the
and 1s, with weights of 1/(1 + 2r) = 0.53 for each of the same core calculations.
three responses in household 1 and 1/(1 + 4r) = 0.36 for each If P is different for different covariate patterns or strata,
of the five responses in household 2; that is, then the unit variance 2 = P(1 P) is no longer homoge-
neous. Nonconstant variances can be allowed for by incorpo-
0 0.53 3 + { 0 0.36 1 + 1 0.36 4 } rating a function of 2 into the weight for each observation
p = ----------------------------------------------------------------------------------------------------------
0.53 3 + 0.36 5 (this is the basis of the iteratively reweighted least squares
algorithm used with the usual logistic regression for uncorre-
lated responses).
1.44 Indeed, using different weights for each of n uncorrelated
= ---------- = 0.42.
3.39 outcomes allows a unified approach to the maximum likeli-
Am J Epidemiol 2003;157:364375
370 Hanley et al.
hood estimation of a family of generalized linear models the model used does not fully specify the distribution of the
(15, 16). Parameters are fitted by minimizing the weighted responses in each cluster.
sum of squared residuals, using functions of the 2s as
weights. For binomial and Poisson responses, where 2 is a
Standard errors: model-based or data-based
function of the mean, weights are reestimated after each iter-
ation. With GLIM software (17), Wacholder (18) illustrated (empirical)?
how the risk difference, risk ratio, and odds ratio are esti- Two versions of the standard error are available for
mated using the identity, log, and logit links, respectively.
accompanying GEE estimates. The difference between them
This unified approach to uncorrelated responses has since
become available in most other statistical packages. GEE can be illustrated using the previously cited estimate, p =
implementations for correlated data use this same unified 0.42, of the parameter P. The model-based standard error
approach but use a quasi-likelihood rather than a full likeli- is based on the estimated (exchangeable) correlation r =
hood approach (3). Since correct specification of the mean 0.45. This in turn implies the effective sample size of 3.39
and variance functions is sufficient for unbiased estimates, (w = 3 0.53 + 5 0.36 = 1.59 + 1.80 = 3.39) shown above
Am J Epidemiol 2003;157:364375
Generalized Estimating Equations 371
and in the last footnote of figure 5. Thus, based on the bino- such instances, within-cluster resampling, coupled with the
mial model, use of a generalized linear model for uncorrelated data (13),
provides more valid confidence intervals than GEE.
SEmodel-based(p) = {p (1 p)/w}1/2
= {0.42 0.58/3.39}1/2 = 0.27. APPLICATION
Figure 6 showsfor the lower socioeconomic status
The empirical or robust standard error uses the actual group in table 1the various estimates of , the average z
variations in the cluster-level statistics, that is, the p1 = 0/3 = score, and p, the proportion of children with z scores less
0 and p2 = 4/5 = 0.8, and the effective sizes of the than 1. The GEE estimate = 1.02, based on an estimated
subsamples r of 0.50, is a weighted average, with heights of children in
households of size 1, 2, 3, ... receiving weights of 1, 0.67,
0.50, ... . Thus, for estimating , the 90 children (in 34 house-
SEempirical(p) = {[1.592(0 0.42)2 holds) constitute an effective sample size of 48.3 unre-
+ 1.802(0.8 0.42)2]/3.392}1/2 = 0.28. lated individuals. The proportion p = 0.542 is obtained
similarly, with weights calculated from r = 0.45.
The model-based and empirical standard errors agree to
Unless data are sparse, the empirical standard error is two decimal places in the case of GEE and differ only
likely to be more trustworthy than the model-based one. slightly (7.0 percent vs. 7.2 percent) in the case of the
Agreement between the model-based and empirical standard proportion p. Of interest is the fact that the empirical stan-
errors suggests that the assumed correlation structure is dard error of 7.2 percent is identical to that calculated from
reasonable. However, the robust variance estimator, also the variance formula for a proportion estimated from a
known as the sandwich estimator, was developed for cluster sample in the classic survey sample textbook (22).
uncorrelated observations, and its theoretical behavior with Figure 7 contrasts the Low and High socioeconomic status
correlated data has only recently received attention (19). groups with respect to , mean height, and P, the proportion
Methods designed to improve on the poor performance in of children with z scores less than 1. We can estimate a
small samples (20) include bias-correction and explicit difference by subtracting the specific estimates, and we can
small-sample adjustments, that is, use of t rather than z (21). estimate its standard error from the rules for the variance of
A second concern has been the case in which the cluster size a difference between two independent estimates. Alterna-
itself is related to the outcome and so is nonignorable. In tively, the difference can be estimated as the coefficient of
Am J Epidemiol 2003;157:364375
372 Hanley et al.
the indicator variable IHigh (1 if high socioeconomic status, 0 riances, and the (7.02 + 8.22)1/2 = 10.8 percent obtained from
if not) in a regression model applied to the combined data. the two separate standard errors are nearly identical.
For height measured quantitatively, the intercept represents In the above examples, groups can be compared directly.
the mean of low socioeconomic status children, and the coef- However, to assess trends in responses over levels of one or
ficient of IHigh represents the L-H difference in means. more quantitative variables measured at a cluster level (here,
GEE estimates of the proportions are shown in the right household level), a regression approach is more practical.
half of figure 7. The proportions are compared using various Since GEE analysis is carried out at the child level, it can
regression forms applied to the combined data. The slight also include covariates, such as illness histories, that differ
discrepancy between the difference of the separately esti- from child to child within a household.
mated group-specific proportions and the difference
obtained directly from the regression model stems from the DISCUSSION
fact that the latter uses a common covariance rather than two
separate covariances. The 11.0 percent standard error of the This orientation focused on correlated data arising from
difference in proportions, calculated using the pooled cova- the relatedness of several individuals in the same cluster,
Am J Epidemiol 2003;157:364375
Generalized Estimating Equations 373
rather than several longitudinal observations in the same in table 2. If all N clusters are sufficiently large, one can fit
individual. We chose examples that 1) could also be handled an unconditional logistic regression model to the data. If
by classical methods and 2) were small enough to hand- clusters are small, one can avoid fitting one nuisance param-
calculate the weights induced by the correlations. These eter per cluster (and the consequent bias in the estimated
weights are used both to generate parameter estimates and to parameter of interest) and instead fit a more economical
calculate standard errors. Although they are nuisance param- conditional logistic regression model, using each cluster as a
eters, the correlations do provide for efficient estimates of set. The appropriate logistic regression model recovers
the primary parameters and for accurate quantification of the common within-cluster ratio of 9, as does the nonregres-
their precision. sion Mantel-Haenszel approach. However, the GEE
The GEE approach differs in a fundamental conceptual approach, with clusters identified as such, yields an odds
way from the techniques included under the rubric of ratio of only 5.4. The 5.4 contrasts the P1 for an individual
random-effects, multilevel, and hierarchical models selected randomly from the population with the P0 for
(e.g., the MIXED and NLMIXED procedures in SAS, MLn another individual selected randomly from the population,
(23, 24), or other software described in the paper by Burton that is, without matching on cluster. In addition, this
et al. (5)). Besides the seeking of more efficient estimators of population averaged measure, from the marginal model
regression parameters, the main benefit of GEE is the (5) used in the GEE approach, is specific to the mix of clus-
production of reasonably accurate standard errors, hence ters studied. In contradistinction to this, the odds ratio of 9
confidence intervals with the correct coverage rates. The contrasts the P1 for an individual with the P0 for another indi-
procedures in the other set of techniques explicitly model vidual from the same cluster.
and estimate the between-cluster variation and incorporate The subtleties of combining ratios, where the rules for
this, and the residual variance, into standard errors. The GEE collapsibility vary with the comparative measure (25),
method does not explicitly model between-cluster variation; have long been recognized; indeed, the example of
instead it focuses on and estimates its counterpart, the combining a 1 percent versus 5 percent contrast in one
within-cluster similarity of the residuals, and then uses this stratum (odds ratio ~ 5) and a 95 percent versus 99 percent
estimated correlation to reestimate the regression parameters contrast in the other (again, odds ratio ~ 5) was used by
and to calculate standard errors. With GEE, the computa- Mantel and Haenszel (26, p. 736). Gail et al. (27) used the
tional complexity is a function of the size of the largest even more extreme example with odds ratios of 9 (as in table
cluster rather than of the number of clustersan advantage, 2) to show how a covariate omitted from a regression anal-
and a source of reliable estimates, when there are many small ysis can lead to attenuated estimates of what the authors call
clusters. a nonlinear comparative parameter (such as the odds ratio
However, because the GEE approach does not contain and the hazard ratio), even ifas in table 2it is balanced
explicit terms for the between-cluster variation, the resulting across the compared levels of the factor.
parameter estimates for the contrast of interest do not have The above extreme examples are quite hypothetical. In
the usual keeping other factors constant interpretation. To practice, with much less variation in P0 across clusters, the
appreciate this, consider the (admittedly extreme) situation discrepancy is usually relatively minor. The discrepancy
Am J Epidemiol 2003;157:364375
374 Hanley et al.
does not arise with absolute differences, since, with balanced 5. Burton P, Gurrin L, Sly P. Extending the simple linear regres-
sample sizes, the difference in an aggregate is the aggregate sion model to account for correlated responses: an introduction
of the within-cluster differences. Table 2 confirms this, to generalized estimating equations and multi-level mixed mod-
showing that the GEE approach, with the identity link, accu- elling. Stat Med 1998;17:126191.
6. Horton NJ, Lipsitz SR. Review of software to fit generalized esti-
rately recovers the common 40 percent risk difference
mating equation regression models. Am Stat 1999;53:1609.
within each cluster. 7. Forrester JE, Scott ME, Bundy DA, et al. Predisposition of indi-
Unfortunately, as currently implemented in most software, viduals and families in Mexico to heavy infection with Ascaris
the GEE approach cannot handle several levels of clustering/ lumbricoides and Trichuris trichiura. Trans R Soc Trop Med
hierarchy, such as households selected from randomly Hyg 1990;84:2726.
selected villages that in turn were selected from selected coun- 8. Colton T. Statistics in medicine. Boston, MA: Little, Brown
ties. For binary responses, it is possible to use alternating and Company, 1974.
logistic regression (28), an extension of GEE, implemented in 9. Rosner B, Milton RC. Significance testing for correlated binary
S-PLUS, to model different correlations at different levels, but outcome data. Biometrics1988;44:50512.
this procedure is not yet available in SPSS, Stata, and SAS 10. Donner A, Klar N. Methods for comparing event rates in inter-
vention studies when the unit of allocation is a cluster. Am J
implementations of GEE. Likewise, unlike multilevel models,
Epidemiol 1994;140:27989.
the GEE approach cannot accommodate both cluster-specific 11. Stansfield SK, Pierre-Louis M, Lerebours G, et al. Vitamin A
intercepts and slopes in longitudinal data. supplementation and increased prevalence of childhood diar-
In our height example, several children within the house- rhoea and acute respiratory infections. Lancet 1993;342:57882.
hold are measured cross-sectionally, that is, just once, each
Am J Epidemiol 2003;157:364375
Downloaded from http://aje.oxfordjournals.org/ by guest on September 12, 2016
Generalized Estimating Equations 375
APPENDIX
Am J Epidemiol 2003;157:364375