You are on page 1of 14

Hw1

6.1.1

Mean seems to have a steady growth over time and rate of change seems to be constant for initial
couple of weeks where they differ afterwards. Also it seems that they all have same baseline weight so
we shall consider that all the groups have the same initial weight.
6.1.2
First few rows in long format
id group week y
1 1 0 57
1 1 1 86
1 1 2 114
1 1 3 139
1 1 4 172
2 1 0 60
2 1 1 93
2 1 2 123


6.1.3
By the very assumption we are considering linear models with same intercept model as they all seem to
start with same baseline weight.
So the model is going to be :


Here

ith subject belongs to

group).
Note that estimation is done using REML and for the test we are also estimating the unstructured
variance matrix
Under this model, required test:


We have wald s chisq test and the pvalue << 0.01 So we can conclude that the groups mean differ
from one another. To conduct the test, a user-defined function WaldTest is defined first. For details
please refer to the code in the Appendix.
6.1.4

Estimated Mean Wgt plot

6.1.5
Group Estimated rate of increase
1(No additive) 26.21
2 Thiourcil 19.11
3 Thyroxin 24.12


6.1.6
This conjecture corresponds to a potential quadratic model with negative quadratic coefficient. For the
sake of simplicity in interpretation we are not centering the time stamps.
So new model can be proposed as follows:


Test of hypothesis:

.

In this case it is implied that it can either be 0 or negative.
P-value = 0.0026 which indicates we reject the hypothesis in favor of the conjecture. Here we note that
if we had done the test for only group2 and group3 then Pvalue would have been 0.522 leading to
acceptance of the hypothesis.
6.1.7
So comparing the results from last two analysis it seems that in general quadratic plot may be necessary
for modeling such longitudinal data. It seems for group 1 and group3 linear trends seem to be good
enough
6.1.8
As far as effectiveness is concerned adding additives usually decreases rate of weight gain. It seems for
thyroxin the growth yet remains steady while for the other additive rate of gain decreases significantly.


7.1

It seems although boys and girls achieve have these different distances boys distance tends to grow
faster than that of girls approximately after age 10.

7.1.2
First couple of rows:

id gender Year y
1 F 8 21
2 F 8 21
3 F 8 20.5
4 F 8 23.5
5 F 8 21.5
6 F 8 20

7.1.3
Lets denote

model1: Unstructured
model11: heterogeneous Comp Symmertry
model12: ComSymmetry
model21: heterogeneous AR1
model22: AR1
Note here that model1 model11 model12
model1 model21 model22
results with saturated models:

Model df AIC BIC logLik Test L.Ratio p-value
model1 1 18 450.0348 496.9279 -207.0174
model11 2 13 447.4236 481.2908 -210.7118 1 vs 2 7.3888 0.1933
Note model11 is preferred as they are not significantly different
Model df AIC BIC logLik Test L.Ratio p-value
model11 1 13 447.4236 481.2908 -210.7118
model12 2 10 443.4085 469.4602 -211.7043 1 vs 2 1.984932 0.5755
Note model12 is preferred as they are not significantly different
model1 1 18 450.0348 496.9279 -207.0174
model21 2 13 458.5028 492.3700 -216.2514 1 vs 2 18.46803 0.0024
Indeed there is a significant difference between the two.
So the best model should be : model12
We also do the following comparison although it is not immediately required.
Model df AIC BIC logLik Test L.Ratio p-value
model21 1 13 458.5028 492.3700 -216.2514
model22 2 10 454.5472 480.5989 -217.2736 1 vs 2 2.044335 0.5633
One can also note that it has the minimum AIC
7.1.4
Coefficients:
Value td.Error t-value p-value
(Intercept) 21.181818 0.6915345 30.630168 0.0000
year.f10 1.045455 0.5992479 1.744611 0.0841
year.f12 1.909091 0.5992479 3.185812 0.0019
year.f14 2.909091 0.5992479 4.854570 0.0000
as.factor(gender)M 1.693182 0.8983297 1.884811 0.0624
year.f10:as.factor(gender)M -0.107955 0.7784458 -0.138680 0.8900
year.f12:as.factor(gender)M 0.934659 0.7784458 1.200673 0.2327
year.f14:as.factor(gender)M 1.684659 0.7784458 2.164131 0.0328
From the results one can notice that all the interactions terms and the gender term are not significant.
So one can conclude that patterns remain the same over gender
7.1.5
We can directly use the fitted values to figure it out. From R output we get the following:
1.8 1 F 8 21.0 8 1 21.18182
1.10 1 F 10 20.0 10 2 22.22727
12.8 12 M 8 26.0 8 1 22.87500
12.10 12 M 10 25.0 10 2 23.81250

7.1.6
Coefficients:
Value Std.Error t-value p-value
(Intercept) 17.372727 1.1835071 14.679023 0.0000
Year 0.479545 0.0934699 5.130481 0.0000
as.factor(gender)M -1.032102 1.5374208 -0.671321 0.5035
Year:as.factor(gender)M 0.304830 0.1214209 2.510519 0.0136

In this case also, we observe the gender and interaction terms are not significant
7.1.7


Note that Y axis numbers are close. So these lines are very close to one another and they are almost
parallel

7.1.8
For Yr8 and Male, one can use the equation: Male+8*Year+8*MaleYear ( Since Male is coded as 1)
Similarly others can be formulated

7.1.9
Looking at the p-values of summary statistics, it seems model with linear trend is good enough. One can
do a wald test to confirm that too
7.1.10
From doing similar analysis we would choose the unrestricted covariance model.
as.factor(gender)M 1.693182 0.9114718 1.857635 0.0662
year.f10:as.factor(gender)M -0.107955 0.7994953 -0.135028 0.8929
year.f12:as.factor(gender)M 0.459625 0.6917906 0.664399 0.5080
year.f14:as.factor(gender)M 1.684659 0.8741224 1.927258 0.0568
We see similar pattern here also. All the interaction and gender term seem non-significant for
categorical year model
(Intercept) 17.488562 1.1192522 15.625221 0.0000
Year 0.473712 0.0980725 4.830223 0.0000
as.factor(gender)M -0.870734 1.4558096 -0.598110 0.5511
Year:as.factor(gender)M 0.320955 0.1274410 2.518461 0.0133
For the model with continues Year variable, interaction term looks slight significant @5% level.
So it seems for continuous year the model may or may not be treated adequate based on confidence
level one demands
7.1.11
Overall it seems the mean distances remain fairly similar for both the genders and can be adequately
modeled by linear trends but that said, these linear trends might be slightly different based on the
gender.
Appendix: Rcode
######################### Problem 6.1 ################
require(foreign)
tlc <- read.dta("http://www.hsph.harvard.edu/fitzmaur/ala2e/tlc.dta")
ds <- read.dta("http://www.hsph.harvard.edu/fitzmaur/ala2e/rat.dta")
str(ds)

## convert from `wide' format to `long' format
ds.l <- reshape(ds, idvar="id", varying=c("y1", "y2", "y3", "y4","y5"),
v.names="y", timevar="week", time=c(0, 1, 2, 3,4),
direction="long")
with(ds.l,{
interaction.plot(week, group, y,
xlab="Time (in weeks)", ylab="Wgt",
main="Time Plot, with Joined Line Segments",
col=c(1:50), legend=T)
})
head(ds.l,10)
summary(ds.l);
library(nlme)
ds.l$week.f <- as.factor(ds.l$week)
ds.l$time <- rep(1:5, 27)

fit <- gls(y ~ week+I(group==2):week+I(group==3):week, data = ds.l,
corr = corSymm(, form= ~ time | id),
weights = varIdent(form = ~ 1 | week.f))
summary(fit)
beta_hat=coef(fit);
G=vcov(fit);
WaldTest=function(C,beta,var){
if(is.null(nrow(C))){
C=matrix(C,nrow=1);
};
L= C %*% beta ; VarL= C %*% var %*% t(C);
Test=t(L) %*% solve(VarL) %*%
L;Pvalue=pchisq(Test,df=ncol(C),lower.tail=F);
return(Pvalue);}
MyC1= matrix(c(0,0,1,0,0,0,0,1),nrow=2,byrow=T);
Test1Pvalue= WaldTest(MyC1,beta_hat,G);
ds.est=cbind(ds.l[,1:3],fit1=fit$fitted);
with(ds.est,{
interaction.plot(week, group, fit1,
xlab="Time (in weeks)", ylab="Estimated Wgt",
main="Time Plot, with Joined Line Segments",
legend=T)
})
grp1=beta_hat[2];
grp2=beta_hat[3]+beta_hat[2];
grp3=beta_hat[4]+beta_hat[2];
fit2 <- gls(y ~
week+I(week^2)+I(group==2):week+I(group==2):I(week^2)+I(group==3):week+I(group==3):I(week^2),
data = ds.l,
corr = corSymm(, form= ~ time | id),
weights = varIdent(form = ~ 1 | week.f))
summary(fit2)
beta_hat=coef(fit2);
G=vcov(fit2);
MyC2= matrix(c(c(0,0,1,0,0,0,0),c(0,0,0,0,0,0,1)),nrow=2,byrow=T);
Test2Pvalue= WaldTest(MyC2,beta_hat,G);
ds.est=cbind(ds.l[,1:3],fit1=fit$fitted);
####################### Problem 7.1 ############
ds <- read.dta("http://www.hsph.harvard.edu/fitzmaur/ala2e/dental.dta")
ds.l <- reshape(ds, idvar="id", varying=c("y1", "y2", "y3", "y4"),
v.names="y", timevar="Year", time=c(8, 10, 12, 14),
direction="long")
with(ds.l,{
interaction.plot(Year, gender, y,
xlab="Time (in years)", ylab="Mean distance",
main="Time Plot, with Joined Line Segments",
, legend=T)
})

ds.l$year.f <- as.factor(ds.l$Year)
ds.order=ds.l[order(ds.l[,1]),]
ds.order$time <- rep(1:4, 27);
# unstructured covriance
model1 <- gls(y ~ year.f * as.factor(gender), data = ds.order,
corr = corSymm(, form= ~ time | id),
weights = varIdent(form = ~ 1 | year.f))
# heterogenious compound symmetry
model11 <- gls(y ~ year.f * as.factor(gender),data=ds.order,corr=corCompSymm(, form= ~ time | id),
weights = varIdent(form = ~ 1 | year.f))
#compound symmetry
model12 <- gls(y ~ year.f * as.factor(gender),data=ds.order,corr=corCompSymm(, form= ~ time| id)
)

# heterogenious AR1
model21 <- gls(y ~ year.f * as.factor(gender),data=ds.order,corr=corAR1(, form= ~ time| id),
weights = varIdent(form = ~ 1 | year.f))
#AR1
model22 <- gls(y ~ year.f * as.factor(gender),data=ds.order,corr=corAR1(, form= ~ time | id)
)

anova(model1,model11)
anova(model11,model12)
anova(model1,model21)
anova(model21,model22)
summary(model2)
coef(model12);
ds.all=cbind(ds.order,model12$fitted)
fg=subset(ds.all, (Year==8 | Year==10));

model120 <- gls(y ~ Year* as.factor(gender),data=ds.order,corr=corCompSymm(, form= ~ time| id)
)
ds.all=cbind(ds.order,fit1=model120$fitted)
with(ds.all,{
interaction.plot(Year, gender, fit1,
xlab="Time (in Years)", ylab="Estimated Distance",
main="Time Plot, with Joined Line Segments",
legend=T)
})
#with the outlier removed:
which(rownames(ds.order)=="20.12")
ds.order=ds.order[-79,];

# unstructured covriance
model1 <- gls(y ~ year.f * as.factor(gender), data = ds.order,
corr = corSymm(, form= ~ time | id),
weights = varIdent(form = ~ 1 | year.f))
# heterogenious compound symmetry
model11 <- gls(y ~ year.f * as.factor(gender),data=ds.order,corr=corCompSymm(, form= ~ time | id),
weights = varIdent(form = ~ 1 | year.f))
#compound symmetry
model12 <- gls(y ~ year.f * as.factor(gender),data=ds.order,corr=corCompSymm(, form= ~ time| id)
)

# heterogenious AR1
model21 <- gls(y ~ year.f * as.factor(gender),data=ds.order,corr=corAR1(, form= ~ time| id),
weights = varIdent(form = ~ 1 | year.f))
#AR1
model22 <- gls(y ~ year.f * as.factor(gender),data=ds.order,corr=corAR1(, form= ~ time | id)
)
anova(model1,model11)
anova(model11,model12)
anova(model1,model12)
anova(model1,model21)
anova(model21,model22)
#unrestricted covariance model is chosen
summary(model1)
model10 <- gls(y ~ Year * as.factor(gender), data = ds.order,
corr = corSymm(, form= ~ time | id),
weights = varIdent(form = ~ 1 | year.f))

summary(model10)

You might also like