106 Project 1

Bailey Wang
Professor Melcon
STA106
Introduction:
The experiment tests whether different weighted sparrows are attracted to differing sized
nests, on Kent Island, or not. The experiment will test whether heavy sparrows are only attracted
to large nests or if they are attracted to small or average nests. Along with determining how
different weighted sparrows pick their nests. Thus, this experiment will help understand the
sparrow’s nesting behaviors.
Some approaches that are used for testing are: hypothesis testing, confidence interval,
Tukey-Kramer confidence interval, pairwise confidence interval, power, Shaprio-Wilk, F-test,
and Brown-Forsythe.
Summary:
The boxplot for the data shows that there are a few outliers for the lower tail, therefore
the data is considered to be skewed. The histogram reveals that the data is actually left-skewed
meaning that the mean is less than the median. This is also further supported by the Q-Q normal
plot showing the skewed tails. Therefore, the normal assumption is violated.
According to the fitted values plot, the reduced group is much smaller when compared to
the control and enlarged group. From the sample variances, it appears that the control and
enlarged nests are considered significant. The constant variance assumption is seeming to be
questionable and requires a more objective test to determine.
The overall sample mean is 14.1344. The reduced mea is 15.569, the enlarged is 13.515,
and the control is 13.924. Thus the sample means are all around the overall mean. Similarly, the
enlarged and control groups’ standard deviations appear to be similar to the overall standard
deviations, while the reduced does not.
Sample Means Sample SDs Sample Sizes
Total 14.13448 2.242596 116
Reduced 15.56923 1.459252 26
Total 14.13448 2.242596 116
Reduced 15.56923 1.459252 26
Enlarged 13.51556 2.103996 45
Control 13.92444 2.419631 45
Diagnostics:
It can be assumed that the sparrows randomly chose which nest to pick. Assuming the
only object manipulated was the nest size, therefore the sparrows’ choice of nest was most likely
unaffected by other sparrows’ choices. However, it could also be the case where multiple nests
are located throughout Kent Island and if one nest was already taken, it would affect other
sparrows’ choices. However, the previous plots show that the normality and equal variances do
not hold. Despite the violations, we assume that those assumptions hold for our testing.
Analysis:
The null hypothesis is the mean weight of sparrows are the same in different treatment groups.
The alternative hypothesis is at least one of those mean weight of sparrows is different.
HA: At least one does not equal
There are 5 outliers in the reduced group, 2 outliers in enlarged group, and 2 outliers in
the control group according to the boxplot. Unfortunately, we cannot determine the causes of
these outliers, therefore we decide to leave them in the data.
All tests will use .
F-test:
DF Sum Mean Sq F value P(>F)

Treatment 2 72.74 36.272 .1288 .0005031
Residuals 113 505.62 4.474
By the ANOVA F-test, we reject Ho, and conclude that at least one of those mean weight
of sparrows is different.
Tukey
*= 2.374
Bonferroni
=2.4301
Scheffe
= 2.4805
Comparing Tueky, Bonferroni, and Scheffe values, since Tukey has the smallest value,
we use Tukey as our multiplier.
Tukey’s Confidence Interval
Dff Lower Upper Decision Adj p-value

Dff Lower Upper Decision Adj p-value
Control-Enlarged .40889 -.65023 1.468 FTR H0 .630776
Control-Reduced -1.64479 -2.88236 -.40721 Reject H .005765
0
Enlarged-Reduced -2.05368 -3.29125 -.08161 Reject H .000412
0
Control Vs Reduced, we reject Ho, and conclude that the control group true mean weight
of sparrows are lighter than those in the reduced group by between -2.8 to -.4.
Control vs enlarged, we fail to reject, and conclude that the mean weight of the sparrows
in the control group and enlarged group are not significantly different.
Shapiro-Wilks
H0: The data is normally distributed
Ha: The data is not normally distributed
= .93236
p-value: 2.761*10-5
By the Shapiro-Wilks test, we reject Ho, the distribution does not follow a normal
distribution.
Brown-Forsythe
H0:
Ha: At least one is not equal
= 1.7494
DF F value P(>F)
Group 2 1.7494 .1786
113
By the Brown-Forsythe test, we fail to reject Ho, and the variances are constant.
power= = 0.9547
The hypothesis test is the probability that the test is rejected when it is false.
Confidence Intervals:
-*
i. j.
Reduced: (13.17554,14.67335)
Control vs Enlarged: (-.65022,1.4680)
Control vs Reduced: (-2.8823,-.40721)
The reduced group has the largest true mean weighted sparrows compared to the other
two groups. The control group has the true mean weight of sparrows .4072 and 2.8823 lower
than the reduced group. Finally, the control group and the enlarged group are not significantly
different.
Interpretation:
We are 95% confident that the true mean weight of sparrows between the control and
reduced to be within -2.8823 and -.40721. As well as being 95% confident that the true mean
weight of sparrows between control and enlarged has no significant difference. By the Brown-
Forsythe test, we conclude that there is sufficient evidence to prove the claim that the variances
are equal. From the Shapiro-Wilks test, we conclude that the distribution does not follow a
weight of sparrows between control and enlarged has no significant difference. By the Brown-
Forsythe test, we conclude that there is sufficient evidence to prove the claim that the variances
are equal. From the Shapiro-Wilks test, we conclude that the distribution does not follow a
normal distribution.
Conclusions:
The ANOVA assumptions for normality and constant variance are violated, therefore our
conclusion may not be dependable. However, from the ANOVA test, the true mean weight of
sparrows differs for at least one group. The reduced nest group significantly increase the true
mean weight sparrows compared to the control group. Enlarged are not significantly different
from the control group, thus this treatment is not effective. The data contains outliers shown by
the boxplot and a single outlier from the standardized residual, hence we are uncertain of the
causes for those outliers. Therefore, stated before that the ANOVA assumptions were violated,
the results are not considered reliable.
R-Code:
data1 = read.csv("~/Desktop/RStudio/Data106/sparrow.csv")
library("car")
library("ggplot2")
boxplot=ggplot(data1, aes(y=Weight, x = Treatment))+
geom_boxplot() +ylab("Weight")+
xlab(" ")+coord_flip()+
ggtitle("Treatment vs Weight")
boxplot
ggplot=ggplot(data1,aes(x=Weight))+
geom_histogram(binwidth =2, color = "black",fill = "white")+
facet_grid(Treatment ~.)+
ggtitle("Treatment vs Weight")
ggplot
meanreduced=mean(data1$Weight[data1$Treatment == "reduced"])
meanenlarged=mean(data1$Weight[data1$Treatment == "enlarged"])
meancontrol=mean(data1$Weight[data1$Treatment == "control"])
sdreduced=sd(data1$Weight[data1$Treatment == "reduced"])
sdenlarged=sd(data1$Weight[data1$Treatment == "enlarged"])
sdcontrol=sd(data1$Weight[data1$Treatment == "control"])
give.me.CI = function(ybar,ni,ci,MSE,multiplier){
if(sum(ci) != 0 & sum(ci !=0 ) != 1){
return("Error - you did not input a valid contrast")
} else if(length(ci) != length(ni)){
return("Error - not enough contrasts given")
}
else{
estimate = sum(ybar*ci)
SE = sqrt(MSE*sum(ci^2/ni))
CI = estimate + c(-1,1)*multiplier*SE
result = c(estimate,CI)
names(result) = c("Estimate","Lower Bound","Upper Bound")
return(result)
}
}
give.me.power = function(ybar,ni,MSE,alpha){
a = length(ybar) # Finds a
nt = sum(ni) #Finds the overall sample size
overall.mean = sum(ni*ybar)/nt # Finds the overall mean
nt = sum(ni) #Finds the overall sample size
overall.mean = sum(ni*ybar)/nt # Finds the overall mean
phi = (1/sqrt(MSE))*sqrt( sum(ni*(ybar - overall.mean)^2)/a) #Finds the books value of phi
phi.star = a *phi^2 #Finds the value of phi we will use for R
Fc = qf(1-alpha,a-1,nt-a) #The critical value of F, use in R's function
power = 1 - pf(Fc, a-1, nt-a, phi.star)# The power, calculated using a non-central F
return(power)
}
the.model = lm(Weight~Treatment, data = data1)

anova.table = anova(the.model)
data1$ei = the.model$residuals
group.means = by(data1$Weight,data1$Treatment,mean)
group.nis = by(data1$Weight,data1$Treatment,length)
nt = nrow(data1) #Calculates the total sample size
a = length(unique(data1$Treatment)) #Calculates the value of a
SSE = sum(data1$ei^2) #Sums and squares the errors (finds SSE)
MSE = SSE/(nt-a) #Finds MSE
eij.star = the.model$residuals/sqrt(MSE)
alpha = 0.05
t.cutoff= qt(1-alpha/(2*nt), nt-a)
CO.eij = which(abs(eij.star) > t.cutoff)
CO.eij
rij = rstandard(the.model)
CO.rij = which(abs(rij) > t.cutoff)
CO.rij
CO1= which(data1$Weight > 15)

CO2 = which(data1$Weight < 13)
CO3 = which(data1$Treatment=="reduced" & data1$Weight < 14)
CO4 = which(data1$Treatment == "control" & data1$Weight < 9)
CO5 = which(data1$Treatment == "enlarged" & data1$Weight < 9)
Outliers = CO3
new.data = data1[-Outliers,]
new.model = lm(Weight~Treatment,data = new.data)
g=3
B = qt(1-alpha/(2*g),nt-a)
Tuk = qtukey(1-alpha,a,nt-a)/sqrt(2)
S = sqrt((a-1)*qf(1-alpha, a-1, nt-a))
ci = c(1,0,0)
ci = c(1,-1,0)
ci = c(1,0,-1)
give.me.CI(group.means,group.nis,ci,MSE,Tuk)
bonfCI(data1$Weight,data1$Treatment, conf.level = 1-alpha)
scheffeCI(data1$Weight,data1$Treatment, conf.level = 1-alpha)
tukeyCI(data1$Weight,data1$Treatment, conf.level = 1-alpha)
the.power = give.me.power(group.means,group.nis,MSE,0.05)
overall.mean = sum(group.means*group.nis)/sum(group.nis)
#gammai = group.means - mean(group.means)
n = as.numeric(table(data1$Treatment))
gammai = group.means - sum(group.means*n)/(sum(n))
qqnorm(new.model$residuals)
qqline(new.model$residuals)
qqnorm(new.model$residuals)
qqline(new.model$residuals)
ei = new.model$residuals
the.SWtest = shapiro.test(ei)
the.SWtest
plot(new.model$fitted.values, new.model$residuals, main = "Errors vs. Group Means",

xlab = "Group Means",ylab = "Errors",pch = 19)
abline(h = 0,col = "purple")
qplot(Treatment, ei, data = new.data) +

ggtitle("Errors vs. Fitted Values") +
xlab("Group") + ylab("Errors") +
geom_hline(yintercept = 0,col = "purple")
boxplot(ei ~ Treatment, data = new.data, main = "Residual vs Treatment")
the.BFtest = leveneTest(ei~ Treatment, data=data1, center=median)

p.val = the.BFtest[[3]][1]
p.val

106 Project 1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

106 Project 1

Uploaded by

Copyright:

Available Formats

Bailey Wang

HA: At least one does not equal

All tests will use .

DF Sum Mean Sq F value P(>F)

Tukey’s Confidence Interval

Dff Lower Upper Decision Adj p-value

the.model = lm(Weight~Treatment, data = data1)

CO1= which(data1$Weight > 15)

plot(new.model$fitted.values, new.model$residuals, main = "Errors vs. Group Means",

qplot(Treatment, ei, data = new.data) +

boxplot(ei ~ Treatment, data = new.data, main = "Residual vs Treatment")

the.BFtest = leveneTest(ei~ Treatment, data=data1, center=median)

You might also like