You are on page 1of 8

1

Biol/Stat 2250

Exam 2010

Instructors: Julie Horrocks, Beren Robinson

Your Name Student ID

Time Limit:
• 2 hours

Aids permitted:
• Course notes, handouts, your own personal notes, material printed from website (including old
tests and assignments), stats textbooks, scientific calculator

• NO graphing calculators, computers, cell phones, etc.

Directions:
• SHOW YOUR WORK for part marks!
• The exam has 7 pages.
• There are 8 major questions on 6 pages after this one.
• The exam is marked out of 60 [grades per question are shown in square parentheses]
• For short and long answer questions, a better answer gets a higher mark.
• If you need more space to organize your thoughts, use the back of the previous page.
• Answers must be given in the spaces provided.

Grades:

Question Grade

1 /12

2 /4

3 /8

4 /6

5 /1

6 /10

7 /10

8 /9

Total /60
2

1. Black wheatears are small birds of Spain and Morocco. Males of the species demonstrate an exaggerated
sexual display by carrying many heavy stones to the nesting cavities. A study was done to determine whether
males that carry heavier stones are healthier. The response variable is a measure of health called tcell, which is
related to strength of the immune system. The explanatory variable, mass, represents the mass of the stone in
grams. A plot of the data and output of a simple linear regression model are shown below. [from Exercise 7.29 in
Statistical Sleuth].

Coefficients:
Value Std.Error t value Pr(>|
t|)
(Intercept) 0.0875 0.0787 1.1121 0.2800
mass 0.0328 0.0106 3.0843 0.0061

Residual standard error: 0.08102 on 19 df


Multiple R-Squared: 0.3336
F-statistic: 9.513 on 1 and 19 df,
the p-value is 0.006105

a) [4] Comment on whether the assumptions required for linear regression appear to be satisfied.

b) [3] What conclusions can you draw about the relationship between tcell response and the mass of
stones carried? (address significance, direction and magnitude).

c) [2] Give a formula for a (2-sided) 95% confidence interval for the slope parameter.

d) [1] Is the estimated value for the intercept meaningful for this data set? Explain.
3

e) [1] What is the estimated regression equation? Please give numerical values for the parameter
estimates and use variable names.

f) [1] What is the predicted value of tcell when mass=7?

2. A study was designed to test the effects of a pesticide (factor A, with 3 levels of formulation) and areal
application procedures (factor B, with 2 levels) on spruce budworm abundance in a Canadian forest. A
total of 72 forest sites were randomly selected. Each of the three pesticide formulations (A) was applied
to 24 randomly selected forest sites. For each pesticide formulation, half of the 24 sites were randomly
chosen to be treated by one application method (B) and the remaining half were treated by the other
application method.
a. [1] How many experimental units are there for each level of A: pesticide formulation?

b. [1] How many experimental units are there for each level of B: areal application?

c. [2] What will be the numerator and denominator DF for testing the interaction of A and B

The numerator DF are , and the denominator DF are .

3. Canopy cover (X), and undergrowth density (Y), were recorded at each of n = 27 randomly selected
sites in a forest. The Pearson’s r correlation coefficient was found to be r = -0.67.
A test will be conducted with
H0: There is no association between canopy cover and undergrowth intensity, versus
HA: There is an association between canopy cover and undergrowth intensity.
Determine the strength of the evidence against H0 using the tdata method shown in the course, using the
following steps:

a. [1] Provide the formula for the test statistic:

b. [2] The numeric value for the test statistic is with DF.

c. [2] Assuming that the P-value for the evidence against H0 is P = 0.0034, the biological
conclusion is:
4

d. [2] What assumptions are required for the above test to be valid? Give at least two assumptions.

e. [1] The proportion of variation in undergrowth density that is related to canopy cover in this
dataset is:

4. Phosphorus is implicated in the invasion of native vegetation by exotic weeds. Clements (1983)
investigated how phosphorus varies with topographic location and soil type, in an area around Sydney,
Australia. Two types of SOIL (Shale-derived and sandstone-derived) and four different topographies
(TOPO) (valleys, north-facing slopes, south facing slopes and hilltops) were examined. There were
three plots in each of the eight combinations of soil type and topography. The response variable was
total phosphorus per plot in ppm.

The complete data set is shown below

Topographies (TOPO)
VALLEY NORTH SOUTH HILLTOP
SHALE 98 78 117 83
172 77 54 12
SOIL 185 100 96 14
SANDSTONE 19 27 28 55
39 49 53 21
25 24 72 19

a. [5] Fill in the missing values in the following ANOVA table.

Source Df Sum of Mean F Value Pr>F


Squares Square
TOPO 969 323 0.0679
SOIL 1788 22.92 0.0002
SOIL*TOPO 0.0135
Residuals 1245 78
Total 5141

b. [1] Does the effect of Topography (TOPO) depend on soil type? Support your answer.
5

5. [1] A study estimated the relationship between age (days) and body weight in pigeons from hatching
to molting in a wild population. Five pigeon eggs were randomly sampled from the population and
grown under standardized natural conditions. At hatching (day 0) and every second day thereafter until
molting (day 28), each fledgling was weighed. Simple linear regression was used to assess the
relationship between the body weights of chicks at each time and age (n=75 observations). Identify the
flaw in this analysis.

6. Circle True or False.


a. [1] Contingency table analyses (chi-squared tests) become unreliable when any category’s expected
value is zero.
True False

b. [1] Measuring a covariate after applying the treatment is always as good as measuring the covariate
before applying the treatment in ANCOVA.

True False

c. [1] A completely randomized design (CRD) requires as much work to set up as a randomized
complete block design (RCBD), all else being equal.

True False

d. [1] The purpose of a randomized complete block design is to remove the effects of a categorical
confounding variable (or a continuous confounding variable classified into categories) in the
study.

True False

e. [1] ‘Randomization’ in the randomized complete block design refers to the choice of blocks
used.

True False

f. [1] In regression, interpolation refers to prediction within the range of observed data values

True False

g. [1] One important purpose of good experimental design in multi-factor studies is to add or include
variation in each factor that is independent of all other factors of interest.

True False
6

h. [1] A principle goal of both ANCOVA and RCBD ANOVA analyses is to reduce the error sums of
squares in the analysis.

True False

i. [1] The number of possible interactions in a 3 factor analysis is three.

True False

j. [1] Suppose that the 95% confidence interval for a coefficient (β) in a multiple linear regression is
(-2.0, 3.2). From this we can infer that the true coefficient is significantly different from 0.

True False

7. Researchers were interested in determining if hair color was linked to gender in humans. A 2 by 4
contingency table is provided below showing the frequency of individuals from a random sample of
humans by sex and hair color category.
Sex Black Brown Blond Red Total
Male 32 43 16 9 100
Female 55 65 64 16 200
Total 87 108 80 25 300

a. [2] Provide the relevant null and alternate hypotheses.

b. [2] Provide the formulae and calculate the value expected only for the Female Blond category if
Ho is true.

c. [1] Provide the formulae for determining X2data

d. [1] The DF for X2data is:

e. [2] Provide the formula for a residual, and complete the table of residual values for this data set
if the expected value for the Male Blond category is 26.67

Formula:
7

Sex Black Brown Blond Red

Male 0.56 1.17 0.23


Female -0.39 -0.83 1.46 -0.16

f. [2] The value of X2data in this case was 8.987 with a P-value = 0.029. Provide your biological
conclusions, and describe the direction of the effect.

8. To determine if UVB radiation inhibits INHIBIT UVB DEPTH DEEPV


1 0.0 0.0000 DEEP 1
phytoplankton growth, researchers sampled the 2 1.0 0.0000 DEEP 1
ocean column at 2 depths and 17 locations around 3 6.0 0.0100 DEEP 1
Antarctica during the austral spring of 1990. (from 4 7.0 0.0150 SURFACE 0
5 7.0 0.0185 SURFACE 0
Statistical Sleuth, ex1026). The variable INHIBIT 6 7.0 0.0335 SURFACE 0
represents the percentage of inhibition of normal 7 9.0 0.0435 SURFACE 0
8 9.5 0.0090 DEEP 1
phytoplankton production at each site. The variable 9 10.0 0.0025 DEEP 1
UVB is a measure of UVB exposure. DEEPV is an 10 11.0 0.0255 SURFACE 0
indicator variable constructed by the data analyst 11 12.5 0.0280 SURFACE 0
12 14.0 0.0055 DEEP 1
that takes the value 1 if the observation was taken 13 20.0 0.0285 DEEP 1
deep in the ocean column, and 0 if the observation 14 21.0 0.0435 SURFACE 0
was taken at the surface of the ocean. The data are 15 25.0 0.0180 DEEP 1
16 39.0 0.0325 DEEP 1
shown at the right: 17 59.0 0.0300 DEEP 1

A General Linear Model was fit to the data and the output is shown below:
INHIBIT ~ UVB * DEEPV
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.967 9.726 0.305 0.7651
UVB 258.936 309.612 0.836 0.4181
DEEPV -1.467 10.538 -0.139 0.8914
UVB:DEEPV 980.039 381.539 2.569 0.0234
Residual standard error: 8.521 on 13 degrees of freedom
Multiple R-squared: 0.7289, Adjusted R-squared: 0.6663
F-statistic: 11.65 on 3 and 13 DF, p-value: 0.0005498

a. [2] Write down an equation for the full regression model.

b. [2] Write down separate equations for the two depths.


8

c. [2] Make a sketch showing the geometry of this model with UVB on the x axis and INHIBIT on
the y axis. Be sure to label lines and axes.

d. [3] Does UVB affect phytoplankton growth? If so, how?

You might also like