Professional Documents
Culture Documents
PERMUTATION TESTS
Example 1: ANOVA where H0 is that the treatment means are all equal.
The assumptions that must be true are that each treatment must have the
same variance and the same shape.
If in fact, the null hypothesis is true, then the observations are not
distinguishable by treatment but are instead from the same distribution
(one shape, mean and variance) and just happen to be randomly
associated with a treatment.
Method Under The Assumptions That The Distributions Are Identical Under
H0 And Sampling Is Random And With Replacement And Treatment
Assignment Is Random:
1) Calculate the test statistic for the hypotheses for the original observed
arrangement of data. This could be a sample correlation, an F-stat or a
MS or some other statistic. Call it κ 0 .
2) Now, randomly rearrange the data among the treatments (shuffle or
permute the data according to the experimental design; see below for
the case of matrices) and calculate the test statistic for the new
arrangement. Call it κ *p .
3) Store the permutation estimate κ *p .
4) Repeat steps 2-3 many times. Call the total number of times you repeat
the permutations P. That is p = 1, 2, …, P.
5) Compare κ 0 to the distribution of the permutation estimates κ *p . The p-
value for the test is
# (κ *p > κ p )
p − value = .
P
Example: The most famous use of permutation tests for ecological problems
is Mantel’s test of similarity of two symmetric matrices. Mantel’s test was
extended to allow more than 2 matrices by Smouse et al. 1986. We’ll look at
the simple case (2 matrices).
Mantel’s test is a test of the correlation between the elements in one matrix
with the elements in the other matrix where the elements within the matrices
have been organized in a very specific way (symmetric with zeroes on the
diagonal). Original use was to compare two distance matrices and that is still
the most common use today.
Question: Are the element-wise pairs, (a, α), (b, β), (c, χ), (d, δ), (e, ε), (f, φ),
correlated? Can we use Pearson’s correlation coefficient to test that?
Now, most of the matrices are not exactly as just shown above. More
specifically, the matrices are usually distance measures where distance is
some metric between the replicates involved in the study. For example,
matrix Y could be the number of genes not in common between sampled
animals in a study and matrix X could be the Euclidean distance between the
locations at which the animals were found.
The distance between a replicate and itself is 0 and the distances are
symmetric in the sense that the distance between F and H is the same as the
distance between H and F. So commonly we have matrices with the structure
Y X
animal 1 2 3 1 2 3
1 ⎡0 b c⎤ 1 ⎡0 β χ⎤
2 ⎢b 0 e⎥ 2 ⎢β 0 ε ⎥
⎢ ⎥ ⎢ ⎥
3 ⎢⎣c e 0⎥⎦ 3 ⎢⎣ χ ε 0 ⎥⎦
Matrixsize= 13
#X1 matrix
logDist=matrix(c(
0.00, 0.556, 0.607, 0.653, 0.708, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076,
0.556, 0.00, 0.161, 0.279, 0.398, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076,
0.607, 0.161, 0.00, 0.161, 0.312, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076,
0.653, 0.279, 0.161, 0.000, 0.204, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076,
0.708, 0.398, 0.312, 0.204, 0.000, 3.097, 3.097, 3.097, 3.097, 3.076, 3.076, 3.076, 3.076,
3.097, 3.097, 3.097, 3.097, 3.097, 0.000, 1.959, 1.959, 1.959, 1.820, 1.820, 1.820, 1.820,
3.097, 3.097, 3.097, 3.097, 3.097, 1.959, 0.000, 0.886, 0.896, 1.820, 1.820, 1.820, 1.820,
3.097, 3.097, 3.097, 3.097, 3.097, 1.959, 0.886, 0.000, 0.072, 1.820, 1.820, 1.820, 1.820,
3.097, 3.097, 3.097, 3.097, 3.097, 1.959, 0.896, 0.072, 0.000, 1.820, 1.820, 1.820, 1.820,
3.076, 3.076, 3.076, 3.076, 3.076, 1.820, 1.820, 1.820, 1.820, 0.000, 1.390, 1.405, 1.412,
3.076, 3.076, 3.076, 3.076, 3.076, 1.820, 1.820, 1.820, 1.820, 1.390, 0.000, 0.270, 0.356,
3.076, 3.076, 3.076, 3.076, 3.076, 1.820, 1.820, 1.820, 1.820, 1.405, 0.270, 0.000, 0.149,
3.076, 3.076, 3.076, 3.076, 3.076, 1.820, 1.820, 1.820, 1.820, 1.412, 0.356, 0.149, 0.000),
nrow=Matrixsize, ncol=Matrixsize)
So, permute Y by randomly rearranging the columns and then arranging the
rows to match the random rearrangement of the columns:
Aperm<- A[,temp]
> Aperm
[,1] [,2] [,3]
[1,] 12 13 11
[2,] 22 23 21
[3,] 32 33 31
Aperm<- Aperm[temp,]
> Aperm
[,1] [,2] [,3]
[1,] 22 23 21
[2,] 32 33 31
[3,] 12 13 11
Aperm<-A[temp,temp]
> Aperm preserves the symmetry of
[,1] [,2] [,3] the matrix
[1,] 22 23 21
[2,] 32 33 31
[3,] 12 13 11
Then do the permutations and get the resulting set of correlations. Compare
the original correlation against the permuted pairs.
← Frequency distribution of
600
Pearson’s assumes the relationship if it exists is linear. Is that the case here?
Change cor(Jvector,X1vector) to
cor(rank(Jvector),rank(X1vector)) and rerun the above code.
Mantel Correlogram
In order to study the structure of the Y matrix (usually the one of interest)
with respect to “distances” in the other matrix, it is of interest to look at the
correlation among values of Y for specific sets of “distances” in X. This is a
case of looking at AUTOcorrelation among subsets of values within a matrix
rather than correlation between two different variables. The correlogram is a
graphic displaying the autocorrelation for those different subsets. For
example, suppose I am interested in the autocorrelation among the
dissimilarities of the copepods as a function of log(distance). The way to do
that is to create a set of non-overlapping distance classes (called lag
First, I need to create the set of lag distances: (>0 – 1), (1 – 2), and (> 2).
# Lag distance matrix
lagDistMatrix=matrix(c(
0, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3,
1, 0, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3,
1, 1, 0, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3,
1, 1, 1, 0, 1, 3, 3, 3, 3, 3, 3, 3, 3,
1, 1, 1, 1, 0, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 0, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 2, 0, 1, 1, 2, 2, 2, 2,
3, 3, 3, 3, 3, 2, 1, 0, 1, 2, 2, 2, 2,
3, 3, 3, 3, 3, 2, 1, 1, 0, 2, 2, 2, 2,
3, 3, 3, 3, 3, 2, 2, 2, 2, 0, 2, 2, 2,
3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 0, 1, 1,
3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 0, 1,
3, 3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 1, 0),
nrow=Matrixsize, ncol=Matrixsize)
Then, for each lag distance, I need to create another matrix of 0s and 1s,
where the zeroes indicate that the distance is within the lag class or 1s
otherwise. Now perform Mantel’s test on these two matrices. Repeat until all
lag classes have been done.
# For example: Lag 1 matrix
lagDistMatrix1 = matrix(c(
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0),
nrow=Matrixsize, ncol=Matrixsize)
Run Mantel’s test on each lag distance matrix and Y. We obtain the
following results:
Very positive and very negative values indicate that the further away the
locations from one another, the more dissimilar the species composition (as
measured by 1-J).