Professional Documents
Culture Documents
Chromosome Variation
M. F. Hammer,* T. Karafet,* A. Rasanayagam,* E. T. Wood,* T. K. Altheide,* T. Jenkins,
R. C. Griffiths,\ A. R. Templeton, and S. L. Zegura
*Laboratory of Molecular Systematics and Evolution and Department of Anthropology, University of Arizona; Laboratory
of Human Molecular and Evolutionary Genetics, Institute of Cytology and Genetics, Novosibirsk, Russia; Department of
Human Genetics, University of Witwatersrand, Johannesburg, South Africa; \Mathematics Department, Monash University,
Clayton, Australia; and Department of Biology, Washington University, St. Louis, Missouri
We surveyed nine diallelic polymorphic sites on the Y chromosomes of 1,544 individuals from Africa, Asia, Europe,
Oceania, and the New World. Phylogenetic analyses of these nine sites resulted in a tree for 10 distinct Y haplotypes
with a coalescence time of ;150,000 years. The 10 haplotypes were unevenly distributed among human populations:
5 were restricted to a particular continent, 2 were shared between Africa and Europe, 1 was present only in the Old
World, and 2 were found in all geographic regions surveyed. The ancestral haplotype was limited to African
populations. Random permutation procedures revealed statistically significant patterns of geographical structuring
of this paternal genetic variation. The results of a nested cladistic analysis indicated that these geographical associations arose through a combination of processes, including restricted, recurrent gene flow (isolation by distance)
and range expansions. We inferred that one of the oldest events in the nested cladistic analysis was a range expansion
out of Africa which resulted in the complete replacement of Y chromosomes throughout the Old World, a finding
consistent with many versions of the Out of Africa Replacement Model. A second and more recent range expansion
brought Asian Y chromosomes back to Africa without replacing the indigenous African male gene pool. Thus, the
previously observed high levels of Y chromosomal genetic diversity in Africa may be due in part to bidirectional
population movements. Finally, a comparison of our results with those from nested cladistic analyses of human
mtDNA and b-globin data revealed different patterns of inferences for males and females concerning the relative
roles of population history (range expansions) and population structure (recurrent gene flow), thereby adding a new
sex-specific component to models of human evolution.
Introduction
A holistic understanding of both the biocultural
processes responsible for and the actual trajectory of
human evolution ultimately requires the integration of
data from genetics, paleontology, archaeology, linguistics, and ethnohistory (Cavalli-Sforza, Menozzi, and Piazza 1994). Within genetics, autosomal and haploid systems can offer complementary, but not necessarily identical, glimpses into our evolutionary history (Jorde et al.
1995). It is also possible that maternal (mtDNA) and
paternal (the nonrecombining portion of the Y chromosome) lineages will offer different insights into the
global dispersal of Homo sapiens (Cavalli-Sforza and
Minch 1997). Stoneking (1993) underscored the growing utility of mtDNA data for understanding human maternal evolutionary pathways. Recently, Hammer and
Zegura (1996) also presented an extensive review of the
Y chromosome literature pertaining to human evolutionary studies.
In a subsequent data-oriented paper, Hammer et al.
(1997) proposed that their Y chromosome combination
haplotype distributions were compatible with multiple
human migrations. The high haplotype diversity values
found in sub-Saharan Africa were attributed to great
population antiquity and/or large effective population
size. Although most of their hypothesized dispersal
Key words: Y chromosome haplotypes, human evolution, nested
cladistic test, coalescence times, population structure.
Address for correspondence and reprints: Michael Hammer, Department EEB, Biosciences West, University of Arizona, Tucson, Arizona 85721. E-mail: mhammer@u.arizona.edu.
Mol. Biol. Evol. 15(4):427441. 1998
q 1998 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038
428
Hammer et al.
Table 1
Y Chromosome Haplotype Frequencies (%) in 35 Populations and 8 Geographic Regions (N 5 1,544)
HAPLOTYPE
POPULATION
1A
1B
1C
1D
1E
3G
3A
Sub-Saharan Africans . . . . . . . . . . .
1. Khoisan . . . . . . . . . . . . . . . . .
2. Pygmies . . . . . . . . . . . . . . . . .
3. West Africans . . . . . . . . . . . .
4. East Africans. . . . . . . . . . . . .
5. East Bantus . . . . . . . . . . . . . .
6. West Bantus . . . . . . . . . . . . .
7. Dama . . . . . . . . . . . . . . . . . . .
Europeans . . . . . . . . . . . . . . . . . . . .
8. British . . . . . . . . . . . . . . . . . .
9. Germans . . . . . . . . . . . . . . . .
10. Russians . . . . . . . . . . . . . . . .
11. Italians . . . . . . . . . . . . . . . . . .
12. Greeks . . . . . . . . . . . . . . . . . .
North Asians
13. Komi . . . . . . . . . . . . . . . . . . .
14. West Siberians . . . . . . . . . . .
15. Kets . . . . . . . . . . . . . . . . . . . .
16. Buryats . . . . . . . . . . . . . . . . .
17. Evenks . . . . . . . . . . . . . . . . . .
18. Eskimos . . . . . . . . . . . . . . . . .
Central Asians . . . . . . . . . . . . . . . . .
19. Altai. . . . . . . . . . . . . . . . . . . .
20. Mongolians . . . . . . . . . . . . . .
21. Tibetans . . . . . . . . . . . . . . . . .
East Asians . . . . . . . . . . . . . . . . . . .
22. Koreans . . . . . . . . . . . . . . . . .
23. Japanese. . . . . . . . . . . . . . . . .
24. South Chinese. . . . . . . . . . . .
25. Taiwanese . . . . . . . . . . . . . . .
26. Indonesians . . . . . . . . . . . . . .
27. Southeast Asians. . . . . . . . . .
South Asians . . . . . . . . . . . . . . . . . .
28. Indians . . . . . . . . . . . . . . . . . .
29. Sri Lankans . . . . . . . . . . . . . .
Australasians . . . . . . . . . . . . . . . . . .
30. Papua New Guineans . . . . . .
31. Melanesians. . . . . . . . . . . . . .
32. Australian Ab. People . . . . .
Native Americans . . . . . . . . . . . . . .
33. Tanana . . . . . . . . . . . . . . . . . .
34. Navajo . . . . . . . . . . . . . . . . . .
35. Amerinds. . . . . . . . . . . . . . . .
380
70
38
59
44
95
54
20
217
44
21
20
48
84
235
21
46
12
81
55
20
101
31
43
27
296
29
97
59
20
53
38
155
72
83
116
46
37
33
44
8
18
18
6
31
0
0
0
0
0
5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
6
20
0
3
0
4
0
5
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
17
17
45
7
2
26
6
5
33
11
38
25
31
46
83
100
74
8
96
98
30
51
32
74
33
82
90
52
98
95
100
95
72
68
76
96
100
95
91
18
75
11
0
2
1
0
0
0
0
7
5
33
73
33
20
35
13
15
0
24
83
3
0
65
11
16
14
0
1
0
2
2
0
0
3
13
17
10
3
0
3
9
82
25
89
100
0*
0
0
0
0
0
2
0
12
9
19
50
2
7
2
0
2
8
1
2
5
20
48
9
4
0*
3
0
0
0
0
0
15
15
15
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
19
3
2
63
17
7
46
0
5
0
3
0
0
0
0
0
0
0
0
0
0
0
8
0
5
15
11
11
4
10
0*
0
0
0
2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
5
10
0
9
0
4
4
5
22
7
10
5
29
33
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
57
20
50
66
86
55
78
65
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
NOTE.Frequencies were rounded off to the nearest integer. As a result, some haplotype rows do not sum to 100%. Zeros with an asterisk indicate the presence
of a haplotype at a frequency greater than 0 but less than 0.005.
DNA Extraction
All new DNA samples referred to above were provided by co-authors except the following: 4 West Africans from G. Rappold, 50 Indonesians from M. Stoneking, 56 South Asian Indians from R. Herrera, and 37
Melanesians from J. Martinson. Eleven of the new DNA
samples were isolated from cell lines in the Y Chromosome Consortium Repository, while the others were
extracted from buccal cells as described in Hammer et
al. (1997). All sampling protocols were approved by the
Human Subjects Committee at the University of Arizona.
Genotyping Y Chromosome Markers
Whitfield, Sulston, and Goodfellow (1995) identified three polymorphic nucleotide sites within the SRY
region on the short arm of the Y chromosome (Yp) and
429
Table 2
Nucleotide States at 8 Polymorphic Sites on 21 Human Y Chromosomes
SRY REGION
ORIGIN
OF INDIVIDUAL
Humans
1. European (Italian) . . . . . . . . .
2. Bougainville (Nasioi). . . . . . .
3. South American (Surui) . . . .
4. Namibian (!Kung) . . . . . . . . .
5. Zaire (Mbuti Pygmy). . . . . . .
6. South African (Xhosa). . . . . .
7. South African (Zulu) . . . . . . .
8. South African (Sotho) . . . . . .
9. Namibian (!Kung) . . . . . . . . .
10. Zaire (Mbuti Pygmy). . . . . . .
11. Zaire (Mbuti Pygmy). . . . . . .
12. African American. . . . . . . . . .
13. African American. . . . . . . . . .
14. Australian Ab. Person . . . . . .
15. Australian Ab. Person . . . . . .
16. European (Ashkenazi) . . . . . .
17. European. . . . . . . . . . . . . . . . .
18. Japanese . . . . . . . . . . . . . . . . .
19. Japanese . . . . . . . . . . . . . . . . .
20. Japanese . . . . . . . . . . . . . . . . .
21. Iraqi (Jewish) . . . . . . . . . . . . .
Outgroupb . . . . . . . . . . . . . . . . . . . . .
a
YAP REGION
4064
9138
10831
YAP
PN1
PN2
PN3
DYS257
HTYPEa
G
G
G
G
A
A
A
G
G
A
G
A
A
G
A
G
G
G
G
G
G
G
C
T
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
A
G
G
A
G
G
G
G
A
G
G
G
G
G
G
G
G
G
G
G
G
A
2
2
2
2
1
1
1
2
2
1
2
1
1
2
1
2
2
1
2
1
2
2
C
C
C
C
C
C
T
C
C
T
C
T
T
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
T
C
C
T
C
T
T
C
T
C
C
C
C
C
C
C
G
G
G
G
G
G
G
G
A
G
G
G
G
G
G
G
G
G
G
G
G
G
A
G
A
G
G
G
G
G
G
G
G
G
G
G
G
G
A
G
G
G
G
Gc
1D
1E
1C
1A
3A
3A
5
1B
2
5
1B
5
5
1B
4
1B
1C
3G
1B
3G
1B
Haplotype (Htype) refers to the combination of states found on each human Y chromosome.
We examined all seven homologous sites within the SRY and YAP regions in a sample of four common chimpanzees (for site 10831, a total of 25 common
chimpanzees was included in the sample), one pygmy chimpanzee, three gorillas, and two orangutans.
c DYS257 STS repeat sequences were obtained from outgroups by cloning and sequencing PCR products from a single male from each of the five hominoid
species. Phylogenetic analysis revealed a G at position 108 of the eight great ape STS repeats most closely related to the human sequences.
b
430
Hammer et al.
clades are now formed. This nesting procedure is repeated until a nesting level is reached such that the next
higher nesting level would result in only a single category spanning the entire original haplotype network.
The resulting nested clades are designated by C-N
where C is the nesting level of the clade and N is
the number of a particular clade at a given nesting level.
Once the haplotype tree has been converted into a
nested statistical design, the geographical data are quantified by forming two distance statistics (Templeton,
Routman, and Phillips 1995): (1) the clade distance, Dc,
which measures the geographical range of a particular
clade; and (2) the nested clade distance, Dn, which measures how a particular clade is geographically distributed
relative to its closest evolutionary sister clades (i.e.,
clades in the same next-higher-level nesting category).
Distance contrasts between older and younger clades are
important in discriminating the potential causes of geographical structuring of the genetic variation (Templeton, Routman, and Phillips 1995). In our analysis, temporal polarity is determined by outgroup comparisons.
The statistical significances of the different distance
measures and the oldyoung (interior-tip) contrasts are
determined by random permutation testing. This procedure simulates the null hypothesis of a random geographical distribution for all clades within a nesting category given the marginal clade frequencies and sample
sizes per locality. There are two major reasons for failing to reject this null hypothesis: (1) the samples are
inadequate to detect geographical structuring even
though it exists, and (2) the population is panmictic over
the sampled area such that any haplotype frequency differences are due only to sampling or drift effects that
do not result in a geographic pattern. As there is no way
of discriminating between these alternatives with the existing data, biological inference is confined to those
cases in which the null hypothesis is rejected.
If the null hypothesis is rejected, the analysis continues by seeking the biological causes of these statistically significant (i.e., nonrandom) haplotypegeography associations in terms of population structure and/or
population history considerations. Templeton, Routman,
and Phillips (1995) consider three major biological factors that can cause a significant spatial/phylogenetic association of haplotype variation: (1) recurrent genetic
drift coupled with restricted gene flow, particularly gene
flow restricted by isolation by distance; (2) past fragmentation events; and (3) population range expansion.
The detailed expected impacts of these types of evolutionary forces and events on the nested patterns of geographical distances are given in Templeton, Routman,
and Phillips (1995), and an empirical validation is given
in Templeton (1998a). In order to make this pattern
evaluation explicit and consistent, a detailed inference
key is provided as an appendix to Templeton, Routman,
and Phillips (1995). The nested clade analysis does not
treat these patterns as mutually exclusive, but instead
searches for multiple overlaying patterns within the
same data set. The aforementioned key also incorporates
the types of pattern artifacts that can arise from inadequate sampling. As a consequence, even though statis-
431
(1)
and m is the total mutation rate per sequence per generation. The mean and standard deviation of the time to
the most recent common ancestor (TMRCA) and the
ages of each of the mutations can be found conditional
on the full information of the pattern of mutations in the
sequences, or equivalently, from the gene tree deduced
from the data assuming an infinitely-many-sites model
of mutation (Griffiths and Tavare 1994). Thus, the approach taken here (Griffiths and Tavare 1994; Harding
et al. 1997) simulates a coalescent process including
time information conditional on a specified haplotype
tree with a given value of u, the only independent parameter in the model. Specifically, an advanced simulation technique was used such that each simulation run
generated an estimated likelihood, TMRCA, and mutational ages. If there are r simulation runs generating
(TMRCA, likelihood) pairs (t1, l1), . . . , (tr, lr), then an
empirical distribution of the TMRCA has (TMRCA,
probability) pairs (ti, pi), where
@O l ,
r
pi 5 li
j51
i 5 1, . . . , r.
(2)
432
Hammer et al.
FIG. 3.Y chromosome haplotype tree based on nine polymorphic markers. Each crossbar represents a single point mutational event.
Circle areas reflect relative, rather than exact, global haplotype frequency proportionalities. Dotted lines represent alternative tree topologies resulting from homoplasy in the data set (see text for explanation). The arrows on the lineages connecting haplotypes 1A, 1B, 1C,
and 1D represent the resolved tree topology after a parsimony analysis
including a 10th mutational character, M9.
433
FIG. 4.Histogram representations of 10 Y chromosome haplotype frequencies for 8 geographic regions. The haplotype frequency for 1B
in Australasians (96%) is not drawn to scale.
434
Hammer et al.
Table 3
Nested Contingency Analysis of Geographic Associations
Cladea
1-1 . . . . . . . . . . . . . . . . . . . .
1-3 . . . . . . . . . . . . . . . . . . . .
1-4 . . . . . . . . . . . . . . . . . . . .
1-5 . . . . . . . . . . . . . . . . . . . .
2-1 . . . . . . . . . . . . . . . . . . . .
2-2 . . . . . . . . . . . . . . . . . . . .
Entire cladogram . . . . . . . .
Permutational
Chi-Square Statistic
7.70
99.00
196.62
94.02
587.00
243.36
1,470.14
Probabilityb
0.042
0.000
0.000
0.000
0.000
0.000
0.000
435
FIG. 6.Results of the nested geographic analysis of the human Y chromosome haplotypes. The nested design is given in figure 5, as are
the haplotype and clade designations. Following the name or number of any given clade are the clade (Dc) and nested clade (Dn) great circle
distances. The oldest clade within the nested group is indicated by shading. The average difference between the oldest versus younger clades
within a nesting category (with haplotype 1A as the root) for both distance measures is given in the row labeled O-Y below a dashed line.
A single superscript S or L designates a significantly small or large distance at the 5% level, respectively, and double letters indicate
significance at the 1% level. Below the O-Y row is the inference key chain, wherein the numbers refer to the sequence of questions in the
interpretative key given in the appendix in Templeton, Routman, and Phillips (1995). Following the sequence of numbers is the answer to the
final question in this interpretive key (yes/no) along with the final biological inference (IBD 5 isolation by distance; RE 5 range expansion).
For any clade level, the inference results are given at the bottom of the preceding column in the rectangle connected to the next higher nesting
level. For the entire cladogram (3-1), the inference result is given at the bottom of the last column, labeled 2-Step Clades.
type (3) was thought to have had an Asian origin. Because haplotypes 4 and 5 evolved from haplotype 3 and
accounted for the majority of African Y chromosomes,
the implication of this hypothesis was that a large portion of African paternal diversity had its roots in Asia.
436
Hammer et al.
Table 4
Main Inferences from Results of Nested Cladistic Analysis
Clade
Haplotypes nested in 1-1 . . . . . . . . . . . . . . . . . . . . . . . .
Haplotypes nested in 1-3 . . . . . . . . . . . . . . . . . . . . . . . .
Haplotypes nested in 1-4 . . . . . . . . . . . . . . . . . . . . . . . .
Haplotypes nested in 1-5 . . . . . . . . . . . . . . . . . . . . . . . .
One-step clades nested in 21. . . . . . . . . . . . . . . . . . . .
One-step clades nested in 2-2 . . . . . . . . . . . . . . . . . . . .
Two-step clades nested in entire cladogram (3-1) . . . .
Inference
Restricted gene flow with isolation by distance within Africa
Range expansion (Asia into Africa)
Restricted gene flow with isolation by distance (Africa and Europe)
Global restricted gene flow with isolation by distance
Range expansion out of Africa
Restricted gene flow with isolation by distance (Old World)
Contiguous range expansion; however, source is ambiguous because of widespread
distribution of oldest clade (2-1)
NOTE.Only clades resulting in the rejection of the null hypothesis are included above.
Although Hammer et al. (1997) compared the geographic distribution of Y chromosome haplotypes with a phylogenetic tree to generate these multiple migration hypotheses, they were unable to test them. Nor were they
able to gain insight into the evolutionary processes responsible for the Old World distribution of Y chromosomes. On the other hand, the present study uses population genetics models to: (1) test for statistically significant geographic structuring of Y haplotypes, (2)
identify the possible causes of significant associations
between haplotypes and geography, and (3) provide estimated dates for a number of human microevolutionary
events.
The major implications of our present analysis for
the structure and evolutionary history of the Y chromosome haplotype tree (figs. 3 and 7) are the following:
(1) haplotype 1 of Hammer et al. (1997) is now subdi-
vided into five (1A1E) haplotypes; (2) 1A is the ancestral haplotype and has an African origin; (3) 1B originated ;110,000 years ago and is the most frequent and
geographically widespread Y chromosome haplotype in
the world; (4) the dispersal of haplotype 1B from Africa
throughout the Old World marked the complete replacement of Eurasian Y chromosomes by African Y chromosomes; (5) 1E is the most recently evolved haplotype
in our tree; (6) both an AG transition and a GA
reversion occurred at nucleotide site SRY10831, with
more than 100,000 years separating these mutational
events; (7) the GA transition responsible for the polymorphism in the DYS257 STS occurred on the lineage
leading to haplotype 1C before the second mutation at
SRY10831; (8) DYS257 identifies a major Eurasian clade
containing haplotypes 1C and 1D; (9) haplotype 1C is
both older and more geographically widespread than
FIG. 7.Scaled coalescent tree for Y chromosome haplotypes showing ages of mutations estimated from the world set of 1,544 samples
when u 5 2.5 (timescale in units of 103 years).
437
FIG. 8.Mean TMRCA (T) and mutational ages for each of nine mutations when u varies from 1.0 to 4.0. The inset shows symbols used
for u values of 1.0, 2.0, 2.5, 3.0, and 4.0.
438
Hammer et al.
439
440
Hammer et al.
441