You are on page 1of 59

Review

Oxytricha as a modern analog of


ancient genome evolution
Aaron David Goldman and Laura F. Landweber
Department of Ecology and Evolutionary Biology, Princeton University, Guyot Hall, Princeton, NJ 08544, USA

Several independent lines of evidence suggest that the


modern genetic system was preceded by the RNA
world in which RNA genes encoded RNA catalysts.
Current gaps in our conceptual framework of early genetic systems make it difficult to imagine how a stable
RNA genome may have functioned and how the transition to a DNA genome could have taken place. Here we
use the single-celled ciliate, Oxytricha, as an analog to
some of the genetic and genomic traits that may have
been present in organisms before and during the establishment of a DNA genome. Oxytricha and its close
relatives have a unique genome architecture involving
two differentiated nuclei, one of which encodes the
genome on small, linear nanochromosomes. While its
unique genomic characteristics are relatively modern,
some physiological processes related to the genomes
and nuclei of Oxytricha may exemplify primitive states of
the developing genetic system.
Early genome evolution
The modern genetic system requires the synthesis and
functional orchestration of three distinct biopolymers:
DNA, RNA, and proteins. This complex system was likely
preceded by a stage in which RNA played a central role
both in information storage and as the only geneticallyencoded catalyst (Figure 1) [1,2]. The early prominence of
RNA is substantiated by its ability to store genetic information, as in mRNA, and to impart catalysis, as demonstrated by the abundance of catalytic RNAs present in
nature and produced in laboratories [3]. The primacy of
functional RNAs in the process of protein translation
(transfer and ribosomal RNAs and other functional RNAs
that modify them), coupled to the ubiquity of those RNAs
across all extant life, suggests that the translation system
emerged from an RNA-catalyzed metabolism [4]. The central role of nucleotide-derived cofactors (such as ATP,
NADH, and CoA) in metabolism is consistent with a scenario in which those functions were previously catalyzed
by ribozymes [5].
The catalytic range of RNA is limited and a ribozymebased metabolic system probably remained dependent on
the background chemistry from which it emerged. The
development of protein translation may have evolved as
a mechanism to bring this crucial chemistry under the
control of genetically-encoded enzymes [6]. Deoxyribonucleotides were probably unavailable until the evolution of
ribonucleotide reductase proteins [7], implying that the
Corresponding author: Goldman, A.D. (adg@princeton.edu).

382

development of the DNA genome was not even possible


until substantial evolution of protein enzymes had taken
place. By this point, the translation system seems to have
reached a moderate level of its modern sophistication and
the range of protein fold architectures encoded by early
genomes had significantly expanded [8].
The transition from an RNA genome to a DNA genome is
not well understood. Many protein fold architectures seem
to have evolved before this stage, because modern ribonucleotide reductase enzymes fall into three distinct classes
that share no noticeable similarity in amino acid sequence
but appear to be homologous when their active site amino
acids are compared in 3D structure alignments [9]. Although DNA-processing functions are similar across the
tree of life, no ancient core of enzymes can be detected by
sequence comparison (Figure 2) [10]. Six distinct families
of DNA polymerase are known, but only those with specific
functions related to excision repair have a universal taxonomic distribution [11]. Two distinct families of DNA
primase, one bacterial and one archaeal/eukaryotic, are
observed in modern life. A similar phylogenetic pattern is
observed in DNA ligases. This lack of a universal DNA
metabolism may imply that a complete protein-catalyzed
DNA-processing system was not present in the last universal common ancestor (LUCA) or that ancient non-orthologous gene displacements [12], in either the ancestor of
Bacteria or the ancestor of Archaea and Eukarya, erased
the phylogenetic evidence of most DNA processes in LUCA.
In lieu of the current inability to reconstruct early genome-related metabolism through bioinformatics, some
researchers have used features of modern biological systems
as analogs to traits of ancient organisms and their genomes.
For example, it has been argued that viruses provide an
ideal evolutionary platform to acquire a DNA genome in an
RNA world and to distribute this trait to cellular life [13]. A
similar approach compares the notion of early genomes to a
ciliate macronucleus in which genes are encoded on small
linear chromosomes [14]. Here, we expand the latter idea
and use the remarkable genetic system of the ciliate genus
Oxytricha [15] to improve our understanding of the early
transition from RNA to DNA genomes.
Oxytricha
Oxytricha is a genus of single-celled ciliated protists. They
are predatory, mitochondrion-bearing, free-living organisms that inhabit freshwater environments. Its lineage
diverged 1 Gya ago from the common ancestor of Tetrahymena and Paramecium [16]. Oxytricha spp, like most

0168-9525/$ see front matter 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2012.03.010 Trends in Genetics, August 2012, Vol. 28, No. 8

Review

Trends in Genetics August 2012, Vol. 28, No. 8

Prebiotic reaction networks


(a)

Functional RNAs

Informational RNAs
C..
AUGCGGCAUUAUGGGA

(b)
Functional RNAs
Informational RNAs

Metabolic
networks

C..
AUGCGGCAUUAUGGGA

Functional proteins
DNA genome
(c)

Functional RNAs
Metabolic
networks

Informational RNAs

C..
AUGCGGCAUUAUGGGA

Functional proteins
TRENDS in Genetics

Figure 1. The development of the modern genetic system from an RNA-dominated precursor genetic system. (a) The first genetic system probably involved informational
RNAs encoding ribozymes which facilitated the replication of those informational RNAs [1]. Given the narrow catalytic range of ribozymes, this system probably relied on
substantial networks of prebiotic chemistry to provide activated nucleotides [6]. (b) Protein synthesis by translation most likely arose from this RNA-based system [7] and
rapidly developed into a highly processive, high-fidelity system [8]. Appropriately, the translation system is dominated by functional RNAs, including the ribosome itself,
which has a ribozyme active site in its highly conserved core [57,58]. (c) The DNA genome probably arose from an RNAprotein precursor system. Deoxyribonucleotides
seem to have been unavailable until the evolution of the ribonucleotide reductase protein enzymes [7]. Unlike translation, DNA replication and processing are dominated by
protein functions rather than RNA functions, and core DNA-related functions do not appear to be universally conserved [10,11]. In the absence of significant bioinformatic
evidence, the transition from an RNA genome to a DNA genome remains enigmatic.

ciliates, have two types of nuclei, a micronucleus and a


macronucleus (reviewed in [17]). The macronucleus is
transcriptionally active during vegetative growth, whereas
the micronucleus is almost always transcriptionally silent.
However, only the micronucleus is exchanged during the
ciliate sexual cycle, after which a new macronucleus and
macronuclear genome are formed from micronuclear DNA.
Although these general traits are common throughout the
phylum Ciliophora, the architectures of the macronuclear
and micronuclear genomes, as well as the process of macronuclear development, differ among ciliate taxa.
The Oxytricha micronuclear genome contains approximately 1Gb of sequence, while the macronuclear genome
contains approximately 50Mb of sequence, representing a
95% reduction in genome content during development [16].

In addition, thousands of micronuclear genes are scrambled with respect to their macronuclear counterparts, with
segments of micronuclear genes present in a permuted or
inverted order relative to their order in the macronucleus
(Figure 3) (reviewed in [18]). Following sexual exchange of
haploid micronuclei, the macronuclear genome assembles
from dispersed segments of micronuclear DNA through a
process of genome rearrangement that is guided by macronuclear RNA templates (Figure 3) [19]. It is likely that
these RNA templates represent a transient cache of the
entire macronuclear genome during this developmental
stage.
The roles of RNA may surpass those of DNA in regulating the information in the genome of Oxytricha at three
levels. At the first level, RNA transcripts of complete
383

Review

Trends in Genetics August 2012, Vol. 28, No. 8

Ribonucleotide
reductase classes

II

III

DNA polymerase families

DNA
ligases

DNA
primases

ATP NAD

Pol Hel

Nanoarchaeum equitans
Pyrobaculum aerophilum
Aeropyrum pernix

Sulfolobus
Thermoplasma
Archaeoglobus fulgidus
Halobacterium sp. NRC-1

Methanosarcina
Pyrococcus
Methanobacterium thermoautotrophicum
Methanopyrus kandleri

Methanococcales
Giardia lamblia
Leishmania major
Thalassiosira pseudonana

Apicomplexa
Cyanidioschyzon merolae

Streptophyta
Disctyostelium discoideum
Schizosaccharomyces pomber

Saccharomycetaceae
Caenorhabditis

Diptera
Gnathostomata
Firmicutes
Chlamydiaceae
Fibrobacter succinogenes
Chlorobium tepidum

Bacteroidales
Actinobacteridae
Planctomycetaceae
Leptospira

Spirochaetaceae
Fusobacterium necleatum
Aquifex aeolicus
Thermotoga maritima

Cyanobacteria
Dehalococcoides ethenogenes

Deinococci
Acidobacteria
Desulfovibrio vulgaris
Geobacter sulfurreducens
Bdellovibrio bacteriovorus

Campylobacterales
Proteobacteria subclades
Alphaproteobacteria
TRENDS in Genetics

Figure 2. A phylogenetic distribution of key enzymes involved in DNA synthesis. Unlike the protein translation system, very few features of DNA synthesis and processing are
universally conserved. Ribonucleotide reductase is an enzyme required to produce deoxyribonucleotides from ribonucleotides. It is found in three distinct classes, I, II, and III,
although ancient homology between them can be inferred from structural and mechanistic similarity. Six distinct families of DNA polymerases are known. None of the four
standard DNA polymerase families (A, B, C, and D) has a universal taxonomic distribution. DNA polymerase families X and Y are universally distributed, but impart functions
that are related to excision repair rather than DNA replication. The DNA polymerase X family catalyzes non-template-dependent DNA synthesis, while the DNA polymerase Y
family polymerizes short segments across lesions. Bacteria use an ATP-dependent DNA ligase that is unrelated to the NADH-dependent DNA ligase used by Eukarya and
Archaea. Similarly, Bacteria use a helicase associated DNA primase, whereas Archaea and Eukarya use a DNA polymerase a-associated DNA primase. The lack of a universally
distributed set of enzymes involved in DNA synthesis suggests that modern pathways were still in the process of forming during the time of the last universal common ancestor
(LUCA). Alternatively, DNA-related pathways may simply be more evolutionarily malleable than, for example, translation pathways, and this property would obscure their
ancient phylogenetic signatures. The universal phylogenetic tree was previously generated in [59] and is based on 31 universal gene sequences from 191 genomes. The tree
image was produced using the Interactive Tree of Life web server [60]. Clades representing groups of 2540% similarity were collapsed to conserve space. Taxonomic
distribution of ribonucleotide reductase enzymes were identified from the RNR database [61]. Taxonomic distributions of DNA polymerase families, DNA ligases, and DNA
primases are extrapolated from [10,11], and do not represent a resolution capable of illustrating horizontal gene transfer. Ciliates are members of the Alveolata.

nanochromosomes from the previous generation can program the pattern of DNA rearrangements during macronuclear development [19]. The microinjection of synthetic
RNA molecules into Oxytricha cells can introduce an
384

alternative order of micronuclear DNA segments in the


resulting progeny [18,19]. These new DNA rearrangement
patterns can transfer to the sexual offspring of those
progeny and even their progenys progeny. Given that

Review
(a)

Trends in Genetics August 2012, Vol. 28, No. 8

(b)

Old macronucleus
1

dsDNA
1

ssRNA

Transcription
2

Micronuclear
meiosis

...

Micronuclear chromosomes
2

...

Macronuclear nanochromosomes
1

(d)

1
1

2
2

3
3

(c)

Developing
macronucleus
2

4
4

Developing macronucleus

...
...
...

2
2
2

4
4
4

1
1
1

3
3
3

...
...
...

New micronucleus

...

...

TRENDS in Genetics

Figure 3. A model for development of the Oxytricha macronuclear genome


following conjugation. During conjugation (center) the micronucleus undergoes
meiosis to produce four haploid nuclei, two of which exchange between partnering
cells to form a new diploid micronucleus. During this process, the old macronucleus
degrades and a new macronucleus differentiates from one copy of the new
micronucleus [22]. The outer panels depict the process of macronuclear genome
development by DNA rearrangement. (a) Conjugation triggers transcription of old
macronuclear chromosomes into RNA. (b) The old macronucleus becomes
dismantled, while the RNA transcripts of the chromosomes are retained and
transported to the developing macronucleus. (c) The micronucleus replicates by
mitosis and one micronucleus undergoes DNA amplification to produce material for
the macronuclear genome. (d) Segments of micronuclear DNA (numbered 14) are
reorganized using the macronuclear transcripts as a template for RNA-guided DNA
rearrangement (including inversion of segment 3). Red bars indicate telomeres at the
ends of nanochromosomes. Orange rectangles indicate deleted micronuclear DNA
that separates DNA segments retained in the macronucleus.

the micronuclear DNA remains unchanged, the inheritance of altered rearrangement patterns in Oxytricha
appears to be a transgenerational RNA-mediated epigenetic phenomenon.
At the second level, point substitutions can also transfer
from the RNA template to the macronuclear DNA [19],
particularly near regions where junctions form between
macronuclear segments. These point substitutions can also
transfer to the sexual progeny and their progenys progeny.
Given that the micronuclear DNA does not share these
point substitutions [19], this observation implicates a role
for RNA-templated DNA repair [20] in DNA rearrangement. These somatically acquired point mutations represent another level at which epigenetically-inherited RNA
molecules instruct the sequence and interpretation of the
DNA genome.
At the third level, the RNA macronuclear genome cache
also appears to be responsible for determining the copy
number of macronuclear chromosomes. Artificially increasing or decreasing the available levels of RNA chromosome templates by microinjection or RNAi, respectively,
leads to a relative increase or decrease in the copy number

of the corresponding DNA molecules in the next generation. This effect also lasts at least two sexual generations
[21], demonstrating a further example of RNA-mediated
transgenerational epigenetic inheritance in Oxytricha.
Apart from its unique sequence features, ciliate micronuclear genomes have a normal eukaryotic structure.
Their genome architecture is in the form of large chromosomes with telomeres and a centromere, and micronuclei
reproduce via mitosis during cell division. During the
sexual cycle the diploid genome undergoes meiosis to
produce haploid gametes, one of which is retained and
the other of which passes to the mating partner
(Figure 3) [22,23].
The macronucleus is very different. The macronuclear
genome contains on the order of 20 million small DNA
chromosomes, or nanochromosomes, most of which encode
a single protein-coding gene or functional RNA. In fact, the
lack of a centromere has led some to argue that the term
chromosome is inappropriate for macronuclear DNA [16].
The extraordinary number of DNA molecules in the macronucleus results from approximately 20 000 unique
nanochromosomes averaging roughly 1000 copies per macronucleus. Their average length is approximately 2.7 kb
[24]. These unusual properties of the Oxytricha macronuclear genome and macronucleus, and the powerful role of
RNA in sculpting these genomes, offer a compelling system
within which to consider possible transitions from simple
RNA genomes to complex DNA genomes.
Oxytricha and early genome replication
Small, single-gene chromosomes, such as those in the
Oxytricha macronucleus, represent one of the simplest
possible states of a genome and thus were probably predecessors to more complex genome architectures. A genome of small linear chromosomes would have presented
less of a challenge to primitive polymerases [14], which
probably copied nucleic acids with low fidelity and were
unable to process long sequences. The nature of these
primitive DNA polymerases is unknown. None of the four
families of standard DNA polymerases has a universal
distribution [11], although the sliding clamp function of
the DnaN polymerase in E. coli and the 50 30 exonuclease
function of the Pol1-A polymerase in E. coli appear to have
been present in LUCA [25,26]. Three subunits of DNAdependent RNA polymerases appear to be universal as
well [25]. Structural and functional comparisons of DNAdependent RNA polymerases suggest that they may share
a multi-subunit ancestor with proofreading capabilities
that was present in LUCA [27].
It is generally assumed that the RNA-only stage in the
development of the genetic system would have required an
RNA-dependent RNA polymerase ribozyme to have replicated the genome. Although no such enzyme has been
found in extant biology, several have been produced synthetically through laboratory evolution techniques [24,28
30]. So far, all of these ribozymes are over a hundred
nucleotides long and exhibit very tight constraints on
sequence space, making it difficult to imagine how similar
ribozymes could have evolved de novo in an RNA world
scenario. In addition, even the most capable of these
laboratory-generated polymerase ribozymes is not able
385

Review
to sustain the processivity required to replicate RNA
molecules of its own size or larger.
During the process of Oxytricha genome rearrangement,
segments of DNA from the micronuclear genome assemble
according to RNA templates of the macronuclear genome
(Figure 3). This process represents a unique scenario in
extant biology in which a complete copy of a genome is
produced, not by polymerizing a complementary strand
one nucleotide at a time, but by recycling DNA polymers
from a precursor genome. It is likely that these pieces of
micronuclear DNA ligate together after assembling on the
complementary RNA template, although there is also evidence that gaps or errors between the DNA segments are
repaired by the activity of an RNA-dependent DNA polymerase [19].
A similar mode of replication would have conferred
several benefits to early life and perhaps created a viable
selection regime in which polymerases with high fidelity
and processivity might have evolved. In contrast to ribozyme polymerases, several ribozyme ligases are present in
modern organisms [3] and more have been synthesized by
directed evolution [31,32]. Polymerases are in fact a specialized kind of ligase in which one of the ligated partners is
a single nucleotide [31]. It follows, then, that the central
challenge to a polymerase is not the catalytic step of
ligation, but the ability to perform that step repeatedly
over the full length of a gene-sized molecule, a limitation
that is borne out by the difficulty of producing a highly
processive ribozyme polymerase [24,29,30].
If early nanochromosomes replicated in an Oxytrichalike fashion, the number of catalytic ligation steps would be
much smaller than that in a complete polymerase-dependent replication. The source of these DNA segments in a
primitive system is not clear. Perhaps if the GC% was very
high or very low, the sequence complexity of the nanochromosomes would also be low, and short abiotically synthesized segments with random sequences [33,34] would
provide enough matches to the template to permit assembly of most of the genome from these small, modular pieces
[35]. The need to fill or repair small gaps between segments
would create a selective environment for the evolution of a
weakly processive polymerase into the ancestor of a modern, highly processive polymerase. This model of early
genome replication is consistent with the observation that
the only universally conserved DNA polymerase families
are involved in excision repair (Figure 2). Once a highfidelity, high-processivity polymerase became available,
genome replication could move towards its current polymerase-dependent form and longer chromosome lengths
would be possible.
Oxytricha and early cell division
In most Eukaryotes, cell division is orchestrated by the
complex process of mitosis, wherein duplicate chromosomes segregate evenly between the dividing cells. The
process is controlled by dynamic motor complexes that pull
chromosomes along organized microtubules [36,37]. Functionally analogous but non-homologous processes are
thought to take place in Bacteria [3840] and Archaea
[41,42]. It is difficult to imagine that such a complex system
was present in early life forms. Early cell division probably
386

Trends in Genetics August 2012, Vol. 28, No. 8

involved uncontrolled membrane division with chromosomes segregating at random.


Similarly, the Oxytricha macronucleus does not divide
by way of mitosis. The approximately 20 million nanochromosome molecules probably present an overwhelming
challenge to organized mitotic segregation. Although amitosis in Oxytricha is microtubule-dependent [43,44], these
microtubules appear to control membrane division rather
than chromosome segregation. Macronuclear nanochromosomes lack centromeres to which mitotic motors, or kinetochores, would normally attach [16,45]. As a result, the
segregation of DNA between macronuclei is unpredictable
and often uneven [16,46]. Amitotic division of macronuclei
seems to have arisen early in the ciliates [47] although
previous phylogenies have predicted three independent
origins of amitosis in ciliates, with one origin in the common lineage of the genera, Oxytricha and Euplotes [48].
It is possible that the high chromosome copy-numbers
observed in Oxytricha, and to an equal or lesser extent in
other ciliates, are related to the imprecise segregation of
chromosomes during amitotic division [49]. If a single
chromosome is duplicated and the two copies are allowed
to segregate randomly to one of the two daughter cells,
then the probability of losing that gene in one of the
daughter cells is 0.5. A greater number of chromosomes
will statistically ensure an approximately even segregation of the chromosomes between daughter cells. This
feature of amitosis in Oxytricha may be similar to the
division of primitive cells, which would have also benefited
from carrying chromosomes in high copy-numbers to safeguard against uneven segregation.
Oxytricha and early genome stability
A single common ancestor of all life is the most statistically
satisfying explanation for common traits observed in modern organisms [50]. This explanation, however, does not
distinguish between a single organism and a community
of organisms with highly pervasive lateral gene transfer
[14]. Even if we assume the former scenario, the complexity
of a single LUCA organism may have been generated in part
by lateral gene transfer within a heterogeneous population
of organisms [26]. If early genomes did indeed resemble
Oxytricha macronuclear genomes, then the nature of the
RNA-mediated gene transfer observed in Oxytricha [19,23]
may also help describe the sort of communal inheritance
that preceded the predominantly vertical inheritance of
modern organisms.
The nanochromosome structure of the macronuclear
genome and its regeneration through RNA-template-directed DNA unscrambling provide a form of lateral gene
transfer that differs from mechanisms described in any
other organisms. Unlike conventional conjugation in bacteria or sexual reproduction in eukaryotes, an RNA-driven
epigenetic mode of inheritance does not require the introduction of new genes, but instead new alleles can spread
via conversion of existing ones (through RNA-guided mechanisms). Allele frequencies can be increased or decreased
by the introduction of foreign nucleic acids, and these
acquired traits are passed on to subsequent generations.
These phenomena are similar to horizontal gene transfer, in that somatic DNA or RNA variants provide an

Review
external source of genetic variation. But the nanochromosome structure of the macronuclear genome and its capacity to receive new alleles during the process of DNA
arrangement make the Oxytricha macronucleus uniquely
permissive to somatically acquired genetic change. Nevertheless, an epigenetic system such as that of Oxytricha is
also robust to such perturbations because the high copynumber of original alleles will initially act as a buffer
against sequence change, restricting the spread of deleterious somatic alterations. Perhaps early genomes with
structures similar to the Oxytricha macronucleus would
also be permissive to genetic acquisitions, but stable
against their deleterious effects.
Oxytricha and early organismal identity
The genetic openness that existed during the transition to
modern life was probably also prone to invasions by selfish
replicators that may have easily infiltrated and taken
advantage of emerging organismal replicating systems
[51]. This effect is generally modeled through self-propagating metabolism-like networks, or hypercycles. These
replicating entities may be parasitic if they either receive
replication support from the host system without conferring a reciprocal benefit, or shortcut the host system in
some deleterious way. Vesicles can barricade replicating
systems against selfish entities if they provide a mechanism of blocking the entry of external replicators [52].
Selfish replicators can also be eventually incorporated into
the metabolism of the host system, balancing their deleterious effects with beneficial ones [53].
Although the dynamics of nuclear dimorphism in Oxytricha do not resemble a hypercycle, the scrambling of the
micronucleus and its rearrangement to form the macronuclear genome illustrate the properties of stable systems
that host selfish replicators. The unique genomic traits of
Oxytricha seem to be both caused by and assisted by an
invasion of DNA transposons (typically regarded as selfish
genetic agents). The micronucleus hosts thousands of
transposons, which probably contributed to the scrambling
of its genome, either through actual transposition or via
ectopic recombination between transposons of the same
family. Unlike domesticated transposases in other eukaryotes, micronuclear transposons display evidence of purifying selection acting on their encoded proteins [23,54] and
may still be active outside the control of the host cell. The
presence of active transposons in the micronucleus may
have provided the selective pressure for acquisition of a
template-directed genome unscrambling system as part of
macronuclear development [55] as a mechanism for promoting the long-term stability of the genome and robustness to perturbations.
Recent discoveries reveal that micronuclear transposons play a surprisingly direct role in both macronuclear
development and genome rearrangement [23]. Micronucleus-limited transposase genes are expressed during macronuclear development, but silent during vegetative growth.
The experimental silencing of these transposases by RNAi
results in aberrant unscrambling patterns in the macronuclear genome, suggesting that transposons play an active role in genome rearrangement. It is possible that
the nanochromosome templates are composed of RNA to

Trends in Genetics August 2012, Vol. 28, No. 8

protect the developing macronucleus from the integration


of active transposons. In this regard, Oxytricha seems to
have avoided the deleterious effects of internal transposon
activity through template-directed genome rearrangement
that, itself, employs the transposon proteins. Thus, the
properties of nuclear dimorphism and template-directed
macronuclear development in Oxytricha demonstrate the
principles of spatial separation and metabolic incorporation that are thought to make early replicating systems
resistant to selfish replicators.
Concluding remarks
Here, we have discussed the nuclear dimorphism and
genome structures of Oxytricha to demonstrate several
plausible dynamics of early genetic systems during the
transition to modern genomes. Oxytricha is not by any
means a living fossil, given that its phylum, Ciliophora, is
both eukaryotic and not particularly deep branching. However, by analogy we have used Oxytricha to introduce
several new hypotheses about early genomes. We invoke
the process of template-directed genome rearrangement in
Oxytricha to model an evolutionary landscape in which
protein polymerases could evolve gradually from ligases.
We have also observed that the dynamics of Oxytricha
amitotic macronuclear division suggest that unmanaged
cell division in early life could be viable if hereditary
molecules were present in high copy-numbers. Finally,
we employed observations of lateral gene transfer and
active transposon mediation in Oxytricha to improve our
understanding of the consequences of genome instability
for early life. Although the particular genomic traits that
we discuss are unique to Oxytricha and closely related
genera, we encourage the further exploration of extant
organisms, particularly those with atypical genetic systems [56], to help elucidate features of early cellular life.
Acknowledgments
We thank members of the Landweber laboratory for critical discussions of
this work. This work was supported by a National Aeronautics and Space
Administration Postdoctoral Program fellowship to A.D.G. and by
National Institutes of Health grant GM59708 and National Science
Foundation grant 0923810 to L.F.L.

References
1 Gilbert, W. (1986) The RNA world. Nature 319, 618
2 Gesteland, R. and Atkins, J.F., eds (1993) The RNA World, Cold
Spring Harbor Laboratory Press
3 Landweber, L. et al. (1998) Ribozyme engineering and early evolution.
Bioscience 48, 94103
4 Fox, G. (2010) Origin and evolution of the ribosome. Cold Spring Harb.
Perspect. Biol. 2, a003483
5 White, H. (1976) Coenzymes as fossils of an earlier metabolic state. J.
Mol. Evol. 7, 101104
6 Goldman, A. et al. (2012) Evolution of the protein repertoire. In
Encyclopedia of Molecular Cell Biology and Molecular Medicine
(Meyers, R.A., ed.), Wiley-VCH
7 Freeland, S. et al. (1999) Do proteins predate DNA? Science 286, 690692
8 Goldman, A. et al. (2010) The evolution and functional repertoire of
translation proteins following the origin of life. Biol. Direct 5, 15
9 Torrents, E. et al. (2002) Ribonucleotide reductases: divergent
evolution of an ancient enzyme. J. Mol. Evol. 55, 138152
10 Forterre, P. (2002) The origin of DNA genomes and DNA replication
proteins. Curr. Opin. Microbiol. 5, 525532
11 Filee, J. et al. (2002) Evolution of DNA polymerase families: evidences
for multiple gene exchange between cellular and viral proteins. J. Mol.
Evol. 54, 763773
387

Review
12 Koonin, E. (2003) Comparative genomics, minimal gene-sets and the
last universal common ancestor. Nat. Rev. Microbiol. 1, 127136
13 Forterre, P. (2006) Three RNA cells for ribosomal lineages and three
DNA viruses to replicate their genomes: a hypothesis for the origin of
cellular domain. Proc. Nat. Acad. Sci. U.S.A. 103, 36693674
14 Woese, C. (1998) The universal ancestor. Proc. Natl. Acad. Sci. U.S.A.
95, 68546859
15 Zoller, S. et al. (2012) Characterization and taxonomic validity of the
ciliate Oxytricha trifallax (Class Spirotrichea) based on multiple gene
sequences: limitations in identifying genera solely by morphology.
Protist DOI: 10.1016/j.protis.2011.12.006
16 Prescott, D. (1994) The DNA of ciliated protozoa. Microbiol. Rev. 58,
233267
17 Prescott, D. (2000) Genome gymnastics: unique modes of DNA
evolution and processing in ciliates. Nat. Rev. Genet. 1, 191198
18 Nowacki, M. et al. (2011) RNA-mediated epigenetic programming of
genome rearrangements. Annu. Rev. Genomics Hum. Genet. 12, 367
389
19 Nowacki, M. et al. (2008) RNA-mediated epigenetic programming of a
genome-rearrangement pathway. Nature 451, 153158
20 Storici, F. et al. (2007) RNA-templated DNA repair. Nature 447, 338
341
21 Nowacki, M. et al. (2010) RNA-mediated epigenetic regulation of DNA
copy number. Proc. Natl. Acad. Sci. U.S.A. 107, 2214022144
22 Nowacki, M. and Landweber, L.F. (2009) Epigenetic inheritance in
ciliates. Curr. Opin. Microbiol. 12, 638643
23 Nowacki, M. et al. (2009) A functional role for transposases in a large
eukaryotic genome. Science 324, 935938
24 Green, R. and Szostak, J.W. (1992) Selection of a ribozyme that
functions as a superior template in a self-copying reaction. Science
258, 19101915
25 Harris, J. et al. (2003) The genetic core of the universal ancestor.
Genome Res. 13, 407412
26 Becerra, A. et al. (2007) The very early stages of biological evolution and
the nature of the last common ancestor of the three major cell domains.
Annu. Rev. Ecol. Evol. Syst. 38, 361379
27 Poole, A. and Logan, D.T. (2005) Modern mRNA proofreading and
repair: clues that the last universal common ancestor possessed an
RNA genome? Mol. Biol. Evol. 22, 14441455
28 Doudna, J. et al. (1991) A multisubunit ribozyme that is a catalyst of
and template for complementary strand RNA synthesis. Science 251,
16051608
29 Johnston, W. et al. (2001) RNA-catalyzed RNA polymerization:
accurate and general RNA-templated primer extension. Science 292,
13191325
30 Wochner, A. et al. (2011) Ribozyme-catalyzed transcription of an active
ribozyme. Science 332, 209212
31 Bartel, D. and Szostak, J.W. (1993) Isolation of new ribozymes from a
large pool of random sequences. Science 261, 14111418
32 Landweber, L. and Pokrovskaya, I.D. (1999) Emergence of a dual
catalytic RNA with metal specific cleavage and ligase activities: the
spandrels of RNA evolution. Proc. Natl. Acad. Sci. U.S.A. 96, 173178
33 Huang, W. and Ferris, J.P. (2006) One-step, regioselective synthesis of
up to 50-mers of RNA oligomers by montmorillonite catalysis. J. Am.
Chem. Soc. 128, 89148919
34 Aldersley, M. et al. (2009) RNA synthesis by mineral catalysis. Orig.
Life Evol. Biosph. 39, 200
35 Kotler, L. et al. (1993) DNA sequencing: modular primers assembled
from a library of hexamers or pentamers. Proc. Natl. Acad. Sci. U.S.A.
90, 42414245

388

Trends in Genetics August 2012, Vol. 28, No. 8

36 Sharp, D. et al. (2000) Microtubule motors in mitosis. Nature 407, 41


47
37 Maiato, H. et al. (2004) The dynamic kinetochore-microtubule
interface. J. Cell Sci. 117, 54615477
38 Fogel, M. and Waldor, M.K. (2006) A dynamic, mitotic-like mechanism
for bacterial chromosome segregation. Genes Dev. 20, 32693282
39 Ptacin, J. et al. (2010) A spindle-like apparatus guides bacterial
chromosome segregation. Nat. Cell Biol. 12, 791798
40 Draper, G. and Gober, J.W. (2002) Bacterial chromosome segregation.
Annu. Rev. Microbiol. 56, 567597
41 Lundgren, M. and Bernander, R. (2007) Genome-wide transcription
map of an archaeal cell cycle. Proc. Natl. Acad. Sci. U.S.A. 104, 2939
2944
42 Cortez, D. et al. (2010) Evidence for a Xer/dif system for chromosome
resolution in Archaea. PLoS Genet. 6, e1001166
43 Tucker, B. et al. (1980) Microtubules and control of macronuclear
amitosis in Paramecium. J. Cell Sci. 44, 135151
44 Kushida, Y. et al. (2010) Amitosis requires gamma-tubulin-mediated
microtubule assembly in Tetrahymena thermophila. Cytoskeleton 68,
8996
45 Jung, S. et al. (2011) Exploiting Oxytricha trifallax nanochromosomes
to screen for non-coding RNA genes. Nucleic Acids Res. 39, 75297547
46 Witt, P. (1977) Unequal distribution of DNA in the macronuclear
division of the ciliate Euplotes eurystomus. Chromosoma 60, 5967
47 Katz, L. (2001) Evolution of nuclear dualism in ciliates: a reanalysis in
light of recent molecular data. Int. J. System. Evol. Microbiol. 51, 1587
1592
48 Orias, E. (1991) Evolution of amitosis of the ciliate macro-nucleus: gain
of the capacity to divide. J. Protozool. 38, 217221
49 Duerra, H. et al. (2004) Modeling senescence in hypotrichous ciliates.
Protist 155, 4552
50 Theobald, D. (2010) A formal test of the theory of universal common
ancestry. Nature 465, 219222
51 Smith, S. (2003) Nucleoprotein assemblies. Encycl. Nanosci. Nanotech.
X, 110
52 Eigen, M. et al. (1981) The origin of genetic information. Sci. Am. 244,
8892
53 Konnyu, B. et al. (2008) Prebiotic replicase evolution in a surface-bound
metabolic system: parasites as a source of adaptive evolution. BMC
Evol. Biol. 8, 267
54 Doak, T. et al. (1994) A proposed superfamily of transposase genes:
transposon-like elements in ciliated protozoa and a common D35E
motif. Proc. Natl. Acad. Sci. U.S.A. 91, 942946
55 Klobutcher, L. and Herrick, G. (1997) Developmental genome
reorganization in ciliated protozoa: the transposon link. Prog.
Nucleic Acid Res. Mol. Biol. 56, 162
56 Reyes-Prieto, F. et al. (2012) Coenzymes, viruses and the RNA world.
Biochimie DOI: 10.1016/j.biochi.2012.01.004
57 Cech, T. (2000) The ribosome is a ribozyme. Science 289, 878879
58 Hsiao, C. et al. (2009) Peeling the onion: ribosomes are ancient
molecular fossils. Mol. Biol. Evol. 26, 24152425
59 Ciccarelli, F.D. et al. (2006) Toward automatic reconstruction of a
highly resolved tree of life. Science 311, 12831287
60 Letunic, I. and Bork, P. (2011) Interactive Tree Of Life v2: online
annotation and display of phylogenetic trees made easy. Nucleic Acids
Res. 39, W475W478
61 Lundin, D. et al. (2009) RNRdb, a curated database of the universal
enzyme family ribonucleotide reductase, reveals a high level of
misannotation in sequences deposited to Genbank. BMC Genomics
10, 589

Review

Replication timing and its emergence


from stochastic processes
John Bechhoefer1 and Nicholas Rhind2
1

Department of Physics, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester,
MA 01605, USA

The temporal organization of DNA replication has puzzled cell biologists since before the mechanism of replication was understood. The realization that replication
timing correlates with important features, such as transcription, chromatin structure and genome evolution,
and is misregulated in cancer and aging has only deepened the fascination. Many ideas about replication timing have been proposed, but most have been short on
mechanistic detail. However, recent work has begun to
elucidate basic principles of replication timing. In particular, mathematical modeling of replication kinetics in
several systems has shown that the reproducible replication timing patterns seen in population studies can be
explained by stochastic origin firing at the single-cell
level. This work suggests that replication timing need
not be controlled by a hierarchical mechanism that
imposes replication timing from a central regulator,
but instead results from simple rules that affect individual origins.
Replication origins: correlated or independent?
The duplication of the genome of a cell by DNA replication
is an essential step in the cell cycle. In bacteria, the overall
situation is straightforward, in that DNA replication initiates at a single, well-defined location in the genome (e.g.
the oriC site in Escherichia coli) and terminates at a
second, well-defined region (ter in E. coli) [1]. Eukaryotic
organisms, with 101000 times more DNA and with 10
100 times slower replication forks, depend on the firing of
multiple origins of replication along the DNA. These origins are defined by a two-step process [2]. Licensing, the
first step, occurs in G1 phase, when the origin recognition
complex (ORC) binds to chromatin and, with the aid of
Cdc6 and Cdt1, loads onto the DNA head-to-head pairs of
the barrel-shaped heterohexameric MCM complex, the
catalytic core of the replicative helicase [3,4]. Each pair
of MCM complexes is a potential origin of DNA replication.
Initiation (or origin firing), the second step, occurs in S
phase, when a pair of MCMs is activated via a complex
process involving numerous proteins, including recruitment of Sld2, Sld3, the GINS complex and Cdc45, as well
as the phosphorylation of various components by the CDK
and DDK replication kinases [5]. The regulation of the
spatial binding of the ORC and the temporal activation
Corresponding author: Bechhoefer, J. (johnb@sfu.ca).
Keywords: DNA replication timing; stochastic models; replication initiation; ORC;
MCM

374

of MCMs largely determines the kinetics of replication


during S phase, which is referred to as the replication
program.
The question of how replication programs are regulated
is an active, and sometimes controversial, field. Although
the specific mechanisms that control timing are still obscure, recent work has revealed basic principles that appear to apply to eukaryotic replication in general. In
particular, mathematical modeling of genome-wide replication timing data shows that replication timing can be
explained by stochastic mechanisms. The significance of
this conclusion is that it explains the regulation of replication timing in terms of simple rules that affect the individual probabilities of origin firing. In such models, replication
timing is controlled by changing the firing rate of individual origins, instead of by directly regulating the time at
which origins fire. Although this distinction may seem
semantic, it is important because it recasts black-box
mechanisms of global replication timing in terms of biochemically plausible effects on individual origins.
Over the past decade, two views about replication timing mechanisms have been developed. In the first, origin
firing is a stochastic event that is (largely) independent of
the replication state of neighboring origins. In particular, it
has been postulated that there is an initiation function
I(x,t) that describes the rate of initiation, per time and per
length of unreplicated DNA, of a site x along the genome at
time t after the beginning of S phase [6,7] (Figure 1; Box 1).
This type of origin firing can manifest in at least three
different ways, depending on the experimental model considered. In species such as budding yeast, in which replication initiates at well-defined loci, the function I(x,t) forms
a discrete spike at the replication origin [8] (Figure 1a). At
the other end of the spectrum, amphibian embryos lack
origin specificity, and DNA replication can initiate anywhere along the genome [6]. In an intermediate case,
mammalian somatic cells can display clusters of origins
or broad initiation zones that are not homogeneously
distributed throughout the genome [911] (Figure 1b).
Each of these three cases is discussed in detail below.
We refer to the hypothesis of a locally determined initiation
rate as the independent origin hypothesis because it is
distinguished by the feature that origins fire independently from the firing of neighboring origins. The attraction of
the independent origin hypothesis is its simplicity: one
does not need to postulate biological mechanisms that
would cause correlated initiations. The potential weakness

0168-9525/$ see front matter 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tig.2012.03.011 Trends in Genetics, August 2012, Vol. 28, No. 8

Review

Replication frac.

(b)
f(x)

I(x)

f(t)

I(t)

Genome position

Genome position

Yeast

(c)

Initiation rate
# /kb /time

Replication frac.

Initiation rate
# /kb /time

(a)

Trends in Genetics August 2012, Vol. 28, No. 8

Metazoan

f(x)

f(x)

f(t)
0

I(x,t)

I(t)

I(x)

Genome position

Time

Time

Genome position

I(t)

f(t)

Time

Time

f(x,t)

I(x)

TRENDS in Genetics

Figure 1. Replication fractions and initiation rates. (a,b) The relation between
replication fractions f and initiation rates I, as illustrated for budding yeast. (a)
Spatially resolved data, averaged over an asynchronous cell population. (b) Time
course data, averaged over the genome. (c) Illustration of typical replication timing
data for budding yeast (left) and a metazoan organism (right). Top-left image
shows the replication fraction f(x,t), as it might be inferred from a microarray
timing experiment with several time points of data from synchronized cell
populations. Black represents low-replication levels and white represents highreplication levels. Averaging the replication fraction over the genome gives the
curve f(t), depicted to the left of the f(x,t) image, which goes from 0 to 1. Averaging
the replication fraction over time, as in an experiment on asynchronous cell
populations, gives the curve above the f(x,t) image. The bottom-right group shows
the inferred I(x,t) image, as well as the averaged curves I(t) and I(x). Note that, in
budding yeast, replication origins are well localized, as indicated by the spikes in
the function I(x). [When viewed or printed at low resolution, not all spikes in I(x,t)
may be visible.] The right-hand groups illustrate similar concepts for a typical
metazoan organism. The main difference is that origins are not well localized, so
that the function I(x) has broad features, representing zones where initiations are
more or less likely to occur.

of the independent origin hypothesis is that, if too simple, it


may fail to describe experiments accurately or that implausible coincidences of parameters may be required to fit
the data.
In a second scenario, the initiation of an origin, although still stochastic, is linked to the state of the genome
in its vicinity. For example, observations of origin clustering [1214] have led several authors to hypothesize that
the presence of a replication fork can increase the firing
rate of nearby origins, for instance, the next-in-line model
[15] and the domino-cascade model [16,17]. We refer to
this second scenario in general as the correlated origin
hypothesis.
Previously, there was considerable debate as to whether
replication was stochastic and whether origins are independent. At present, it is generally accepted that all models

of replication are stochastic at the level of molecular interactions. It is important to note that stochastic models do not
require that origins all fire with the same probability, nor is
stochastic firing incompatible with late firing origin [18].
However, there is evidence in some cases for correlation in
origin initiation activity. As a result, the current picture is
an intermediate one that mixes both stochastic elements
and mechanisms for correlations in origin initiation [15,19].
Still, differences remain concerning what is essential and
what is incidental in the above picture and what kind of
underlying mechanisms are likely to be important in controlling the replication program. In this review, we argue
that, for the simpler cases such as unicellular yeast and for
the embryonic cells of some multicellular animals, recent
experiments and modeling efforts have shown that much of
the available replication data may be understood in terms of
the simpler independent origin hypothesis and that correlations probably play a minor role in the replication program.
Replication in the somatic cells of metazoan organisms is
more complex, and we outline recent efforts in this area.
Replication in yeast
The past few years have marked a turning point in the
understanding of replication in yeast. First came a series
of high-resolution combing and microarray experiments
(Box 2). For example, high-resolution timing data of
synchronized populations of wild type and clb5D Saccharomyces cerevisiae show clear average timing patterns [20].
Their measurements, as mentioned in Box 2, amounted to
measurements of f(x,t), with spatial information resolved to
a few kilobases and temporal information resolved to 5 min.
At around the same time, DNA combing studies in budding
and fission yeast showed that initiations at the single-molecule scale are stochastic, with different sets of origins chosen
in each cell cycle [21,22]. Indeed, in budding yeast, it is now
clear that there are as many as 700 potential origin sites, of
which only approximately 200 are used in any given cycle.
In parallel work, the rate of origin firing in budding and
fission yeast was shown to be regulated by competition for
limiting activators, such as the Cdc45 initiation factor and
the DDK initiation kinase [2326]. Competition for limiting activators provides an explanation for why origin firing
is less efficient than might be possible. The stochastic
interaction between origins and diffusible activators also
provides a mechanism for stochastic firing of origins.
The stochasticity of individual origins turns out to be an
important effect. In contrast to earlier models, in which the
firing of specific origins was envisaged to be limited to
narrow windows of S phase, it is now clear that the width of
the firing-time distribution for an individual origin can be a
substantial fraction of S phase. Indeed, models that fail to
incorporate the width of the timing distribution fail to
reproduce many of the experimental details adequately
[27]. By contrast, stochastic models that take into account
the width of the firing-time distribution can successfully fit
the microarray data [8,28,29]. Several notable insights and
results come from these analyses: first, it is possible to
generate models with independent initiation scenarios
[initiation rate I(x,t) and constant fork velocity v] that lead
to good fits of the data. This result shows that the independent origin hypothesis suffices to explain microarray data
375

Review
Box 1. f and I: mathematical functions that describe
replication kinetics
DNA replication kinetics can be described using two related but
distinct mathematical functions: the replication fraction f and the
initiation rate I. The first, f, is a complete description of replication
kinetics and can be directly determined from experimental data (Box
2). The second, I, only describes the kinetics of origin initiation and
cannot be directly measured; it must be inferred from f. However, if
fork rates are assumed to be nearly constant, as is frequently done
in models of replication kinetics, then I is sufficient to completely
determine f. Both f and I can be defined for every spatial point (x) in
the genome and every time point (t) in S phase, to give f(x,t) and
I(x,t) (Figure 1c, main text).
It is often useful to consider the time-averaged functions, f(x) and
I(x) (Figure 1a, main text). f(x) can be thought of as the average
replication time of each point in the genome and is generally
measured on asynchronous populations of cells. It is closely related
to the median replication time trep at a site that is inferred from time
course data on synchronized cell populations. The peaks in f(x)
represent the origins, and taller peaks indicate origins that fire, on
average, earlier in S phase. I(x) represents the average initiation rate
of each point in the genome. In yeast, where origins are well
defined, I(x) = 0 for most of the genome and forms spikes over the
origins, with taller spikes reflecting a higher average probability of
origin firing (Figure 1a,c). In metazoans, origins appear to be more
diffuse, and thus so is I(x) (Figure 1c). It is important to realize that
the height of the peaks in I(x) (e.g. the average firing probability of
an origin) cannot be directly inferred from the height of the peaks in
f(x), because f(x) convolves both passive replication and active firing
of each origin; I(x) can only be extracted by mathematical modeling
of f(x).
It can also be useful to consider the spatially averaged functions,
f(t) and I(t) (Figure 1b, main text). The replication fraction f(t) is
generally sigmoidal, as cells go from unreplicated in G1 to
replicated in G2. The exact shape of the sigmoid depends on the
details of the replication program, such as of the distribution of
origins and the shape of I(t). As discussed in the main text, I(t) has
been proposed to generally increase for most of S phase and then
decline in late S phase.

on replication timing in yeast. Second, the intrinsic parameters characterizing each origin have values that are independent of their neighbors, again suggesting that the
initiation of each origin is an independent stochastic event
[8]. Studies in fission yeast have also led to the conclusion
that local initiation models suffice to explain the available
experimental data [30,31]. However, several biologically
different scenarios can lead to similar overall timing patterns [32], and more complicated mechanisms, such as
trans-acting regulators of origin activity and chromosome
structure, can affect origin timing [33,34]. Clearly, further
iterations of modeling and experiment will be needed to
come to a final picture.
Replication in embryos
Embryonic cells in metazoans represent an interesting
intermediate case of complexity. On the one hand, they
have the full amount of DNA of somatic cells. On the other
hand, they undergo a rapid, simplified cell cycle that is
largely transcriptionally silent, which removes one major
source of complication in the replication of somatic cells. In
vitro studies of Xenopus cell-free extracts have been especially detailed and fruitful [13,3537] and have led to
associated modeling efforts [6,7,19,38,39]. The replication
program in Xenopus embryos is relatively simple and much
faster than in somatic cells. In particular, there are no fixed
376

Trends in Genetics August 2012, Vol. 28, No. 8

origin sites, presumably because the lack of transcription


and more uniform chromatin structure allows the ORC to
load MCM anywhere in the genome [40,41]. Although
variation in initiation rates and, hence, replication timing
does occur at the megabase scale [37], modeling efforts to
date have focused on understanding the temporal variation of the initiation rate, I(t), averaged over the genome.
The main conclusion is that the initiation rate increases
over most of S phase, before decreasing to zero near the end
of it. This variation of initiation rates over S phase is
significant because it leads to a relatively narrow distribution of lengths for S phase, which, because of the stochasticity of origin placement and initiation time, varies with
each cell cycle [42]. In embryos, it is particularly important
that there be little variation in genome duplication time, as
the cell cycle lacks checkpoints that can delay the start of
mitosis if replication is not complete. In Xenopus embryos,
for example, the typical S phase duration is 20 min and
that of mitosis is 5 min, all within a 25-min cell cycle [43].
Thus, variations of S phase of more than 5 min can be
lethal for a cell. Such variations are proposed to be suppressed by the increasing nature of initiation rate I(t). It
has even been postulated that the increasing form of I(t) is
a universal characteristic of eukaryotic replication [44].
Preliminary assessment of replication data from S. cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens supports this scenario, although
better data and more extensive analysis are required. The
initial increase of initiation rate I(t) has been attributed to
competition for a limiting factor required for replication
fork function [35,38] or origin firing (e.g. the DDK replication kinase [23]), whereas the decrease of I(t) at the end of S
phase has been variously attributed to a fork-dependent
control mechanism [38] or to increasing diffusion search
times for the limiting factor to find its target [39].
Replication in metazoan somatic cells
The replication of DNA in metazoan germline and somatic
cells is more complicated than in embryonic cells. Replication in somatic cells can take up to 100 times longer than in
embryonic cells [45], and this increase in replication time is
not spread equally across the genome. Instead, different
regions of the genome replicate at characteristic times
during the elongated S phase, and the replication timing
of a locus correlates with several other important chromosomal characteristics. The best-established correlation is
between late replication and constitutive heterochromatin,
the repetitive, transcriptionally inactive regions of the
genome that remain condensed throughout the cell cycle
[46]. Conversely, gene-rich, transcriptionally active
regions of the genome tend to replicate earlier in S phase
[47]. The correlation between the transcriptional activity of
individual genes and their replication timing is not strong
[48]. However, when averaged over large groups of neighboring genes, transcriptional activity correlates well with
replication timing [49,50]. An even more remarkable correlation is seen between chromosome interaction maps and
replication timing [51,52]. The contiguous regions of the
genome that replicate with similar timing are referred to
as replication domains. The correlations between the average transcriptional activity, chromatin interactions and

Review
Box 2. Experimental techniques for analyzing DNA
replication timing
The recent gains in our understanding of replication timing are built
on experimental advances that have greatly increased the quality
and quantity of data available. Defined patterns of DNA replication
were first observed in fiber autoradiography studies of tritiated
thymidine incorporation in bacterial and mammalian cells [12,79].
By in vivo pulse-labeling cells with tritiated thymidine and then
stretching the labeled DNA on a photosensitive film, it was possible
to map replication patterns (which regions have replicated and
which have not) at a given time. A significant technical improvement
was the substitution of fluorescently labeled thymine analogs, such
as BrdU, that could be observed using an optical microscope [80,81].
Molecular combing, which stretches DNA more controllably,
improved the latter technique by allowing one to more reliably
associate positions on an image of a stretched fiber with genomic
positions and by simplifying the identification of individual fibers
taken anonymously from the genome [82,83] or with the genome
location identified [54,84]. In parallel with fiber-based techniques,
live-cell imaging has also yielded much valuable information.
Although the size of origins and even their separations are well
below the resolution of conventional light microscopy, clever
techniques can yield spatial and temporal information. For example,
specific sites can be labeled with fusion proteins whose intensity
doubles after replication, an event that can readily be observed [85].
In the future, live single-molecule studies based on flow and
optical or magnetic tweezers [86], nano-engineered capillaries
[87,88] and other molecular-scale structures may lead to even
greater insights, especially into local mechanisms at the fork and
initiation sites.
A second set of techniques provides information about the
fraction of cells in a population that has replicated at a particular
location x and time t. This fraction of replicated cells can be
described by the function f(x,t), if replication kinetics throughout S
phase are measured, or simply as f(x), if measurements are
performed on asynchronous cell populations (Figure 1, main text).
Such measurements originally used microarrays [89,90], with one
approach based on local changes in copy number during replication. In a population of unreplicated cells, a baseline intensity is
measured at each locus [f(x) = 0]. After all cells have replicated, the
measured intensity at each locus should be double [f(x) = 1]. During
replication, intermediate levels of replication are detected as
intermediate intensity levels [0<f(x)<1]. For example, if half of the
cells in the population have replicated at a location x, then f(x) = 0.5.
More recently, direct sequencing to determine local DNA copy
number has given similar information with fewer artifacts [91,92].
Initial studies used multiple time points in cultures of synchronized
cells to directly measure f(x,t) [89,93], and this approach is still the
state of the art in yeast [20,64]. However, comparable results can be
derived by sorting asynchronous cells of any type into G1 and S
populations [90].

the replication timing of replication domains has led to


qualitative models in which the chromosome accessibility
of a domain affects its replication timing [53].
Although replication domains replicate with reproducible timing, origin firing within domains is heterogeneous
because of stochastic origin firing [10,54,55]. As in yeast,
origin firing in metazoans appears to be regulated by
limiting activators. Mammalian Cdc45 is substoichiometric, relative to OCR and MCM, and increasing
Cdc45 levels increases the rate of origin firing [56]. Moreover, modulating the levels of the CDK replication kinase
affects the efficiency of origin firing [5759]. An additional
reason for the heterogeneity of origin firing in metazoans
is that metazoan origins are not well-defined loci; at least
in some cases, MCM seems to be loaded heterogeneously
throughout a region [6062], which can be thought of as a

Trends in Genetics August 2012, Vol. 28, No. 8

cluster of many inefficient origins or as a diffuse initiation


zone.
Mechanisms for timing
Although replication timing appears to be uniform and
well coordinated at the population level, this average
behavior hides heterogeneous replication kinetics in individual cells. This apparent conflict between heterogeneity
at the single-cell level and organization at the population
level is resolved by observing that the average of the
heterogeneous single-cell data recapitulates the results
from ensemble studies [22]. This observation has led to
models in which the average replication time of a locus is a
function of the firing probability of individual origins,
regardless of whether those probabilities are independent
or coordinated (Box 3). Such models predict a correlation
between the probability and timing of origin firing, a
correlation seen in budding yeast [8,28]. Furthermore,
recent budding yeast studies have shown that, in most
cases in which the length of S phase is significantly increased, the relative timing program is maintained [63,64];
that is, the overall ordering of replication timing of different regions is preserved, even as the scale of timing is
altered. Such a result would be expected if S phase length
changes because the initiation rates have been altered
globally (Naama Barkai, personal communication). As
discussed above, initiation rates are thought to be regulated by competition among origins for limiting activation
factors. One recently proposed model makes the case both
theoretically and experimentally that the limiting factor is a
protein associated with active replication forks [65]. The
Cdc45 protein, which is required to activate the MCM helicase complex, is one such candidate [56]. Alternatively,
factors such as DDK, which phosphorylates and activates
MCM, have been seen to be rate limiting in fission yeast [23].
The competition for limiting activators explains why
origins fire stochastically but not why some origins fire
with higher probability than others. One obvious explanation for differing probabilities of origin firing is the effect of
chromatin structure on the accessibility of origins to initiation factors [53]. In the context of competition between
origins for limiting activators, it is natural to imagine that
chromatin structure affects that competition, allowing
euchromatic origins greater access to activators and so
higher firing probabilities. This possibility fits well with
the strong correlation observed between heterochromatin
and late replication [46]. Another possibility that we have
recently proposed is based on the observation that multiple
MCMs are loaded at each origin [8,60]. In this model, each
MCM loaded has a low probability of firing; however,
because multiple MCMs are loaded at each origin, origins
that have more MCMs loaded will have a higher aggregate
firing probability. Thus, the probability of origin firing is
set in part by the number of MCMs loaded at a given origin
site. The probability of origin firing can then be subsequently altered by chromatin context. For example, a
recent study has shown that Rif1, which affects telomere
chromatin structure, also binds to chromosome arms and
alters origin initiation rates at these sites, perhaps by
altering the loading of the Cdc45 that is required for
MCM helicase activation [33].
377

Review
Box 3. Theoretical techniques for analyzing DNA replication
timing
Although determining the firing time of an origin would seem
straightforward, particularly for the relatively simple yeast genome,
the heterogeneous nature of origin firing and the passive replication
of origins by forks from neighboring origins mean that the
distribution of origin firing times cannot be directly inferred from
its average replication time [94]. Therefore, rigorous analysis of
replication timing patterns has relied on more sophisticated
analytical tools. One of the most straightforward and widespread
methods is computer simulation [6,27,28,30,38]. An advantage of
simulation is that, with modest computer resources (especially if
simulations keep track of only positions of forks and origins rather
than use a lattice for each point on the genome [95]), one can
recreate in silico not only the ideal experimental scenario envisaged,
but also any relevant experimental details. For example, it is
straightforward to include the effects of asynchrony in the cell
population, finite microscope resolution, labeling artifacts, and the
like [96]. Once the artifacts and the replication scenario are chosen
correctly, the simulation can reproduce, within statistical error, the
data from any given scenario.
The main disadvantage of simulations is that to analyze experimental data, one must first determine both the appropriate type of
replication scenario to simulate and ways to incorporate experimental details and then determine the appropriate parameters to
use. In situations in which origin firing is not uniformly distributed,
each origin will be characterized by several parameters, and so the
simulation may depend on hundreds or even thousands of
parameters, depending on the type of organism. Curve-fitting
techniques, which amount to a search in the space of parameters,
require simulating a large number of scenarios. Analytical models,
which can be used to directly calculate replication profiles instead of
needing to simulate replication step by step, are one way to get
around such obstacles. Analytical models may be evaluated faster
than simulations. The difficulties are that one must be able to
determine an appropriate model and be able to solve it. Thus,
beginning with [6], a variety of analytical models have been
proposed [8,39,42,94,97]. Because models based on independent
origins are simpler than ones that allow correlated initiations, most
of the above work has assumed such a scenario. Nonetheless, some
analysis of correlated initiations has been done, as discussed in the
main text.

A scenario comprising stochastically firing origins with


different firing probabilities naturally leads to a reproducible replication-timing program [66]. Origins with high
firing probabilities will be more likely to fire in early S
phase and so will have early average replication times. In
general, low-probability origins would be unlikely to fire
efficiently even in late S phase. However, if the firing rate,
I(t), increases during S phase, as described above, even lowprobability origins, if not passively replicated, will have a
high probability of firing late in S phase, leading to efficient
replication of late-replicating regions [18]. Here, we distinguish between I(t), which describes the timing program,
and the underlying biological mechanisms, which try to
explain why I(t) has an observed form. This description of
origin timing applies not only to the individual origins of
simpler genomes, such as budding yeast, but also to the
complicated replication domains of metazoan genomes. In
the latter case, euchromatic replication domains of highprobability origins reproducibly replicate earlier than do
domains of lower-probability origins, but heterochromatic
domains, which harbor the lowest-probability origins,
nonetheless replicate efficiently in late S phase. Thus,
the order in which various domains of metazoan genomes
replicate may be a secondary consequence of the effect of
378

Trends in Genetics August 2012, Vol. 28, No. 8

their chromatin structure on the firing probabilities of


their origins. This possibility is consistent with the strong
correlation between chromatin interactions and replication timing [52].
Correlated origin initiations
Although much of observed replication timing can be
explained in terms of a picture of independent initiations,
there is also evidence for correlations in initiation. For
example, DNA fiber studies observe clusters of nearby
origins that initiated at approximately the same time
[12,13]. One plausible mechanism for origin clustering is
that the polymerases and other proteins responsible for
replication are localized within the nucleus in small foci
known as replication factories [67,68]. As a consequence, if
the DNA is tethered to a location in the cell nucleus while
replicating, it may loop around and find another set of
replication machinery in the same factory. Such looping
could increase the likelihood of origin firing of origins
located approximately 10 kb from an active fork and decrease origin firing for closer origins [19].
Another line of argument suggesting the possibility of
correlated initiation lies in an observation of small biases
in the DNA base sequence near certain regions. It has been
shown that if a region of the genome is repeatedly replicated by a polymerase on the leading strand, mutations
will eventually lead to strand compositional asymmetries
(an excess of G over C and T over A) [69]. Indeed, a large
proportion of known origins for H. sapiens have been found
by looking for signatures of compositional skew [70]. Early
replicating regions are then marked by an abrupt jump in
the local skew. Because adjacent early replicating regions
are separated by approximately 1 Mb and because the
average distance between origins is approximately
100 kb, there must be multiple initiations between each
early region. To explain the observation that the compositional skew varies linearly between compositional discontinuities associated with origins, it was postulated that a
wave of correlated initiations occurs, which leads to a
domino [16,17,71] or next-in-line model [15]. It is not
clear whether a looping mechanism [19] can explain such
effects, whether some more complicated form of coupling
between initiation and fork progression is required, or
whether the difference in chromatin structure between
early- and late-replicating regions can account for these
observations. Such a possibility would avoid the need to
invoke coordinated origin firing. In support of this idea, a
recent single-molecule replication kinetics analysis of the
mouse Igh locus is consistent with a stochastic model that
lacks any origin coordination [11] (Paolo Norio, personal
communication).
In addition to temporal ordering of origin initiation,
some models include spatial correlations in the positioning
of origins. Recently, it was proposed that the clustering of
initiated origins observed in Xenopus embryos and, to a
lesser extent in yeast origins, may speed up the overall
completion of S phase [72]. A shorter S phase is particularly helpful in Xenopus embryos, as it prevents the mitotic
catastrophe discussed above. Clustering several inefficient
origins together can lead to a group that is collectively
efficient in that one or the other of the origins is likely to

Review

Trends in Genetics August 2012, Vol. 28, No. 8

fire early. Although the periodic distribution of such groups


of origins would be an efficient way to replicate the genome,
mechanisms that could achieve this global order are not
clear, at present.

Acknowledgments

Concluding remarks
The hypothesis that replication is largely controlled by the
local rate of initiation has received wide support from
recent experiments and analyses. Models based on local
replication rates I(x,t) have successfully described the
replication process in budding and fission yeast, in Xenopus embryos and in the Igh locus of mouse pro-B cells
[6,8,11,28,30,38] (Paolo Norio, personal communication). A
limiting factor in this work is that each of the above
analyses involved a long-term collaboration between experimental biologists and modeling laboratories (the latter
from a variety of fields, including physics, engineering and
computer science). To broaden the use of quantitative
analyses of replication and to analyze the growing number
of data sets, it is important that the software and analysis
procedures be usable by non-specialists. The recent derivation of inversion formulas (A. Baker, PhD thesis, ENS
de Lyon, France, 2011) that give I(x,t) directly from data on
the local average replication fraction f(x,t) obtainable from
microarray or deep sequencing studies on synchronized
cell populations are a first step in that direction.
A second research direction is a more precise understanding of the relation between the replication program,
as described above, and the effects of DNA damage, with its
concomitant activation of DNA repair mechanisms. For
example, one consequence of damage that stalls replication
forks is the activation of additional origins, which now have
more time to initiate [73,74], an effect that is straightforward to simulate [75] and model analytically [76]. The
modeling of fork stalls predicts that there is a critical
density of stalled forks (approximately one per replicon),
above which there is a global delay in S phase and below
which the effects are minor and localized. Interestingly,
this threshold density matches the observed stall densities
in fragile zones and in cells with activated oncogenes [76].
However, DNA damage can also induce checkpoints that
inhibit subsequent origin firing [77], complicating the
overall effect of DNA damage on replication timing. A
related topic is the interrelation between mutation rates
and events in S phase. Although formal models to handle
such situations are beginning to be developed [69], more
work is needed to understand observations, such as the
link between mutation rate and S phase timing [78].
Although the independent origin hypothesis is attractive in its simplicity and so far remarkably successful in its
application, there is evidence for correlated initiations in
somatic metazoan cells. Some of the correlation is explainable as straightforward consequences of the physical constraints of clustering polymerases. In such a view, the
primary method of controlling timing in S phase remains
the local modulation of overall initiation rates, and the
correlations in the initiation of neighboring origins are
produced by the geometrical effects of loops induced by
replication factories. Whether such mechanisms suffice or
whether a more complicated control mechanism is at play
is at present unclear. Time will tell.

References

JB has been supported by grants from NSERC (Canada) and the Human
Frontiers Science Program. NR has been supported by NIH grant
GM098815 and an American Cancer Society Research Scholar Grant.

1 Baker, T.A. and Wickner, S.H. (1992) Genetics and enzymology of DNA
replication in Escherichia coli. Annu. Rev. Genet. 26, 447477
2 Masai, H. et al. (2010) Eukaryotic chromosome DNA replication: where,
when, and how? Annu. Rev. Biochem. 79, 89130
3 Remus, D. et al. (2009) Concerted loading of Mcm2-7 double hexamers
around DNA during DNA replication origin licensing. Cell 139, 719730
4 Evrin, C. et al. (2009) A double-hexameric MCM2-7 complex is loaded
onto origin DNA during licensing of eukaryotic DNA replication. Proc.
Natl. Acad. Sci. U.S.A. 106, 2024020245
5 Labib, K. (2010) How do Cdc7 and cyclin-dependent kinases trigger the
initiation of chromosome replication in eukaryotic cells? Genes Dev. 24,
12081219
6 Herrick, J. et al. (2002) Kinetic model of DNA replication in eukaryotic
organisms. J. Mol. Biol. 320, 741750
7 Jun, S. and Bechhoefer, J. (2005) Nucleation and growth in one
dimension. II. Application to DNA replication kinetics. Phys. Rev. E
71, 011909
8 Yang, S.C. et al. (2010) Modeling genome-wide replication kinetics
reveals a mechanism for regulation of replication timing. Mol. Syst.
Biol. 6, 404
9 Hamlin, J.L. et al. (2008) A revisionist replicon model for higher
eukaryotic genomes. J. Cell. Biochem. 105, 321329
10 Norio, P. et al. (2005) Progressive activation of DNA replication
initiation in large domains of the immunoglobulin heavy chain locus
during B cell development. Mol. Cell 20, 575587
11 Gauthier, M.G. et al. (2012) Modeling inhomogeneous DNA replication
kinetics. PLoS ONE 7, e32053
12 Huberman, J.A. and Riggs, A.D. (1968) On the mechanism of DNA
replication in mammalian chromosomes. J. Mol. Biol. 32, 327341
13 Blow, J.J. et al. (2001) Replication origins in Xenopus egg extract are 5
15 kilobases apart and are activated in clusters that fire at different
times. J. Cell Biol. 152, 1525
14 Pasero, P. et al. (2002) Single-molecule analysis reveals clustering and
epigenetic regulation of replication origins at the yeast rDNA locus.
Genes Dev. 16, 24792484
15 Shaw, A. et al. (2010) S-phase progression in mammalian cells:
modelling the influence of nuclear organization. Chromosome Res.
18, 163178
16 Audit, B. et al. (2009) Open chromatin encoded in DNA sequence is the
signature of master replication origins in human cells. Nucleic Acids
Res. 37, 60646075
17 Guilbaud, G. et al. (2011) Evidence for sequential and increasing
activation of replication origins along replication timing gradients in
the human genome. PLoS Comput. Biol. 7, e1002322
18 Rhind, N. et al. (2010) Reconciling stochastic origin firing with defined
replication timing. Chromosome Res. 18, 3543
19 Jun, S. et al. (2004) Persistence length of chromatin determines origin
spacing in Xenopus early-embryo DNA replication: quantitative
comparisons between theory and experiment. Cell Cycle 3, 223229
20 McCune, H.J. et al. (2008) The temporal program of chromosome
replication: genomewide replication in clb5D Saccharomyces
cerevisiae. Genetics 180, 18331847
21 Patel, P.K. et al. (2006) DNA replication origins fire stochastically in
fission yeast. Mol. Biol. Cell 17, 308316
22 Czajkowsky, D.M. et al. (2008) DNA combing reveals intrinsic temporal
disorder in the replication of yeast chromosome VI. J. Mol. Biol. 375,
1219
23 Patel, P.K. et al. (2008) The Hsk1(Cdc7) replication kinase regulates
origin efficiency. Mol. Biol. Cell 19, 55505558
24 Mantiero, D. et al. (2011) Limiting replication initiation factors execute
the temporal programme of origin firing in budding yeast. EMBO J. 30,
48054814
25 Wu, P.Y. and Nurse, P. (2009) Establishing the program of origin firing
during S phase in fission yeast. Cell 136, 852864
26 Tanaka, S. et al. (2011) Origin association of sld3, sld7, and cdc45
proteins is a key step for determination of origin-firing timing. Curr.
Biol. 21, 20552063
379

Review
27 Spiesser, T.W. et al. (2009) A model for the spatiotemporal organization
of DNA replication in Saccharomyces cerevisiae. Mol. Genet. Genomics
282, 2535
28 de Moura, A.P. et al. (2010) Mathematical modelling of whole
chromosome replication. Nucleic Acids Res. 38, 56235633
29 Luo, H. et al. (2010) Genome-wide estimation of firing efficiencies of
origins of DNA replication from time-course copy number variation
data. BMC Bioinform. 11, 247
30 Lygeros, J. et al. (2008) Stochastic hybrid modeling of DNA replication
across a complete genome. Proc. Natl. Acad. Sci. U.S.A. 105, 12295
12300
31 Koutroumpas, K. and Lygeros, J. (2011) Modeling and analysis of DNA
replication. Automatica 47, 11561164
32 Raghuraman, M.K. and Brewer, B.J. (2010) Molecular analysis of the
replication program in unicellular model organisms. Chromosome Res.
18, 1934
33 Hayano, M. et al. (2011) Mrc1 marks early-firing origins and
coordinates timing and efficiency of initiation in fission yeast. Mol.
Cell. Biol. 31, 23802391
34 Knott, S.R. et al. (2012) Forkhead transcription factors establish origin
timing and long-range clustering in S. cerevisiae. Cell 148, 99111
35 Herrick, J. et al. (2000) Replication fork density increases during DNA
synthesis in X. laevis egg extracts. J. Mol. Biol. 300, 11331142
36 Lucas, I. et al. (2000) Mechanisms ensuring rapid and complete DNA
replication despite random initiation in Xenopus early embryos. J. Mol.
Biol. 296, 769786
37 Labit, H. et al. (2008) DNA replication timing is deterministic at the
level of chromosomal domains but stochastic at the level of replicons in
Xenopus egg extracts. Nucleic Acids Res. 36, 56235634
38 Goldar, A. et al. (2008) A dynamic stochastic model for DNA replication
initiation in early embryos. PLoS ONE 3, e2919
39 Gauthier, M.G. and Bechhoefer, J. (2009) Control of DNA replication by
anomalous reactiondiffusion kinetics. Phys. Rev.Lett. 102, 158104
40 Harland, R.M. and Laskey, R.A. (1980) Regulated replication of DNA
microinjected into eggs of Xenopus laevis. Cell 21, 761771
41 Hyrien, O. and Mechali, M. (1993) Chromosomal replication initiates
and terminates at random sequences but at regular intervals in the
ribosomal DNA of Xenopus early embryos. EMBO J. 12, 45114520
42 Yang, S.C. and Bechhoefer, J. (2008) How Xenopus laevis embryos
replicate reliably: investigating the random-completion problem. Phys.
Rev. E 78, 041917
43 Graham, C.F. (1966) The regulation of DNA synthesis and mitosis in
multinucleate frog eggs. J. Cell Sci. 1, 363374
44 Goldar, A. et al. (2009) Universal temporal profile of replication origin
activation in eukaryotes. PLoS ONE 4, e5899
45 Blumenthal, A.B. et al. (1974) The units of DNA replication in
Drosophila melanogaster chromosomes. Cold Spring Harb. Symp.
Quant. Biol. 38, 205223
46 Lima-de-Faria, A. and Jaworska, H. (1968) Late DNA synthesis in
heterochromatin. Nature 217, 138142
47 Gilbert, N. et al. (2004) Chromatin architecture of the human genome:
gene-rich domains are enriched in open chromatin fibers. Cell 118,
555566
48 Schwaiger, M. and Schubeler, D. (2006) A question of timing: emerging
links between transcription and replication. Curr. Opin. Genet. Dev. 16,
177183
49 MacAlpine, D.M. et al. (2004) Coordination of replication and
transcription along a Drosophila chromosome. Genes Dev. 18, 3094
3105
50 Hiratani, I. et al. (2009) Replication timing and transcriptional control:
beyond cause and effect: part II. Curr. Opin. Genet. Dev. 19, 142149
51 Lieberman-Aiden, E. et al. (2009) Comprehensive mapping of longrange interactions reveals folding principles of the human genome.
Science 326, 289293
52 Ryba, T. et al. (2010) Evolutionarily conserved replication timing
profiles predict long-range chromatin interactions and distinguish
closely related cell types. Genome Res. 20, 761770
53 Hayashi, M.T. and Masukata, H. (2011) Regulation of DNA replication
by chromatin structures: accessibility and recruitment. Chromosoma
120, 3946
54 Lebofsky, R. et al. (2006) DNA replication origin interference increases
the spacing between initiation events in human cells. Mol. Biol. Cell 17,
53375345
380

Trends in Genetics August 2012, Vol. 28, No. 8

55 Cayrou, C. et al. (2011) Genome-scale analysis of metazoan replication


origins reveals their organization in specific but flexible sites defined by
conserved features. Genome Res. 21, 14381449
56 Wong, P.G. et al. (2011) Cdc45 limits replicon usage from a low density
of preRCs in mammalian cells. PLoS ONE 6, e17533
57 Krasinska, L. et al. (2008) Cdk1 and Cdk2 activity levels determine the
efficiency of replication origin firing in Xenopus. EMBO J. 27, 758769
58 Katsuno, Y. et al. (2009) Cyclin A-Cdk1 regulates the origin firing
program in mammalian cells. Proc. Natl. Acad. Sci. U.S.A. 106, 3184
3189
59 Thomson, A.M. et al. (2010) Replication factory activation can be
decoupled from the replication timing program by modulating Cdk
levels. J. Cell Biol. 188, 209221
60 Edwards, M.C. et al. (2002) MCM2-7 complexes bind chromatin in a
distributed pattern surrounding the origin recognition complex in
Xenopus egg extracts. J. Biol. Chem. 277, 3304933057
61 Dijkwel, P.A. et al. (2002) Initiation sites are distributed at frequent
intervals in the Chinese hamster dihydrofolate reductase origin of
replication but are used with very different efficiencies. Mol. Cell.
Biol. 22, 30533065
62 Harvey, K.J. and Newport, J. (2003) CpG methylation of DNA restricts
prereplication complex assembly in Xenopus egg extracts. Mol. Cell.
Biol. 23, 67696779
63 Koren, A. et al. (2010) MRC1-dependent scaling of the budding yeast
DNA replication timing program. Genome Res. 20, 781790
64 Alvino, G.M. et al. (2007) Replication in hydroxyurea: its a matter of
time. Mol. Cell. Biol. 27, 63966406
65 Ma, E. et al. (2012) Do replication forks control late origin firing in
Saccharomyces cerevisiae? Nucleic Acids Res. 40, 20102019
66 Rhind, N. (2006) DNA replication timing: random thoughts about
origin firing. Nat. Cell Biol. 8, 13131316
67 Hozak, P. and Cook, P.R. (1994) Replication factories. Trends Cell Biol.
4, 4852
68 Baddeley, D. et al. (2010) Measurement of replication structures at the
nanometer scale using super-resolution light microscopy. Nucleic Acids
Res. 38, e8
69 Chen, C.L. et al. (2011) Replication-associated mutational asymmetry
in the human genome. Mol. Biol. Evol. 28, 23272337
70 Touchon, M. et al. (2005) Replication-associated strand asymmetries in
mammalian genomes: toward detection of replication origins. Proc.
Natl. Acad. Sci. U.S.A. 102, 98369841
71 Chagin, V.O. et al. (2010) Organization of DNA replication. Cold Spring
Harb. Perspect. Biol. 2, a000737
72 Karschau, J. et al. (2012) Optimal placement of origins for DNA
replication. Phys. Rev. Lett. 108, 058101
73 Ge, X.Q. et al. (2007) Dormant origins licensed by excess Mcm2-7 are
required for human cells to survive replicative stress. Genes Dev. 21,
33313341
74 Blow, J.J. et al. (2011) How dormant origins promote complete genome
replication. Trends Biochem. Sci. 36, 405414
75 Blow, J.J. and Ge, X.Q. (2009) A model for DNA replication showing
how dormant origins safeguard against replication fork failure. EMBO
Rep. 10, 406412
76 Gauthier, M.G. et al. (2010) Defects and DNA replication. Phys. Rev.
Lett. 104, 218104
77 Sancar, A. et al. (2004) Molecular mechanisms of mammalian DNA repair
and the DNA damage checkpoints. Annu. Rev. Biochem. 73, 3985
78 Herrick, J. (2011) Genetic variation and DNA replication timing, or
why is there late replicating DNA? Evolution 65, 30313047
79 Cairns, J. (1963) The bacterial chromosome and its manner of
replication as seen by autoradiography. J. Mol. Biol. 6, 208213
80 Gratzner, H.G. (1982) Monoclonal antibody to 5-bromo- and 5iododeoxyuridine: a new reagent for detection of DNA replication.
Science 218, 474475
81 Jackson, D.A. and Pombo, A. (1998) Replicon clusters are stable units of
chromosome structure: evidence that nuclear organization contributes
to the efficient activation and propagation of S phase in human cells. J.
Cell Biol. 140, 12851295
82 Bensimon, A. et al. (1994) Alignment and sensitive detection of DNA by
a moving interface. Science 265, 20962098
83 Michalet, X. et al. (1997) Dynamic molecular combing: stretching the
whole human genome for high-resolution studies. Science 277, 1518
2123

Review
84 Norio, P. and Schildkraut, C.L. (2001) Visualization of DNA
replication on individual EpsteinBarr virus episomes. Science 294,
23612364
85 Kitamura, E. et al. (2006) Live-cell imaging reveals replication of
individual replicons in eukaryotic replication factories. Cell 125,
12971308
86 van Oijen, A.M. and Loparo, J.J. (2010) Single-molecule studies of the
replisome. Annu. Rev. Biophys. 39, 429448
87 Riehn, R. et al. (2005) Restriction mapping in nanofluidic devices. Proc.
Natl. Acad. Sci. U.S.A. 102, 1001210016
88 Sidorova, J.M. et al. (2009) Microfluidic-assisted analysis of replicating
DNA molecules. Nat. Protoc. 4, 849861
89 Raghuraman, M.K. et al. (2001) Replication dynamics of the yeast
genome. Science 294, 115121
90 Woodfine, K. et al. (2004) Replication timing of the human genome.
Hum. Mol. Genet. 13, 191202

Trends in Genetics August 2012, Vol. 28, No. 8

91 Desprat, R. et al. (2009) Predictable dynamic program of timing of DNA


replication in human cells. Genome Res. 19, 22882299
92 Chen, C.L. et al. (2010) Impact of replication timing on non-CpG and CpG
substitution rates in mammalian genomes. Genome Res. 20, 447457
93 Yabuki, N. et al. (2002) Mapping of early firing origins on a replication
profile of budding yeast. Genes Cells 7, 781789
94 Retkute, R. et al. (2011) Dynamics of DNA replication in yeast. Phys.
Rev. Lett. 107, 068103
95 Jun, S. et al. (2005) Nucleation and growth in one dimension. I. The
generalized KolmogorovJohnsonMehlAvrami model. Phys. Rev. E
71, 011908
96 Yang, S.C. et al. (2009) Computational methods to study kinetics of
DNA replication. Methods Mol. Biol. 521, 555573
97 Brummer, A. et al. (2010) Mathematical modelling of DNA replication
reveals a trade-off between coherence of origin activation and
robustness against rereplication. PLoS Comput. Biol. 6, e1000783

381

Review

Human limb abnormalities caused by


disruption of hedgehog signaling
Eve Anderson, Silvia Peluso, Laura A. Lettice and Robert E. Hill
MRC Human Genetics Unit at the MRC Institute of Genetics and Molecular Medicine, University of Edinburgh,
Edinburgh, EH4 2XU, UK

Human hands and feet contain bones of a particular size


and shape arranged in a precise pattern. The secreted
factor sonic hedgehog (SHH) acts through the conserved hedgehog (Hh) signaling pathway to regulate
the digital pattern in the limbs of tetrapods (i.e. landbased vertebrates). Genetic analysis is now uncovering
a remarkable set of pathogenetic mutations that alter
the Hh pathway, thus compromising both digit number
and identity. Several of these are regulatory mutations
that have the surprising attribute of misdirecting expression of Hh ligands to ectopic sites in the developing
limb buds. In addition, other mutations affect a fundamental structural property of the embryonic cell that is
essential to Hh signaling. In this review, we focus on the
role that the Hh pathway plays in limb development, and
how the many human genetic defects in this pathway
are providing clues to the mechanisms that regulate
limb development.
Human limb abnormalities that affect digit number
Structural abnormalities of the hands and feet are frequent birth defects, several of which have known genetic
causes. These defects may affect just the limbs or may be
part of a complex syndrome affecting several organs. Mammalian limb-bud development is based on a highly conserved pentadactyl pattern for the digits in the hands and
feet [1], and deviation from five digits can be informative
for clinicians and developmental biologists. Too many
digits, or polydactyly (Glossary), is the most frequently
observed congenital hand malformation, with a prevalence
of approximately two per 1000 live births [2]. Depending on
the anatomical location of the extra digits, polydactyly is
classified as preaxial (on the side of the thumb and big toe)
or postaxial (the opposite side). The genetic contribution to
polydactyly was recently surveyed [3] and a remarkable
number of individual clinical classifications (80) that include polydactyly have been assigned to 99 different genes.
Mechanism that polarizes the limb
During development, digit number and identity is regulated by a mechanism that initially polarizes the limb bud and
then specifies digit identity and regulates growth. The
complementary expression of the transcription factors
GLI3, a zinc finger-containing DNA-binding protein, in
the anterior half and HAND2, a member of the basic
Corresponding author: Hill, R.E. (bob.hill@igmm.ed.ac.uk).
Keywords: limb development; sonic hedgehog; limb abnormalities; polydactyly; cilia.

364

helixloophelix family of DNA-binding proteins, in the


posterior half of the limb [4] are the first molecular indications that the early limb is polarized (Figure 1). This then
predisposes the posterior margin of the limb bud to express
the Shh gene, which is the crucial step in regulating spatial
variation along the anterioposterior (AP) axis of the early
limb bud. The Shh gene is expressed at the posterior
margin of the limb in a region that was defined in transplantation experiments during the 1960s as the zone of
polarizing activity (ZPA) [5]. These experiments showed
that chick embryonic limb tissue transplanted from the
posterior to the anterior limb-bud margin secreted a factor,
now known to be SHH, that induced the generation of extra
digits.
SHH acts via the Hh signaling pathway, which is remarkably conserved from flies to mammals [6]. Much of
what is known about the pathway initially came from
analysis in Drosophila, which has only a single Hh gene;
in mice, three homologs exist [desert hedgehog (Dhh),
Indian hedgehog (Ihh) and Shh] (Box 1 and Figure 2).

Glossary
Acheiropodia: an autosomal recessive disorder that results in severe truncations of the arms and legs, such that there is lack of the distal extremities.
Acrocapitofemoral dysplasia: a rare recessive condition characterized mainly
by short limbs, dwarfism and cone-shaped epiphyses at the joints, mainly in
the hands and hips.
Apical ectodermal ridge (AER): a specialized ectodermal structure that forms
along the distal edge of the limb bud and acts as a major signaling center
through the FGFs.
Brachydactyly: a condition that affects the length of the digits, making the
fingers and toes appear shorter.
Craniosynostosis Philadelphia type: craniosynostosis is a condition in which
one or more of the bony primordia of the infant skull prematurely ossifies, thus
changing the growth pattern of the skull. Philadelphia type has associated
syndactyly of the hands and feet.
Preaxial and postaxial polydactyly: polydactyly means additional digits and
pre-and postaxial refer to the side of the hand or foot that the extra digit
appears. Preaxial is the thumb and big toe side; whereas postaxial is the
opposite side.
Syndactyly: a condition in which two or more digits are fused together.
Syndromic: a syndromic condition is characterized by having several
recognizable clinical features that occur together and are associated for
diagnosis. A nonsyndromic condition has a single clinical feature.
Triphalangeal thumb: whereas each finger has three phalanges (the small
bones of the digits), the thumb only has two. In this condition, the thumb has
an extra phalanx and often has the appearance of a finger.
Zone of polarizing activity (ZPA): an area of mesenchymal cells located along
the posterior margin of the limb bud that produces SHH. SHH patterns the early
limb bud along the AP axis, specifying digit identity and the number of digits
that will form.
ZPA regulatory sequence (ZRS): an approximately 800-bp cis-regulatory
sequence that is necessary and sufficient for the limb specific expression of
the Shh gene.

0168-9525/$ see front matter 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tig.2012.03.012 Trends in Genetics, August 2012, Vol. 28, No. 8

Review

Trends in Genetics August 2012, Vol. 28, No. 8

(a)

(b)

AER (FGFs)
GLI3R

AER (FGFs)

SHH

AER

HAND2

SHH
ETV4/ETV5
ETS1/GABP

5 HOXD
HAND2

GLI3

(c)

GLI3A

TRENDS in Genetics

Figure 1. Expression of genes that polarize the limb and regulate sonic hedgehog (Shh) expression in the zone of polarizing activity (ZPA). The earliest limb bud (a) is
polarized by the expression of GLI3 in the anterior (A) and by HAND2 (which downregulates GLI3) in the posterior (P). The expression of the 50 Hoxd and Shh genes follow
Hand2 expression and Shh is upregulated by HAND2 in the ZPA. Once SHH is produced (b), it maintains the expression of Hand2 and the 50 Hoxd genes in a regulatory loop.
The gradient of GLI3A is shown below. (c) Distal production of ETV4/ETV5 and ETS1/GABPa in overlapping patterns. ETV4/ETV5 ensures that ectopic expression does not
occur in the wild-type limb, whereas ETS1/GABPa determines the position of the Shh expression boundary. Abbreviations: AER, apical ectodermal ridge; FGFs, fibroblast
growth factors 4, 8, 9 and 17.

SHH signaling regulates the proteolytic processing of


members of the GLI (after glioma) family of proteins,
one of which, GLI3, is of particular interest early in limb
development (Figure 1). GLI3 is expressed across the limb
bud; however, in the posterior of the limb bud, where SHH
concentrations are high, GLI3 is present in the full-length
activator form, Gli3A; by contrast, in the anterior, where
SHH is low or undetectable, GLI3 is proteolytically processed into the repressor form, GLI3R [7]. The relative
concentration of GLI3A:GLI3R across the developing limb
bud specifies the differences between the fingers. The most
distinctive digit, the thumb, develops from a region of the

Box 1. Conservation of the Hh pathway


The Hh gene and much of the Hh signaling pathway is conserved
from flies to mice [6] (Figure 2, main text). In Drosophila, one Hh
gene has been identified, whereas in mice three homologs exist:
Dhh, Ihh and Shh. In signaling cells, Hh is synthesized, cleaved and
lipid modified before being secreted [73]. In responding cells, Hh
binds to the Patched (Ptc) coreceptor, alleviating Ptc inhibition of the
seven-pass transmembrane protein Smoothened (Smo) and activating the downstream pathway [74].
In Drosophila, the transcriptional effecter of Hh signaling is called
Cubitus interruptus (Ci) and exists in two forms; a full-length activator
protein, and a truncated repressor protein generated by proteolytic
processing [75]. The processing of Ci is blocked by Hh, which also
serves to increase activity of the activator form [76]. In mammals,
members of the GLI protein family are homologs of Ci. There are three
members of the Gli family in vertebrates. Gli1 acts as a transcriptional
activator, whereas Gli2 and Gli3 exist in two forms: a full-length
activator form and a truncated transcriptional repressor [6].
Phosphorylation of Ci or Gli allows binding of the ubiquitin ligase
Slimb (Drosophila) or B-TrCP (mammals) and subsequent polyubiquitination and proteasome-mediated processing to their activator
forms [77]. Meanwhile, the activity of both CiA and GliA can be
inhibited by Suppressor of fused [SUFU (mouse) or Su(Fu) (flies)] [78].
The proteins Fused (Fu) and Costal 2 (Cos2) play an important role
in the Hh signaling in Drosophila [79,80]. Knockdown of Fu in mouse
cells does not disrupt Gli signaling. The Cos2 homologs in
mammals, Kif7 and Kif27, as well as Cos2 from Drosophila itself,
have been shown to regulate GLI in mammalian cells, suggesting a
conserved regulatory interaction [81].

limb bud that has the highest concentration of GLI3R and


no detectable SHH activity. In addition, SHH and GLI3
function together to constrain the number of digits produced, thus ensuring pentadactyly [8,9]. In mice, the absence of both SHH and GLI3 (Shh/;Gli3/), gives rise to
multiple, unspecified digits forming a polydactylous paw
with as many as six to 11 unspecified digits. This indicates
that limb buds have an intrinsic capacity to produce digit
primordia and that this process is unregulated in the
absence of both SHH and GLI3.
Several models have been produced to explain SHH
activity [10]; a recent model suggests that SHH [11,12]
integrates two different activities to regulate early limbbud development. SHH initially acts as a morphogen to
specify digit identity at the earliest stages of limb development. Subsequently, it exhibits mitogenic activity that
ensures the production of a sufficient number of cells to
promote the normal complement of digits. Together, these
two activities of SHH are responsible for specifying the
identity of each digit and, as the limb bud expands, the
position within the limb bud in which each forms. This is
observed as a progressive formation of the digits, such that
there is a stereotypical order in which each digit appears.
For example, in the mouse, digit 4 appears first in the limb
bud followed in order and rapid succession by digits 2, 5
and then 3 (digit 1 appears to be the last to form). If cellular
expansion in the limb bud is reduced by attenuating SHH
activity, the digits are lost in the reverse order, with digit 3
being the first to disappear.
Limb polarity and digit specification
Attempts to understand the genetic basis of preaxial polydactyly led to the identification of the cis-regulatory element responsible for controlling expression of Shh in the
posterior part of the limb [13]. This 750800-bp enhancer
sequence is both necessary and sufficient for regulating the
spatial and temporal activity of Shh, which in turn defines
the ZPA; therefore, it was called the ZPA regulatory sequence (ZRS) (Figure 3). The ZRS is highly conserved in all
365

Review

Trends in Genetics August 2012, Vol. 28, No. 8

(a)

Drosophila
No HH signaling

(b)

PTC
SMO
FU COS2
SUFU Ci
kinases
SLIMB
Proteasome

KIF7

SMO

KIF7

SUFU

SUFU
GRK2
GLI3

GLI3

CI-R
CI-R

SUFU

no transcription
of target
genes

PTCH1

GLI3

BTRCP

Drosophila
HH signaling
HH

PTC

B-ARRESTIN

PTCH1

KIF3A

H
SH

GLi3-A

GLI3
kinases
Proteasome

SMO

SMO
GLI3-R
FU

kinases
COS2

SUFU

CI-A

Ci

GLi3-R

transcription
of target
genes

no transcription
of target
genes

Vertebrate
No SHH signaling

GLi3-A

transcription
of target
genes

Vertebrate
SHH signaling
TRENDS in Genetics

Figure 2. Conservation of the Hh signaling pathway. (a) Schematic representation of key components of the Drosophila HH signaling pathway in the absence (top) or
presence (bottom) of HH. In the absence of ligand, Patched (PTC) inhibits Smoothened (SMO), which is held in intracellular vesicles (yellow ovals). A complex of proteins,
including cubitus interruptus (CI), Costal2 (COS2) and several kinases, is established. Phosphorylation of CI establishes recognition signals for SLIMB leading to partial
degradation of CI by the proteasome and formation of the repressor form (CI-R). CI-R then translocates to the nucleus, where it represses transcription of HH targets.
Binding of secreted HH to PTC, blocks PTC activity and releases SMO from inhibition. SMO moves to the plasma membrane, where phosphorylation allows interaction with
COS2. Subsequent phosphorylation of COS2 by FU leads to release of unphosphorylated, full-length CI, which can translocate to the nucleus where it promotes
transcriptional activation. (b) Schematic representation of key conserved components of the vertebrate HH signaling pathway. The cilium (which is absent in Drosophila) is
represented by the central axoneme and the centrosome and basal bodies (gray). In the absence of SHH ligand, PTCH1 inhibits SMO, which is held in intracellular vesicles.
GLI3 is kept in the primary cilium in a complex with KIF7 and SUFU. Phosphorylation of GLI3 by kinases allows its recognition by b-TRCP and leads to partial degradation by
the proteasome, resulting in the formation of the repressor molecule. Activation by SHH relieves the inhibition of SMO by PTCH1. SMO becomes phosphorylated by GRK2,
binds to b-ARRESTIN and KIF3A, and is trafficked to the cilium. This relieves the inhibitory effect of SUFU and allows the full-length GLI3 to translocate to the nucleus and
activate target genes. Homologous genes in Drosophila and vertebrates are colored similarly.

vertebrates with opposing appendages, including fish. In


addition, the ZRS is located inside an intron of the limb
region 1 homolog (Lmbr1) gene, which has no known role in
limb development and operates over a long distance to
activate the Shh promoter (800 kb1 Mb away) in mice and
humans.
It is still an open question how the ZRS directs Shh
expression to the ZPA. It is known that the expression of
Shh depends on the initial establishment of AP polarity,
and targeted mutations of the Hand2 gene show a role for
this gene in the early determinative process that functions
upstream of Shh expression [14]. However, a low basal level
of Shh expression is established in the absence of HAND2
[14]. Therefore, initiation of Shh expression may rely on
additional signals, and one of these may emanate from a
specialized ectoderm that resides at the proximal border of
the apical ectodermal ridge (AER; Figure 1), operating
through the T box-containing transcription factor TBX2
[15]. As the limb bud emerges, the Hoxd gene cluster is
activated and becomes confined to the posterior mesenchyme. Genetic analysis has shown that the 50 HOXD factors
366

are essential for activation of Shh expression in the ZPA [16]


and, in agreement with this, the 50 HOXD proteins (specifically HOXD10 and 13) may bind to the ZRS [17]. In addition,
the regulatory function of HAND2 is mediated by direct
binding near or at the ZRS and may bind as an activating
protein complex with HOXD13 (and other 50 HOXD proteins)
[14]. Given that the ZRS is located a long distance from its
target gene, it is crucial to convert transcription factor
binding to the ZRS into expression of Shh. In accord, chromosome architecture changes specifically in the expressing
cells within the limb bud and two events are observed to
occur. First, a chromosomal looping mechanism brings the
ZRS close to the Shh promoter where they interact. Second,
the Shh locus moves out of its chromosomal territory; further genetic analysis suggests that this is the event that
relates directly to Shh activation [18].
Once Shh expression is initiated, members of the ETS
transcription factor family act to establish the boundary of
expression (Figure 1). Five ETS binding sites have been
identified in the ZRS (Figure 3), at which two ETS factors,
ETS1 and GABPa, were shown to bind [19]. Occupancy by

Review

Trends in Genetics August 2012, Vol. 28, No. 8

G
G
T

G
G

463T
475A
477A

555G

621C

739A
743T

T
T
A,C
G
A
396C
402C
404G
406A
407T

769T

C
G

C
A
T
295T
297G
305A

329T
334T

C
A

105C

ZRS point mutations

252G
258G

Werner mesomelic
syndrome
G

(a)

ETS sites / ETV sites

Wild-type Shh locus


Lmbr1

Rnf32

ZRS

Shh
800kb

ZRS duplication

(b)
Chromosomal
breakpoint
Hs chromosome 7

q22.1

q36.3

Wild-type Shh locus


Limb
enhancer

Shh

Rnf32

ZRS

Lmbr1

Rnf32

ZRS

Lmbr1

Inverted Shh locus


Limb
enhancer

Shh

TRENDS in Genetics

Figure 3. Mutations and chromosomal lesions in the Shh locus responsible for limb abnormalities. (a) Shh gene and the upstream regulatory region (including enhancers
shown as pink boxes). The ZRS cis-activator is shown as a gray box inside the Lmbr1 gene, enlarged above to show the number and position of the point mutations that
cause preaxial polydactyly. Some of the other Shh enhancers are shown in pink. The position [30] of the human (black), mouse (red), cat (green) and chicken (gray)
mutations are shown above the enlarged ZRS. The position of the ETS (ETS1/GABPa) (green ovals) and ETV (ETV4/ETV5) (blue ovals) binding sites identified by biochemical
and chromatin immunoprecipitation methods [17] are shown below the ZRS. The position of the Werner mesomelic syndrome mutations [29] are highlighted in blue. Below
the wild-type Shh locus is a representative intrachromosomal duplication that results in triphalangeal thumb-polysyndactyly (TPTPS). These duplications can be of various
sizes, such that the duplicated ZRSs can reside at various distances from the other. (b) Approximate position of the breakpoints of the intrachromosomal inversion on
human chromosome 7. Below the chromosome is a representation of the position of the Shh gene before and after the inversion, showing that Shh is now regulated by
another limb enhancer, a process called (enhancer adoption) [41].

these factors at multiple sites within the ZRS is required to


set the appropriate boundary of expression in the posterior
mesenchyme. Two other ETS factors, the closely related
ETV4 and ETV5, act redundantly to oppose ETS1/GABPa
activation. Limb buds that are deficient in both ETV4 and
ETV5 ectopically express Shh in a domain in mesenchyme
at the anterior margin of the limb [20,21], indicating that

their normal role is to restrict Shh expression to the


posterior region of the limb. Although ETV4/5 bind to
two sites within the ZRS, the binding at one of these sites
is sufficient to regulate Shh negatively in the anterior
domain [19]. In addition to this regulatory role, ETV4
and ETV5 were shown to modulate the activity of two
other transcription factors, TWIST1 and HAND2, which
367

Review

Trends in Genetics August 2012, Vol. 28, No. 8

different clinical classifications that show a robust genotypephenotype correlation but comprise an overlapping
spectrum of digit abnormalities. These are preaxial polydactyly type II (PPD2, MIM# 174500), which includes
isolated triphalangeal thumb, triphalangeal thumbpolysyndactyly syndrome (TPTPS), syndactyly type IV
(SD4, MIM# 186200) and Werner mesomelic syndrome
(WMS) [13,2532] (Figure 4). It has been suggested that
this group of limb defects should be collectively referred to
as ZRS-associated syndromes [29].
PPD2 is characterized by a triphalangeal thumb
(Figure 4) sometimes leading to the appearance of a fivefingered hand and, in some cases, may be accompanied by
additional digits. Fifteen single-point mutations in the
human ZRS have been identified that are associated with
this limb abnormality (Figure 3). Extra toes have also been
frequently observed in other species, including mice
[13,33,34], cats [35] and chickens [3638], and these abnormalities are associated with seven more point mutations in the ZRS. A mutation in polydactylous dogs was

regulate Shh expression. A fine balance is proposed for


TWIST1, an inhibitor of Shh expression, in the anterior of
the limb and the positive regulator HAND2. An ETV4/5
TWIST1 complex is important in promoting the TWIST1
inhibitory activity in the ectopic domain, perhaps by inhibiting dimerization of TWIST1HAND2 [22], which acts as
an activator.
Finally, fibroblast growth factor (FGF) signaling is central to Shh expression, both as a positive and a negative
regulator (Figure 1). The FGFs 4, 8, 9 and 17 are expressed
in the AER [23] (Figure 1) and mediate limb bud outgrowth
and maintenance of Shh expression in the ZPA. In addition, FGFs regulate the production of ETV4 and ETV5 and
so are responsible for repression of Shh expression ectopically at the anterior margin [20,21,23].
Limb deformities due to aberrant Shh expression
Regulatory mutations in the ZRS cause misexpression of
Shh and are associated with limb malformations [24]. The
limb defects that result from a mutant ZRS fall into
(a)

4
5

Isolated
triphangial thumb

4 5

Normal hand

(b)

Preaxial
polydactyly type2

Postaxial
polydactyly typeA

Triphalangial thumb
polysyndactyly

(c)

TRENDS in Genetics

Figure 4. Representative phenotypes for each of the limb abnormalities caused by misexpression of the Shh gene. (a) Types of digit abnormality of the hands caused by
misexpression of the Shh gene. Bones are represented along the top, and each digit is numbered, the triphalangeal thumb is labeled T and digits that cannot be accurately
identified are labeled with an asterisk. Below are pictures of hands of patients with the various disorders [26,28,87,88]. Werner mesomelic syndrome [29] in (b) is associated
with short-limb dwarfism. The X-ray shows the tibial hypoplasia in the right leg (the white arrow indicates the end of the femur). A patient with acheiropodia [44] is shown in
(c) exhibiting the severe limb truncations that characterize this abnormality.

368

Review
found in a conserved domain upstream of the ZRS, called
the pre-ZRS; however, it is not clear how this domain
regulates Shh expression [39]. Much of the understanding
of the molecular mechanism underlying PPD2 comes from
studies in mice. A single nucleotide change in the sequence
of the ZRS is sufficient to generate ectopic production of
Shh such that it is anomalously expressed at the opposite,
anterior margin of the limb bud [30,35,40]. Ectopic Shh
expression presumably produces an additional ZPA and,
consequently, affects the GLI3R:GLI3A ratio, leading to
respecification of the developing anterior digits. The phenotypic outcome is seen in some cases as the transformation of the thumb to a fifth finger, often accompanied by the
production of additional digits.
Mechanisms that give rise to anomalous expression of
Shh are being investigated. The two ETS factors that
regulate the SHH expression boundary play a central role
in generating polydactyly in two different families [19]. In
these families, ZRS point mutations were shown to give
rise to new, additional ETS1/GABPa binding sites, leading
to the upregulation of the ZRS in the posterior margin of
the limb bud, setting a wider boundary of expression and
causing ectopic expression at the anterior margin. Because
both Ets1 and Gabpa genes are expressed at the anterior
margin (in mice) and the ZRS is primed for expression in
this ectopic region [18], the additional binding sites are
sufficient to override the inhibition of Shh expression and
cause ectopic expression. Another point mutation that
changes transcription factor binding to the ZRS was
reported for a polydactylous mouse designated DZ [34].
In this case, the point mutation introduced a higher affinity
binding site in the ZRS recognized by the nuclear factor
HnRNP U, which was postulated to mediate the interaction between the cis-regulator and the 50 end of the Shh
gene.
WMS (Figure 4) is an autosomal dominant disorder with
preaxial polydactyly of the hands and feet that also shows
the additional, distinctive characteristic of associated
dwarfism [13,29]. This condition appears to be at the severe
end of the phenotypic spectrum of ZRS mutations. The
short stature is the result of tibial hypoplasia (i.e. very
small or absent tibia). The molecular basis for this disorder
is also a point mutation, but at a specific position, nucleotide 404 (either a G>A or G>C change) (Figure 3), of the
ZRS. Again, this mutation is likely to have an effect on
transcription factor binding that is causative of the phenotype. Analysis of ZRS activity carrying the G>A mutation
by mouse transgenesis suggests that expression in the
ectopic domain occurs at a high level and extends broadly
along the anterior limb-bud margin [35]. This level of
ectopic SHH production may disrupt specification of the
tibia and affect chondrogenesis.
Recently, the genetic basis of a severe form of polysyndactyly (extra digits with fusions of digits, particularly of
the hands) was reported. Haas type (syndactyly type IV)
polysyndactyly and TPTPS [2729] (Figure 4) show a
consistent association with intrachromosomal duplications involving the genomic region that contains the
ZRS, leading to a tandem duplication of the ZRS (or
triplication in one patient) (Figure 3). The molecular
mechanism that gives rise to this limb phenotype is not

Trends in Genetics August 2012, Vol. 28, No. 8

known; however, it is reasonable to speculate that ectopic


expression of SHH in the anterior margin of the limb bud is
responsible for the polydactyly. The role that SHH expression plays in the syndactyly phenotype in patients with
either Haas-type polysyndactyly or TPTPS is less clear;
however, an isolated case of a patient with a distinct form
of syndactyly was recently reported that may shed some
light on this process. This patient had fusions of all fingers
and toes along the entire length of each digit, which was
shown to involve misregulation of the SHH gene in the
limb but did not involve the ZRS [41]. Chromosomal
analysis revealed that this patient had an intrachromosomal inversion (Figure 3) with one breakpoint upstream
of the SHH gene such that it ended up under the influence
of a different enhancer at the other end of the breakpoint,
freeing it from the influence of the ZRS and other regulators. This event was termed enhancer adoption. In mouse
transgenics, this new enhancer was shown to drive expression broadly in the limb, extending to later developmental stages and persisting in the interdigital
mesenchyme. Further transgenic studies showed that,
by placing this enhancer upstream of the mouse Shh gene,
expression was directed to the interdigital cells at a later
than normal stage in development. Syndactyly is probably
the result of the rescue of the interdigital tissue from cell
death due to this abnormal expression of Shh. This also
suggests that the ZRS duplications that cause TPTPS
similarly affect the temporal expression of SHH in the
interdigital regions of the limb.
Another chromosomal inversion with a breakpoint between Shh and the ZRS was previously reported in mice for
the Dsh (short digits) mutation [42] Shh is ectopically
activated in the cartilage of early digit primordia of the
Dsh heterozygous embryo, representing another example
of spatial and temporal misregulation due to a chromosomal rearrangement. However, in this case, it was postulated that the misexpression of Shh was the result of the
removal of a repressor that enabled additional expression
to occur in the early developing digits.
One other potential regulatory mutation of the SHH
gene was uncovered in the genetic analysis of a condition
called acheiropodia (MIM# 200500), a rare, recessive condition in which the hands and feet are lacking. This
phenotype is similar to that seen in mice lacking ZRS
activity [43]. However, the genetic lesion reported for these
families [44] is a 46 kb deletion upstream (approximately
30 kb) of the ZRS. These genetic data suggest that a second,
limb-specific regulatory component exists within the deleted DNA, the role of which may be to modify ZRS activity.
Taken together, these examples illustrate several different mechanisms that can alter the regulation of Shh,
with a significant impact on the developing embryo. These
range from simple nucleotide changes in the ZRS that
cause ectopic expression to duplications that, more surprisingly, appear to affect both spatial and temporal expression. These mutational mechanisms only appear to
affect Shh limb expression, as there is no evidence that the
expression is misdirected outside the limb bud. Finally,
acheiropodia deletions appear to result in a lack of Shh
limb expression due to removal of an element that modifies
ZRS activity.
369

Review
Ihh misregulation also causes limb abnormalities
Another Hh signaling factor, Ihh, is expressed in the
cartilage of the developing long bones in the limb. Here,
Ihh is expressed within the growth plate, where it is
responsible for regulation of chondrocyte proliferation
and differentiation [45]. Ihh is not expressed at early
limb-bud stages when Shh is expressed in the posterior
mesenchyme, suggesting that Ihh has a role distinct from
Shh. Despite this, IHH and Shh operate along similar
signaling pathways, including regulation of the conserved
target GLI [46].
In humans, loss-of-function mutations in the IHH gene
result in the autosomal recessive condition acrocapitofemoral dysplasia (MIM# 607778) [47], while gain-offunction mutations of IHH result in brachydactyly type
A1 (MIM# 112500) [48,49]. Evidence suggests that the Ihh
gene has a similar regulatory landscape as Shh and that
Ihh is also under long-range regulatory control. This has
been highlighted through analysis of three families with
syndactyly type 1 (including some family members with
polydactyly) and craniosynostosis Philadelphia type
(MIM# 601222). This condition was found to map to a
single locus at 2q35. Further analysis revealed that all
three families contained distinct microduplications, but all
shared the same 9-kb region located within the intron of a
gene 40 kb upstream of IHH. This shared region contains a
putative distant regulator of IHH and represents a similar
situation to the duplication of the ZRS in the cases of
TPTPS [50].
Disruption of the long-range regulation of Ihh is also
considered to be the cause of the polydactyly phenotype
seen in the Doublefoot (Dbf ) mouse mutant. Dbf is an
autosomal dominant mutation that results in extreme
polydactyly of all four limbs, containing six to nine digits
on each paw that are triphalangeal and arise preaxially
[51,52]. Ihh is expressed ectopically within the mutant
limb bud across the AP axis, disrupting normal SHH
activity and overriding Shh expression usually driven by
the ZRS. A 600-kb deletion starting approximately 50 kb
upstream of Ihh underlies the Dbf phenotype. This region
is expected to contain a cis-acting regulatory element,
which could be a repressor of Ihh expression that is removed by the deletion or, alternatively, a cryptic enhancer
that may normally be located beyond the deleted region
and moves into an activating position [53].
Gli3 mutants affect Hh signaling
The zinc finger-containing transcription factor GLI3 is the
ultimate target for Shh signaling in the early limb bud [54].
Heterozygous mutations in the GLI3 gene cause Greig
cephalopolysyndactyly syndrome (GCPS: MIM# 175700)
and PallisterHall syndrome (PHS MIM# 146510), both
of which include polydactyly in the spectrum of disorders
[55,56]. In addition, in rare cases, GLI3 mutations cause
nonsyndromic polydactyly (MIM# 174700). The PHS and
GCPS phenotypes are clinically distinct and, as with the
Shh regulatory mutants, there is a robust genotypephenotype correlation [57]. The polydactyly phenotype in PHS has
a central or insertional polydactyly; whereas GCPS exhibits
pre- or postaxial polydactyly (most commonly preaxial of the
feet and postaxial of the hands) with variable syndactyly
370

Trends in Genetics August 2012, Vol. 28, No. 8

[58]. Truncating mutations in the middle third of the Gli3


gene cause PHS, whereas large deletions or truncation
mutations in the amino or carboxy terminal third of the
gene cause GCPS. PHS mutations are predicted to be dominant mutations in which the truncated protein ends near the
proteolytic cleavage site to constitutively produce a repressor protein with a similar activity to GLI3R. This would
skew the balance of the activator and the repressor forms of
GLI3, resulting in an anteriorizing affect on the limb bud.
GCPS mutations are predicted to be null mutations, and the
phenotype results from a haploinsufficiency, suggesting
that absolute amounts of GLI3R and GLI3A are required
for development. Mouse mutations that represent Gli3 loss
of function (Gli3xt and Gli3pdn) and a mutation (Gli3D699)
that causes a PHS-like truncation of the protein near the
proteolytic cleavage site support the notion that GCPS and
PHS are clinically distinct [59,60]. Mouse studies suggest
that GLI3 has a Shh-independent activity in early limb-bud
stages [60], acting to restrict HAND2 expression (Figure 1);
however, it is not clear what role this independent activity
plays in heterozygous human GLI3 mutations.
Cilia, the Hh pathway and limb patterning
Transduction of the Hh signal to the GLI protein is a
multistep process (Box 2 and Figure 2) and, over the past
decade, it has become clear that there is a connection
between the complex steps of the Hh pathway and a unique
structural component of the cell, the cilia (reviewed in [61
63]). Although the cilia have several signaling roles, it has
been suggested [64] that, in early development, primary
cilia in vertebrates are dedicated to Hh signal transduction. The phenotypes caused by loss of cilia-associated
proteins are syndromic, and not all patients show limb
abnormalities, which suggests that cilia play an active role
in mediating Hh signaling and do not simply serve as a
compartment in which pathway components are concentrated.
The primary cilium is a small organelle that projects
from the surface of the cell. It comprises a central structure
of microtubules, called the axoneme, that functions to
maintain the cilium and extends the structure by transport
of particles along its length. This intraflagellar transport
(IFT) mechanism transports molecules from the base to the
tip of the cilium. Evidence suggests that components of the
IFT machinery are involved in the Hh signaling pathway.
The GLI proteins (Gli2 and Gli3) are localized at the cilia
tip and trafficked along the axoneme in response to Hh
signaling [65]. Thus, mutations in those proteins involved
in the trafficking process often have phenotypes reminiscent of Hh signaling defects (Figure 2).
Several congenital human disorders, called ciliopathies,
result from recessive mutations in genes that have a role in
the cilia or the basal body [66]. Ciliopathies are a heterogeneous group of diseases presenting with a broad spectrum of clinical phenotypes, including pre- and postaxial
polydactyly. For example, Joubert syndrome (MIM#
213300), MeckelGruber syndrome (MIM# 249000), and
BardetBiedl syndrome (BBS, MIM# 209900) are all associated with polydactyly. Joubert syndrome can result from
mutations in at least ten genes and is characterized by a
specific brain malformation with additional pathologies. A

Review
Box 2. Vertebrate Hh signaling in the cilium
The main difference between mammalian and Drosophila Hh
signaling is the central role played by cilia in mammals but not in
flies [6] (Figure 2, main text). Drosophila lacking cilia develop almost
normally, indicating that cilia are not required for Drosophila Hh
signaling [82]. In vertebrates, several steps from recognition of SHH
to the processing of GLI1-3 (here referred to as GLI) in the limb
involve the cilia and IFT [83]. The cilium is maintained and extended
by transport of particles along the axoneme (reviewed in [6062]).
The transport of molecules toward the cilia tip, via IFT, is called
anterograde trafficking (kinesin motor driven) and down the
axoneme toward the base of the cilia is referred to as retrograde
trafficking (dynein driven) [62,63].
Signal transduction takes place in the cilia, where PTCH1 is
located in the absence of the ligand and represses the function of
smoothened (SMO), which resides in the repressed state in
cytoplasmic vesicles [84]. Upon activation by SHH, PTCH1 is
internalized and SMO is phosphorylated by a G protein-coupled
receptor kinase (GRK2). This phosphorylation promotes SMO
binding to b-arrestin and Kif3a, a requirement for the trafficking of
SMO into the cilium, where it activates GLI.
Full-length GLIs are present in the cilia in a complex with the
anterograde IFT kinesin motor KIF7 [68]. SUFU promotes the
truncation of GLI into the repressor form (GLIR) and the retrograde
IFT-dynein motor enables GLIR to reach the nucleus. Activation of
SMO relieves the inhibition that SUFU exerts and promotes the
activator form of GLI (GLIA) [85,86]. This process is promoted by
KIF7, which may also block the function of SUFU. GLIA reaches the
nucleus and activates the transcription of Hh targets genes, which
include PTCH.
In the absence of SHH signaling, the processing of GLIs requires
regulated proteolysis by the large multiprotein proteasome complex. The GLIs are sequentially phosphorylated by kinases producing a phosphopeptide domain that is recognized by b-TrCP, which
recruits an SCF E3 ubiquitin ligase complex. Ubiquitination targets
Gli3 to the proteosome and initiates a limited degradation process,
allowing GLIR to be transported to the nucleus, where it inhibits
transcription [6].

recent report highlights a mutation in the KIF7 gene [67],


an ortholog of the Drosophila kinesin-encoding gene Costal2 (Cos2), which is involved in Hh signaling (Figure 2).
Reduction of KIF7 leads to a decrease in the number of cells
displaying primary cilia and misregulation of GLI. Alteration in the GLI3R:GLI3A ratio (as seen in GCPS) may be
responsible for the polydactyly. In mice, KIF7 was shown to
be a core regulator of SHH signaling and a putative ciliary
motor protein [6870]. Interestingly, Cos2 differs in that it
has lost its kinesin motor function; this is in accord with the
observation that Drosophila do not use cilia for developmental signaling.
Mutations that broadly affect cilia structure and function probably disrupt GLI3 processing, leading to polydactyly. BBS is a multisystem disorder that results from
mutations in any one of 16 different genes and limb defects
usually appear as postaxial polydactyly. BBS is primarily a
disease of the basal body [71], a microtubule-based, modified centriole located at the base of the axoneme that serves
as a nucleation site for the growth of the axoneme microtubules. Thus, cilia assembly (a complex process requiring
hundreds of proteins), SHH signaling and GLI3 processing
are tightly amalgamated. It seems probable that polydactyly in ciliopathies arises for various reasons. Clearly, some
of the disease-causing mutations block important steps in
the transduction of the SHH signal. However, other defects
may be more general and act to disrupt cilia architecture,

Trends in Genetics August 2012, Vol. 28, No. 8

thus inhibiting the signaling process [72]. Both routes


would disrupt GLI3 processing, affecting the GLI3R:GLI3A ratio and creating digit abnormalities, some phenocopying GCPS or PHS.
Concluding remarks
Several mutational mechanisms alter Hh signaling at
different points in the pathway often impacting on the
developing limb bud. Regulatory mutations affecting
Shh expression play a central role in generating preaxial
polydactyly. In some cases, regulatory mutations affecting
expression of the closely related molecule, IHH, override
normal developmental processes to affect adversely the
developing limb. It appears that these and an increasingly
large number of other mutations ultimately disrupt proteolytic processing of GLI3, the prime target for Hh signaling. These other mutations include those that directly
affect the structure of GLI3 and those that affect the
cilia, a complex cellular structure that has a significant
investment in Hh signaling. The large number of different
clinical manifestations, that includes the polydactyly phenotype (at least 80 have been described) [3], is a hallmark
of the hundreds of genes, especially those that affect ciliogenesis, involved in Hh signaling, presenting a considerable overall target for pathogenetic mutations. It is clear
that further genetic analysis of limb patterning will be
informative in generating insights into not only developmental biology, but also the basic biology of the cell.
References
1 Abbasi, A.A. (2011) Evolution of vertebrate appendicular structures:
insight from genetic and palaeontological data. Dev. Dyn. 240, 1005
1016
2 Sun, G. et al. (2011) Twelve-year prevalence of common neonatal
congenital malformations in Zhejiang Province, China. World J.
Pediatr. 7, 331336
3 Biesecker, L.G. (2011) Polydactyly: how many disorders and how many
genes? 2010 update. Dev. Dyn. 240, 931942
4 te Welscher, P. et al. (2002) Mutual genetic antagonism involving GLI3
and dHAND prepatterns the vertebrate limb bud mesenchyme prior to
SHH signaling. Genes Dev. 16, 421426
5 Saunders, J.W. and Gasseling, M.T. (1968) Ectodermalmesenchymal
interactions in the origin of limb symmetry. In Epithelial
Mesenchymal Interactions (Fleischmeyer, R. and Billingham, R.E.,
eds), pp. 7897, Williams & Wilkins
6 Wilson, C.W. and Chuang, P.T. (2010) Mechanism and evolution of
cytosolic Hedgehog signal transduction. Development 137, 20792094
7 Wang, B. et al. (2000) Hedgehog-regulated processing of GLI3 produces
an anterior/posterior repressor gradient in the developing vertebrate
limb. Cell 100, 423434
8 Litingtung, Y. et al. (2002) Shh and Gli3 are dispensable for limb
skeleton formation but regulate digit number and identity. Nature 418,
979983
9 te Welscher, P. et al. (2002) Progression of vertebrate limb development
through SHH-mediated counteraction of GLI3. Science 298, 827830
10 Towers, M. and Tickle, C. (2009) Growing models of vertebrate limb
development. Development 136, 179190
11 Zhu, J. et al. (2008) Uncoupling Sonic hedgehog control of pattern and
expansion of the developing limb bud. Dev. Cell 14, 624632
12 Towers, M. et al. (2008) Integration of growth and specification in chick
wing digit-patterning. Nature 452, 882886
13 Lettice, L.A. et al. (2003) A long-range Shh enhancer regulates
expression in the developing limb and fin and is associated with
preaxial polydactyly. Hum. Mol. Genet. 12, 17251735
14 Galli, A. et al. (2010) Distinct roles of Hand2 in initiating polarity and
posterior Shh expression during the onset of mouse limb bud
development. PLoS Genet. 6, e1000901
371

Review
15 Nissim, S. et al. (2007) Characterization of a novel ectodermal signaling
center regulating Tbx2 and Shh in the vertebrate limb. Dev. Biol. 304,
921
16 Tarchini, B. and Duboule, D. (2006) Control of Hoxd genes colinearity
during early limb development. Dev. Cell 10, 93103
17 Capellini, T.D. et al. (2006) Pbx1/Pbx2 requirement for distal limb
patterning is mediated by the hierarchical control of Hox gene spatial
distribution and Shh expression. Development 133, 22632273
18 Amano, T. et al. (2009) Chromosomal dynamics at the Shh locus: limb
bud-specific differential regulation of competence and active
transcription. Dev. Cell 16, 4757
19 Lettice, L.A. et al. (2012) Opposing functions of the ETS factor family
define Shh spatial expression in limb buds and underlie polydactyly.
Dev. Cell 22, 459467
20 Mao, J. et al. (2009) Fgf-dependent Etv4/5 activity is required for
posterior restriction of Sonic Hedgehog and promoting outgrowth of
the vertebrate limb. Dev. Cell 16, 600606
21 Zhang, Z. et al. (2009) FGF-regulated Etv genes are essential for
repressing Shh expression in mouse limb buds. Dev. Cell 16, 607613
22 Zhang, Z. et al. (2010) Preaxial polydactyly: interactions among ETV,
TWIST1 and HAND2 control anterior-posterior patterning of the limb.
Development 137, 34173426
23 Fernandez-Teran, M. and Ros, M.A. (2008) The apical ectodermal
ridge: morphological aspects and signaling pathways. Int. J. Dev.
Biol. 52, 857871
24 Hill, R.E. (2007) How to make a zone of polarizing activity: insights into
limb development via the abnormality preaxial polydactyly. Dev.
Growth Differ. 49, 439448
25 Albuisson, J. et al. (2011) Identification of two novel mutations in Shh
long-range regulator associated with familial pre-axial polydactyly.
Clin. Genet. 79, 371377
26 Gurnett, C.A. et al. (2007) Two novel point mutations in the long-range
SHH enhancer in three families with triphalangeal thumb and
preaxial polydactyly. Am. J. Med. Genet. A 143, 2732
27 Klopocki, E. et al. (2008) A microduplication of the long range SHH
limb regulator (ZRS) is associated with triphalangeal thumbpolysyndactyly syndrome. J. Med. Genet. 45, 370375
28 Sun, M. et al. (2008) Triphalangeal thumb-polysyndactyly syndrome
and syndactyly type IV are caused by genomic duplications involving
the long range, limb-specific SHH enhancer. J. Med. Genet. 45, 589595
29 Wieczorek, D. et al. (2010) A specific mutation in the distant sonic
hedgehog (SHH) cis-regulator (ZRS) causes Werner mesomelic
syndrome (WMS) while complete ZRS duplications underlie Haas
type polysyndactyly and preaxial polydactyly (PPD) with or without
triphalangeal thumb. Hum. Mutat. 31, 8189
30 Furniss, D. et al. (2008) A variant in the sonic hedgehog regulatory
sequence (ZRS) is associated with triphalangeal thumb and
deregulates expression in the developing limb. Hum. Mol. Genet. 17,
24172423
31 Farooq, M. et al. (2010) Preaxial polydactyly/triphalangeal thumb is
associated with changed transcription factor-binding affinity in a
family with a novel point mutation in the long-range cis-regulatory
element ZRS. Eur. J. Hum. Genet. 18, 733736
32 Semerci, C.N. et al. (2009) Homozygous feature of isolated
triphalangeal thumb-preaxial polydactyly linked to 7q36: no
phenotypic difference between homozygotes and heterozygotes. Clin.
Genet. 76, 8590
33 Masuya, H. et al. (2007) A series of ENU-induced single-base
substitutions in a long-range cis-element altering Sonic hedgehog
expression in the developing mouse limb bud. Genomics 89, 207214
34 Zhao, J. et al. (2009) HnRNP U mediates the long-range regulation of
Shh expression during limb development. Hum. Mol. Genet. 18, 3090
3097
35 Lettice, L.A. et al. (2008) Point mutations in a distant sonic hedgehog
cis-regulator generate a variable regulatory output responsible for
preaxial polydactyly. Hum. Mol. Genet. 17, 978985
36 Dunn, I.C. et al. (2011) The chicken polydactyly (Po) locus causes allelic
imbalance and ectopic expression of Shh during limb development.
Dev. Dyn. 240, 11631172
37 Maas, S.A. et al. (2011) Identification of spontaneous mutations within
the long-range limb-specific Sonic hedgehog enhancer (ZRS) that alter
Sonic hedgehog expression in the chicken limb mutants
oligozeugodactyly and Silkie breed. Dev. Dyn. 240, 12121222
372

Trends in Genetics August 2012, Vol. 28, No. 8

38 Dorshorst, B. et al. (2010) Genomic regions associated with dermal


hyperpigmentation, polydactyly and other morphological traits in the
Silkie chicken. J. Hered. 101, 339350
39 Park, K. et al. (2008) Canine polydactyl mutations with heterogeneous
origin in the conserved intronic sequence of LMBR1. Genetics 179,
21632172
40 Maas, S.A. and Fallon, J.F. (2005) Single base pair change in the longrange Sonic hedgehog limb-specific enhancer is a genetic basis for
preaxial polydactyly. Dev. Dyn. 232, 345348
41 Lettice, L.A. et al. (2011) Enhancer-adoption as a mechanism of human
developmental disease. Hum. Mutat. 32, 14921499
42 Niedermaier, M. et al. (2005) An inversion involving the mouse Shh
locus results in brachydactyly through dysregulation of Shh
expression. J. Clin. Invest. 115, 900909
43 Sagai, T. et al. (2005) Elimination of a long-range cis-regulatory module
causes complete loss of limb-specific Shh expression and truncation of
the mouse limb. Development 132, 797803
44 Ianakiev, P. et al. (2001) Acheiropodia is caused by a genomic deletion
in C7orf2, the human orthologue of the Lmbr1 gene. Am. J. Hum.
Genet. 68, 3845
45 Kronenberg, H.M. (2003) Developmental regulation of the growth
plate. Nature 423, 332336
46 Koziel, L. et al. (2005) GLI3 acts as a repressor downstream of Ihh in
regulating two distinct steps of chondrocyte differentiation.
Development 132, 52495260
47 Hellemans, J. et al. (2003) Homozygous mutations in IHH cause
acrocapitofemoral dysplasia, an autosomal recessive disorder with
cone-shaped epiphyses in hands and hips. Am. J. Hum. Genet. 72,
10401046
48 Guo, S. et al. (2010) Missense mutations in IHH impair Indian
Hedgehog signaling in C3H10T1/2 cells: implications for
brachydactyly type A1, and new targets for Hedgehog signaling.
Cell. Mol. Biol. Lett. 15, 153176
49 Gao, B. et al. (2001) Mutations in IHH, encoding Indian hedgehog,
cause brachydactyly type A-1. Nat. Genet. 28, 386388
50 Klopocki, E. et al. (2011) Copy-number variations involving the IHH
locus are associated with syndactyly and craniosynostosis. Am. J.
Hum. Genet. 88, 7075
51 Yang, Y. et al. (1998) Evidence that preaxial polydactyly in the
Doublefoot mutant is due to ectopic Indian Hedgehog signaling.
Development 125, 31233132
52 Hayes, C. et al. (1998) Sonic hedgehog is not required for polarising
activity in the Doublefoot mutant mouse limb bud. Development 125,
351357
53 Babbs, C. et al. (2008) Polydactyly in the mouse mutant Doublefoot
involves altered GLI3 processing and is caused by a large deletion in cis
to Indian hedgehog. Mech. Dev. 125, 517526
54 Hui, C.C. and Angers, S. (2011) GLI proteins in development and
disease. Annu. Rev. Cell Dev. Biol. 27, 513537
55 Shin, S.H. et al. (1999) GLI3 mutations in human disorders mimic
Drosophila cubitus interruptus protein functions and localization.
Proc. Natl Acad. Sci. U.S.A 96, 28802884
56 Johnston, J.J. et al. (2010) Molecular analysis expands the spectrum of
phenotypes associated with GLI3 mutations. Hum. Mutat. 31, 1142
1154
57 Naruse, I. et al. (2010) Birth defects caused by mutations in human
GLI3 and mouse Gli3 genes. Congenit. Anom. 50, 17
58 Biesecker, L.G. (2008) The Greig cephalopolysyndactyly syndrome.
Orphanet J. Rare. Dis. 3, 10
59 Hill, P. et al. (2007) The molecular basis of Pallister Hall associated
polydactyly. Hum. Mol. Genet. 16, 20892096
60 Hill, P. et al. (2009) A SHH-independent regulation of Gli3 is a
significant determinant of anteroposterior patterning of the limb
bud. Dev. Biol. 328, 506516
61 Bettencourt-Dias, M. et al. (2011) Centrosomes and cilia in human
disease. Trends Genet. 27, 307315
62 Gerdes, J.M. et al. (2009) The vertebrate primary cilium in
development, homeostasis, and disease. Cell 137, 3245
63 Wong, S.Y. and Reiter, J.F. (2008) The primary cilium at the
crossroads of mammalian hedgehog signaling. Curr. Top. Dev. Biol.
85, 225260
64 Goetz, S.C. and Anderson, K.V. (2010) The primary cilium: a signalling
centre during vertebrate development. Nat. Rev. Genet. 11, 331344

Review
65 Haycraft, C.J. et al. (2005) GLI2 and GLI3 localize to cilia and require
the intraflagellar transport protein polaris for processing and function.
PLoS Genet. 1, e53
66 Hildebrandt, F. et al. (2011) Ciliopathies. N. Engl. J. Med. 364, 1533
1543
67 Dafinger, C. et al. (2011) Mutations in KIF7 link Joubert syndrome
with Sonic Hedgehog signaling and microtubule dynamics. J. Clin.
Invest. 121, 26622667
68 Liem, K.F., Jr et al. (2009) Mouse Kif7/Costal2 is a cilia-associated
protein that regulates Sonic hedgehog signaling. Proc. Natl Acad. Sci.
U.S.A 106, 1337713382
69 Endoh-Yamagami, S. et al. (2009) The mammalian Cos2 homolog Kif7
plays an essential role in modulating Hh signal transduction during
development. Curr. Biol. 19, 13201326
70 Cheung, H.O. et al. (2009) The kinesin protein KIF7 is a critical
regulator of Gli transcription factors in mammalian hedgehog
signaling. Sci. Signal. 2, ra29
71 Zaghloul, N.A. and Katsanis, N. (2009) Mechanistic insights into
BardetBiedl syndrome, a model ciliopathy. J. Clin. Invest. 119, 428437
72 Ocbina, P.J. et al. (2011) Complex interactions between genes
controlling trafficking in primary cilia. Nat. Genet. 43, 547553
73 Gallet, A. (2011) Hedgehog morphogen: from secretion to reception.
Trends Cell Biol. 21, 238246
74 Murone, M. et al. (1999) Sonic hedgehog signaling by the patchedsmoothened receptor complex. Curr. Biol. 9, 7684
75 Aza-Blanc, P. et al. (1997) Proteolysis that is inhibited by hedgehog
targets Cubitus interruptus protein to the nucleus and converts it to a
repressor. Cell 89, 10431053
76 Hooper, J.E. and Scott, M.P. (2005) Communicating with Hedgehogs.
Nat. Rev. Mol. Cell Biol. 6, 306317

Trends in Genetics August 2012, Vol. 28, No. 8

77 Jiang, J. (2006) Regulation of Hh/Gli signaling by dual ubiquitin


pathways. Cell Cycle 5, 24572463
78 Ruel, L. and Therond, P.P. (2009) Variations in Hedgehog signaling:
divergence and perpetuation in SUFU regulation of Gli. Genes Dev. 23,
18431848
79 Hooper, J.E. (2003) Smoothened translates Hedgehog levels into
distinct responses. Development 130, 39513963
80 Jia, J. et al. (2003) Smoothened transduces Hedgehog signal by
physically interacting with Costal2/Fused complex through its Cterminal tail. Genes Dev. 17, 27092720
81 Marks, S.A. and Kalderon, D. (2011) Regulation of mammalian Gli
proteins by Costal 2 and PKA in Drosophila reveals Hedgehog pathway
conservation. Development 138, 25332542
82 Basto, R. et al. (2006) Flies without centrioles. Cell 125, 13751386
83 Oh, E.C. and Katsanis, N. (2012) Cilia in vertebrate development and
disease. Development 139, 443448
84 Lum, L. and Beachy, P.A. (2004) The Hedgehog response network:
sensors, switches, and routers. Science 304, 17551759
85 Chen, M.H. et al. (2009) Cilium-independent regulation of Gli protein
function by Sufu in Hedgehog signaling is evolutionarily conserved.
Genes Dev. 23, 19101928
86 Humke, E.W. et al. (2010) The output of Hedgehog signaling is
controlled by the dynamic association between Suppressor of Fused
and the Gli proteins. Genes Dev. 24, 670682
87 Heus, H.C. et al. (1999) A physical and transcriptional map of the
preaxial polydactyly locus on chromosome 7q36. Genomics 57, 342351
88 Radhakrishna, U. et al. (1999) The phenotypic spectrum of GLI3
morphopathies includes autosomal dominant preaxial polydactyly
type-IV and postaxial polydactyly type-A/B; no phenotype prediction
from the position of GLI3 mutations. Am. J. Hum. Genet. 65, 645655

373

Review

Regulation of chromatin structure by


long noncoding RNAs: focus on natural
antisense transcripts
Marco Magistri1, Mohammad Ali Faghihi1, Georges St Laurent III2 and Claes
Wahlestedt1
1

Department of Psychiatry and Behavioral Sciences, and Center for Therapeutic Innovation, University of Miami Miller School of
Medicine, Miami, FL 33136, USA
2
St Laurent Institute, Cambridge, MA 02139, USA

In the decade following the publication of the Human


Genome, noncoding RNAs (ncRNAs) have reshaped our
understanding of the broad landscape of genome regulation. During this period, natural antisense transcripts
(NATs), which are transcribed from the opposite strand
of either protein or non-protein coding genes, have
vaulted to prominence. Recent findings have shown that
NATs can exert their regulatory functions by acting as
epigenetic regulators of gene expression and chromatin
remodeling. Here, we review recent work on the mechanisms of epigenetic modifications by NATs and their
emerging role as master regulators of chromatin states.
Unlike other long ncRNAs, antisense RNAs usually regulate their counterpart sense mRNA in cis by bridging
epigenetic effectors and regulatory complexes at specific
genomic loci. Understanding the broad range of effects
of NATs will shed light on the complex mechanisms that
regulate chromatin remodeling and gene expression in
development and disease.
Chromatin and ncRNAs: coupling structure and
dynamic information
Histone octamer proteins and their tightly associated
146 bp of DNA form the nucleosome, the structural and
functional core of eukaryotic chromatin. Specific combinations of DNA and histone post-translational modification
patterns lead to diverse changes in chromatin states and
distinct functional genomic outputs [1,2]. DNA methylation is perhaps the best-characterized chemical modification of DNA that impacts chromatin structure and
function. In mammalian cells, DNA methylation occurs
on cytosine residues in CpG dinucleotides and correlates
with transcriptional repression. Promoter regions have a
high density of CpG dinucleotides, whose methylation
state dictates the transcriptional activity of the gene.
Chromatin structure and function are also regulated by
post-translational modifications of histone proteins.
Histone-modifying enzymes are protein complexes that
dynamically recognize (read), add (write), remove (erase)
or replace various chromatin modifications. Examples
of writers include EZH2, the catalytic subunit of the
Corresponding author: Wahlestedt, C. (cwahlestedt@med.miami.edu).
Keywords: antisense RNA; epigenetics; transcriptome; chromatin; ncRNAs; NATs.

polycomb repressive complex 2 (PRC2) which is responsible


for the trimethylation of histone H3 at lysine 27
(H3K27me3), and G9a, the histone methyltransferase
(HMT) that catalyzes the di- or trimethylation of histone
H3 at lysine 9 (H3K9me2/3) [2,3]. Erasers, such as the
demethylase LSD1, specifically remove particular histone
marks [4]. Readers function as interpreters and include
effector proteins that recognize specific histone marks and
transduce this information into a genomic response [57].
Writers, erasers and readers have to work in concert with
their action tightly coordinated to produce an integrated
regulatory effect. Recent discoveries of frequent interactions
between ncRNAs and chromatin strongly suggest pivotal
roles for ncRNAs in orchestrating the function of these
protein complexes. How chromatin-modifying enzymes
specifically recognize and bind to their target loci still
remains mysterious. One tempting hypothesis is that local
transcription of low-abundance ncRNAs might be the key
event in the locus-specific recruitment of different reader,
eraser and writer complexes.
Dynamic transcriptional regulation at the level of
chromatin
The classic division of chromatin into two opposing states,
gene-rich euchromatin versus the silenced, tightly packed
heterochromatin, has been challenged by recent discoveries suggesting the existence of different chromatin states in
various organisms, including humans [813]. The two-state
chromatin model assumed that the chromatin structure was
essentially an on/off switch whereby a gene was either active
or repressed, without any intermediate states. By contrast,
a dynamic chromatin state varies between these extremes
and represents an integration of information derived
from an intricate network of histone-modifying enzymes,
chromatin-binding proteins, transcription factors and
chromatin-associated RNA transcripts [14,15].
Globally, RNA, which is an integral structural component of chromatin, is required for the maintenance of
compact chromatin fibers [16]. RNA has also been shown
to be involved in the maintenance of higher-order chromatin structure at pericentric heterochromatin in mouse cells
[17], highlighting the important contribution of RNA to the
regulation of chromatin structure and function. Recently, a

0168-9525/$ see front matter 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tig.2012.03.013 Trends in Genetics, August 2012, Vol. 28, No. 8

389

Review

Trends in Genetics August 2012, Vol. 28, No. 8

genome-wide next-generation RNA sequencing approach


was used to identify the RNA content of chromatin in
human fibroblasts [18]. Surprisingly, more than 70% of
the sequencing reads aligned with intergenic and intronic
regions of the human genome. Functional experiments on a
small number of chromatin RNA transcripts imply an
interaction with chromatin-modifying enzymes, which
raises the possibility of a functional role of these transcripts in chromatin regulation [18].
Further support for the notion that RNA regulates
chromatin comes from a small but growing number of
antisense transcripts [19,20] and other long ncRNAs
[2124] that interact with epigenetic effectors to orchestrate chromatin remodeling and epigenetic changes during
development and disease. Cell type-specific ncRNAs interact with ubiquitously expressed regulatory proteins to
form RNAprotein complexes that can interact with histones, DNA, other RNAs, and other chromatin-modifying
complexes, to dynamically coordinate changes in gene
expression programs (reviewed in [25]). RNA motifs composed of primary sequence information coupled to highly
diverse secondary structure elements underlie the complexity and dynamic nature of these interactions. The
combination of structural and regulatory elements of the
chromatin contributes to the acquisition of a specific chromatin state and is key to understanding the mechanisms
governing the organization of the human genome and the
regulation of gene expression.
Natural antisense transcripts (NATs)
A substantial fraction of the mammalian genome is transcribed in the form of non-protein-coding RNAs [2629]
that have important regulatory functions in development,
differentiation [3032] and human diseases [19,3335].
Although there is no unequivocal classification of

non-protein-coding transcripts found in the mammalian


genome, ncRNAs can be roughly divided on the basis of size
into short ncRNAs (less than 200 nt in length) and long
ncRNAs (lncRNAs) [36,37]. Short ncRNAs include miRNAs, piRNA, endogenous siRNAs and snoRNAs, which
have been extensively reviewed elsewhere [3840] and
therefore will not be discussed here. lncRNAs are a heterogeneous group of RNAs transcribed from intergenic [41] or
intragenic regions [42], which vary in length from 200 nt to
over 100 kb [37]. NATs are a class of lncRNA molecules [43]
that are transcribed from the opposite DNA strand of other
RNA transcripts with which they share sequence complementarity [26,4446]. Antisense RNAs could potentially
exert a regulatory function on their corresponding sense
mRNA at different levels [47]. NAT regulatory mechanisms fall into four main categories: mechanisms related
to transcription (including epigenetic interactions), RNA
DNA interactions, RNARNA interactions in the nucleus
and RNARNA interactions in the cytoplasm [48]. Among
these four mechanisms, RNA-mediated epigenetic modification has received an increasing amount of experimental
support. Antisense transcripts can provide a scaffold for
effector proteins to interact with DNA and chromatin in a
locus specific way.
NATs: cis-acting epigenetic silencers
Unlike transcription factors, many histone-modifying
enzymes lack specific DNA-binding domains [15]. Based
on this important observation, it has been postulated that
ncRNAs might interact with ubiquitously expressed histone-modifying enzymes providing the required level of
binding specificity (Figure 1).
In mammalian cells, dosage compensation offered
the first characterized examples of antisense lncRNAmediated chromatin remodeling and gene silencing [49].

NAT

mRNA

TRENDS in Genetics

Figure 1. Epigenetic regulation induced by NATs. NATs regulate the epigenetic landscape of genomic loci from which they are transcribed (cis regulation). A specific
secondary structure permits the NAT to interact with different chromatin-modifying enzymes (green, red and purple shapes), thereby coordinating their action and directing
specific epigenetic modifications of the nearby chromatin (green and red flags). Locus specificity may be achieved through sequence-specific interactions between the NAT
and the DNA.

390

Review
One of the two mammalian female X chromosomes is
inactivated via an RNA-based mechanism in which the
antisense ncRNA Xist, expressed from the X chromosome,
mediates the recruitment of polycomb repressive complex
2 (PRC2) that in turn catalyzes the heterochromatinization
of the entire X chromosome [21,49].
A similar mechanism of RNA-based epigenetic regulation of gene expression was found to silence various
imprinted mammalian alleles. Most imprinted mammalian genes associate in clusters [50], and the presence of
NATs is a common feature of these loci [26,51,52]. For
example, Air is an imprinted, paternally expressed
lncRNA transcribed from the second intron of the mouse
insulin-like growth factor 2 receptor (Igf2r) gene [53]. In
mouse placenta, expression of Air induces the epigenetic
silencing of both the paternal allele of Igf2r, from which Air
is expressed, and neighboring upstream genes. Although
the transcription unit of Air only overlaps with Igf2r, Air
recognizes and binds to the promoter regions of its neighboring genes. The molecular mechanisms underlying these
interactions have not been clarified and might rely on a
specific secondary structure adopted by Air or on the
involvement of mediator proteins. The Air ncRNA interaction with the promoter of upstream genes in the cluster
results in the recruitment of the HMT G9a, which generates a repressive chromatin state [54]. The ability of Air to
silence non-overlapping genes in cis is reminiscent of Xistinduced X-chromosome inactivation. In the case of Xist,
epigenetic silencing spreads through the entire X chromosome, in contrast to the case of imprinted genes where
epigenetic silencing spreads only to a significant portion of
the locus. The extent of the spread of epigenetic silencing
may be related to the presence of insulator elements in the
DNA sequence and their association with the CCCTCbinding factor (CTCF) [55], a multifunctional protein that
enables insulator function and facilitates higher-order
chromatin interactions [56].
Another interesting example of imprinting regulation is
the antisense ncRNA transcript Kcnq1ot1, which is transcribed from intron 10 of the imprinted gene Kcnq1 [57].
This paternally expressed NAT silences Kcnq1 in cis, as
well as neighboring genes on the paternal chromosome, by
controlling chromatin and DNA modifications at that locus
[58]. Kcnq1ot1 mediates the allele-specific deposition of the
repressive histone marks H3K27me3 and H3K9me3 by
direct interaction with the PRC2 components Ezh2,
Suz12 and the H3K9-specific HMT, G9a [58,59]. Similar
to the situation with Air, the epigenetic changes caused by
Kcnq1ot1 occur outside the sequence boundary of this
lncRNA, emanating bidirectionally from the Kcnq1 locus.
Some of the imprinted genes in this cluster, although
silenced, lack Kcnq1ot1 enrichment [60].
Based on these examples, cis-acting NATs may remain
linked to their transcription loci but exert their regulatory
function on the neighboring genes via the recruitment of
different proteins and the organization of higher-order
chromatin structures. The presence or absence of insulator
elements may influence the extension of chromatin alterations in each locus [61]. In this hypothetical scenario, the
antisense transcript acts as a scaffold for the recruitment
of chromatin-modifying enzymes, initiating events that

Trends in Genetics August 2012, Vol. 28, No. 8

expand in both directions to the entire chromosome, as in


the case of X-chromosome inactivation, or to the entire
imprinted cluster. In this model, the recruitment of chromatin-modifying complexes is dependent on antisense RNA
expression, whereas the expansion of these effects depends
on the subsequent involvement of DNA insulator elements.
Taken together, these imprinting studies imply that a
large portion of NATs could exert their regulatory role by
binding to chromatin enzymes and recruiting them in cis to
their targets. In favor of this hypothesis, RNA immunoprecipitation (RIP) experiments targeting Ezh2, coupled
with directional RNA sequencing (RIP-seq), revealed that
the PRC2 complex associates with almost 10 000 RNAs in
mouse embryonic stem cells (mESCs) [62]. Almost 3000 of
these RNAs are NATs, and around 1000 are bidirectional
transcripts. Interestingly, some NATs linked to disease loci
were found to immunoprecipitate with Ezh2, such as
Hspa1a-AS, Bgn-AS, Foxn2-AS and Malat1-AS [62], suggesting that ncRNAs target the PRC2 complex to chromatin. Unfortunately, in this study RIP-sequencing data were
not integrated with ChIP-sequencing data, and the
authors did not investigate the possible overlap between
the genomic localization of PRC2 and the immunoprecipitated RNA transcripts. Nevertheless, the presence of NATs
associated with PRC2 suggests the importance of these
RNA transcripts in mediating the recruitment of chromatin-modifying complexes.
Accumulating evidence implies that the interaction of
NATs with EZH2 and other HMTs is more common than
previously believed, contributing to the epigenetic regulation of many autosomal loci. In addition to the finding that
lncRNAs interact with histone-modifying enzymes, they
have also been shown to play a role in DNA methylation.
ANRIL is a NAT that overlaps with the INK4b/ARF/
INK4a locus [63]. This locus encodes two cyclin-dependent
kinase inhibitors, p15INK4b and p16INK4a, and a regulator of the p53 pathway, ARF [64]. The ANRIL transcript
also overlaps with several polymorphisms discovered in
genome-wide association studies (GWAS) that correspond
to increased risk for cardiovascular disease and diabetes
[65]. An initial study showed that ANRIL expression
inversely correlates with p15INK4b expression in acute
lymphoblastic leukemia and acute myeloid leukemia. It
was demonstrated that ANRIL mediates the silencing of
the tumor suppressor gene p15INK4b via DNA methylation and heterochromatin formation in a Dicer-independent manner, thus excluding the involvement of
endogenous small RNAs in the process [20]. Later, it
was shown that ANRIL, EZH2 and the PRC1 component
CBX7 are upregulated in several prostate cancer tissue
specimens with an inverse correlation to the expression of
p16INK4a [19]. Moreover, ANRIL physically associates
with CBX7 and colocalizes with EZH2 and CBX7 to the
promoter region of p16INK4a in prostate cancer cells.
Thus, the NAT ANRIL participates in the silencing of
two very important tumor-suppressor genes via two distinct mechanisms, and the alteration of these regulatory
circuits has been found in different types of cancer.
Evidence of a functional interaction between NATs and
PRC2 comes from a study on the cyclin-dependent kinase
inhibitor p21, another important tumor-suppressor gene.
391

Review
Bidirectional transcription at the p21 locus generates an
antisense transcript and p21 mRNA. The p21 NAT
represses p21 mRNA in a process involving the deposition
of the repressive histone mark H3K27me3 [66]. This mechanism is AGO1-independent, further excluding involvement of endogenous small RNA mediators in the
process. Thus, depending on the cellular context, an imbalanced expression of NATs can result in the silencing or
activation of partner protein-coding genes, providing an
interesting potential mechanism to explain the aberrant
upregulation or silencing of cancer-related genes.
Among the different body tissues, the brain expresses a
high abundance of ncRNAs [67]. Discovered in the developing mouse forebrain, the NAT Evf2 is transcribed from
the ultra-conserved Dlx5/6 region encoding the homeodomain transcription factors DLX5 and DLX6 [68]. Evf2
forms a complex with the DLX-2 homeodomain protein
to function as a transcriptional coactivator that increases
Dlx5/6 enhancer activity [68]. Recently, studies of an Evf2
loss-of-function mouse revealed more complex regulatory
functions of this NAT in the development of GABAergic
interneurons [69]. Through antisense interference, Evf2
negatively regulates the expression of Dlx6 mRNA. Moreover, Evf2 exerts a silencing effect on Dlx5 by recruiting
DLX and the methyl CpG binding protein 2 (MECP2) to the
enhancer region [69]. Mutant Evf2 mice have reduced
numbers of GABAergic interneurons in the dentate gyrus
of the early postnatal hippocampus and reduced synaptic
inhibition in the adult hippocampus [69]. This study highlights the importance of NATs in regulating gene expression during neuronal maturation and raises the possibility
of a more extended role of antisense transcripts in central
nervous system development.
In recent studies, repeat expansion diseases have often
been characterized by bidirectional transcription overlapping the repeat region [70]. Spinocerebellar ataxia type 7
(SCA7) is a neurological disorder associated with a polyglutamine repeat (CAG) expansion in the ataxin-7 gene
(ATXN7) [71]. SCAANT1 is a 1.4 kb long NAT overlapping
the ATXN7 gene that is actively transcribed upon CTCF
binding to target sites flanking the CAG repeat region [72].
SCAANT1 expression is associated with an increased level
of the repressive H3K27me3 mark and a decreased level of
the activating histone H3 acetylation mark at the ATXN7
promoter. The pathological increase of CAG expansion is
accompanied by reduced expression of SCAANT1 ncRNA
and increased expression of ATXN7 mRNA, showing an
inverse relationship between the NAT and its partner
sense transcript [72]. This study reveals an interesting
NAT-based mechanism that is potentially involved in
SCA7 pathogenesis.
NATs can silence gene expression in cis, making them
attractive therapeutic targets to achieve specific upregulation of gene expression. It has recently been shown that
brain-derived neurotrophic factor (BDNF) is under the
epigenetic control of an antisense transcript, BDNF-AS
[73]. Depletion of BDNF-AS can alter chromatin marks at
the BDNF locus and upregulate locus-specific gene expression. This study also described NAT-mediated endogenous
gene suppression of glia-derived neurotrophic factor
(GDNF) and ephrin B2 receptor (EPHB2), suggesting that
392

Trends in Genetics August 2012, Vol. 28, No. 8

antisense RNA-mediated transcriptional suppression is a


frequent phenomenon [73]. Considering the frequency with
which NATs are transcribed, these examples may represent only the tip of the iceberg, with the regulatory role of
NATs in epigenetic modifications representing a more
common event than previously imagined.
NATs: cis-acting epigenetic activators
The first observation that lncRNAs are involved in epigenetic gene activation stems from dosage compensation
studies in Drosophila, where the imbalanced presence of
X chromosomes in the sexes necessitates compensation by
a twofold upregulation of all the genes on the single male X
chromosome [74]. Two lncRNAs, roX1 and roX2, play a
fundamental role in the correct targeting of the Dosage
Compensation Complex to many different binding sites on
the male X chromosome, which results in transcriptional
upregulation. These and other examples provide accumulating evidence of a central role for NATs in the epigenetic
activation of specific loci on a genome-wide basis, providing
insight into the biological language of lncRNAs [75].
Following these initial findings in Drosophila, several
other examples of ncRNAs in vertebrates have been
reported. Among these, a ncRNA expression-profile study
of mESC differentiation identified several ncRNAs associated with important mESC protein-coding genes [30]. Among
these ncRNAs, two concordantly upregulated NATs colocalized with their sense mRNA partners during a specific step
of mESC differentiation. The NATs, named Evx1as and
Hoxb5/6as, are transcribed from the opposite DNA strand
of Evx1 and Hoxb5/6, respectively [30]. Using RNA-ChIP,
the authors found that these NATs immunoprecipitate with
H3K4me3, demonstrating a physical interaction with a
transcriptional activation mark [30]. Furthermore, RNAIP experiments showed direct interaction between Evx1as
and Hoxb5/6as with MLL1, the mammalian trithorax protein responsible for H3K4me3 in the promoter region of
several developmental genes [30]. This finding raise the
possibility that these NATs are involved in the epigenetic
activation of their mRNA partners during differentiation.
In another example of epigenetic activation, the chromatin-associated ncRNA transcript termed Intergenic10,
located in the region 30 to FANK1 in the opposite orientation, overlaps with the protein-coding gene ADAM12 [18].
The expression of Intergenic10 correlates positively with
the expression of the neighboring protein coding genes.
siRNA depletion of Intergenic10 resulted in the concordant
downregulation of ADAM12 and FANK1 and a decrease in
the levels of the active chromatin mark H3K4me2 in the
promoter regions of the downregulated genes [18]. NATs
may bind and recruit in cis chromatin-modifying enzymes
to establish a locus-specific transcriptionally active chromatin state.
Taken together, these observations show that a chromatin-associated ncRNA can act as a chromatin remodeler
in cis to regulate positively or negatively the expression of
neighboring genes.
LncRNAs: trans-acting chromatin remodelers
Controversy still exists regarding the functional significance of many long and short ncRNA transcripts that are

Review
pervasively transcribed in the human genome, and particularly those originating in the proximity of the transcriptional start sites (TSSs) of many active genes. However,
cell-, tissue- and developmental-specific transcription of
lncRNAs argues against the simplistic assumption that
these arise from transcriptional noise. Moreover, removal
of these ncRNAs often correlates with functional consequences. Aside from NATs, the human genome produces
many other classes of lncRNAs. For example, the analysis
of chromatin signatures revealed a family of over 1000
highly conserved lncRNAs, termed large intergenic noncoding RNAs (lincRNAs), that contain sense and antisense
members with many potential regulatory functions [41].
RNA-IP experiments of the PRC2 complex component
EZH2 followed by hybridization to a custom exon-tiling
array for 900 human lincRNAs showed that almost 30% of
expressed lincRNAs physically interact with PRC2 [76].
Immunoprecipitation of lncRNAs with EZH2 is highly
suggestive of functional roles of these transcripts through
the PRC2 pathway. The catalog of lincRNAs encoded in the
human genome as well as the understanding of their roles
in mediating the function of chromatin-modifying complexes is rapidly expanding.
Unlike most NATs, lincRNAs exert their regulatory
roles in trans to alter chromatin shape and gene expression
at distant loci. HOTAIR is a lincRNA encoded in antisense
orientation in the HOX-C cluster on chromosome 12 that is
necessary for the correct expression of the HOX-D cluster
of genes on chromosome 2 [23]. HOTAIR associates with
the PRC2 complex to silence and maintain a large domain
of heterochromatin in the HOX-D gene cluster. Genomic
regions flanking HOX-D contain high levels of H3K27me3
and low levels of H3K4me2/3 [77]. It was shown in several
cellular systems that HOTAIR acts as a modular scaffold
for the recruitment of both PRC2 and LSD1, the catalytic
subunit of the repressor complex CoREST/REST, which in
turn coordinate the methylation of H3K27me3 and demethylation of H3K4me2/3, respectively, in trans at many
different target genomic regions [78]. Interestingly, altered
HOTAIR expression in primary breast tumors is a
powerful predictor of metastasis and poor prognosis [35].
Inhibition of HOTAIR expression in cancer cells reduces
invasiveness and metastatic potential, consistent with its
physiological function in dictating chromatin states of
fibroblast during development [35].
A loss-of-function study in mESCs produced a functional
characterization of a large number of lincRNAs [32]. It was
shown that lincRNAs maintain the pluripotent state and
repress lineage programs in mESCs via trans-acting
mechanisms of global gene expression regulation. mESCs
lincRNAs associate with 12 different chromatin complexes
involved in different aspects of epigenetic regulation, such
as writers (Tip60/P400, Prc2, Setd8, Eset, Suv39), readers
(Prc1, Cbx1, Cbx3) and erasers (Jarid1b, Jarid1c, Hdac1)
[32]. Seventy-four lincRNAs associate with at least one of
these complexes and several lincRNAs associate with
functionally related chromatin complexes [32]. Because
lincRNAs physically associate with multiple chromatinregulatory proteins, they may serve as scaffolds to
bridge together similar complexes into larger functional
units.

Trends in Genetics August 2012, Vol. 28, No. 8

Similar to NATs, lncRNAs can be involved in the


epigenetic activation of specific loci. HOTTIP is a spliced,
polyadenylated lncRNA transcribed in the opposite orientation from the 50 end of the HOXA locus [79]. HOTTIP
knockdown in fibroblasts and chick embryos resulted in
decreased HOXA expression, affecting a region 40 kb
downstream from the 50 end of the HOXA locus. This
repressive effect depends on the distance from the HOTTIP
gene; genes in close proximity exhibit a greater decrease in
expression levels [79]. These changes in gene expression
correlated with a global loss of H3K4me3 and H3K4me2
across the affected region. RIP experiments demonstrated
direct binding of HOTTIP with WDR5, a component of
the core complex responsible for H3K4 methylation [79].
Ectopically expressed HOTTIP does not induce the expression of 50 HOXA genes in fibroblast cells, implying a cis
mechanism of action for HOTTIP. Artificial recruitment of
HOTTIP RNA upstream of a silent GAL4 promoter can
boost transcription in the presence of WDR5, confirming
the cis effect of the HOTTIP transcript in the proximity of
the target genes [79].
Mechanisms of lncRNA interactions with chromatin and
chromatin-modifying enzymes
The ability of lncRNAs to function as scaffolds for the
recruitment of different yet functionally related enzymes
and to confer locus specificity to these enzymes raises two
immediate questions: what mediates the interactions
between ncRNAs and specific chromatin enzymes, and
what is the language of molecular rules governing them?
One of the first hints of a mechanism governing ncRNA
enzyme interactions came from studies of the X-chromosome inactivation phenomenon. It was shown that a novel
ncRNA termed Repeat A (RepA) directly binds to EZH2 and
functions in the recruitment of PRC2 to the X chromosome
[21]. RepA is a 1.6 kb ncRNA transcribed within Xist and is
composed of 7.5 tandem repeat sequences that fold into two
conserved stemloop structures crucial for EZH2 binding
[21]. These initial findings were subsequently confirmed by
an independent study showing that short RNAs 50200 nt
in length are transcribed from the 50 end of polycomb target
genes [80]. Interestingly, these short RNAs have stemloop
structures similar to RepA and are able to bind the PRC2
component SUZ12 [80]. Similarly, the antisense Kcnq1ot1
has a conserved RNA repeat that was shown to be necessary for the epigenetic silencing of imprinted genes [60].
These studies imply that lncRNAs assume specific secondary structures offering different docking sites for different
enzymes.
In large part, how NATs bind to target genes to guide
chromatin-modifying enzymes to specific loci remains unexplained (Figure 2). Two recently developed methods for
profiling the genome-wide occupancy of lncRNAs have
allowed high-throughput identification of RNADNA and
RNAprotein interactions [81,82]. The application of these
new techniques may represent a promising tool to explore
the mechanisms governing ncRNAchromatin interactions, as shown by the informative analysis performed
on a few known lncRNAs (roX2, TERC and HOTAIR)
[82]. Interestingly, among the discovered DNA binding
sites of both rox2 and TERC, specific consensus DNA
393

Review

(a)

Trends in Genetics August 2012, Vol. 28, No. 8

the discovery that HOTAIR binding to its genomic targets


does not require EZH2, demonstrate that ncRNAs are
required for specific recognition of DNA sequences as well
as recruitment of polycomb proteins, which in turn modify
the neighboring chromatin. This study demonstrates that
locus-specific interaction between ncRNAs and chromatin
takes place independently from ncRNAenzyme interaction and pointed out the existence of specific RNA-targeting motifs among ncRNA target sites. These motifs may
represent binding sites for structural elements within the
ncRNA, in case of direct RNADNA interaction, or may
function as the binding site for mediator proteins that may
induce HOTAIR recruitment.

Sense mRNA

Antisense RNA

(b)

Sense mRNA

Antisense RNA

(c)

Sense mRNA

Antisense RNA

TRENDS in Genetics

Figure 2. Molecular mechanisms of NATs and chromatin interactions. Two types


of interactions are necessary for any ncRNA-induced chromatin modification to
take place: between an antisense RNA molecule and a chromatin-modifying
enzyme (CME), and between either a CME and DNA or antisense RNA and DNA.
The second type of interaction is necessary to confer sequence specificity to the
chromatin modifications. Each one of these interactions (RNAprotein, RNADNA
or DNAprotein) can either take place through sequence motifs (digital Watson
Crick base pairing) or by RNA secondary structure. NATs function as intermediates
that target CMEs to locus-specific regions of the genome. The molecular
mechanisms governing the interaction between NATs and chromatin remain
poorly characterized. Here, we propose three different possible scenarios by which
this interaction occurs. (a) Specific binding of antisense RNA to a CME as well as to
a DNA region by forming a unique secondary structure. (b) The sequence motif
dictates the interaction between the antisense RNA molecule and the target DNA.
In this model, antisense RNA binds specifically to CMEs and to a particular DNA
region. (c) Nonspecific binding of antisense RNA to a DNA sequence. In this model,
local antisense transcription leads to a specific chromatin modification. The
specificity in this model comes from the promoter of antisense RNA and the fact
that transcription will lead to particular modifications. NATs do not physically
associate with the chromatin. In this case, locus-specificity is achieved by nascent
NATs that are recognized by chromatin-modifying enzymes.

sequences have been observed, thus suggesting that specific DNA motifs might be important for the recruitment of
these and other lncRNAs to their target genomic loci.
HOTAIR binding sites contain a GA-rich polypurine motif,
reminiscent of mammalian Polycomb response elements. It
is notable that although the HOTAIR binding sites overlap
with PRC2 and H3K27me3 chromatin regions, they are
restricted to small regions of a few hundred bp, raising the
possibility that HOTAIR nucleates PRC2 binding and
H3K27me3 spreading [82]. These data, together with
394

Concluding remarks
Although the examples of NAT and lncRNA mechanisms
described above suggest a broad continuum of function for
ncRNAs in epigenetic regulation, the exact roles and mechanisms of most of these molecules remain largely unknown. NATs have emerged as powerful transducers of
biological information, primarily due to their ability to
bridge the interaction between proteins and DNA [83].
The information content and structural features of these
ncRNAs collectively establish a dynamic interface with
other macromolecules [83], thus facilitating the formation
and modulation of ribonucleoprotein complexes crucial for
epigenetic signaling. These unique features permit NATs
and other lncRNAs to function as scaffolds to regulate
epigenetic mechanisms within the cell. The key to future
studies of lncRNAs will be to integrate successfully the
layers of knowledge gained from multiple genomic, transcriptomic, proteomic and epigenomic approaches to create
a multidimensional understanding of NATs within the
existing cellular framework [84].
Acknowledgments
The authors would like to thank Dr Chiara Pastori and Roya Pedram
Fatemi for helpful discussions and critical reading of the manuscript. The
research on long ncRNAs in C.W.s laboratory is supported in large part by
grants from the U.S. National Institutes of Health (5R01NS063974 and
5R01MH084880). M.M.s postdoctoral studies are supported by a fellowship
from the Swiss National Science Foundation (PBGEP3-136151).

References
1 Chi, P. et al. (2010) Covalent histone modifications miswritten,
misinterpreted and mis-erased in human cancers. Nat. Rev. Cancer
10, 457469
2 Daniel, J.A. et al. (2005) Effector proteins for methylated histones: an
expanding family. Cell Cycle 4, 919926
3 Cao, R. and Zhang, Y. (2004) The functions of E(Z)/EZH2-mediated
methylation of lysine 27 in histone H3. Curr. Opin. Genet. Dev. 14, 155
164
4 Shi, Y. et al. (2004) Histone demethylation mediated by the nuclear
amine oxidase homolog LSD1. Cell 119, 941953
5 Maurer-Stroh, S. et al. (2003) The Tudor domain Royal Family: Tudor,
plant Agenet, Chromo, PWWP and MBT domains. Trends Biochem.
Sci. 28, 6974
6 Mellor, J. (2006) It takes a PHD to read the histone code. Cell 126, 22
24
7 Kouzarides, T. (2007) Chromatin modifications and their function. Cell
128, 693705
8 van Steensel, B. (2011) Chromatin: constructing the big picture. EMBO
J. 30, 18851895
9 Schubeler, D. (2010) Chromatin in multicolor. Cell 143, 183184
10 Filion, G.J. et al. (2010) Systematic protein location mapping reveals
five principal chromatin types in Drosophila cells. Cell 143, 212224

Review
11 Kharchenko, P.V. et al. (2011) Comprehensive analysis of
the chromatin landscape in Drosophila melanogaster. Nature 471,
480485
12 Ernst, J. and Kellis, M. (2010) Discovery and characterization of
chromatin states for systematic annotation of the human genome.
Nat. Biotechnol. 28, 817825
13 Gerstein, M.B. et al. (2010) Integrative analysis of the Caenorhabditis
elegans genome by the modENCODE project. Science 330, 17751787
14 Bonasio, R. et al. (2010) Molecular signals of epigenetic states. Science
330, 612616
15 Bernstein, E. and Allis, C.D. (2005) RNA meets chromatin. Genes Dev.
19, 16351655
16 Rodriguez-Campos, A. and Azorin, F. (2007) RNA is an integral
component of chromatin that contributes to its structural
organization. PLoS ONE 2, e1182
17 Maison, C. et al. (2002) Higher-order structure in pericentric
heterochromatin involves a distinct pattern of histone modification
and an RNA component. Nat. Genet. 30, 329334
18 Mondal, T. et al. (2010) Characterization of the RNA content of
chromatin. Genome Res. 20, 899907
19 Yap, K.L. et al. (2010) Molecular interplay of the noncoding RNA
ANRIL and methylated histone H3 lysine 27 by polycomb CBX7 in
transcriptional silencing of INK4a. Mol. Cell 38, 662674
20 Yu, W. et al. (2008) Epigenetic silencing of tumour suppressor gene p15
by its antisense RNA. Nature 451, 202206
21 Zhao, J. et al. (2008) Polycomb proteins targeted by a short repeat RNA
to the mouse X chromosome. Science 322, 750756
22 Martianov, I. et al. (2007) Repression of the human dihydrofolate
reductase gene by a non-coding interfering transcript. Nature 445,
666670
23 Rinn, J.L. et al. (2007) Functional demarcation of active and silent
chromatin domains in human HOX loci by noncoding RNAs. Cell 129,
13111323
24 Bierhoff, H. et al. (2010) Noncoding transcripts in sense and antisense
orientation regulate the epigenetic state of ribosomal RNA genes. Cold
Spring Harb. Symp. Quant. Biol. 75, 357364
25 Wang, K.C. and Chang, H.Y. (2011) Molecular mechanisms of long
noncoding RNAs. Mol. Cell 43, 904914
26 Katayama, S. et al. (2005) Antisense transcription in the mammalian
transcriptome. Science 309, 15641566
27 Carninci, P. et al. (2005) The transcriptional landscape of the
mammalian genome. Science 309, 15591563
28 Birney, E. et al. (2007) Identification and analysis of functional
elements in 1% of the human genome by the ENCODE pilot project.
Nature 447, 799816
29 Clark, M.B. et al. (2011) The reality of pervasive transcription. PLoS
Biol 9, e1000625 discussion e1001102
30 Dinger, M.E. et al. (2008) Long noncoding RNAs in mouse embryonic
stem cell pluripotency and differentiation. Genome Res. 18, 1433
1445
31 Ahfeldt, T. et al. (2012) Programming human pluripotent stem cells
into white and brown adipocytes. Nat. Cell Biol. 14, 209219
32 Guttman, M. et al. (2011) lincRNAs act in the circuitry controlling
pluripotency and differentiation. Nature 477, 295300
33 Ji, P. et al. (2003) MALAT-1, a novel noncoding RNA, and thymosin
beta4 predict metastasis and survival in early-stage non-small cell
lung cancer. Oncogene 22, 80318041
34 Faghihi, M.A. et al. (2008) Expression of a noncoding RNA is elevated
in Alzheimers disease and drives rapid feed-forward regulation of
beta-secretase. Nat. Med. 14, 723730
35 Gupta, R.A. et al. (2010) Long non-coding RNA HOTAIR reprograms
chromatin state to promote cancer metastasis. Nature 464, 10711076
36 Wright, M.W. and Bruford, E.A. (2011) Naming junk: human nonprotein coding RNA (ncRNA) gene nomenclature. Hum. Genomics 5,
9098
37 Kapranov, P. et al. (2007) RNA maps reveal new RNA classes and a
possible function for pervasive transcription. Science 316, 14841488
38 Ghildiyal, M. and Zamore, P.D. (2009) Small silencing RNAs: an
expanding universe. Nat. Rev. Genet. 10, 94108
39 Malone, C.D. and Hannon, G.J. (2009) Small RNAs as guardians of the
genome. Cell 136, 656668
40 Carthew, R.W. and Sontheimer, E.J. (2009) Origins and mechanisms of
miRNAs and siRNAs. Cell 136, 642655

Trends in Genetics August 2012, Vol. 28, No. 8

41 Guttman, M. et al. (2009) Chromatin signature reveals over a thousand


highly conserved large non-coding RNAs in mammals. Nature 458,
223227
42 Nakaya, H.I. et al. (2007) Genome mapping and expression analyses of
human intronic noncoding RNAs reveal tissue-specific patterns and
enrichment in genes related to regulation of transcription. Genome
Biol. 8, R43
43 Sun, M. et al. (2006) Evidence for variation in abundance of antisense
transcripts between multicellular animals but no relationship between
antisense transcription and organismic complexity. Genome Res. 16,
922933
44 Kiyosawa, H. et al. (2003) Antisense transcripts with FANTOM2
clone set and their implications for gene regulation. Genome Res. 13,
13241334
45 Chen, J. et al. (2005) Human antisense genes have unusually short
introns: evidence for selection for rapid transcription. Trends Genet. 21,
203207
46 Chen, J. et al. (2005) Genome-wide analysis of coordinate expression
and evolution of human cis-encoded senseantisense transcripts.
Trends Genet. 21, 326329
47 Lapidot, M. and Pilpel, Y. (2006) Genome-wide natural antisense
transcription: coupling its regulation to its different regulatory
mechanisms. EMBO Rep. 7, 12161222
48 Faghihi, M.A. and Wahlestedt, C. (2009) Regulatory roles of natural
antisense transcripts. Nat. Rev. Mol. Cell Biol. 10, 637643
49 Lee, J.T. et al. (1999) Tsix, a gene antisense to Xist at the X-inactivation
centre. Nat. Genet. 21, 400404
50 Verona, R.I. et al. (2003) Genomic imprinting: intricacies of epigenetic
regulation in clusters. Annu. Rev. Cell Dev. Biol. 19, 237259
51 Mohammad, F. et al. (2009) Epigenetics of imprinted long noncoding
RNAs. Epigenetics 4, 277286
52 Wan, L.B. and Bartolomei, M.S. (2008) Regulation of imprinting in
clusters: noncoding RNAs versus insulators. Adv. Genet. 61, 207223
53 Sleutels, F. et al. (2002) The non-coding Air RNA is required for
silencing autosomal imprinted genes. Nature 415, 810813
54 Nagano, T. et al. (2008) The Air noncoding RNA epigenetically
silences transcription by targeting G9a to chromatin. Science 322,
17171720
55 Kim, T.H. et al. (2007) Analysis of the vertebrate insulator protein
CTCF-binding sites in the human genome. Cell 128, 12311245
56 Gaszner, M. and Felsenfeld, G. (2006) Insulators: exploiting
transcriptional and epigenetic mechanisms. Nat. Rev. Genet. 7, 703
713
57 Smilinich, N.J. et al. (1999) A maternally methylated CpG island in
KvLQT1 is associated with an antisense paternal transcript and loss of
imprinting in BeckwithWiedemann syndrome. Proc. Natl. Acad. Sci.
U.S.A. 96, 80648069
58 Pandey, R.R. et al. (2008) Kcnq1ot1 antisense noncoding RNA mediates
lineage-specific transcriptional silencing through chromatin-level
regulation. Mol. Cell 32, 232246
59 Terranova, R. et al. (2008) Polycomb group proteins Ezh2 and Rnf2
direct genomic contraction and imprinted repression in early mouse
embryos. Dev. Cell 15, 668679
60 Kanduri, C. (2011) Kcnq1ot1: a chromatin regulatory RNA. Semin. Cell
Dev. Biol. 22, 343350
61 Ghirlando, R. et al. (2012) Chromatin domains, insulators, and the
regulation of gene expression. Biochim. Biophys. Acta (http://
dx.doi.org/10.1016/j.bbagrm.2012.01.016)
62 Zhao, J. et al. (2010) Genome-wide identification of polycombassociated RNAs by RIP-seq. Mol. Cell 40, 939953
63 Pasmant, E. et al. (2007) Characterization of a germ-line deletion,
including the entire INK4/ARF locus, in a melanoma-neural system
tumor family: identification of ANRIL, an antisense noncoding RNA
whose expression coclusters with ARF. Cancer Res. 67, 39633969
64 Popov, N. and Gil, J. (2010) Epigenetic regulation of the INK4bARF
INK4a locus: in sickness and in health. Epigenetics 5, 685690
65 Pasmant, E. et al. (2011) ANRIL, a long, noncoding RNA, is an
unexpected major hotspot in GWAS. FASEB J. 25, 444448
66 Morris, K.V. et al. (2008) Bidirectional transcription directs both
transcriptional gene activation and suppression in human cells.
PLoS Genet. 4, e1000258
67 Qureshi, I.A. et al. (2010) Long non-coding RNAs in nervous system
function and disease. Brain Res. 1338, 2035
395

Review
68 Feng, J. et al. (2006) The Evf-2 noncoding RNA is transcribed from the
Dlx-5/6 ultraconserved region and functions as a Dlx-2 transcriptional
coactivator. Genes Dev. 20, 14701484
69 Bond, A.M. et al. (2009) Balanced gene regulation by an embryonic
brain ncRNA is critical for adult hippocampal GABA circuitry. Nat.
Neurosci. 12, 10201027
70 Batra, R. et al. (2010) Partners in crime: bidirectional transcription in
unstable microsatellite disease. Hum. Mol. Genet. 19, R77R82
71 Martin, J-J. (2012) Spinocerebellar ataxia type 7. In Handbook of
Clinical Neurology (Vol. 103; Ataxic Disorders) (Subramony, S.H.
and Durr, A., eds), pp. 475491, Elsevier
72 Sopher, B.L. et al. (2011) CTCF regulates ataxin-7 expression through
promotion of a convergently transcribed, antisense noncoding RNA.
Neuron 70, 10711084
73 Modarresi, F. et al. (2012) Inhibition of natural antisense transcripts in
vivo results in gene-specific transcriptional upregulation. Nat.
Biotechnol. (http://dx.doi.org/10.1038/nbt.2158)
74 Straub, T. and Becker, P.B. (2011) Transcription modulation
chromosome-wide: universal features and principles of dosage
compensation in worms and flies. Curr. Opin. Genet. Dev. 21, 147153
75 Ilik, I. and Akhtar, A. (2009) roX RNAs: non-coding regulators of the
male X chromosome in flies. RNA Biol. 6, 113121

396

Trends in Genetics August 2012, Vol. 28, No. 8

76 Khalil, A.M. et al. (2009) Many human large intergenic noncoding


RNAs associate with chromatin-modifying complexes and affect gene
expression. Proc. Natl. Acad. Sci. U.S.A. 106, 1166711672
77 Fanti, L. et al. (2008) The trithorax group and Pc group proteins are
differentially involved in heterochromatin formation in Drosophila.
Chromosoma 117, 2539
78 Tsai, M.C. et al. (2010) Long noncoding RNA as modular scaffold of
histone modification complexes. Science 329, 689693
79 Wang, K.C. et al. (2011) A long noncoding RNA maintains active
chromatin to coordinate homeotic gene expression. Nature 472, 120124
80 Kanhere, A. et al. (2010) Short RNAs are transcribed from repressed
polycomb target genes and interact with polycomb repressive complex2. Mol. Cell 38, 675688
81 Simon, M.D. et al. (2011) The genomic binding sites of a noncoding
RNA. Proc. Natl. Acad. Sci. U.S.A. 108, 2049720502
82 Chu, C. et al. (2011) Genomic maps of long noncoding RNA occupancy
reveal principles of RNA-chromatin interactions. Mol. Cell 44, 667678
83 St Laurent, G., 3rd and Wahlestedt, C. (2007) Noncoding RNAs:
couplers of analog and digital information in nervous system
function? Trends Neurosci. 30, 612621
84 Hawkins, R.D. et al. (2010) Next-generation genomics: an integrative
approach. Nat. Rev. Genet. 11, 476486

Review

Genetic basis of blood pressure and


hypertension
Sandosh Padmanabhan1, Christopher Newton-Cheh2 and Anna F. Dominiczak1
1

Institute of Cardiovascular and Medical Sciences, College of Medical, Veterinary and Life Sciences, University of Glasgow,
Glasgow G12 8TA, UK
2
Harvard Medical School, Massachusetts General Hospital, Broad Institute of Harvard University and Massachusetts Institute
of Technology, Boston, MA 02114, USA

Blood pressure (BP) is a complex trait regulated by an


intricate network of physiological pathways involving
extracellular fluid volume homeostasis, cardiac contractility and vascular tone through renal, neural or endocrine
systems. Untreated high BP, or hypertension (HTN), is
associated with increased mortality, and thus a better
understanding of the pathophysiological and genetic
underpinnings of BP regulation will have a major impact
on public health. However, identifying genes that contribute to BP and HTN has proved challenging. In this review
we describe our current understanding of the genetic
architecture of BP and HTN, which has accelerated over
the past five years primarily owing to genome-wide association studies (GWAS) and the continuing progress in
uncovering rare gene mutations, epigenetic markers and
regulatory pathways involved in the physiology of BP. We
also look ahead to future studies characterizing novel
pathways that affect BP and HTN and discuss strategies
for translating current findings to the clinic.
The complexity of BP and HTN
BP is a quantitative trait that is distributed normally in
the general population. In adults there is a continuous,
incremental risk of cardiovascular disease, stroke and
renal disease associated with high BP. HTN is defined
based on a cut-off at the upper end of the distribution of BP
at which the benefits of action (i.e., therapeutic intervention) exceed those of inaction [1]. Based on this definition,
there are over 1 billion people with HTN worldwide, and
the World Health Organization suggests this will rise to
1.5 billion by 2020 [2]. The high prevalence of HTN and its
consequent significant adverse economic impact on the
individual and population highlight the importance of
primary prevention of HTN. Thus, there is a pressing
need for a greater understanding of the pathophysiological
and genetic underpinnings of BP regulation and dysregulation. Studies have demonstrated that BP is a genetically
determined trait, with estimates of heritability ranging
from 31% to 68% [3,4]. The BP/HTN phenotype poses
unique challenges for genetic dissection that have made
progress slow (Box 1). BP levels are determined by cardiac
output and peripheral vascular resistance, and these in
turn are regulated by a complex network of interacting
physiological pathways involving extracellular fluid
Corresponding author: Dominiczak, A.F. (Anna.Dominiczak@glasgow.ac.uk).
Keywords: hypertension; blood pressure; genetics; sodium; artery; kidney.

volume homeostasis, cardiac contractility and vascular


tone through renal, neural or endocrine systems. Perturbations in any of these physiological pathways can arise
from environmental (for example salt intake) or genetic
factors or a combination of both that result in high or low
BP. Rare monogenic BP syndromes are characterized by a
major gene defect affecting a single pathway commonly
involving renal electrolyte balance (Box 2). The phenotypic heterogeneity is further complicated by intra-individual
BP variability caused by a large number of factors including measurement technique, instrument error and patient
factors such as anxiety and activity level [5]. All these
genotypic and phenotypic complexities have resulted in
both false-positive and false-negative studies in the past.
The search for BP genes initially focused on genome-wide
linkage studies that were successful in uncovering genes
for monogenic forms of high and low blood pressure but
turned out to be largely unsuccessful in explaining the
polygenic BP phenotype. The recent successes of GWAS
[612] are testament to greater rigor in phenotypic characterization and statistical design. The limitations and
potential of GWAS in the dissection of hypertension are
highlighted in a recent debate [13,14].
In this review we describe the explosion in our understanding of the genetic architecture of BP and HTN that
has occurred over the past five years and the continuing
progress in uncovering new gene mutations causing rare
inherited forms of HTN and hypotension. We review the
road ahead, highlighting the novel pathways identified,
both common and rare, and discuss future strategies to
uncover novel mechanisms and clinical translation. Table
1 summarizes the genomic and functional context of all the
monogenic and GWAS BP/HTN loci discovered to date.
Common variants
GWAS use dense sets of single nucleotide polymorphisms
(SNPs; usually 500 0001 000 000) and rely on linkage
disequilibrium (LD) or correlation patterns of typed (or
imputed) SNPs with functional variants. This means the
identified SNPs are usually proxies of ungenotyped functional variants. Although these SNPs show unequivocal
association with BP, the functional dissection of these
signals is not straightforward. Associations detected by
GWAS between BP/HTN and SNPs, with 50 kb flanking
segments, are shown in Figure 1 (GWAS SNPs were selected from studies with sample sizes greater than 20 000

0168-9525/$ see front matter 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tig.2012.04.001 Trends in Genetics, August 2012, Vol. 28, No. 8

397

Review

Trends in Genetics August 2012, Vol. 28, No. 8

Box 1. Phenotypic complexity


Variation in extracellular fluid volume, the contractile state of the heart
and vascular tone contribute to variation in BP level. Other determinants of BP include age, weight, ethnicity and diet. Systolic BP (SBP)
increases linearly from age 30 to 84 years together with mean arterial
pressure [a weighted average of SBP and diastolic pressure (DBP)],
but DBP increases linearly up to age 5060, after which it begins to
decline with a steep increase in pulse pressure (the difference between
SBP and DBP) [52,53]. The late decline of DBP after age 60 and the
continuous rise in systolic BP reflects the increased large artery
stiffness in older age. The odds of progression to HTN increase by 20
30% for every 5% gain in body weight.
At all ages, HTN is more common in African Americans than in
whites; in all ethnic and racial groups it is more common in those with
lower socioeconomic status. Interestingly, HTN is more prevalent in

and SNPs attaining a P value <5108 for significant


association). Many of the SNPs from GWAS that attained
genome-wide significance also show similar strong associations for other traits (pleiotropy) (Figure 1a) for example,
rs13333226 shows independent association with HTN and
chronic kidney disease [11,15,16]; rs3184504 shows

the African-American population in the USA than in either AfroCaribbean or native black African populations [5456]. In some
societies, BP shows only a small age-related increase and may be
related in part to their agrarian lifestyle as well as the high potassium,
low sodium diet of the hunter-gatherer, a more rural lifestyle and a
lower consumption of food [5760]. From an evolutionary perspective,
essential HTN is a disease of civilization with its abundance of
processed foods and long lifespans and could be an undesirable
pleiotropic effect of a genotype that may have optimized fitness in an
ancient environment [61]. The rates of HTN and sodium sensitivity are
generally higher in individuals carrying the ancestral alleles of
sodium-conserving genes, which show strong latitudinal clines with
the ancestral sodium-conserving alleles more prevalent in African
populations and less so in the northern regions [6264].

significant association with chronic kidney disease, celiac


disease, type 1 diabetes, coronary artery disease, cholesterol, hemoglobin, retinal vascular caliber, plasma eosinophil count and rheumatoid arthritis [6,9,1624]; rs1799945
is also associated with serum iron concentration and
hemoglobin levels in addition to its association with blood

Box 2. Qualitative or quantitative phenotype

(b)

Hypertension

Population frequency

Platt

Pickering

Population frequency

(a)

Platt, hypertensive individuals were a distinct sub-population,


whereas according to Pickering, hypertension was only the upper
portion of a continuous distribution curve of BP. The debate dragged
on through the 1960s with much resistance to accepting Pickerings
quantitative model; this only changed with mounting evidence from
epidemiological studies indicating that high BP was a risk factor for
cardiovascular disease and intervention trials of antihypertensive
therapy all showing similar benefits of reducing BP. Further support
comes from large-scale GWAS for BP that have mapped common
variants at 29 loci with small effects [6,811,67]. However, Platts
model cannot be discounted entirely because there are rare
mendelian forms of HTN and hypotension that are caused by highly
penetrant rare genetic variants with large effects [40]. Pickerings
concession at the end of the long debate aptly summarizes the current
understanding of HTN genetics I never denied the possibility that
there may be a group in what we now call essential hypertension
characterized by single-gene inheritance.

Hypertension

From a genetic perspective, whether BP is considered as a


quantitative trait or a dichotomous disease phenotype has major
implications for studies of genetic causation, and this was recognized
very early. In the 1950s a technical controversy about the unimodal or
bimodal distribution of blood pressure led to the famous Platt
Pickering debate [65], and this is a useful platform to understand the
assumptions that have driven genetic research of BP/HTN so far
(Figure I). Platt measured BP in normotensive and hypertensive
probands and their relatives and found a bimodal distribution of BP
values (Figure Ia). This led him to argue that HTN was simple
mendelian disease caused by a single dominant genetic mutation. By
contrast, Pickering studied BP distributions from the second to eighth
decades in first-degree relatives of normotensive and hypertensive
probands and instead found a unimodal distribution of BP (Figure Ib).
Pickering concluded that the continuous Gaussian distribution of BP
values indicated that BP was inherited as a graded character and is
hence a polygenic non-mendelian trait [65,66]. Thus, according to

Gene (0-n)
Gene mutation absent

Blood pressure

Gene mutation present

Environment (0-n)

Blood pressure
TRENDS in Genetics

Figure I. The PlattPickering debate about the quantitative or qualitative nature of HTN. (a) Platt argued that HTN occurred in a discrete subpopulation and was caused
by a single, heritable genetic mutation. (b) Pickering suggested that there was a range of BP levels in the population, and that there was no clear dividing line between
hypertension and normotension; instead, HTN represented the end of a continuum and was therefore polygenic in origin.

398

Review

Trends in Genetics August 2012, Vol. 28, No. 8

Table 1. Genetic loci associated with monogenic BP syndromes and identified through GWASa
CHR

GWAS b

Monogenic syndrome

1p36.13

Gene/nearest
gene
CLCNKB

1p36.13

SDHB

1p36.2

MTHFR
(NPPA, NPPB)

1q23.3

SDHC

1q42.2

AGT

2q36.2

CUL3

3p25.3

VHL

3q22.1

ULK4

CHARGE, GBPG, ICBP

3q26.2

GBPG, ICBP

4q21.2

MECOM
(MDS1)
FGF5

4q31.2

NR3C2

5p15.3

SDHA

5p13.3

NPR3

5q31.2

KLHL3

6p22.2

HFE

7p22

Pathway

Notes

Bartter syndrome, type 3


OMIM #607364

Renal electrolyte
balance

Paragangliomas 4
OMIM #115310

Sympathetic system

Autosomal recessive
Impaired chloride reabsorption in
the thick ascending loop of Henle
leads to impaired sodium
reabsorption
Low/normal BP
Multiple catecholamine-secreting
head and neck paragangliomas and
retroperitoneal
pheochromocytomas
Methylene-tetrahydrofolate
reductase; has been associated
with changes in plasma
homocysteine levels and preeclampsia. Atrial natriuretic and
brain natriuretic peptides genes
have been associated with
hypertension
Tumors or extra-adrenal
paraganglia- associated
pheochromocytoma
The cleaved products angiotensin I,
angiotensin II and angiotensin III
are known regulators of BP and
sodium homeostasis
Modulation of renal salt, K+ and H+
handling in response to
physiological challenge
Autosomal dominant
Associated with retinal, cerebellar,
and spinal hemangioblastoma,
renal cell carcinoma (RCC),
pheochromocytoma, and
pancreatic tumors
Serine-threonine kinase of
unknown function
Myelodysplasia syndrome
protein 1
Fibroblast growth factor 5;
stimulates cell growth and
proliferation and is associated with
angiogenesis
Autosomal dominant
Missense mutation (S810L) in the
mineralocorticoid receptor
Low-renin, low-aldosterone,
hypokalemia

CHARGE, GBPG,
AGEN, ICBP,
Gene-centric

Paragangliomas 3
OMIM #605373

Renal electrolyte
balance

Sympathetic system

Gene-centric

Pseudohypoaldosteronism
type IIE
OMIM *603136
von HippelLindau syndrome
OMIM #193300

Renal electrolyte
balance,
vascular function
Renal electrolyte
balance
Sympathetic system

GBPG, AGEN, ICBP

Hypertension exacerbation
in pregnancy
OMIM #605115

Renal electrolyte
balance

Pseudohypoaldosteronism
type I
OMIM #177735

Renal electrolyte
balance

Autosomal dominant
Renal unresponsiveness to
mineralocorticoids

Paragangliomas 5
OMIM #614165

Sympathetic system

Tumors or extra-adrenal
paraganglia-associated
pheochromocytoma
Natriuretic peptide clearance
receptor
Modulation of renal salt, K+ and H+
handling in response to
physiological challenge
Autosomal recessive
Iron metabolism
Autosomal dominant
Hyperaldosteronism due to
adrenocortical hyperplasia not
suppressed by dexamethasone

AGEN, ICBP,
Gene-centric
Pseudohypoaldosteronism
type IID
OMIM #614495
Hemochromatosis
OMIM #235200
Familial
hyperaldosteronism type 2
OMIM #605635

Renal electrolyte
balance
Renal electrolyte
balance

ICBP, Gene-centric
Steroid/aldosterone
synthesis

399

Review

Trends in Genetics August 2012, Vol. 28, No. 8

Table 1 (Continued )
Gene/nearest
gene
NOS3

Monogenic syndrome

GWAS b

Pathway

Notes

Pregnancy-induced
hypertension
OMIM +163729

HYPERGENES,
Gene-centric

Endothelial function

8q24.3

CYP11B1,
CYP11B2

8q24.3

CYP11B2

8q24.3

CYP11B1

Familial
hyperaldosteronism type 1
Glucocorticoidremediable
aldosteronism (GRA)
OMIM #103900
Corticosterone
methyloxidase
II deficiency
OMIM #61060
Steroid 11b-hydroxylase
deficiency
OMIM #202010

10p12.3

CACNB2

10q11.2

RET

Multiple endocrine
neoplasia type IIA
OMIM #171400

10q24.3

CYP17A1

17a-hydroxylase and/or
17,20-lyase deficiency
OMIM *609300

11p15.1

PLEKHA7

CHARGE, ICBP

11p15.2

SOX6

Gene-centric

Transcription

11p15.5

LSP1/TNNT3

Gene-centric

?Endothelial
function

11q12.2

SDHAF2

Paragangliomas 2
OMIM #601650

Sympathetic
system

11q23.1

SDHD

Paragangliomas 1
OMIM #16800

Sympathetic
system

11q24.3

KCNJ1

Bartter syndrome,
antenatal, type 2
OMIM #241200
Hypertension with
Brachydactyly
Bilginturan syndrome
OMIM %112410
Pseudohypoaldosteronism
type IIC
Gordons syndrome
OMIM #614492

Renal electrolyte
balance

Nitric oxide plays an important role


in the maintenance of
cardiovascular and renal
homeostasis
Autosomal dominant
Chimeric gene
Plasma and urinary aldosterone
responsive to ACTH;
dexamethasone suppressible
within 48 h
Autosomal recessive
Enzymatic defect results in
decreased aldosterone and saltwasting
Enzyme dysfunction leads to
increased levels of MR-activating
hormones
Subunit of voltage-gated calcium
channel expressed in heart
Autosomal dominant
Associated with multiple endocrine
neoplasms, including medullary
thyroid carcinoma,
pheochromocytoma, and
parathyroid adenomas
Cytochrome p450 enzyme
mediating the first step in
mineralocorticoid and
glucocorticoid synthesis. Enzyme
dysfunction leads to increased
levels of MR activating hormones.
Also involved in sex steroid
synthesis
Plextrin-homology domaincontaining family member
expressed in zona adherens of
epithelial cells
Required for normal development
of the central nervous system,
chondrogenesis, and maintenance
of cardiac and skeletal muscle cells
Expressed in leukocytes and
endothelial cells. Involved in
signaling, regulating the
cytoskeletal architecture and
neutrophil migration
Tumors or extra-adrenal
paraganglia-associated
pheochromocytoma
Tumors or extra-adrenal
paraganglia associated
pheochromocytoma
Reduced potassium recycling leads
to impaired sodium
Reabsorption
Inversion, deletion, and reinsertion
at 12p12.2 to p11.2
No specific biochemical findings

CHR
7q36.1

12p12.2

12p12.3

WNK1

12q21.3

ATP2B1

400

Steroid/aldosterone
synthesis

Steroid/aldosterone
synthesis

Steroid/aldosterone
synthesis
CHARGE, ICBP

?Vascular/cardiac
function
Sympathetic
system

CHARGE, GBPG,
AGEN-BP, ICBP

Steroid/aldosterone
synthesis

Renal electrolyte
balance

CHARGE, GBPG, AGEN,


ICBP, Gene-centric

?Vascular function

Autosomal dominant
Gain-of-function mutations in
WNK1
Low plasma renin, normal or
elevated K+
Encodes plasma membrane
calcium- or calmodulin-dependent
ATPase expressed in endothelium

Review

Trends in Genetics August 2012, Vol. 28, No. 8

Table 1 (Continued )
CHR

Monogenic syndrome

GWAS b

Pathway

Notes

CHARGE, GBPG, ICBP

?Endothelial
function

Also known as lymphocyte-specific


adaptor protein (LNK), may
regulate hematopoietic progenitors
and inflammatory signaling
pathways in endothelium
T-box genes involved in regulation
of developmental processes
Homozygous or compound
heterozygous mutation in the
sodium-potassium-chloride
cotransporter-2 gene
Cytoplasmic tyrosine kinase
involved in angiotensin IIdependent vascular smooth muscle
cell contraction
Autosomal dominant
Constitutive activation of epithelial
sodium
transporter, ENaC. Low plasma
renin, low or normal K+; negligible
urinary aldosterone
Uromodulin; TammHorsfall
protein. Specifically expressed in
the thick ascending limb of the loop
of Henle where 25% of sodium
reabsorption in the kidney occurs
Low BP
Loss-of-function mutation leads to
lower sodium reabsorption
Autosomal recessive
Increased plasma ACTH and
secretory rates of all corticosteroids
Zinc-finger protein 652
Autosomal dominant
Loss-of-function mutations in
WNK4
Low plasma renin, normal or
elevated K+
GNAS encodes the a subunit of the
G protein mediating b-receptor
signal transduction
EDN3 encodes endothelin 3, the
precursor for the ligand of the
endothelin B receptor

12q24.1

Gene/nearest
gene
SH2B3

12q24.2

TBX5TBX3

15q21.1

SLC12A1

15q24.1

CSK

16p12.2

SCNN1B,
SCNN1G

16p12.3

UMOD

16q13

SLC12A3

Gitelman syndrome
OMIM #263800

Renal electrolyte
balance

16q22.1

HSD11B2

Apparent mineralocorticoid
excess
OMIM # 218030

Steroid/aldosterone
synthesis

17q21.3
17q21.3

ZNF652
WNK4

20q13

GNASEDN3

CHARGE, ICBP
Bartter syndrome,
antenatal, type 1
OMIM #601678

Renal electrolyte
balance

CHARGE, GBPG,
AGEN-BP, ICBP

Liddle syndrome
OMIM #177200

Vascular function

Renal electrolyte
balance

BP-Extremes

?Renal electrolyte
balance
?Renal function

GBPG, AGEN-BP, ICBP


Renal electrolyte
balance

Pseudohypoaldosteronism
type IIB
Gordons syndrome
OMIM #614491
GBPG, ICBP

Vascular function

The key genes at each locus are shown with their known or potential role in BP regulation. The grey shaded rows indicate genes implicated in monogenic syndromes of
high/low BP. The Pathway column is color-coded according to the pathway involved.

GWAS studies: AGEN [7], BP-Extremes [10], CHARGE [8], GBPG [9], Gene-centric [6], HYPERGENES [11], and ICBP [5].

pressure [6,25,26]; and proxies for rs1004467 show genome-wide significant associations with coronary artery
disease, schizophrenia, intracranial aneurysm and parkinsonism [6,810,24,2730]. This illustrates the challenges ahead when attempting to design studies to
functionally dissect these signals. Figure 1a also shows
genes that are associated with monogenic syndromes
from Online Mendelian Inheritance in Man (OMIM) that
occur in the GWAS-related DNA segments shown. The
only genes known to be associated with monogenic forms
of high blood pressure and have been identified by GWAS
are cytochrome P450, family 17, subfamily A, polypeptide 1 (CYP17A1) and nitric oxide synthase 3 (NOS3).
Even once a SNP has been identified that is associated
with HTN, it is difficult to identify the gene involved. For
example, Figure 1b shows the genes within 50 kb on

either side of a blood pressure GWAS SNP, and among


these only a few genes (NPPA, NOS3 and UMOD) have
been clearly linked to the GWAS SNP [11,12,31]. Furthermore, 50 kb is by no means the limit of the zone of
influence of a SNP because the risk-genes implicated by
the GWAS SNPs may lie within the region of linkage
disequilibrium around the SNP, or even more distantly
because SNPs can influence the regulation of remote
genes. GWAS loci are often rich in copy-number variants,
insertion/deletion variants (Figure 1b), microRNA targets and transcription factor binding sites (Figure 1c)
that may influence the genotypephenotype association
and offer another avenue for molecular and functional
experiments to elucidate the causal pathways or more
simply to identify which risk-gene the GWAS SNP
implicates.
401

Review

Trends in Genetics August 2012, Vol. 28, No. 8

(a)

(b)

TRENDS in Genetics

Figure 1. Phenotypic, genetic and regulatory context of GWAS signals for blood pressure and hypertension. (a) Phenotypic landscape of GWAS signals in BP/HTN GWAS.
The strongest SNPs for BP and HTN also show very little overlap with genes involved in monogenic BP syndromes. Only CYP17A1 and NOS3 are associated with
monogenic BP syndromes and occur within 50 kb of BP GWAS SNPs. The strongest BP GWAS SNPs and their proxies are not associated exclusively with BP phenotypes

402

Review

Trends in Genetics August 2012, Vol. 28, No. 8

(c)

TRENDS in Genetics

Figure 1. (Continued ).

Finally, the collective effect of all BP loci identified


through GWAS explains only a small fraction (2%) of
BP heritability. Thus, similarly to other common traits, BP
shares the same missing heritability conundrum [32], and
efforts are now directed toward identifying additional
common variants of small effect and rare variants of
greater effect. Although GWAS use SNPs selected to provide genome-wide coverage, they provide limited coverage
of genes with plausible biological relevance (candidate
genes) particularly in relation to lower-frequency genetic
variants (such as those with minor allele frequencies

of 15%). Large-scale gene-centric analysis of BP using a


customized gene array enriched with common, low-frequency variants in 2100 candidate cardiovascular genes
reflecting a wide variety of biological pathways in over
80 000 individuals identified NPR3, HFE, NOS3, SOX6,
LSP1/TNNT3, MTHFR, AGT and ATP2B1, with some
overlap with large GWAS meta-analyses [7]. Among the
single candidate genes studied, NPPA-NPPB [31] and
SCNN1G [33] showed evidence of association with replication, but only the former showed strong concordant signals
in GWAS.

but show pleiotropy with non-BP traits that can either point to plausible underlying pathways (for example UMOD and its association with HTN and kidney function) or
novel common pathways or may be independent associations. The rings from outer to inner represent: (1) chromosomal segments with GWAS SNPs (including 50 kb
flanking region); (2) GWAS SNPs; (3) black markers on chromosomal segments SNP proxy locations for the index SNP in the region (r2>0.8); (4) genes implicated in
monogenic syndromes from OMIM present in the chromosomal regions; (5) non-BP phenotypes that showed genome-wide significance within these loci. (b) Genetic
landscape of GWAS signals in BP/HTN GWAS. Only a few genes (NPPA, NOS3, UMOD) have been clearly linked to the strongest GWAS SNP, whereas many of the SNPs lie
in gene-rich regions, highlighting the challenges ahead in fine-mapping and identifying the causative gene/variant. It is very likely that GWAS SNPs may influence the
regulation of distant genes outside the 50 kb regions shown in this figure. Furthermore, the GWAS loci are also rich in copy-number variants and insertion/deletion variants
that will need to be considered in the functional dissection of GWAS signals. The rings from outer to inner represent 15 as in (a); (6) shows structural variations present
within the chromosomal regions. (c) Regulatory landscape of GWAS signals in BP/HTN GWAS. Bioinformatic analysis of GWAS BP SNP loci show microRNA targets,
conserved transcription factor binding sites, and epigenetic loci, that may influence the genotypephenotype association and offer another avenue for the design of
molecular and functional experiments to elucidate the causal pathways. The SNP positions are indicated by red bars on the chromosome and are the same SNPs as shown
in (a) and (b). The rings from outer to inner represent: (1) MicroRNA targets and associated genes; (2) chromosomal segments with GWAS SNPs (including 50 kb flanking
region); (3) transcription factors binding sites conserved in the human/mouse/rat alignment in the chromosomal regions using TFBS Conserved (tfbsConsSites) in UCSC
Browser showing those transcription factors with score >800; (4) DNase hypersensitive areas assayed in a large collection of cell types; (5) predicted CpG islands. The
height of the line indicates the length of the segment; (ac) were generated using Circos [68] with the Feb 2009(GRCh37/hg19) assembly data from UCSC Genome Browser
(http://genome.ucsc.edu/) [69].

403

Review
One striking result of the BP GWAS is that the genes
from highly plausible pathways are not represented near
the identified SNPs (Figure 1). Using the GRAIL textmining algorithm (Gene Relationships Across Implicated
Loci [34]) to search for connectivity between genes near the
associated SNPs, based on existing literature (published
before 2006 before the explosion of GWAS publications),
Figure 2 shows that of the 41 BP GWAS loci, 14 showed
underlying genes with significant relatedness, as defined
by the degree of similarity in the text describing them
within article abstracts, implying these connected genes
are involved in a common cellular process or pathway.
These regions of GRAIL connectivity show the expected
connection between NPPA/B and NPR3 but, in cases when
the GWAS SNPs lie in gene-rich regions, also reveal connections that point to specific novel genes for follow-up
studies. This is highlighted by rs805303, present in a very
gene-rich locus, and where a connection between NOTCH4

Trends in Genetics August 2012, Vol. 28, No. 8

and Jagged 1 (JAG1 in a different locus rs1327235)


highlights the NOTCH signaling pathway that has been
shown to be important in developing cardiovascular system and congenital human cardiovascular diseases. Their
connection with BP regulation is not intuitive, but this
association should prioritize this pathway and these genes
for functional dissection.
Despite the increasing pace of discovery of variants
associated with BP and HTN, the limited predictive utility
of these variants either singly or as part of a composite risk
score is striking. The population distribution of the number
of BP-increasing alleles with nearly similar allele frequencies is normally distributed because each SNP is inherited
independently, and hence the number of individuals in the
population that are expected to carry all harmful risk
alleles would be vanishingly small. As an example, using
the BP extreme casecontrol cohort [11], the probability
density functions of the number of BP-increasing alleles

TRENDS in Genetics

Figure 2. Representation of the connections between 41 BP GWAS SNPs and their corresponding genes using the GRAIL literature-based text-mining algorithm (Gene
Relationships Across Implicated Loci [34]). This searches for connectivity between genes near the associated SNPs, based on existing literature (we selected published
literature before 2006 before the surge of GWAS publications). The thickness of the red lines indicates the strength of the literature-based connectivity between the genes.
This type of analysis supports known interactions but also suggests new connections that are worth following up in future studies.

404

Hypercontrols
Age>50 years
BP<120/80
No prevalent CVD or incident
CVD during 10 year follow-up
Hypertensive population
Age <63 years
BP>160/100

0.06

0.08

Trends in Genetics August 2012, Vol. 28, No. 8

0.00 0.02

0.04

Frequency

0.10

0.12

Review

30
35
40
45
50
Number of BP-increasing alleles
TRENDS in Genetics

Figure 3. For the prediction of complex diseases, genotypes at multiple SNPs are
often combined into scores (for example, scores are calculated according to the
number of risk alleles carried). The frequency distribution of the number of BPincreasing alleles carried in the general population would be normally distributed
because each allele is inherited independently. The frequency distributions of 35
BP-increasing alleles from GWAS SNPs in populations selected from the extremes
of BP distribution (top 9% and the bottom 2%) [11] show a large overlap of scores,
and the majority of the individuals from both phenotypic extremes lie in the middle
of the distribution. This illustrates the fallacy of using risk scores from GWAS SNPs
to identify individuals at high risk for hypertension. Abbreviation: CVD:
cardiovascular diseases.

(from 35 genome-wide significant GWAS SNPs) in hypercontrols and the extreme hypertensive cases are shown in
Figure 3, illustrating the significant overlap of cases and
controls by genetic risk score despite extremeness of the
phenotypic ascertainment. Using genetic risk scores constructed from up to 13 GWAS BP SNPs, a novel longitudinal study showed that individuals with the highest
combination risk score had significantly higher diastolic
BP at the age of nine years, and the effect was persistent
from childhood through adult age [35]. Genetic risk scores,
including many non-genome-wide significant SNPs,
explained more of the variance than scores based only
on very significant SNPs in adults and children
(1.21.7% in adults and 0.81.4% in children) [36].
Novel pathways uncovered by GWAS
Highly correlated SNPs (r2>0.9) in the 50 end of UMOD
have been independently identified in large GWAS of blood
pressure extremes and kidney function [11,16]. The UMOD
gene [expressed primarily in the thick ascending limb
(TAL) of the loop of Henle] encodes the TammHorsfall
protein [THP/uromodulin (UMOD)], an extracellular protein anchored by a glycosyl phosphatidylinositol (GPI)
functional group at the luminal face of tubular epithelia
and released into the urine by proteolytic cleavage. It is the
most abundant tubule protein in the urine. In the HTN
study, the minor G allele of rs13333226 at the 50 end of
UMOD gene is associated with a lower risk of HTN [OR
(95% CI): 0.87 (0.84;0.91); P = 3.61011], 0.49 mmHg
lower SBP (P = 2.6105) and 0.30 mmHg lower DBP
(P = 1.5105), increased estimated glomerular filtration
rate (eGFR) (3.6 ml/min/minor-allele, P = 0.012), reduced

urinary UMOD excretion and lower fractional excretion of


endogenous lithium. In addition, the genotype association
between rs13333226 and urinary UMOD excretion was
more pronounced with low salt intake and blunted with
high salt intake, indicating a possible geneenvironment
interaction [11]. Adjustment for eGFR in the HTN GWAS
did not alter the association between rs13333226 and
HTN. Mutations in UMOD cause medullary cystic kidney
disease type 2 (MCKD2), familial juvenile hyperuricemic
neuropathy (FJHN) and glomerulocystic kidney disease
(GCKD), but these only lead to HTN during latter stages of
renal failure. A single mechanism that could explain all the
observations involving the minor G allele of rs13333226 is
a decreased sensitivity of the macula densa to luminal Cl.
The decreased sensitivity of the macula densa may be
mediated either through the increased UMOD excretion
associated with the G allele or through other mechanisms.
Under this model, decreased macula densa sensitivity
activates tubuloglomerular feedback and increases GFR,
explaining the increased proximal tubular Na+ reabsorption. The lifetime effect of the elevated GFR would explain
the reduced BP and potentially the age-related effect of the
variant. There may be other possible mechanisms, for
example ROMK function is activated by UMOD, and thus
reduced ROMK activity might explain renal salt wasting in
THP knockout mice and patients with Bartter syndrome
[37].
Rare variants
Genes involved in monogenic hypertension are summarized in Table 1. Pseudohypoaldosteronism type II (Gordons syndrome; familial hyperkalemia; OMIM #145260),
an autosomal dominant form of HTN associated with
hyperkalemia, non-anion gap metabolic acidosis and increased salt reabsorption by the kidney, is caused by either
gain-of-function mutations in WNK1 or loss-of-function
mutations in WNK4. Recently, exome sequencing has been
used to identify mutations in kelch-like 3 (KLHL3) or
cullin3 (CUL3) in pseudohypoaldosteronism II (PHAII)
patients from 41 unrelated families [38].
Conversely, mutations that reduce salt retention, such
as those associated with Bartter (SLC12A1, KCNJ1,
CLCNKB, BSND, CaSR, ClCK-A) and Gitelman
(SLC12A3) syndromes, tend to lower BP and protect
against the development of HTN [39,40]. Resequencing
three candidate genes (SLC12A3, SLC12A1, KCNJ1) involved in Bartter or Gitelman syndromes in the Framingham Heart Study population identified 30 distinct
potentially deleterious rare mutations present in 49 subjects. In the heterozygous state, these variants were associated with 5.7 mmHg lower BP at age 40, and 9.0 mmHg
lower BP at age 60, and in aggregate reduce the risk of
HTN by 60% at age 60 [39]. This is the first indication that
rare variants can produce clinically significant BP reduction in the general population and supports the rarevariantcommon-disease hypothesis [41]. There are
currently exome sequencing projects involving individuals
at BP extremes to replicate this finding and discover more
rare variants with a large effect on blood pressure. The
hypothesis for these studies is that there will be an abundance of low-frequency variants of large effect that will
405

Review
explain most of the missing heritability of BP. Although the
clinical applications of these findings will be limited given
the very low frequency of these variants in the population,
these studies should uncover novel pathways and provide a
deeper understanding of the genetic architecture of blood
pressure.
Epigenetics
Not all features of gene regulation are encoded in genes or
contained in the DNA sequence. MicroRNAs (miRs), histone modifications and DNA methylation have all been
investigated with regard to their role in BP gene regulation. The potential role of miRs in vascular smooth-muscle
biology and blood pressure is just beginning to be appreciated. Mice lacking miR-143 and miR-145 develop significant reductions in BP resulting from modulation of actin
dynamics [42]. Intrarenal expression of miR-200a, miR200b, miR-141, miR-429, miR-205 and miR-192 were found
to be increased in hypertensive nephrosclerosis, and the
degree of upregulation correlated with disease severity.
There are significant correlations between miR species and
proteinuria and GFR, suggesting a doseresponse type of
relationship between intrarenal miR expression and the
severity of hypertensive nephrosclerosis [43]. Renin gene
expression appears to be regulated by miR-181a and miR663 [44]. The identification of these miRs may lead to the
elucidation of pathways involved in HTN causation and
novel therapeutics. Recently, an observational study
showed that human cytomegalovirus (HCMV) seropositivity and titers are positively associated with essential hypertension independently of other HTN risk factors [45].
The HCMV-encoded miR hcmv-miR-UL112 was highly
expressed in hypertensive patients, pointing to a potentially novel pathway involved in HTN. There is support
from an animal study showing that infection of mice with
mouse cytomegalovirus can alone elevate blood pressure
[46]. Although this is an observational finding, it highlights
the prospect of an abundance of pathways and risk factors
that lead to the final common BP phenotype and may have
implications for the discovery of new treatments.
Recently, renal sympathetic denervation has shown
considerable promise in treating refractory HTN [47].
The sympathetic innervation of the kidney is implicated
in the pathogenesis of HTN by increasing plasma renin
activity that leads to sodium and water retention and
reduces renal blood flow (RBF). The procedure involves
radiofrequency ablation of the renal sympathetic nerves,
and has shown remarkable reductions in BP, but the
underlying mechanism is unclear. Recently, histone modification has been shown to play an important role in the
epigenetic modulation of WNK4 transcription in the development of salt-sensitive HTN. Isoproterenol-induced transcriptional suppression of WNK4 was shown to be
mediated via inhibition of histone deacetylase-8 activity
(HDAC8) at the WNK4 promoter [48], which in turn can
stimulate thiazide-sensitive Na+-Cl+ cotransporter (NCC/
SLC12A3) implying that sympathetic nerve activity can
increase BP partly by activating NCC. The evidence that
isoproterenol induces transcriptional suppression of
WNK4 and leads to activation of NCC offers an opportunity
to combine genomics, epigenomics and NCC detection in
406

Trends in Genetics August 2012, Vol. 28, No. 8

urinary exosomes to test the saltsympathetic-systemBP


axis [48,49].
There are indications of epigenetic regulation involving
interactions between a disruptor of telomeric-silencing
alternative splice variant a (Dot1a) and the ALL-1 fused
gene from chromosome 9 (Af9) to produce a nuclear repressor complex that targets histone H3 Lys-79 methylation in
the promoter region of the sodium channel SCNN1A and
suppresses its transcriptional activity [50]. Aldosterone
can disrupt this nuclear complex, which results in histone
H3 Lys-79 hypomethylation at specific subregions and
derepression of the SCNN1A promoter. This adds a novel
epigenetic dimension to the complex transcriptional and
post-transcriptional regulation of the epithelial sodium
channel by aldosterone.
Concluding remarks
Unraveling the genetic basis of BP regulation and HTN has
been more difficult than might be suggested by their high
heritability, but the progress in cataloging common variants using GWAS is comparable to other common traits.
Ongoing studies include the GWAS meta-analysis of BP
extremes and exome sequencing of BP extremes to identify
more sequence variants that are associated with BP. Indeed, it has been estimated that further increasing the
GWAS sample size will identify 116 common variants for
BP that have similar effect sizes to those found already, but
these will collectively explain only about 2.2% of the phenotypic variance [6].
Some important issues to be addressed in future studies
investigating BP as a quantitative trait are to model BP
more accurately in subjects on antihypertensive treatments by taking into account the number of drugs, drug
dosage and compliance metrics, or to make use of longitudinal BP data for example long-term average and BP
variability (both visit-to-visit or 24 h intra-individual variability). Novel strategies and orthogonal study designs are
needed to discover causal and clinically useful genetic
markers efficiently. This would require a move from pure
BP quantitative traits in larger and larger cohorts to
detailed studies of subjects selected on informative intermediate traits derived from the extensive interventional
studies for high BP. SNPs near the genes encoding uromodulin and natriuretic peptides show allelic association
with urinary uromodulin and plasma natriuretic peptides
respectively [11,31] and offer the opportunity for saltintervention trials to dissect the underlying mechanisms
further. Randomized clinical trials with stored DNA samples offer a readily available resource to study not only
drug response but also to dissect pathways of HTN based
on interindividual differences in response to drugs that
target specific pathways.
The limited predictive utility of common variants that
have emerged from most GWAS would suggest that to
build better predictive models it will be necessary to identify orthogonal (i.e., uncorrelated) genetic variants that are
associated with new pathways as suggested for biomarkers
[51]. The current despondency over poor prediction is
probably related to the early discovery of low-hanging fruit
that are perhaps more correlated with known pathways.
The next level of discovery will be more challenging

Review
because the molecular and functional dissection of the
novel variants will require more detailed low-throughput
science in contrast to the high-throughput screening
methods applied so far.
References
1 Evans, J.G. and Rose, G. (1971) Hypertension. Br. Med. Bull. 27, 3742
2 Kearney, P.M. et al. (2005) Global burden of hypertension: analysis of
worldwide data. Lancet 365, 217223
3 Hottenga, J.J. et al. (2005) Heritability and stability of resting blood
pressure. Twin Res. Hum. Genet. 8, 499508
4 Kupper, N. et al. (2005) Heritability of daytime ambulatory blood
pressure in an extended twin design. Hypertension 45, 8085
5 Padmanabhan, S. et al. (2008) Hypertension and genome-wide
association studies: combining high fidelity phenotyping and
hypercontrols. J. Hypertens. 26, 12751281
6 Ehret, G.B. et al. (2011) Genetic variants in novel pathways influence
blood pressure and cardiovascular disease risk. Nature 478, 103109
7 Johnson, T. et al. (2011) Blood pressure loci identified with a genecentric array. Am. J. Hum. Genet. 89, 688700
8 Kato, N. et al. (2011) Meta-analysis of genome-wide association studies
identifies common variants associated with blood pressure variation in
East Asians. Nat. Genet. 43, 531538
9 6Levy, D. et al. (2009) Genome-wide association study of blood pressure
and hypertension. Nat. Genet. 41, 677687
10 Newton-Cheh, C. et al. (2009) Genome-wide association study
identifies eight loci associated with blood pressure. Nat. Genet. 41,
666676
11 Padmanabhan, S. et al. (2010) Genome-wide association study of blood
pressure extremes identifies variant near UMOD associated with
hypertension. PLoS Genet. 6, e1001177
12 Salvi, E. et al. (2012) Genomewide association study using a highdensity single nucleotide polymorphism array and casecontrol design
identifies a novel essential hypertension susceptibility locus in the
promoter region of endothelial NO synthase. Hypertension 59, 248255
13 Dominiczak, A.F. and Munroe, P.B. (2010) Genome-wide association
studies will unlock the genetic basis of hypertension: pro side of the
argument. Hypertension 56, 10171020
14 Kurtz, T.W. (2010) Genome-wide association studies will unlock the
genetic basis of hypertension: con side of the argument. Hypertension
56, 10211025
15 Gudbjartsson, D.F. et al. (2010) Association of variants at UMOD with
chronic kidney disease and kidney stones-role of age and comorbid
diseases. PLoS Genet. 6, e1001039
16 Kottgen, A. et al. (2009) Multiple loci associated with indices of renal
function and chronic kidney disease. Nat. Genet. 41, 712717
17 Gudbjartsson, D.F. et al. (2009) Sequence variants affecting eosinophil
numbers associate with asthma and myocardial infarction. Nat. Genet.
41, 342347
18 Schunkert, H. et al. (2011) Large-scale association analysis identifies
13 new susceptibility loci for coronary artery disease. Nat. Genet. 43,
333338
19 Stahl, E.A. et al. (2010) Genome-wide association study meta-analysis
identifies seven new rheumatoid arthritis risk loci. Nat. Genet. 42, 508
514
20 Dubois, P.C. et al. (2010) Multiple common variants for celiac disease
influencing immune gene expression. Nat. Genet. 42, 295302
21 Ganesh, S.K. et al. (2009) Multiple loci influence erythrocyte
phenotypes in the CHARGE Consortium. Nat. Genet. 41, 11911198
22 Ikram, M.K. et al. (2010) Four novel Loci (19q13, 6q24, 12q24, and
5q14) influence the microcirculation in vivo. PLoS Genet. 6, e1001184
23 Teslovich, T.M. et al. (2010) Biological, clinical and population
relevance of 95 loci for blood lipids. Nature 466, 707713
24 Wain, L.V. et al. (2011) Genome-wide association study identifies six
new loci influencing pulse pressure and mean arterial pressure. Nat.
Genet. 43, 10051011
25 Pichler, I. et al. (2011) Identification of a common variant in the TFR2
gene implicated in the physiological regulation of serum iron levels.
Hum. Mol. Genet. 20, 12321240
26 Chambers, J.C. et al. (2009) Genome-wide association study identifies
variants in TMPRSS6 associated with hemoglobin levels. Nat. Genet.
41, 11701172

Trends in Genetics August 2012, Vol. 28, No. 8

27 Coronary Artery Disease (C4D) Genetics Consortium (2011) A genomewide association study in Europeans and South Asians identifies five
new loci for coronary artery disease. Nat. Genet. 43, 339344
28 Ripke, S. et al. (2011) Genome-wide association study identifies five
new schizophrenia loci. Nat. Genet. 43, 969976
29 Simon-Sanchez, J. et al. (2009) Genome-wide association study
reveals genetic risk underlying Parkinsons disease. Nat. Genet. 41,
13081312
30 Yasuno, K. et al. (2010) Genome-wide association study of intracranial
aneurysm identifies three new risk loci. Nat. Genet. 42, 420425
31 Newton-Cheh, C. et al. (2009) Association of common variants in NPPA
and NPPB with circulating natriuretic peptides and blood pressure.
Nat. Genet. 41, 348353
32 Manolio, T.A. et al. (2009) Finding the missing heritability of complex
diseases. Nature 461, 747753
33 Busst, C.J. et al. (2011) The epithelial sodium channel gamma-subunit
gene and blood pressure: family based association, renal gene
expression, and physiological analyses. Hypertension 58, 10731078
34 Raychaudhuri, S. et al. (2009) Identifying relationships among genomic
disease regions: predicting genes at pathogenic SNP associations and
rare deletions. PLoS Genet. 5, e1000534
35 Oikonen, M. et al. (2011) Genetic variants and blood pressure in a
population-based cohort: the Cardiovascular Risk in Young Finns
study. Hypertension 58, 10791085
36 Taal, H.R. et al. (2012) Genome-wide profiling of blood pressure in
adults and children. Hypertension 59, 241247
37 Renigunta, A. et al. (2011) TammHorsfall glycoprotein interacts with
renal outer medullary potassium channel ROMK2 and regulates its
function. J. Biol. Chem. 286, 22242235
38 Boyden, L.M. et al. (2012) Mutations in kelch-like 3 and cullin 3 cause
hypertension and electrolyte abnormalities. Nature 482, 98102
39 Ji, W. et al. (2008) Rare independent mutations in renal salt handling
genes contribute to blood pressure variation. Nat. Genet. 40, 592599
40 Lifton, R.P. et al. (2001) Molecular mechanisms of human
hypertension. Cell 104, 545556
41 Eyre-Walker, A. (2010) Evolution in health and medicine Sackler
colloquium: genetic architecture of a complex trait and its
implications for fitness and genome-wide association studies. Proc.
Natl. Acad. Sci. U.S.A 107 (Suppl. 1), 17521756
42 Xin, M. et al. (2009) MicroRNAs miR-143 and miR-145 modulate
cytoskeletal dynamics and responsiveness of smooth muscle cells to
injury. Genes Dev. 23, 21662178
43 Wang, G. et al. (2010) Intrarenal expression of miRNAs in patients with
hypertensive nephrosclerosis. Am. J. Hypertens. 23, 7884
44 Marques, F.Z. et al. (2011) Gene expression profiling reveals renin
mRNA overexpression in human hypertensive kidneys and a role for
microRNAs. Hypertension 58, 10931098
45 Li, S. et al. (2011) Signature microRNA expression profile of essential
hypertension and its novel link to human cytomegalovirus infection.
Circulation 124, 175184
46 Cheng, J. et al. (2009) Cytomegalovirus infection causes an increase of
arterial blood pressure. PLoS Pathog. 5, e1000427
47 Esler, M.D. et al. (2010) Renal sympathetic denervation in patients
with treatment-resistant hypertension (The Symplicity HTN-2 Trial):
a randomised controlled trial. Lancet 376, 19031909
48 Mu, S. et al. (2011) Epigenetic modulation of the renal beta-adrenergic
WNK4 pathway in salt-sensitive hypertension. Nat. Med. 17, 573580
49 Ellison, D.H. and Brooks, V.L. (2011) Renal nerves, WNK4,
glucocorticoids, and salt transport. Cell Metab. 13, 619620
50 Zhang, D. et al. (2009) Epigenetics and the control of epithelial sodium
channel expression in collecting duct. Kidney Int. 75, 260267
51 Gerszten, R.E. and Wang, T.J. (2008) The search for new
cardiovascular biomarkers. Nature 451, 949952
52 Leitschuh, M. et al. (1991) High-normal blood pressure progression to
hypertension in the Framingham Heart Study. Hypertension 17, 2227
53 Franklin, S.S. et al. (1997) Hemodynamic patterns of age-related
changes in blood pressure. The Framingham Heart Study.
Circulation 96, 308315
54 Burt, V.L. et al. (1995) Prevalence of hypertension in the US adult
population. Results from the Third National Health and Nutrition
Examination Survey, 19881991. Hypertension 25, 305313
55 Kaminer, B. and Lutz, W.P. (1960) Blood pressure in Bushmen of the
Kalahari Desert. Circulation 22, 289295
407

Review
56 Truswell, A.S. et al. (1972) Blood pressures of Kung bushmen in
Northern Botswana. Am. Heart J. 84, 512
57 Poulter, N.R. et al. (1990) The Kenyan Luo migration study:
observations on the initiation of a rise in blood pressure. BMJ 300,
967972
58 Crews, D.E. and Mancilha-Carvalho, J.J. (1993) Correlates of blood
pressure in Yanomami Indians of northwestern Brazil. Ethn. Dis. 3,
362371
59 Carvalho, J.J. et al. (1989) Blood pressure in four remote populations in
the INTERSALT Study. Hypertension 14, 238246
60 Laville, M. et al. (1994) Epidemiological profile of hypertensive disease
and renal risk factors in black Africa. J. Hypertens. 12, 839843
61 Neel, J.V. (1962) Diabetes mellitus: a thrifty genotype rendered
detrimental by progress? Am. J. Hum. Genet. 14, 353362
62 Nakajima, T. et al. (2004) Natural selection and population history in
the human angiotensinogen gene (AGT): 736 complete AGT sequences

408

Trends in Genetics August 2012, Vol. 28, No. 8

63
64
65
66
67

68
69

in chromosomes from around the world. Am. J. Hum. Genet. 74, 898
916
Weder, A.B. (2007) Evolution and hypertension. Hypertension 49, 260265
Young, J.H. et al. (2005) Differential susceptibility to hypertension is
due to selection during the out-of-Africa expansion. PLoS Genet. 1, e82
Pickering, G.W. (1955) The genetic factor in essential hypertension.
Ann. Intern. Med. 43, 457464
Oldham, P.D. et al. (1960) The nature of essential hypertension. Lancet
1, 10851093
Adeyemo, A. et al. (2009) A genome-wide association study of
hypertension and blood pressure in African Americans. PLoS Genet.
5, e1000564
Krzywinski, M. et al. (2009) Circos: an information aesthetic for
comparative genomics. Genome Res. 19, 16391645
Kent, W.J. et al. (2002) The human genome browser at UCSC. Genome
Res. 12, 9961006

August 2012 Volume 28, Number 8 pp. 361418

Editor
Rhiannon Macrae
Executive Editor
Feng Chen

Letter

361

Journal Manager
Basil Nyaku
Journal Administrators
Ria Otten and Patrick Scheffmann
Advisory Editorial Board
K.V. Anderson, New York, USA
A. Clark, Ithaca, USA
G. Fink, Cambridge, USA
W.J. Gehring, Basel, Switzerland
D. Goldstein, Durham, USA
L. Guarente, Cambridge, USA
Y. Hayashizaki, Yokohama, Japan
S. Henikoff, Seattle, USA
J. Hodgkin, Oxford, UK
H.R. Horvitz, Cambridge, USA
L. Hurst, Bath, UK
M. Justice, Houston, USA
E. Koonin, Bethesda, USA
E. Meyerowitz, Pasadena, USA
S. Moreno, Salamanca, Spain
C. Scazzocchio, Orsay, France
J. Smith, Cambridge, UK
M. Takeichi, Kobe, Japan
D. Tautz, Pln, Germany
O. Voinnet, Strasburg, France
Editorial Enquiries
Trends in Genetics
Cell Press

600 Technology Square, 5th floor


Cambridge MA 02139, USA
Tel: +1 617 397 2818
Fax: +1 617 397 2810
E-mail: tig@cell.com

Is forward the same as plus?and other


adventures in SNP allele nomenclature

Sarah C. Nelson, Kimberly F. Doheny,


Cathy C. Laurie and Daniel B. Mirel

Reviews

364

Human limb abnormalities caused by


disruption of hedgehog signaling

Eve Anderson, Silvia Peluso,


Laura A. Lettice and Robert E. Hill

374

Replication timing and its emergence from


stochastic processes

John Bechhoefer and Nicholas Rhind

382

Oxytricha as a modern analog of ancient


genome evolution

Aaron David Goldman and


Laura F. Landweber

389

Regulation of chromatin structure by long


noncoding RNAs: focus on natural antisense
transcripts

Marco Magistri, Mohammad Ali Faghihi,


Georges St Laurent III and
Claes Wahlestedt

397

Genetic basis of blood pressure and


hypertension

Sandosh Padmanabhan,
Christopher Newton-Cheh and
Anna F. Dominiczak

409

Mechanisms of transcriptional precision in


animal development

Mounia Lagha, Jacques P. Bothma and


Michael Levine

Erratum

417

Corrigendum: Human evolutionary


genomics: ethical and interpretive issues.
[Trends in Genetics 28 (2012)137145]

Joseph J. Vitti, Mildred K. Cho,


Sarah A. Tishkoff and Pardis C. Sabeti

Cover: During conjugation, members of the ciliate genus Oxytricha inherit a genome that looks like typical eukaryotic
chromatin but is replete with fragmented and scrambled genes. The subsequent developmental process produces
a rearranged somatic genome containing on the order of twenty million of the shortest known telomere-bearing
chromosomes. On pages 382388, Aaron Goldman and Laura Landweber describe recent progress toward understanding
Oxytrichas genomic dimorphism and discuss its various implications for our understanding of ancient genome evolution and
early life. The cover shows an SEM image of Oxytricha, false-colored with Photoshop, courtesy of Bob Hammersmith.

Review

Mechanisms of transcriptional
precision in animal development
Mounia Lagha1, Jacques P. Bothma2 and Michael Levine1
1
2

Center for Integrative Genomics, Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
Biophysics Graduate Group, Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA

We review recently identified mechanisms of transcriptional control that ensure reliable and reproducible patterns of gene expression in natural populations of
developing embryos, despite inherent fluctuations in
gene regulatory processes, variations in genetic backgrounds and exposure to diverse environmental conditions. These mechanisms are not responsible for
switching genes on and off. Instead, they control the
fine-tuning of gene expression and ensure regulatory
precision. Several such mechanisms are discussed, including redundant binding sites within transcriptional
enhancers, shadow enhancers, and poised enhancers
and promoters, as well as the role of redundant gene
interactions within regulatory networks. We propose
that such regulatory mechanisms provide population
fitness and fine-tune the spatial and temporal control
of gene expression.
Transcriptional precision
The basic mechanisms for switching genes on and off
during development were intensively studied in the
1980s and 1990s. The enhancer was shown to play a key
role in integrating complex regulatory information to generate cell-specific patterns of gene expression [1]. However,
in natural populations enhancerpromoter interactions
can be affected by changes in temperature and variations
in genetic background, but the developmental program
remains unperturbed. What is the basis for this stability
in developmental programming?
Our central premise is that the mechanisms used to
provide stability in gene expression in natural populations also produce greater precision in developmental
patterning mechanisms. By transcriptional precision we
refer to the formation of sharp borders of gene expression, the exact timing of gene activation, coordinate
expression of groups of genes within a developing tissue,
and homogenous expression of a given gene across a field
of coordinately developing cells. The advent of wholegenome technologies and improved imaging methods has
provided recent insights into more subtle aspects of
differential gene activity, namely the reproducible
deployment of developmental programs in natural
populations.

Corresponding author: Levine, M. (mlevine@berkeley.edu).


Keywords: enhancer; paused polymerase; pioneer factors; robustness; gene regulatory
networks

Redundant genetic interactions


Genetic analysis of Drosophila embryogenesis led to a
conceptual breakthrough in our understanding of animal
development [2]. The subdivision of the embryo into a
series of body segments was first envisaged to be a regulatory cascade or genetic pathway, with maternal determinants, such as Bicoid, that establish sequential patterns of
gap gene expression, pair-rule stripes and ultimately,
segment-polarity stripes of gene expression (e.g. [3]). This
view of a sequential pathway gave way to one of gene
networks, whereby both maternal and zygotic activators
and repressors interact with complex enhancers to produce
localized stripes of gene expression [4]. In recent years,
gene networks have been visualized as complex wiringdiagrams [5].
Such networks often contain seemingly redundant
interactions. Moreover, two related transcription factors
are sometimes seen to activate the expression of downstream target genes in the same cells at the same time.
Removal of one copy of the regulatory gene often fails to
produce an obvious or fully penetrant phenotype. Nonetheless, the gene might augment population fitness, which
is what natural selection ultimately acts on.

Glossary
Canalization: a measure of the ability of a population to produce the same
phenotype regardless of fluctuations in its environment, genotype or other
sources of variability. Our use of the term robustness conveys the same
essential meaning.
Enhancer: the predominant regulatory DNA for controlling gene expression. It
has the defining property of driving reporter expression in transgenic assays
from a heterologous promoter.
Gene regulatory network: interacting genes and their associated regulatory
DNAs that are responsible for a specific developmental process such as the
specification of gut or muscle.
Paused polymerase: RNA Pol II that has initiated transcription, but arrests after
producing a small nascent RNA of 3050 nt. The Pol II is ready to go but needs
additional regulators to undergo elongation.
Pioneer factor: a specialized TF (sequence-specific) that binds to nucleosomal
DNA and prepares enhancers for rapid and timely deployment.
Poising: preparing genes for rapid and timely transcription. This can be
achieved by priming the promoter, the enhancer, or both.
Redundancy: two genes are considered to be redundant if they play similar
functions and are able to replace one another. This can be extended to a
genetic interaction or enhancers or binding sites within enhancers. However,
we do not believe in true redundancy. Instead, genes or regulatory DNAs might
appear to possess redundant, or overlapping, activities in the laboratory, but
not in natural populations subject to stress.
Shadow enhancer: an enhancer that is sometimes located far from the gene it
regulates. The term shadow is a metaphor which reflects that, historically,
these distal enhancers tended to be discovered after the proximal/primary
enhancer and in unexpected locations such as in the introns of neighboring
genes.

0168-9525/$ see front matter . Published by Elsevier Ltd. doi:10.1016/j.tig.2012.03.006 Trends in Genetics, August 2012, Vol. 28, No. 8

409

Review

Trends in Genetics August 2012, Vol. 28, No. 8

Wildtype

(a)

End-3 mutant

(b)

skn-1

skn-1

med-1/2

med-1/2

end-3

end-1

end-3

end-1
(Variable)

elt-2

Intestinal differentiation

(Variable) elt-2

Variable intestinal differentiation


TRENDS in Genetics

Figure 1. Redundant interactions in gene regulatory networks. Summary of the genetic cascade governing intestinal cell specification in C. elegans (see ref. [6]). (a) Wildtype network. skn-1 is maternally deposited and, in concert with other maternal and zygotic factors, activates the expression of transcription factors end-3 and end-1, both of
which activate elt-2, the key regulator of intestine differentiation. (b) In end-3 mutants, end-1 can compensate and intestine differentiation is essentially normal. However,
end-1 expression becomes significantly more variable, resulting in erratic expression of elt-2 and abnormal intestine differentiation in some individuals.

Redundant interactions in gene regulatory networks


have been suggested to provide stability and precision in
metazoan development [5]. An illustrative example is seen
for gut specification in Caennorhabditis elegans [6]. The
intestine is composed of 20 cells that arise from a single
progenitor in the early embryo. Intestinal identity is specified by a simple regulatory network, beginning with the
maternal deposition of skn-1 transcripts and culminating
in the expression of elt-2, which activates hundreds of
target genes required for gut differentiation (Figure 1a).
The activation of elt-2 depends on two related transcription factors, end-1 and end-3, which function in a largely
redundant fashion. The consequences of disrupting either
end-1 or end-2 gene activity have been examined, and
evidence was obtained for increased noise in gut specification from measurements of single mRNAs in individual
embryos [6]. In particular, end-3 mutants show variability
in both the timing and levels of elt-2 expression (Figure 1b),
which might explain why 5% of end-3 mutants lack intestinal cells. Similarly, the overlapping activities of two
T-box transcription factors, tbx-8 and tbx-9, appear to
buffer stochastic variations in muscle differentiation [7].
These results suggest that redundant gene interactions
within developmental networks can stabilize gene expression in natural populations. Such redundancy might also
play a key role in ensuring transcriptional precision. That
is, the combination of end-1 plus end-3 might ensure the
precise timing and exact levels of elt-2 expression. There
are numerous examples of potential redundancies in gene
networks (e.g. [5]). Are these required for developmental
patterning, or do they represent a means to stabilize
complex processes despite genetic and environmental variations? These are not mutually exclusive concepts.
Intra-enhancer redundancy
A typical developmental enhancer is several hundred bp in
length and contains multiple binding sites for two or more
410

sequence-specific transcription factors (reviewed in [1]).


Some of the binding sites appear to be redundant, in that
mutations in a subset of the sites do not qualitatively alter
the expression patterns produced by the modified enhancers (e.g. [8]). What is the purpose of these extra sites,
which are often highly conserved? Evidence is gathering
that in some cases they ensure robustness or stability in
response to genetic and environmental variation. A recent
analysis of the eve stripe 2 enhancer provides a particularly
compelling example [9] (Figure 2).
The full-length eve stripe 2 enhancer is over 700 bp in
length and contains several binding sites for each of four
key regulators: Bicoid, Hunchback, Kruppel, and Giant
[10,11]. It produces a robust and authentic stripe 2 pattern
when attached to a reporter gene and expressed in transgenic Drosophila embryos. Removal of 200 bp from the 30
end of the enhancer, which contains several TF binding
sites, diminishes the levels of expression, but the resulting
500 bp minimal enhancer produces an essentially normal
pattern of expression [11]. BAC transgenesis and genetic
complementation assays have been used to examine the
contributions of the minimal enhancer and 30 extension
[9].
Removal of the 500 bp minimal enhancer from a rescuing BAC transgene results in lethality due to a severely
diminished stripe 2 pattern. Mutant eve/eve embryos
carrying this BAC fail to hatch due to defects in the first
thoracic segment. Interestingly, removal of the 200 bp 30
extension does not cause lethality under optimal culture
conditions, and the viability of these flies is comparable to
that of wild-type flies. These results suggest that the
minimal 500 bp eve stripe 2 enhancer is sufficient for
segmentation, at least in the absence of environmental
stress. However, there is a breakdown in the function of the
minimal enhancer at elevated temperatures and in sensitized genetic backgrounds. Thus, binding sites in the
30 extension are not redundant under all conditions, but

Review

Trends in Genetics August 2012, Vol. 28, No. 8

eve stripe 2 enhancer

(a)
eve BAC

Minimal enhancer

Extension

eve

Normal

Redundant binding sites

(b)
eve BAC,
no minimal stripe 2

eve

211bp

Non viable

(c)
eve BAC,
no extension

Normal at optimal conditions


554bp
480bp

eve
Under stress : reduced viability
TRENDS in Genetics

Figure 2. Importance of redundant binding sites for robustness. (a) Diagram of a BAC transgene containing the entire eve locus, including 50 and 30 stripe enhancers. Only
the stripe 2 regulatory region is shown. The full-length enhancer contains both the minimal 500 bp enhancer (green) and 200 bp 30 extension (blue). The yellow ovals
represent a subset of the TF binding sites in the stripe 2 regulatory DNA. (b) Removal of the minimal eve stripe 2 enhancer results in lethality, and embryos die with defects
in the thorax (derived from the region of stripe 2 expression). (c) Removal of the 30 extension does not impair embryogenesis under optimal culturing conditions, and
normal adult flies are obtained. However, under genetic stress, only 5% of the flies survive. Thus, redundant binding sites in the 30 extension are required for robustness.

instead they appear to ensure reliable expression of eve


stripe 2 under stress (Figure 2). This is likely to be a
general mechanism of robustness or canalization in development ([9,12,13] for a definition of canalization). Socalled redundant binding sites in developmental enhancers are probably used in natural populations to cope with
variability.
Shadow enhancers
A related mechanism for ensuring robustness is the use of
multiple enhancers for a single pattern of gene expression. A
variety of recently developed whole-genome assays (Box 1)
permit the systematic identification of developmental

enhancers (e.g. [1416]). Such approaches suggest that


many of the crucial developmental patterning genes in
Drosophila are regulated by multiple enhancers that direct
extensively overlapping patterns of gene expression and
employ a similar regulatory logic (e.g. [17]). The newly
identified enhancers are sometimes termed shadow enhancers because they map to more remote locations than the
classical or primary enhancers situated close to the gene
[18,19]. Several examples are discussed below.
The shavenbaby locus (also known as ovo) is important
for the specification of dorsal hairs in the cuticle of embryos
and larvae [20]. It is regulated by a complex array of
enhancers with extensively overlapping activities. It is

Box 1. Whole-genome identification of enhancers


During the past 10 years a variety of post-genome methods have
been devised for the systematic identification of enhancers
(which can exist both 50 or 30 of the gene or within the transcription
unit). Transgenic assays are required to confirm their identities.
Putative enhancers are attached to a minimal promoter and reporter
gene, and introduced (via injection or electroporation) into a
developing embryo. Either stable or transient transgenic embryos
are assayed for reporter gene expression. Below we provide a brief
review of some post-genome methods for identifying putative
enhancers.
Computational methods: enhancers often contain a high density of
transcription factor binding sites, typically one for every 3050 bp
across the length of the enhancer (200300 bp or more). Algorithms
have been developed for identifying high-density clusters of putative
binding sites [58,59]. These methods work, but typically only 1030%
of hits represent authentic enhancers when tested in transgenic
embryos.
ChIP-Seq: permits the genome-wide identification of binding sites
for sequence-specific transcription factors, or histone modifications
(e.g. [40,41]). ChIP-Seq using antibodies against early Drosophila
patterning determinants (e.g. Dorsal, Twist and Snail) led to the
identification of shadow enhancers for a number of genes engaged in

dorsalventral patterning [17]. In some systems it has been possible


to identify active enhancers on a genome-wide scale for a given tissue
by identifying particular histone modifications, or the enzymes
responsible for these modifications (e.g. [16]).
Chromosome conformation capture (3C) assays: can identify the
sequences in a genome that interact with specific promoters. It relies
on the stabilization of transient loops of distal enhancers to target
promoters using formaldehyde cross-linking, similar to the chromatin
cross-linking used for ChIP-Seq assays. 4C (chromosome conformation capture-on-chip) methods were used to identify multiple and
overlapping enhancers for the regulation of Hoxd genes in the mouse
limb bud [25]. 3C and 4C assays provide an estimate of the overall
interactions that occur in vivo but do not reveal the dynamics of these
long-range interactions.
MNase-Seq and FAIRE assays: micrococcal nuclease (MNase)
induces double-strand breaks within nucleosome linker regions
and single-strand nicks within the nucleosome and can be used to
identify nucleosome-free regions. In some cases, these regions
coincide with poised enhancers due to the binding of pioneer
transcription factors (e.g. [60]). FAIRE (formaldehyde-assisted isolation of regulatory elements) also identifies nucleosome-free regions,
or open chromatin [61].
411

Review

Trends in Genetics August 2012, Vol. 28, No. 8

(a)
Enhancer

Enhancer

(b)

10% failure

10% failure

Enhancer

10% failure

1% failure

TRENDS in Genetics

Figure 3. Model for enhancer synergy. (a) Schematic showing that the primary and shadow enhancers (green boxes) possess the same regulatory logic (TF binding sites are
illustrated by colored circles). (b) To activate transcription, an enhancer loops to its cognate promoter. This interaction has a typical failure rate of 10%. In the presence of
two enhancers regulating the same gene at the same time (primary and shadow), the combined failure rate is 1% (10% x 10% = 1%). This assumes that the two enhancers
work independently of one another.

possible to remove some of these enhancers and still obtain


essentially normal cuticle patterns at optimal temperatures. However, these patterns are disrupted when the
embryos are grown at either low (15 8C) or elevated (30 8C)
temperatures. Moreover, normal embryos are resilient to
genetic changes, such as reductions in the levels of Wingless, but produce abnormal cuticles upon removal of shavenbaby shadow enhancers. Thus, the shadow enhancers
ensure reliable expression when embryos are subject to
genetic and environmental variation.
A similar situation is seen for the regulation of snail,
which encodes a zinc finger transcription factor that establishes the boundary between the presumptive mesoderm
and neurogenic ectoderm [12]. The snail gene is regulated
by a proximal enhancer located near the transcription start
site, as well as by a recently identified shadow enhancer
located 5 kb upstream of the start site within the first
intron of a neighboring gene. Quantitative imaging assays
and genetic complementation experiments suggest that
the two enhancers ensure reliable and uniform activation
of snail expression in embryos containing only one maternal dose of Dorsal, or when subject to high temperatures
(30 8C) [12,21]. Removal of either enhancer, particularly
the distal shadow enhancer [21], causes defects in gastrulation under adverse conditions.
Shadow enhancers have also been implicated in vertebrate developmental processes. For example, the neurogenic regulatory gene, ATOH7 (Math5), is essential for the
development of the mammalian retina [22]. A genetic
disease causing blindness at birth (nonsyndromic congenital retinal nonattachment) results from the deletion of a
remote shadow enhancer located more than 20 kb away
from the ATOH7 transcription unit [23]. The shadow
enhancer directs a very similar spatiotemporal pattern
of gene expression as the primary proximal enhancer in
the developing retina of a mouse. This result suggests that
the primary enhancer alone cannot sustain sufficient levels
of ATOH7 expression for normal development in the absence of the shadow enhancer. Thus, the two enhancers
seem to be redundant in terms of the location and timing of
the expression patterns they direct, but both are required
to reinforce ATOH7 expression and achieve correct levels of
expression during crucial stages of eye development.
There are additional examples of multiple enhancers for
key vertebrate patterning genes. For example, deletion of a
412

limb enhancer of the paired-box homeodomain transcription factor Prx has no obvious effect on Prx expression
levels or on limb development in mice [24], suggesting the
existence of additional, shadow enhancers. More recently,
4C assays (Box 1) identified multiple putative enhancers
for Hoxd13 expression within a distal gene desert that
contains known regulatory elements, GCR and Prox [25].
Deletions of GCR and Prox have little effect on Hoxd13
expression in digits, thereby suggesting the occurrence of
redundant regulatory elements. Indeed, complete abolition
of Hoxd13 expression in digits is achieved only when the
gene desert, together with the GCR and Prox regions, are
completely deleted (830 kb deletion).
The preceding examples suggest that multiple enhancers represent a simple means for improving the reliability
of gene expression. The underlying mechanism is uncertain, but they might increase the probability of gene activation at any given time during critical windows of
development and make it more robust to perturbation.
For example, if a typical enhancer has a 10% failure rate
to loop and engage its target promoter, and if the proximal
and distal enhancers function more or less independently
of one another, then there is a combined failure rate of only
1% (e.g. [12]). That is, two enhancers function in an inherently multiplicative manner to activate gene expression
(Figure 3). Such a mechanism also provides robustness.
For example, if the failure rate of each individual enhancer
increases to 30% due to stress, then the combined failure
rate is only 9%.
An alternative explanation is that multiple enhancers
ensure high levels of expression above a minimal threshold
required for genetic function (as suggested in the case of
ATOH7 regulation). In reality, multiple enhancers could be
important both for the reliable activation of gene expression and for maintaining high levels of expression. We still
do not understand the details of how an enhancer switches
on a gene and affects levels of expression, and therefore
this is very much an open question. The source of shadow
enhancers is uncertain, but it has been proposed that they
might arise from cryptic duplication events [18].
Rendering genes poised for activation
Timing is crucial in development, and recent studies have
identified several mechanisms that ensure faithful activation of gene expression upon receipt of key inducing signals.

Review

Trends in Genetics August 2012, Vol. 28, No. 8

+1
Enhancer

Promoter

Poised enhancer

Exon

Poised promoter

Key:

DSIF Nelf

Pioneer TF (ex: FoxA, Zelda?)

ser-5P

mRNA (30nt)

Pol II
Pol II
Nucleosome free paused promoter

Multiple chromatin marks

Nucleosome

TRENDS in Genetics

Figure 4. Summary of mechanisms of transcriptional priming. Gene transcription depends on enhancers (blue) and promoters (purple). The transcription start site (TSS) is
indicated by an arrow labeled +1. The promoter can be primed or poised for transcription by the recruitment of Pol II before gene expression. This promoter pausing
generates a small mRNA (around 3050 nt) and then elongation is blocked by the binding of negative elongation factors such as Nelf and DSIF. The enhancer can be
prepared for activation by the binding of pioneer factors (represented by gray boxes), by recruitment of Pol II, or by the modification of the chromatin landscape
(positioned nucleosomes and associated histone marks). These three features at enhancers may be linked, but for simplicity we illustrate them sequentially. Nucleosomes
are represented by hexagons and histone marks with colored flags. A simplified scheme of a paused promoter is represented in the gray box.

We consider mechanisms that optimize induction of distal


enhancers and the core promoter (Figure 4). In some cases,
both are primed for efficient activation.
Paused promoters
Many metazoan genes contain paused RNA polymerase II
(Pol II) prior to their activation [2628]. This paused Pol II

is an active form of the enzyme that halts 3050 bp


downstream of the +1 transcription start site (Figure 4;
Box 2). It is present in 30% of all genes in embryonic stem
cells and about 15% of genes in the early Drosophila
embryo [26,29,30]. The purpose is uncertain, but many
developmental patterning genes contain paused Pol II. It
has been suggested that it fosters rapid and synchronous

Box 2. Methods for identifying paused promoters


Many developmental patterning genes contain paused Pol II before
their activation during Drosophila embryogenesis (reviewed in [27]).
There is also evidence that a significant number of inactive or weakly
expressed genes contain paused Pol II in mammalian tissues,
including embryonic stem cells. Several different methods have been
used to identify paused genes, as summarized below.
Pol II ChIP-Seq assays: the simplest method is the genome-wide
identification of Pol II binding. This is typically done with a mixture of
antibodies recognizing different isoforms of Pol II (e.g. nonphosphorylated, ser-5P, ser-2P). Active genes contain Pol II extending
signals across the length of their transcription units. Inactive genes
fall into two classes: those completely lacking Pol II and those
containing Pol II near the +1 transcription start site (e.g. [26]). These
latter genes can be regarded as stalled or provisionally paused.
However, it is unclear whether Pol II has engaged the DNA template
and undergone promoter escape, or if the signals detected in the
promoter region represent an equilibrium of unstable Pol II associating and dissociating from the template. Additional methods are
required to determine whether Pol II is truly paused, that is, activated
polymerase containing a capped nascent transcript and arresting
3050 bp downstream of +1.
Permanganate protection assays: stably paused Pol II is associated
with a transcription bubble of 20 bp due to the local denaturation

of the double helix by the active polymerase. It is possible to detect


the bubble by the modification of exposed, single-stranded thymidine
residues with potassium permanganate. This method has been used
to identify transcription bubbles for a number of genes containing
stalled Pol II in Drosophila embryos and cultured S2 cells [62].
Direct sequencing: small nuclear RNAs containing 50 caps are
isolated, cloned, and then subjected to deep sequencing [63]. This
method identified +34 as a common site of paused Pol II, with the DPE
(downstream promoter element) or PB (pause button) motifs being
the last nucleotides transcribed before arrest. A significant fraction of
paused genes contain GAGA, INR, and DPE/PB motifs within or near
their core promoters.
Gro-Seq assays: this has emerged as the method of choice for the
systematic identification of paused Pol II [30,64]. However, it is not for
the faint of heart. The method is a whole-genome nuclear run-on assay.
Nuclei are harvested from embryos, tissues, or cultured cells, and
treated with Sarkosyl to block de novo binding of Pol II. A modified
nucleotide (e.g. bromouridine) is added along with a mixture of ATP
and other agents to permit the elongation of pre-existing polymerases
already engaged on DNA templates. These polymerases are allowed to
extend 50100 nucleotides; the RNAs are then isolated using antibromo antibodies and subjected to deep sequencing. The resulting
sequence information provides the exact locations of paused Pol II.
413

Review
activation of gene expression [31]. The idea is that regulating Pol II release, rather than recruitment, permits
rapid induction of gene expression. This hypothesis has
been explored using detailed mathematical modeling of
transcription [32], but it still remains to be tested experimentally.
A nonexclusive alternative view is that paused Pol II is
involved in recruiting chromatin-modifying enzymes that
expedite transcription. For example, the chromatin landscape of the Hsp70 locus (the prototypic paused gene in
Drosophila) is rapidly altered following heat shock, through
a mechanism independent of transcription [33]. This rapid
change is key to the effective activation of Hsp70 expression
upon heat shock. Moreover, there is an inverse correlation
between paused Pol II and positioned nucleosomes at the
core promoter [34,35]. An increase in positioned nucleosomes has been observed upon destabilization of paused
Pol II (e.g. NelfE knockdown in S2 cells) [34]. Conversely,
diminished levels of the Polycomb repressor (in esc mutant
embryos) correlates with augmented levels of paused Pol II
[35]. It would appear that the promoter regions of developmentally regulated genes contain either paused Pol II or
positioned nucleosomes, but the basis for this regulatory
switch is uncertain.
These studies raise the possibility that paused Pol II
might prepare genes for activation by establishing an
open configuration at the promoter. However, this possibility has not yet been critically tested.
Poised enhancers
There is also evidence that enhancers can be prepared for
rapid deployment before gene activation (Figure 4). For
example, the forkhead transcription factor FoxA binds to
the Albumin enhancer in the primitive endoderm of mouse
embryos where it is inactive (reviewed in [36]). FoxA is an
example of a pioneer factor [37]; it binds to inactive
enhancers and renders them poised for rapid induction
upon the appearance of key activators, such as those
mediating cell signaling.
To bind inactive enhancers, pioneer factors have the
defining property of binding to nucleosomal DNA and
compact chromatin, and remain bound even during mitosis. Since the initial discovery of FoxA and GATA factors as
pioneer factors in the liver differentiation program, additional examples have been described [38,39].
Zelda is a maternal zinc finger transcription factor that
is essential for the activation of 100 genes 23 h after
fertilization during Drosophila embryogenesis (maternal
to zygotic transition) [4042]. It binds to the enhancer
regions of many or most developmental control genes
before their activation. Disrupting Zelda binding sites
can delay the onset of expression, or cause sporadic patterns of activation [40,41]. Thus, Zelda renders developmental enhancers poised for activation by maternal
determinants such as Bicoid and Dorsal, and may function
as a pioneer factor. It might also help ensure reliable
patterns of gene activation in natural populations under
stress, but this idea has not yet been tested.
The mechanisms by which pioneer factors prepare
enhancers for efficient activation are not known. It has
been suggested that they can displace nucleosomes and
414

Trends in Genetics August 2012, Vol. 28, No. 8

thereby render adjacent binding sites available for occupancy [36,38]. A nonexclusive possibility is that pioneer
factors recruit chromatin-modifying enzymes that mark
enhancers for rapid deployment. For example, inactive
liver and pancreas enhancers exhibit active chromatin
modifications in the mouse foregut endoderm where they
are inactive [36]. This suggests pre-patterning of the
enhancers in progenitor tissues before their induction in
the liver and pancreas. The P300 histone acetyltransferase
and the EZH2 histone methyltransferase have been implicated in these modifications [43]. It is conceivable that such
modifications are not strictly required for gene expression,
but might improve the precision and stability of gene
expression in natural populations.
More recently it has been suggested that histone modifications and Pol II help to prime distal enhancers [44]
(Figure 4). In this study, whole-genome Chip-Seq assays
were performed on isolated tissues obtained from staged
Drosophila embryos. The timing of gene expression correlated with Pol II binding and two types of chromatin marks
in enhancers. Pol II occupancy at enhancers is counterintuitive, but multiple studies, in human ES cells [45] and
mice [45,46], suggest that enhancers can be bound by Pol II
and are sometimes transcribed. Additional members of the
general transcription machinery, such as the TATA binding protein TAF3 [47], are also seen at particular enhancers. It was suggested that these factors might foster
looping interactions between distal enhancers and promoters, but it is currently unclear how Pol II and associated
factors might render enhancers poised for activation. It is
possible that they are recruited to enhancers by pioneer
TFs, but this idea awaits further studies.
When stochastic expression is purposeful
Many developmental patterning genes in Drosophila contain paused Pol II, shadow enhancers, or both. We have
discussed how these mechanisms might foster the precision and stability of gene expression in development. However, there are examples of developmental control genes
that exhibit sporadic or stochastic patterns of expression.
Some might exhibit such expression because there is no
selective pressure for them to be expressed in a precise and
synchronous manner. However, there are cases where
stochastic expression is used as a purposeful strategy for
generating regulatory diversity among the cells of a population [48]. One of the most striking examples is seen in the
eye of the adult fly [4951].
Color vision depends on the differential expression of
rhodopsin-3 (Rh3) and Rh4 in the R7 photoreceptor cells
and the differential expression of Rh5 and Rh6 in the R8
photoreceptor cells. These differential patterns depend on
stochastic expression of spineless, which encodes a homeobox transcription factor that activates Rh4 in R7 [52].
Approximately 70% of the ommatidia express spineless,
but the patterns of activation differ among adult flies.
When spineless is expressed, Rh4 is activated in R7; if
not, Rh3 is expressed instead. The identity of these distinct
classes of R7 cells dictates the identities of the underlying
R8 cells. When spineless and Rh4 are expressed in R7, then
Rh6 is expressed in the associated R8 cell. Conversely,
when spineless is absent and Rh3 is expressed in R7, then

Review
Box 3. Outstanding questions
 How do multiple enhancers provide precision in gene expression:
do they increase the levels or probability of expression?
 Are genes with multiple enhancers more or less evolvable? Do
shadow enhancers increase the probability of evolving novel gene
activities?
 How do pioneer factors prime enhancers?
 How does paused Pol II prime the promoter?
 When are imprecise, stochastic modes of gene activation
advantageous in development?

Rh5 is expressed in the associated R8 cell. Thus, diverse


patterns of rhodopsin expression are achieved by the stochastic expression of spineless. The underlying mechanism
is uncertain.
There are other examples of the imporatance of stochastic expression in the control of developmental genes. Notably, Nanog, one of the key determinants of pluripotent
stem cells, exhibits stochastic expression in cultured ES
cells and in early mouse embryos [53,54]. There is a
correlation between elevated levels of Nanog expression
and self-renewal of pluripotent stem cells in culture
[55,56]. By contrast, low levels correlate with a propensity
for the cells to differentiate.
Concluding remarks
The preceding examples are probably exceptional. We
believe that most regulatory genes are primed for rapid
and precise deployment during development. Several
mechanisms were discussed, including redundancies in
gene networks and developmental enhancers, shadow
enhancers, and primed promoters and enhancers (via
paused Pol II and pioneer TFs). There is little doubt that
additional mechanisms await discovery (Box 3).
There is something of a chicken and egg issue that we
have skirted. Namely, what is the source of these mechanisms of developmental precision? It is conceivable that
they arose from the demands of natural populations,
namely, to stabilize complex developmental processes in
response to inherent (genetic) and extrinsic (environmental) fluctuations. Alternatively, they might have arisen
from the demands of the embryo, to produce timely and
dynamic on/off patterns of gene expression underlying cell
specification processes. These are not mutually exclusive
concepts. A regulatory mechanism selected to provide
stability in natural populations (e.g. shadow enhancer)
might be incorporated into the core patterning process to
produce sharper borders of gene expression [12] or homogenous patterns of activation [57]. Conversely, a mechanism selected for developmental precision (e.g. paused Pol
II) might foster robustness of expression in natural populations. We suggest that the dynamic interplay between
the demands of natural populations and the embryo has
produced the exquisite patterning processes that underlie
animal development.
References
1 Levine, M. (2010) Transcriptional enhancers in animal development and
evolution. Curr. Biol. 20, R754R763
2 Nusslein-Volhard, C. and Wieschaus, E. (1980) Mutations affecting
segment number and polarity in Drosophila. Nature 287, 795801

Trends in Genetics August 2012, Vol. 28, No. 8

3 Nusslein-Volhard, C. and Roth, S. (1989) Axis determination in insect


embryos. Ciba Found. Symp. 144, 3755
4 Ip, Y.T. et al. (1992) The bicoid and dorsal morphogens use a similar
strategy to make stripes in the Drosophila embryo. J. Cell Sci. 16
(Suppl.), 3338
5 Davidson, E.H. (2009) Network design principles from the sea urchin
embryo. Curr. Opin. Genet. Dev. 19, 535540
6 Raj, A. et al. (2010) Variability in gene expression underlies incomplete
penetrance. Nature 463, 913918
7 Burga, A. et al. (2011) Predicting mutation outcome from early
stochastic variation in genetic interaction partners. Nature 480, 250
253
8 Arnosti, D.N. et al. (1996) The eve stripe 2 enhancer employs multiple
modes of transcriptional synergy. Development 122, 205214
9 Ludwig, M.Z. et al. (2011) Consequences of eukaryotic enhancer
architecture for gene expression dynamics, development, and fitness.
PLoS Genet. 7, e1002364
10 Stanojevic, D. et al. (1991) Regulation of a segmentation stripe by
overlapping activators and repressors in the Drosophila embryo.
Science 254, 13851387
11 Small, S. et al. (1992) Regulation of even-skipped stripe 2 in the
Drosophila embryo. EMBO J. 11, 40474057
12 Perry, M.W. et al. (2010) Shadow enhancers foster robustness of
Drosophila gastrulation. Curr. Biol. 20, 15621567
13 Waddington, C.H. (1942) Canalization of development and the
inheritance of acquired characters. Nature 150, 563565
14 Zinzen, R.P. et al. (2009) Combinatorial binding predicts spatiotemporal cis-regulatory activity. Nature 462, 6570
15 He, Q. et al. (2011) High conservation of transcription factor binding
and evidence for combinatorial regulation across six Drosophila
species. Nat. Genet. 43, 414420
16 May, D. et al. (2011) Large-scale discovery of enhancers from human
heart tissue. Nat. Genet. 44, 8993
17 Zeitlinger, J. et al. (2007) Whole-genome ChIP-chip analysis of Dorsal,
Twist, and Snail suggests integration of diverse patterning processes in
the Drosophila embryo. Gene Dev. 21, 385390
18 Hong, J.W. et al. (2008) Shadow enhancers as a source of evolutionary
novelty. Science 321, 1314
19 Barolo, S. (2011) Shadow enhancers: frequently asked questions about
distributed cis-regulatory information and enhancer redundancy.
BioEssays 34, 135141
20 Frankel, N. et al. (2010) Phenotypic robustness conferred by apparently
redundant transcriptional enhancers. Nature 466, 490493
21 Dunipace, L. et al. (2011) Complex interactions between cis-regulatory
modules in native conformation are critical for Drosophila snail
expression. Development 4084, 40754084
22 Riesenberg, A.N. et al. (2009) Rbpj cell autonomous regulation of
retinal ganglion cell and cone photoreceptor fates in the mouse
retina. J. Neurosci. 29, 1286512877
23 Ghiasvand, N.M. et al. (2011) Deletion of a remote enhancer near
ATOH7 disrupts retinal neurogenesis, causing NCRNA disease. Nat.
Neurosci. 14, 578586
24 Cretekos, C.J. et al. (2008) Regulatory divergence modifies limb length
between mammals. Gene Dev. 22, 141151
25 Montavon, T. et al. (2011) A regulatory archipelago controls Hox genes
transcription in digits. Cell 147, 11321145
26 Zeitlinger, J. et al. (2007) RNA polymerase stalling at developmental
control genes in the Drosophila melanogaster embryo. Nat. Genet. 39,
15121516
27 Levine, M. (2011) Paused RNA polymerase II as a developmental
checkpoint. Cell 145, 502511
28 Li, J. and Gilmour, D.S. (2011) Promoter proximal pausing and the
control of gene expression. Curr. Opin. Genet. Dev. 21, 231235
29 Guenther, M.G. et al. (2007) A chromatin landmark and transcription
initiation at most promoters in human cells. Cell 130, 7788
30 Min, I.M. et al. (2011) Regulating RNA polymerase pausing and
transcription elongation in embryonic stem cells. Gene Dev. 25, 742754
31 Boettiger, A.N. and Levine, M. (2009) Synchronous and stochastic
patterns of gene activation in the Drosophila embryo. Science 325,
471473
32 Boettiger, A.N. et al. (2011) Transcriptional regulation: effects of
promoter proximal pausing on speed, synchrony and reliability.
PLoS Comput. Biol. 7, e1001136
415

Review
33 Petesch, S.J. and Lis, J.T. (2008) Rapid, transcription-independent loss
of nucleosomes over a large chromatin domain at Hsp70 loci. Cell 134,
7484
34 Gilchrist, D.A. et al. (2010) Pausing of RNA polymerase II disrupts
DNA-specified nucleosome organization to enable precise gene
regulation. Cell 143, 540551
35 Chopra, V.S. et al. (2011) The Polycomb group mutant esc leads to
augmented levels of paused Pol II in the Drosophila embryo. Mol. Cell.
42, 837844
36 Zaret, K.S. and Carroll, J.S. (2011) Pioneer transcription factors:
establishing competence for gene expression. Gene Dev. 25, 22272241
37 Watts, J.A. et al. (2011) Study of FoxA pioneer factor at silent genes
reveals Rfx-repressed enhancer at Cdx2 and a potential indicator of
esophageal adenocarcinoma development. PLoS Genet. 7, e1002277
38 Magnani, L. et al. (2011) Pioneer factors: directing transcriptional
regulators within the chromatin environment. Trends Genet. 27,
465474
39 Fakhouri, T.H.I. et al. (2010) Dynamic chromatin organization during
foregut development mediated by the organ selector gene pha-4/FoxA.
PLoS Genet. 6, e1001060
40 Liang, H.L. et al. (2008) The zinc-finger protein Zelda is a key activator
of the early zygotic genome in Drosophila. Nature 456, 400403
41 Nien, C.Y. et al. (2011) Temporal coordination of gene networks by
Zelda in the early Drosophila embryo. PLoS Genet. 7, e1002339
42 Harrison, M.M. et al. (2011) Zelda Binding in the early Drosophila
melanogaster embryo marks regions subsequently activated at the
maternal-to-zygotic transition. PLoS Genet. 7, e1002266
43 Xu, C.R. et al. (2011) Chromatin prepattern and histone modifiers in a
fate choice for liver and pancreas. Science 332, 963966
44 Bonn, S. et al. (2012) Tissue-specific analysis of chromatin state
identifies temporal signatures of enhancer activity during embryonic
development. Nat. Genet. 44, 148156
45 Rada-Iglesias, A. et al. (2011) A unique chromatin signature uncovers
early developmental enhancers in humans. Nature 470, 279283
46 De Santa, F. et al. (2010) A large fraction of extragenic RNA pol II
transcription sites overlap enhancers. PLoS Biol. 8, e1000384
47 Liu, Z. et al. (2011) Control of embryonic stem cell lineage commitment
by core promoter factor, TAF3. Cell 146, 720731
48 Eldar, A. and Elowitz, M.B. (2010) Functional roles for noise in genetic
circuits. Nature 467, 167173
49 Vasiliauskas, D. et al. (2011) Feedback from rhodopsin controls
rhodopsin exclusion in Drosophila photoreceptors. Nature 479, 108112

416

Trends in Genetics August 2012, Vol. 28, No. 8

50 Johnston, R.J. et al. (2011) Interlocked feedforward loops control celltype-specific rhodopsin expression in the Drosophila eye. Cell 145, 956
968
51 Jukam, D. and Desplan, C. (2010) Binary fate decisions in
differentiating neurons. Curr. Opin. Neurobiol. 20, 613
52 Wernet, M.F. et al. (2006) Stochastic spineless expression creates the
retinal mosaic for colour vision. Nature 440, 174180
53 Dietrich, J.E. and Hiiragi, T. (2007) Stochastic patterning in the mouse
pre-implantation embryo. Development 134, 42194231
54 Silva, J. and Smith, A. (2008) Capturing pluripotency. Cell 132, 532
536
55 Kalmar, T. et al. (2009) Regulated fluctuations in nanog expression
mediate cell fate decisions in embryonic stem cells. PLoS Biol. 7,
e1000149
56 Glauche, I. et al. (2010) Nanog variability and pluripotency regulation
of embryonic stem cellsinsights from a mathematical model analysis.
PLoS ONE 5, e11238
57 Perry, M.W. et al. (2011) Multiple enhancers ensure precision of gap
gene-expression patterns in the Drosophila embryo. Proc. Natl. Acad.
Sci. U.S.A. 108, 1357013575
58 Berman, B.P. et al. (2002) Exploiting transcription factor binding site
clustering to identify cis-regulatory modules involved in pattern
formation in the Drosophila genome. Proc. Natl. Acad. Sci. U.S.A.
99, 757762
59 Markstein, M. et al. (2002) Genome-wide analysis of clustered Dorsal
binding sites identifies putative target genes in the Drosophila embryo.
Proc. Natl. Acad. Sci. U.S.A. 99, 763768
60 Valouev, A. et al. (2011) Determinants of nucleosome organization in
primary human cells. Nature 474, 516520
61 Giresi, P.G. and Lieb, J.D. (2009) Isolation of active regulatory
elements from eukaryotic chromatin using FAIRE (formaldehyde
assisted isolation of regulatory elements). Methods 48, 233239
62 Gilmour, D.S. and Fan, R. (2009) Detecting transcriptionally engaged
RNA polymerase in eukaryotic cells with permanganate genomic
footprinting. Methods 48, 368374
63 Nechaev, S. et al. (2010) Global analysis of short RNAs reveals
widespread promoter-proximal stalling and arrest of Pol II in
Drosophila. Science 327, 335338
64 Core, L.J. et al. (2008) Nascent RNA sequencing reveals widespread
pausing and divergent initiation at human promoters. Science 322,
18451848

Letter

Is forward the same as plus?. . . and other


adventures in SNP allele nomenclature
Sarah C. Nelson1, Kimberly F. Doheny2, Cathy C. Laurie1 and Daniel B. Mirel3
1

Genetics Coordinating Center, Department of Biostatistics, University of Washington, Seattle, WA, USA
Center for Inherited Disease Research, Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
3
Broad Institute (Massachusetts Institute of Technology/Harvard), Cambridge, MA, USA
2

In the accelerating and expanding field of research on


genetic variation, it has become standard practice to work
with a combination of datasets generated by multiple
research groups at different times and by different methods. Synthesizing these data is important for genotype
imputation, meta-analysis, and other applications, but
may be difficult because alleles are typically observed
and recorded on only one of the two DNA strands in
genotyping and sequencing experiments. Different nomenclatures have arisen to designate strand orientation when
reporting single nucleotide polymorphism (SNP) genotypes, but they are neither widely understood nor uniformly applied. Here we define the most common allele strand
orientation nomenclatures and provide guidance in achieving strand consistency.
The majority of SNPs are strand unambiguous, such
that genotypes called on different strands are readily
identifiable (e.g., A/G alleles on one strand are T/C alleles
on the opposite strand). However, determining strand
orientation at strand ambiguous SNPs is more complicated, where alleles are symmetrical across strands (A/T and
C/G). It is assumed that all researchers, as a minimum for
consistency, report the two alleles of a biallelic SNP on the
same strand. It is the choice and the definition of which
strand is used that leads to ambiguity. Generally, SNP
alleles are reported for a single strand designated in one of
four strand naming conventions: probe/target, plus/minus, TOP/BOT, and forward/reverse, defined as follows.
Probe/target
When SNPs are assayed with a site-specific probe, one of
the two strands corresponds to (i.e., is collinear with) the
probe sequence itself, and the other to the complementary
genomic target sequence that flanks or spans the SNP site.
Sometimes the probe strand is called the design strand (in
reference to assay design). Although the specifics vary
between platforms, alternative alleles at a SNP site are
often initially represented using the generic letter codes A
and B. In the following, an italicized A refers to this generic
allele designation and not to adenine. In Illumina annotation each SNP is defined with design allele nucleotides, and
these occur on the same strand as the probe sequence; the
order in which the alternative alleles are given specifies
the generic A and B allele designations [1]. To illustrate, for
a SNP defined as [T/G], the A allele is T and the B allele is
Corresponding author: Nelson, S.C. (sarahcn@uw.edu).
Keywords: allele; strand translation; genotype; nomenclature; genome-wide
association study; meta-analysis.

G. In Affymetrix allele-specific hybridization technology,


the letter codes A and B are assigned differently and could
therefore occur on either the probe or target strand [2].
Plus (+)/minus (S)
In all human reference chromosomes, as for other eukaryotes [3], the plus (+) strand is defined as the strand with its
50 end at the tip of the short arm [4,5] (Genome Reference
Consortium, personal communication, March 27, 2012).
SNP alleles reported on the same strand as the (+) strand
are called plus alleles and those on the () strand are called
minus alleles. Providing SNP alleles on the plus genomic
strand is the convention in publicly available SNP datasets
such as the HapMap (www.hapmap.org) and 1000 Genomes
Projects (www.1000genomes.org).
Although the plus/minus designation is anchored at
the telomeres of each chromosome, the orientation of
intervening sequences may change between genome
builds as gaps are filled in and sequences are refined.
Thus when reporting plus/minus strand, one must specify
a genome build. The fluid nature of plus/minus orientation has partly motivated the development of alternative
nomenclatures.
Illumina TOP/BOT strand
The TOP/BOT strand naming convention, developed by
Illumina and subsequently adopted by dbSNP, has been
thoroughly defined elsewhere [1]. In brief, Illumina
strand designation is determined by either the SNP
alternative nucleotides or its flanking sequence. For unambiguous SNPs the TOP strand is defined as the one
that contains an A nucleotide allele. The A is designated
generically as allele A, whereas the alternative allele on
the TOP strand is designated as allele B. For ambiguous
SNPs the strand designation and allele A/B assignments
are determined by flanking sequence in a similar manner.
This strand definition is local to a SNP in that alleles
reported on the TOP strand for two neighboring SNPs
may be on different physical strands of DNA [6]. Furthermore, the TOP/BOT strand definition is intended to be
independent of any genome build or design strand. Another key feature of this naming system is that allele A for
a TOP strand probe is the base pair complement of allele
A for a BOT strand probe, such that the generic A/B
genotype coding remains consistent regardless of which
strand is probe or target. This nomenclature offers relative stability in the face of changing human genome
assemblies and SNP databases.
361

Letter

Trends in Genetics August 2012, Vol. 28, No. 8

Box 1. An example of allele conversion using Illumina annotation


Here we use Illumina-provided annotation for an example SNP
(rs216614) in Table I to derive a set of allele call conversions in Table
II. In Table I, SNP gives alternative alleles on the probe sequence
strand, IlmnStrand gives the TOP/BOT status of the probe sequence
strand, TopGenomicSeq gives the sequence surrounding the SNP
on the TOP strand, RefStrand gives the plus/minus status of the
probe sequence strand, and IlmnID encodes the correspondence
between TOP/BOT and forward/reverse (dbSNP) strands. The design alleles (on the probe sequence strand) are given directly by
SNP = [T/G] and, following the Illumina convention, the first
nucleotide corresponds to allele A and the second to allele B. The
TOP strand alleles are given in brackets in TopGenomicSeq. The
B_R in IlmnID specifies that the dbSNP reverse strand corresponds
with the BOT strand. The corresponding SNP assay is depicted in
Figure I.

(+) strand
C GG TC TGCA CA CG TC

() strand

5
G

GC CA GA CG T G T G C A

A
C
T
TRENDS in Genetics

Figure I. A simplified schematic of the SNP probe, where the probe sequence is
in blue and the target sequence in black text. The design alleles (T or G) are the
fluorescently labeled nucleotides recruited to the allele probe in this two-color
primer-extension assay. Adapted from materials available on the Illumina
website (www.illumina.org).

Table I. Excerpt from Illumina HumanOmni1-Quad_v1-0_C annotation file (build 37)


IlmnID
rs216614-131_B_R_1865662557

Name
rs216614

IlmnStrand
BOT

SNP
[T/G]

TopGenomicSeq
...CATCCC[A/C]TGCACA. . .

RefStrand

Table II. rs216614 allele-mapping table


AB
A
B

TOP
A
C

Design
T
G

Forward/reverse
The dbSNP resource of the US National Center for Biotechnology Information (NCBI) contains detailed information for each SNP in its database. Each refSNP (or rs)
entry consists of one or more submitted SNP (or ss)
records, each submitted by individual laboratories. Each
dbSNP record shows a flanking DNA sequence, which is
simply taken from the submission with the longest flanking sequence [6,7]. SNP alleles reported on the same strand
as this exemplar sequence in dbSNP sequence are called
forward alleles. Conversely, alleles on the opposite strand
are called reverse alleles. Note that the dbSNP meaning of
forward is easily confused with (+) genomic strand, which
has been referred to as the forward strand by the HapMap
project [8,9].
Achieving strand consistency
The most basic level of strand consistency requires only
that genotypes are reported on the same DNA strand
across datasets. At strand-unambiguous SNPs, discrepant
nucleotides are sufficient to identify strand inconsistencies
(e.g., A/C in one dataset and T/G in another). However,
harmonizing strand-ambiguous SNPs requires converting
allele calls to a specific strand, according to one of the
strand naming conventions described above. Given a nucleotide sequence with a SNP and its flanking bases (e.g.,
CATCCC[A/C]TGCACA) one can determine whether the
strand of that sequence is (i) plus or minus, by sequence
matching with the genomic reference sequence; (ii) TOP or
BOT, from the SNP itself or its flanking sequence [1]; and
(iii) forward or reverse, from the ss sequence record in
dbSNP. Determination of probe or target strand requires
additional information about assay design. In practice,
genotyping assay vendors generally supply annotations
362

Forward
A
C

Plus
A
C

that can be used to make strand conversions. Box 1 gives


an example of how to interpret Illumina annotation to
create a table of allele call conversions. Figure I shows a
simplified schematic of the genotyping probe at this example SNP. However, SNP annotations are not infallible and
further checks on strand consistency are useful. Commonly
used checks are comparisons of minor allele frequency and
patterns of linkage disequilibrium between the datasets to
be harmonized [10,11].
Our intent is not to advocate one allele nomenclature
above all others because the universal adoption of one
naming system is both unlikely and unnecessary. Instead,
our aim is to explain the different nomenclatures and the
need for precise documentation of allele designations for
each dataset. Increased understanding and documentation
will facilitate continued data sharing and collaboration
within the genetics research community.
Acknowledgments
This work was supported in part by the following National Institutes of
Health grants: GENEVA Coordinating Center (U01 HG004446);
GARNET Coordinating Center (U01 HG005157); Center for Inherited
D i s ea s e R e s e a r c h ( U 0 1 H G 0 0 4 4 3 8 , N I H c o n t r a c t n u m b e r s
HHSN268200782096C and HHSN268201100011I); and Broad Center
for Genotyping and Analysis (U01HG04424).

References
1 Illumina Inc. (2006) TOP/BOT strand and A/B allele (Technical Note).
http://www.illumina.com/documents/products/technotes/technote_
topbot.pdf
2 Affymetrix Inc. (2012) Affymetrix genotyping glossary. http://
www.affymetrix.com/support/help/genotyping_glossary/index.affx
3 Cherry, J.M. et al. (1998) SGD: Saccharomyces genome database.
Nucleic Acids Res. 26, 7379
4 Dunham, I. et al. (1999) The DNA sequence of human chromosome 22.
Nature 402, 489495

Letter
5 Cartwright, R.A. and Graur, D. (2011) The multiple personalities of
Watson and Crick strands. Biol. Direct 6, 7
6 National Center for Biotechnology Information (2005) Sequence
formatting in dbSNP reports. http://www.ncbi.nlm.nih.gov/books/
NBK44414
7 Kitts, A.K. and Sherry, S. (2002) The single nucleotide polymorphism
database (dbSNP) of nucleotide sequence variation. In The NCBI
Handbook (McEntyre, J. and Ostell, J., eds), National Center for
Biotechnology Information (Chap. 5) In: http://www.ncbi.nlm.nih.gov/
books/NBK21101/)
8 Frazer, K.A. et al. (2007) A second generation human haplotype map of
over 3.1 million SNPs. Nature 449, 851861

Trends in Genetics August 2012, Vol. 28, No. 8

9 Altshuler, D.M. et al. (2010) Integrating common and rare genetic


variation in diverse human populations. Nature 467, 5258
10 Browning, S.R. (20092011) Strand-switching utility for BEAGLE.
http://faculty.washington.edu/sguy/beagle/strand_switching/strand_
switching.html
11 Howie, B. and Marchini, J. (2009-2012) IMPUTE2 strand alignment
options. http://mathgen.stats.ox.ac.uk/impute/strand_alignment_
options.html
0168-9525/$ see front matter 2012 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.tig.2012.05.002 Trends in Genetics, August 2012,
Vol. 28, No. 8

363

Erratum

Corrigendum: Human evolutionary genomics:


ethical and interpretive issues
[Trends in Genetics 28 (2012)137145]

Joseph J. Vitti1,2, Mildred K. Cho3, Sarah A. Tishkoff4 and Pardis C. Sabeti1,2


1

Broad Institute of MIT and Harvard, Cambridge, Massachusetts


Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts
3
Stanford Center for Biomedical Ethics, Stanford University, Palo Alto, California
4
Departments of Genetics and Biology, University of Pennsylvania, Philadelphia, Pennsylvania
2

In Figure 1 the genes involved in pigmentation are shown


as SLC24A5, SLC42A2. They should be SLC24A5,
SLC45A2. Similarly, in the legend for Figure 1, line 8,
it reads:
In European populations, genes that affect skin pigmentation (SLC24A5 and SLC42A2) have undergone
positive selection.

In European populations, genes that affect skin pigmentation (SLC24A5 and SLC45A2) have undergone
positive selection.
We apologize to the readers of this article for this error.
0168-9525/$ see front matter 2012 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.tig.2012.05.003 Trends in Genetics, August 2012, Vol. 28, No. 8

It should read:

DOI of original article: 10.1016/j.tig.2011.12.001.


Corresponding author: Vitti, J.J. (vitti@broadinstitute.org).

417

You might also like