Professional Documents
Culture Documents
382
0168-9525/$ see front matter 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2012.03.010 Trends in Genetics, August 2012, Vol. 28, No. 8
Review
Functional RNAs
Informational RNAs
C..
AUGCGGCAUUAUGGGA
(b)
Functional RNAs
Informational RNAs
Metabolic
networks
C..
AUGCGGCAUUAUGGGA
Functional proteins
DNA genome
(c)
Functional RNAs
Metabolic
networks
Informational RNAs
C..
AUGCGGCAUUAUGGGA
Functional proteins
TRENDS in Genetics
Figure 1. The development of the modern genetic system from an RNA-dominated precursor genetic system. (a) The first genetic system probably involved informational
RNAs encoding ribozymes which facilitated the replication of those informational RNAs [1]. Given the narrow catalytic range of ribozymes, this system probably relied on
substantial networks of prebiotic chemistry to provide activated nucleotides [6]. (b) Protein synthesis by translation most likely arose from this RNA-based system [7] and
rapidly developed into a highly processive, high-fidelity system [8]. Appropriately, the translation system is dominated by functional RNAs, including the ribosome itself,
which has a ribozyme active site in its highly conserved core [57,58]. (c) The DNA genome probably arose from an RNAprotein precursor system. Deoxyribonucleotides
seem to have been unavailable until the evolution of the ribonucleotide reductase protein enzymes [7]. Unlike translation, DNA replication and processing are dominated by
protein functions rather than RNA functions, and core DNA-related functions do not appear to be universally conserved [10,11]. In the absence of significant bioinformatic
evidence, the transition from an RNA genome to a DNA genome remains enigmatic.
In addition, thousands of micronuclear genes are scrambled with respect to their macronuclear counterparts, with
segments of micronuclear genes present in a permuted or
inverted order relative to their order in the macronucleus
(Figure 3) (reviewed in [18]). Following sexual exchange of
haploid micronuclei, the macronuclear genome assembles
from dispersed segments of micronuclear DNA through a
process of genome rearrangement that is guided by macronuclear RNA templates (Figure 3) [19]. It is likely that
these RNA templates represent a transient cache of the
entire macronuclear genome during this developmental
stage.
The roles of RNA may surpass those of DNA in regulating the information in the genome of Oxytricha at three
levels. At the first level, RNA transcripts of complete
383
Review
Ribonucleotide
reductase classes
II
III
DNA
ligases
DNA
primases
ATP NAD
Pol Hel
Nanoarchaeum equitans
Pyrobaculum aerophilum
Aeropyrum pernix
Sulfolobus
Thermoplasma
Archaeoglobus fulgidus
Halobacterium sp. NRC-1
Methanosarcina
Pyrococcus
Methanobacterium thermoautotrophicum
Methanopyrus kandleri
Methanococcales
Giardia lamblia
Leishmania major
Thalassiosira pseudonana
Apicomplexa
Cyanidioschyzon merolae
Streptophyta
Disctyostelium discoideum
Schizosaccharomyces pomber
Saccharomycetaceae
Caenorhabditis
Diptera
Gnathostomata
Firmicutes
Chlamydiaceae
Fibrobacter succinogenes
Chlorobium tepidum
Bacteroidales
Actinobacteridae
Planctomycetaceae
Leptospira
Spirochaetaceae
Fusobacterium necleatum
Aquifex aeolicus
Thermotoga maritima
Cyanobacteria
Dehalococcoides ethenogenes
Deinococci
Acidobacteria
Desulfovibrio vulgaris
Geobacter sulfurreducens
Bdellovibrio bacteriovorus
Campylobacterales
Proteobacteria subclades
Alphaproteobacteria
TRENDS in Genetics
Figure 2. A phylogenetic distribution of key enzymes involved in DNA synthesis. Unlike the protein translation system, very few features of DNA synthesis and processing are
universally conserved. Ribonucleotide reductase is an enzyme required to produce deoxyribonucleotides from ribonucleotides. It is found in three distinct classes, I, II, and III,
although ancient homology between them can be inferred from structural and mechanistic similarity. Six distinct families of DNA polymerases are known. None of the four
standard DNA polymerase families (A, B, C, and D) has a universal taxonomic distribution. DNA polymerase families X and Y are universally distributed, but impart functions
that are related to excision repair rather than DNA replication. The DNA polymerase X family catalyzes non-template-dependent DNA synthesis, while the DNA polymerase Y
family polymerizes short segments across lesions. Bacteria use an ATP-dependent DNA ligase that is unrelated to the NADH-dependent DNA ligase used by Eukarya and
Archaea. Similarly, Bacteria use a helicase associated DNA primase, whereas Archaea and Eukarya use a DNA polymerase a-associated DNA primase. The lack of a universally
distributed set of enzymes involved in DNA synthesis suggests that modern pathways were still in the process of forming during the time of the last universal common ancestor
(LUCA). Alternatively, DNA-related pathways may simply be more evolutionarily malleable than, for example, translation pathways, and this property would obscure their
ancient phylogenetic signatures. The universal phylogenetic tree was previously generated in [59] and is based on 31 universal gene sequences from 191 genomes. The tree
image was produced using the Interactive Tree of Life web server [60]. Clades representing groups of 2540% similarity were collapsed to conserve space. Taxonomic
distribution of ribonucleotide reductase enzymes were identified from the RNR database [61]. Taxonomic distributions of DNA polymerase families, DNA ligases, and DNA
primases are extrapolated from [10,11], and do not represent a resolution capable of illustrating horizontal gene transfer. Ciliates are members of the Alveolata.
nanochromosomes from the previous generation can program the pattern of DNA rearrangements during macronuclear development [19]. The microinjection of synthetic
RNA molecules into Oxytricha cells can introduce an
384
Review
(a)
(b)
Old macronucleus
1
dsDNA
1
ssRNA
Transcription
2
Micronuclear
meiosis
...
Micronuclear chromosomes
2
...
Macronuclear nanochromosomes
1
(d)
1
1
2
2
3
3
(c)
Developing
macronucleus
2
4
4
Developing macronucleus
...
...
...
2
2
2
4
4
4
1
1
1
3
3
3
...
...
...
New micronucleus
...
...
TRENDS in Genetics
the micronuclear DNA remains unchanged, the inheritance of altered rearrangement patterns in Oxytricha
appears to be a transgenerational RNA-mediated epigenetic phenomenon.
At the second level, point substitutions can also transfer
from the RNA template to the macronuclear DNA [19],
particularly near regions where junctions form between
macronuclear segments. These point substitutions can also
transfer to the sexual progeny and their progenys progeny.
Given that the micronuclear DNA does not share these
point substitutions [19], this observation implicates a role
for RNA-templated DNA repair [20] in DNA rearrangement. These somatically acquired point mutations represent another level at which epigenetically-inherited RNA
molecules instruct the sequence and interpretation of the
DNA genome.
At the third level, the RNA macronuclear genome cache
also appears to be responsible for determining the copy
number of macronuclear chromosomes. Artificially increasing or decreasing the available levels of RNA chromosome templates by microinjection or RNAi, respectively,
leads to a relative increase or decrease in the copy number
of the corresponding DNA molecules in the next generation. This effect also lasts at least two sexual generations
[21], demonstrating a further example of RNA-mediated
transgenerational epigenetic inheritance in Oxytricha.
Apart from its unique sequence features, ciliate micronuclear genomes have a normal eukaryotic structure.
Their genome architecture is in the form of large chromosomes with telomeres and a centromere, and micronuclei
reproduce via mitosis during cell division. During the
sexual cycle the diploid genome undergoes meiosis to
produce haploid gametes, one of which is retained and
the other of which passes to the mating partner
(Figure 3) [22,23].
The macronucleus is very different. The macronuclear
genome contains on the order of 20 million small DNA
chromosomes, or nanochromosomes, most of which encode
a single protein-coding gene or functional RNA. In fact, the
lack of a centromere has led some to argue that the term
chromosome is inappropriate for macronuclear DNA [16].
The extraordinary number of DNA molecules in the macronucleus results from approximately 20 000 unique
nanochromosomes averaging roughly 1000 copies per macronucleus. Their average length is approximately 2.7 kb
[24]. These unusual properties of the Oxytricha macronuclear genome and macronucleus, and the powerful role of
RNA in sculpting these genomes, offer a compelling system
within which to consider possible transitions from simple
RNA genomes to complex DNA genomes.
Oxytricha and early genome replication
Small, single-gene chromosomes, such as those in the
Oxytricha macronucleus, represent one of the simplest
possible states of a genome and thus were probably predecessors to more complex genome architectures. A genome of small linear chromosomes would have presented
less of a challenge to primitive polymerases [14], which
probably copied nucleic acids with low fidelity and were
unable to process long sequences. The nature of these
primitive DNA polymerases is unknown. None of the four
families of standard DNA polymerases has a universal
distribution [11], although the sliding clamp function of
the DnaN polymerase in E. coli and the 50 30 exonuclease
function of the Pol1-A polymerase in E. coli appear to have
been present in LUCA [25,26]. Three subunits of DNAdependent RNA polymerases appear to be universal as
well [25]. Structural and functional comparisons of DNAdependent RNA polymerases suggest that they may share
a multi-subunit ancestor with proofreading capabilities
that was present in LUCA [27].
It is generally assumed that the RNA-only stage in the
development of the genetic system would have required an
RNA-dependent RNA polymerase ribozyme to have replicated the genome. Although no such enzyme has been
found in extant biology, several have been produced synthetically through laboratory evolution techniques [24,28
30]. So far, all of these ribozymes are over a hundred
nucleotides long and exhibit very tight constraints on
sequence space, making it difficult to imagine how similar
ribozymes could have evolved de novo in an RNA world
scenario. In addition, even the most capable of these
laboratory-generated polymerase ribozymes is not able
385
Review
to sustain the processivity required to replicate RNA
molecules of its own size or larger.
During the process of Oxytricha genome rearrangement,
segments of DNA from the micronuclear genome assemble
according to RNA templates of the macronuclear genome
(Figure 3). This process represents a unique scenario in
extant biology in which a complete copy of a genome is
produced, not by polymerizing a complementary strand
one nucleotide at a time, but by recycling DNA polymers
from a precursor genome. It is likely that these pieces of
micronuclear DNA ligate together after assembling on the
complementary RNA template, although there is also evidence that gaps or errors between the DNA segments are
repaired by the activity of an RNA-dependent DNA polymerase [19].
A similar mode of replication would have conferred
several benefits to early life and perhaps created a viable
selection regime in which polymerases with high fidelity
and processivity might have evolved. In contrast to ribozyme polymerases, several ribozyme ligases are present in
modern organisms [3] and more have been synthesized by
directed evolution [31,32]. Polymerases are in fact a specialized kind of ligase in which one of the ligated partners is
a single nucleotide [31]. It follows, then, that the central
challenge to a polymerase is not the catalytic step of
ligation, but the ability to perform that step repeatedly
over the full length of a gene-sized molecule, a limitation
that is borne out by the difficulty of producing a highly
processive ribozyme polymerase [24,29,30].
If early nanochromosomes replicated in an Oxytrichalike fashion, the number of catalytic ligation steps would be
much smaller than that in a complete polymerase-dependent replication. The source of these DNA segments in a
primitive system is not clear. Perhaps if the GC% was very
high or very low, the sequence complexity of the nanochromosomes would also be low, and short abiotically synthesized segments with random sequences [33,34] would
provide enough matches to the template to permit assembly of most of the genome from these small, modular pieces
[35]. The need to fill or repair small gaps between segments
would create a selective environment for the evolution of a
weakly processive polymerase into the ancestor of a modern, highly processive polymerase. This model of early
genome replication is consistent with the observation that
the only universally conserved DNA polymerase families
are involved in excision repair (Figure 2). Once a highfidelity, high-processivity polymerase became available,
genome replication could move towards its current polymerase-dependent form and longer chromosome lengths
would be possible.
Oxytricha and early cell division
In most Eukaryotes, cell division is orchestrated by the
complex process of mitosis, wherein duplicate chromosomes segregate evenly between the dividing cells. The
process is controlled by dynamic motor complexes that pull
chromosomes along organized microtubules [36,37]. Functionally analogous but non-homologous processes are
thought to take place in Bacteria [3840] and Archaea
[41,42]. It is difficult to imagine that such a complex system
was present in early life forms. Early cell division probably
386
Review
external source of genetic variation. But the nanochromosome structure of the macronuclear genome and its capacity to receive new alleles during the process of DNA
arrangement make the Oxytricha macronucleus uniquely
permissive to somatically acquired genetic change. Nevertheless, an epigenetic system such as that of Oxytricha is
also robust to such perturbations because the high copynumber of original alleles will initially act as a buffer
against sequence change, restricting the spread of deleterious somatic alterations. Perhaps early genomes with
structures similar to the Oxytricha macronucleus would
also be permissive to genetic acquisitions, but stable
against their deleterious effects.
Oxytricha and early organismal identity
The genetic openness that existed during the transition to
modern life was probably also prone to invasions by selfish
replicators that may have easily infiltrated and taken
advantage of emerging organismal replicating systems
[51]. This effect is generally modeled through self-propagating metabolism-like networks, or hypercycles. These
replicating entities may be parasitic if they either receive
replication support from the host system without conferring a reciprocal benefit, or shortcut the host system in
some deleterious way. Vesicles can barricade replicating
systems against selfish entities if they provide a mechanism of blocking the entry of external replicators [52].
Selfish replicators can also be eventually incorporated into
the metabolism of the host system, balancing their deleterious effects with beneficial ones [53].
Although the dynamics of nuclear dimorphism in Oxytricha do not resemble a hypercycle, the scrambling of the
micronucleus and its rearrangement to form the macronuclear genome illustrate the properties of stable systems
that host selfish replicators. The unique genomic traits of
Oxytricha seem to be both caused by and assisted by an
invasion of DNA transposons (typically regarded as selfish
genetic agents). The micronucleus hosts thousands of
transposons, which probably contributed to the scrambling
of its genome, either through actual transposition or via
ectopic recombination between transposons of the same
family. Unlike domesticated transposases in other eukaryotes, micronuclear transposons display evidence of purifying selection acting on their encoded proteins [23,54] and
may still be active outside the control of the host cell. The
presence of active transposons in the micronucleus may
have provided the selective pressure for acquisition of a
template-directed genome unscrambling system as part of
macronuclear development [55] as a mechanism for promoting the long-term stability of the genome and robustness to perturbations.
Recent discoveries reveal that micronuclear transposons play a surprisingly direct role in both macronuclear
development and genome rearrangement [23]. Micronucleus-limited transposase genes are expressed during macronuclear development, but silent during vegetative growth.
The experimental silencing of these transposases by RNAi
results in aberrant unscrambling patterns in the macronuclear genome, suggesting that transposons play an active role in genome rearrangement. It is possible that
the nanochromosome templates are composed of RNA to
References
1 Gilbert, W. (1986) The RNA world. Nature 319, 618
2 Gesteland, R. and Atkins, J.F., eds (1993) The RNA World, Cold
Spring Harbor Laboratory Press
3 Landweber, L. et al. (1998) Ribozyme engineering and early evolution.
Bioscience 48, 94103
4 Fox, G. (2010) Origin and evolution of the ribosome. Cold Spring Harb.
Perspect. Biol. 2, a003483
5 White, H. (1976) Coenzymes as fossils of an earlier metabolic state. J.
Mol. Evol. 7, 101104
6 Goldman, A. et al. (2012) Evolution of the protein repertoire. In
Encyclopedia of Molecular Cell Biology and Molecular Medicine
(Meyers, R.A., ed.), Wiley-VCH
7 Freeland, S. et al. (1999) Do proteins predate DNA? Science 286, 690692
8 Goldman, A. et al. (2010) The evolution and functional repertoire of
translation proteins following the origin of life. Biol. Direct 5, 15
9 Torrents, E. et al. (2002) Ribonucleotide reductases: divergent
evolution of an ancient enzyme. J. Mol. Evol. 55, 138152
10 Forterre, P. (2002) The origin of DNA genomes and DNA replication
proteins. Curr. Opin. Microbiol. 5, 525532
11 Filee, J. et al. (2002) Evolution of DNA polymerase families: evidences
for multiple gene exchange between cellular and viral proteins. J. Mol.
Evol. 54, 763773
387
Review
12 Koonin, E. (2003) Comparative genomics, minimal gene-sets and the
last universal common ancestor. Nat. Rev. Microbiol. 1, 127136
13 Forterre, P. (2006) Three RNA cells for ribosomal lineages and three
DNA viruses to replicate their genomes: a hypothesis for the origin of
cellular domain. Proc. Nat. Acad. Sci. U.S.A. 103, 36693674
14 Woese, C. (1998) The universal ancestor. Proc. Natl. Acad. Sci. U.S.A.
95, 68546859
15 Zoller, S. et al. (2012) Characterization and taxonomic validity of the
ciliate Oxytricha trifallax (Class Spirotrichea) based on multiple gene
sequences: limitations in identifying genera solely by morphology.
Protist DOI: 10.1016/j.protis.2011.12.006
16 Prescott, D. (1994) The DNA of ciliated protozoa. Microbiol. Rev. 58,
233267
17 Prescott, D. (2000) Genome gymnastics: unique modes of DNA
evolution and processing in ciliates. Nat. Rev. Genet. 1, 191198
18 Nowacki, M. et al. (2011) RNA-mediated epigenetic programming of
genome rearrangements. Annu. Rev. Genomics Hum. Genet. 12, 367
389
19 Nowacki, M. et al. (2008) RNA-mediated epigenetic programming of a
genome-rearrangement pathway. Nature 451, 153158
20 Storici, F. et al. (2007) RNA-templated DNA repair. Nature 447, 338
341
21 Nowacki, M. et al. (2010) RNA-mediated epigenetic regulation of DNA
copy number. Proc. Natl. Acad. Sci. U.S.A. 107, 2214022144
22 Nowacki, M. and Landweber, L.F. (2009) Epigenetic inheritance in
ciliates. Curr. Opin. Microbiol. 12, 638643
23 Nowacki, M. et al. (2009) A functional role for transposases in a large
eukaryotic genome. Science 324, 935938
24 Green, R. and Szostak, J.W. (1992) Selection of a ribozyme that
functions as a superior template in a self-copying reaction. Science
258, 19101915
25 Harris, J. et al. (2003) The genetic core of the universal ancestor.
Genome Res. 13, 407412
26 Becerra, A. et al. (2007) The very early stages of biological evolution and
the nature of the last common ancestor of the three major cell domains.
Annu. Rev. Ecol. Evol. Syst. 38, 361379
27 Poole, A. and Logan, D.T. (2005) Modern mRNA proofreading and
repair: clues that the last universal common ancestor possessed an
RNA genome? Mol. Biol. Evol. 22, 14441455
28 Doudna, J. et al. (1991) A multisubunit ribozyme that is a catalyst of
and template for complementary strand RNA synthesis. Science 251,
16051608
29 Johnston, W. et al. (2001) RNA-catalyzed RNA polymerization:
accurate and general RNA-templated primer extension. Science 292,
13191325
30 Wochner, A. et al. (2011) Ribozyme-catalyzed transcription of an active
ribozyme. Science 332, 209212
31 Bartel, D. and Szostak, J.W. (1993) Isolation of new ribozymes from a
large pool of random sequences. Science 261, 14111418
32 Landweber, L. and Pokrovskaya, I.D. (1999) Emergence of a dual
catalytic RNA with metal specific cleavage and ligase activities: the
spandrels of RNA evolution. Proc. Natl. Acad. Sci. U.S.A. 96, 173178
33 Huang, W. and Ferris, J.P. (2006) One-step, regioselective synthesis of
up to 50-mers of RNA oligomers by montmorillonite catalysis. J. Am.
Chem. Soc. 128, 89148919
34 Aldersley, M. et al. (2009) RNA synthesis by mineral catalysis. Orig.
Life Evol. Biosph. 39, 200
35 Kotler, L. et al. (1993) DNA sequencing: modular primers assembled
from a library of hexamers or pentamers. Proc. Natl. Acad. Sci. U.S.A.
90, 42414245
388
Review
Department of Physics, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada
Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester,
MA 01605, USA
The temporal organization of DNA replication has puzzled cell biologists since before the mechanism of replication was understood. The realization that replication
timing correlates with important features, such as transcription, chromatin structure and genome evolution,
and is misregulated in cancer and aging has only deepened the fascination. Many ideas about replication timing have been proposed, but most have been short on
mechanistic detail. However, recent work has begun to
elucidate basic principles of replication timing. In particular, mathematical modeling of replication kinetics in
several systems has shown that the reproducible replication timing patterns seen in population studies can be
explained by stochastic origin firing at the single-cell
level. This work suggests that replication timing need
not be controlled by a hierarchical mechanism that
imposes replication timing from a central regulator,
but instead results from simple rules that affect individual origins.
Replication origins: correlated or independent?
The duplication of the genome of a cell by DNA replication
is an essential step in the cell cycle. In bacteria, the overall
situation is straightforward, in that DNA replication initiates at a single, well-defined location in the genome (e.g.
the oriC site in Escherichia coli) and terminates at a
second, well-defined region (ter in E. coli) [1]. Eukaryotic
organisms, with 101000 times more DNA and with 10
100 times slower replication forks, depend on the firing of
multiple origins of replication along the DNA. These origins are defined by a two-step process [2]. Licensing, the
first step, occurs in G1 phase, when the origin recognition
complex (ORC) binds to chromatin and, with the aid of
Cdc6 and Cdt1, loads onto the DNA head-to-head pairs of
the barrel-shaped heterohexameric MCM complex, the
catalytic core of the replicative helicase [3,4]. Each pair
of MCM complexes is a potential origin of DNA replication.
Initiation (or origin firing), the second step, occurs in S
phase, when a pair of MCMs is activated via a complex
process involving numerous proteins, including recruitment of Sld2, Sld3, the GINS complex and Cdc45, as well
as the phosphorylation of various components by the CDK
and DDK replication kinases [5]. The regulation of the
spatial binding of the ORC and the temporal activation
Corresponding author: Bechhoefer, J. (johnb@sfu.ca).
Keywords: DNA replication timing; stochastic models; replication initiation; ORC;
MCM
374
0168-9525/$ see front matter 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tig.2012.03.011 Trends in Genetics, August 2012, Vol. 28, No. 8
Review
Replication frac.
(b)
f(x)
I(x)
f(t)
I(t)
Genome position
Genome position
Yeast
(c)
Initiation rate
# /kb /time
Replication frac.
Initiation rate
# /kb /time
(a)
Metazoan
f(x)
f(x)
f(t)
0
I(x,t)
I(t)
I(x)
Genome position
Time
Time
Genome position
I(t)
f(t)
Time
Time
f(x,t)
I(x)
TRENDS in Genetics
Figure 1. Replication fractions and initiation rates. (a,b) The relation between
replication fractions f and initiation rates I, as illustrated for budding yeast. (a)
Spatially resolved data, averaged over an asynchronous cell population. (b) Time
course data, averaged over the genome. (c) Illustration of typical replication timing
data for budding yeast (left) and a metazoan organism (right). Top-left image
shows the replication fraction f(x,t), as it might be inferred from a microarray
timing experiment with several time points of data from synchronized cell
populations. Black represents low-replication levels and white represents highreplication levels. Averaging the replication fraction over the genome gives the
curve f(t), depicted to the left of the f(x,t) image, which goes from 0 to 1. Averaging
the replication fraction over time, as in an experiment on asynchronous cell
populations, gives the curve above the f(x,t) image. The bottom-right group shows
the inferred I(x,t) image, as well as the averaged curves I(t) and I(x). Note that, in
budding yeast, replication origins are well localized, as indicated by the spikes in
the function I(x). [When viewed or printed at low resolution, not all spikes in I(x,t)
may be visible.] The right-hand groups illustrate similar concepts for a typical
metazoan organism. The main difference is that origins are not well localized, so
that the function I(x) has broad features, representing zones where initiations are
more or less likely to occur.
of replication are stochastic at the level of molecular interactions. It is important to note that stochastic models do not
require that origins all fire with the same probability, nor is
stochastic firing incompatible with late firing origin [18].
However, there is evidence in some cases for correlation in
origin initiation activity. As a result, the current picture is
an intermediate one that mixes both stochastic elements
and mechanisms for correlations in origin initiation [15,19].
Still, differences remain concerning what is essential and
what is incidental in the above picture and what kind of
underlying mechanisms are likely to be important in controlling the replication program. In this review, we argue
that, for the simpler cases such as unicellular yeast and for
the embryonic cells of some multicellular animals, recent
experiments and modeling efforts have shown that much of
the available replication data may be understood in terms of
the simpler independent origin hypothesis and that correlations probably play a minor role in the replication program.
Replication in the somatic cells of metazoan organisms is
more complex, and we outline recent efforts in this area.
Replication in yeast
The past few years have marked a turning point in the
understanding of replication in yeast. First came a series
of high-resolution combing and microarray experiments
(Box 2). For example, high-resolution timing data of
synchronized populations of wild type and clb5D Saccharomyces cerevisiae show clear average timing patterns [20].
Their measurements, as mentioned in Box 2, amounted to
measurements of f(x,t), with spatial information resolved to
a few kilobases and temporal information resolved to 5 min.
At around the same time, DNA combing studies in budding
and fission yeast showed that initiations at the single-molecule scale are stochastic, with different sets of origins chosen
in each cell cycle [21,22]. Indeed, in budding yeast, it is now
clear that there are as many as 700 potential origin sites, of
which only approximately 200 are used in any given cycle.
In parallel work, the rate of origin firing in budding and
fission yeast was shown to be regulated by competition for
limiting activators, such as the Cdc45 initiation factor and
the DDK initiation kinase [2326]. Competition for limiting activators provides an explanation for why origin firing
is less efficient than might be possible. The stochastic
interaction between origins and diffusible activators also
provides a mechanism for stochastic firing of origins.
The stochasticity of individual origins turns out to be an
important effect. In contrast to earlier models, in which the
firing of specific origins was envisaged to be limited to
narrow windows of S phase, it is now clear that the width of
the firing-time distribution for an individual origin can be a
substantial fraction of S phase. Indeed, models that fail to
incorporate the width of the timing distribution fail to
reproduce many of the experimental details adequately
[27]. By contrast, stochastic models that take into account
the width of the firing-time distribution can successfully fit
the microarray data [8,28,29]. Several notable insights and
results come from these analyses: first, it is possible to
generate models with independent initiation scenarios
[initiation rate I(x,t) and constant fork velocity v] that lead
to good fits of the data. This result shows that the independent origin hypothesis suffices to explain microarray data
375
Review
Box 1. f and I: mathematical functions that describe
replication kinetics
DNA replication kinetics can be described using two related but
distinct mathematical functions: the replication fraction f and the
initiation rate I. The first, f, is a complete description of replication
kinetics and can be directly determined from experimental data (Box
2). The second, I, only describes the kinetics of origin initiation and
cannot be directly measured; it must be inferred from f. However, if
fork rates are assumed to be nearly constant, as is frequently done
in models of replication kinetics, then I is sufficient to completely
determine f. Both f and I can be defined for every spatial point (x) in
the genome and every time point (t) in S phase, to give f(x,t) and
I(x,t) (Figure 1c, main text).
It is often useful to consider the time-averaged functions, f(x) and
I(x) (Figure 1a, main text). f(x) can be thought of as the average
replication time of each point in the genome and is generally
measured on asynchronous populations of cells. It is closely related
to the median replication time trep at a site that is inferred from time
course data on synchronized cell populations. The peaks in f(x)
represent the origins, and taller peaks indicate origins that fire, on
average, earlier in S phase. I(x) represents the average initiation rate
of each point in the genome. In yeast, where origins are well
defined, I(x) = 0 for most of the genome and forms spikes over the
origins, with taller spikes reflecting a higher average probability of
origin firing (Figure 1a,c). In metazoans, origins appear to be more
diffuse, and thus so is I(x) (Figure 1c). It is important to realize that
the height of the peaks in I(x) (e.g. the average firing probability of
an origin) cannot be directly inferred from the height of the peaks in
f(x), because f(x) convolves both passive replication and active firing
of each origin; I(x) can only be extracted by mathematical modeling
of f(x).
It can also be useful to consider the spatially averaged functions,
f(t) and I(t) (Figure 1b, main text). The replication fraction f(t) is
generally sigmoidal, as cells go from unreplicated in G1 to
replicated in G2. The exact shape of the sigmoid depends on the
details of the replication program, such as of the distribution of
origins and the shape of I(t). As discussed in the main text, I(t) has
been proposed to generally increase for most of S phase and then
decline in late S phase.
on replication timing in yeast. Second, the intrinsic parameters characterizing each origin have values that are independent of their neighbors, again suggesting that the
initiation of each origin is an independent stochastic event
[8]. Studies in fission yeast have also led to the conclusion
that local initiation models suffice to explain the available
experimental data [30,31]. However, several biologically
different scenarios can lead to similar overall timing patterns [32], and more complicated mechanisms, such as
trans-acting regulators of origin activity and chromosome
structure, can affect origin timing [33,34]. Clearly, further
iterations of modeling and experiment will be needed to
come to a final picture.
Replication in embryos
Embryonic cells in metazoans represent an interesting
intermediate case of complexity. On the one hand, they
have the full amount of DNA of somatic cells. On the other
hand, they undergo a rapid, simplified cell cycle that is
largely transcriptionally silent, which removes one major
source of complication in the replication of somatic cells. In
vitro studies of Xenopus cell-free extracts have been especially detailed and fruitful [13,3537] and have led to
associated modeling efforts [6,7,19,38,39]. The replication
program in Xenopus embryos is relatively simple and much
faster than in somatic cells. In particular, there are no fixed
376
Review
Box 2. Experimental techniques for analyzing DNA
replication timing
The recent gains in our understanding of replication timing are built
on experimental advances that have greatly increased the quality
and quantity of data available. Defined patterns of DNA replication
were first observed in fiber autoradiography studies of tritiated
thymidine incorporation in bacterial and mammalian cells [12,79].
By in vivo pulse-labeling cells with tritiated thymidine and then
stretching the labeled DNA on a photosensitive film, it was possible
to map replication patterns (which regions have replicated and
which have not) at a given time. A significant technical improvement
was the substitution of fluorescently labeled thymine analogs, such
as BrdU, that could be observed using an optical microscope [80,81].
Molecular combing, which stretches DNA more controllably,
improved the latter technique by allowing one to more reliably
associate positions on an image of a stretched fiber with genomic
positions and by simplifying the identification of individual fibers
taken anonymously from the genome [82,83] or with the genome
location identified [54,84]. In parallel with fiber-based techniques,
live-cell imaging has also yielded much valuable information.
Although the size of origins and even their separations are well
below the resolution of conventional light microscopy, clever
techniques can yield spatial and temporal information. For example,
specific sites can be labeled with fusion proteins whose intensity
doubles after replication, an event that can readily be observed [85].
In the future, live single-molecule studies based on flow and
optical or magnetic tweezers [86], nano-engineered capillaries
[87,88] and other molecular-scale structures may lead to even
greater insights, especially into local mechanisms at the fork and
initiation sites.
A second set of techniques provides information about the
fraction of cells in a population that has replicated at a particular
location x and time t. This fraction of replicated cells can be
described by the function f(x,t), if replication kinetics throughout S
phase are measured, or simply as f(x), if measurements are
performed on asynchronous cell populations (Figure 1, main text).
Such measurements originally used microarrays [89,90], with one
approach based on local changes in copy number during replication. In a population of unreplicated cells, a baseline intensity is
measured at each locus [f(x) = 0]. After all cells have replicated, the
measured intensity at each locus should be double [f(x) = 1]. During
replication, intermediate levels of replication are detected as
intermediate intensity levels [0<f(x)<1]. For example, if half of the
cells in the population have replicated at a location x, then f(x) = 0.5.
More recently, direct sequencing to determine local DNA copy
number has given similar information with fewer artifacts [91,92].
Initial studies used multiple time points in cultures of synchronized
cells to directly measure f(x,t) [89,93], and this approach is still the
state of the art in yeast [20,64]. However, comparable results can be
derived by sorting asynchronous cells of any type into G1 and S
populations [90].
Review
Box 3. Theoretical techniques for analyzing DNA replication
timing
Although determining the firing time of an origin would seem
straightforward, particularly for the relatively simple yeast genome,
the heterogeneous nature of origin firing and the passive replication
of origins by forks from neighboring origins mean that the
distribution of origin firing times cannot be directly inferred from
its average replication time [94]. Therefore, rigorous analysis of
replication timing patterns has relied on more sophisticated
analytical tools. One of the most straightforward and widespread
methods is computer simulation [6,27,28,30,38]. An advantage of
simulation is that, with modest computer resources (especially if
simulations keep track of only positions of forks and origins rather
than use a lattice for each point on the genome [95]), one can
recreate in silico not only the ideal experimental scenario envisaged,
but also any relevant experimental details. For example, it is
straightforward to include the effects of asynchrony in the cell
population, finite microscope resolution, labeling artifacts, and the
like [96]. Once the artifacts and the replication scenario are chosen
correctly, the simulation can reproduce, within statistical error, the
data from any given scenario.
The main disadvantage of simulations is that to analyze experimental data, one must first determine both the appropriate type of
replication scenario to simulate and ways to incorporate experimental details and then determine the appropriate parameters to
use. In situations in which origin firing is not uniformly distributed,
each origin will be characterized by several parameters, and so the
simulation may depend on hundreds or even thousands of
parameters, depending on the type of organism. Curve-fitting
techniques, which amount to a search in the space of parameters,
require simulating a large number of scenarios. Analytical models,
which can be used to directly calculate replication profiles instead of
needing to simulate replication step by step, are one way to get
around such obstacles. Analytical models may be evaluated faster
than simulations. The difficulties are that one must be able to
determine an appropriate model and be able to solve it. Thus,
beginning with [6], a variety of analytical models have been
proposed [8,39,42,94,97]. Because models based on independent
origins are simpler than ones that allow correlated initiations, most
of the above work has assumed such a scenario. Nonetheless, some
analysis of correlated initiations has been done, as discussed in the
main text.
Review
Acknowledgments
Concluding remarks
The hypothesis that replication is largely controlled by the
local rate of initiation has received wide support from
recent experiments and analyses. Models based on local
replication rates I(x,t) have successfully described the
replication process in budding and fission yeast, in Xenopus embryos and in the Igh locus of mouse pro-B cells
[6,8,11,28,30,38] (Paolo Norio, personal communication). A
limiting factor in this work is that each of the above
analyses involved a long-term collaboration between experimental biologists and modeling laboratories (the latter
from a variety of fields, including physics, engineering and
computer science). To broaden the use of quantitative
analyses of replication and to analyze the growing number
of data sets, it is important that the software and analysis
procedures be usable by non-specialists. The recent derivation of inversion formulas (A. Baker, PhD thesis, ENS
de Lyon, France, 2011) that give I(x,t) directly from data on
the local average replication fraction f(x,t) obtainable from
microarray or deep sequencing studies on synchronized
cell populations are a first step in that direction.
A second research direction is a more precise understanding of the relation between the replication program,
as described above, and the effects of DNA damage, with its
concomitant activation of DNA repair mechanisms. For
example, one consequence of damage that stalls replication
forks is the activation of additional origins, which now have
more time to initiate [73,74], an effect that is straightforward to simulate [75] and model analytically [76]. The
modeling of fork stalls predicts that there is a critical
density of stalled forks (approximately one per replicon),
above which there is a global delay in S phase and below
which the effects are minor and localized. Interestingly,
this threshold density matches the observed stall densities
in fragile zones and in cells with activated oncogenes [76].
However, DNA damage can also induce checkpoints that
inhibit subsequent origin firing [77], complicating the
overall effect of DNA damage on replication timing. A
related topic is the interrelation between mutation rates
and events in S phase. Although formal models to handle
such situations are beginning to be developed [69], more
work is needed to understand observations, such as the
link between mutation rate and S phase timing [78].
Although the independent origin hypothesis is attractive in its simplicity and so far remarkably successful in its
application, there is evidence for correlated initiations in
somatic metazoan cells. Some of the correlation is explainable as straightforward consequences of the physical constraints of clustering polymerases. In such a view, the
primary method of controlling timing in S phase remains
the local modulation of overall initiation rates, and the
correlations in the initiation of neighboring origins are
produced by the geometrical effects of loops induced by
replication factories. Whether such mechanisms suffice or
whether a more complicated control mechanism is at play
is at present unclear. Time will tell.
References
JB has been supported by grants from NSERC (Canada) and the Human
Frontiers Science Program. NR has been supported by NIH grant
GM098815 and an American Cancer Society Research Scholar Grant.
1 Baker, T.A. and Wickner, S.H. (1992) Genetics and enzymology of DNA
replication in Escherichia coli. Annu. Rev. Genet. 26, 447477
2 Masai, H. et al. (2010) Eukaryotic chromosome DNA replication: where,
when, and how? Annu. Rev. Biochem. 79, 89130
3 Remus, D. et al. (2009) Concerted loading of Mcm2-7 double hexamers
around DNA during DNA replication origin licensing. Cell 139, 719730
4 Evrin, C. et al. (2009) A double-hexameric MCM2-7 complex is loaded
onto origin DNA during licensing of eukaryotic DNA replication. Proc.
Natl. Acad. Sci. U.S.A. 106, 2024020245
5 Labib, K. (2010) How do Cdc7 and cyclin-dependent kinases trigger the
initiation of chromosome replication in eukaryotic cells? Genes Dev. 24,
12081219
6 Herrick, J. et al. (2002) Kinetic model of DNA replication in eukaryotic
organisms. J. Mol. Biol. 320, 741750
7 Jun, S. and Bechhoefer, J. (2005) Nucleation and growth in one
dimension. II. Application to DNA replication kinetics. Phys. Rev. E
71, 011909
8 Yang, S.C. et al. (2010) Modeling genome-wide replication kinetics
reveals a mechanism for regulation of replication timing. Mol. Syst.
Biol. 6, 404
9 Hamlin, J.L. et al. (2008) A revisionist replicon model for higher
eukaryotic genomes. J. Cell. Biochem. 105, 321329
10 Norio, P. et al. (2005) Progressive activation of DNA replication
initiation in large domains of the immunoglobulin heavy chain locus
during B cell development. Mol. Cell 20, 575587
11 Gauthier, M.G. et al. (2012) Modeling inhomogeneous DNA replication
kinetics. PLoS ONE 7, e32053
12 Huberman, J.A. and Riggs, A.D. (1968) On the mechanism of DNA
replication in mammalian chromosomes. J. Mol. Biol. 32, 327341
13 Blow, J.J. et al. (2001) Replication origins in Xenopus egg extract are 5
15 kilobases apart and are activated in clusters that fire at different
times. J. Cell Biol. 152, 1525
14 Pasero, P. et al. (2002) Single-molecule analysis reveals clustering and
epigenetic regulation of replication origins at the yeast rDNA locus.
Genes Dev. 16, 24792484
15 Shaw, A. et al. (2010) S-phase progression in mammalian cells:
modelling the influence of nuclear organization. Chromosome Res.
18, 163178
16 Audit, B. et al. (2009) Open chromatin encoded in DNA sequence is the
signature of master replication origins in human cells. Nucleic Acids
Res. 37, 60646075
17 Guilbaud, G. et al. (2011) Evidence for sequential and increasing
activation of replication origins along replication timing gradients in
the human genome. PLoS Comput. Biol. 7, e1002322
18 Rhind, N. et al. (2010) Reconciling stochastic origin firing with defined
replication timing. Chromosome Res. 18, 3543
19 Jun, S. et al. (2004) Persistence length of chromatin determines origin
spacing in Xenopus early-embryo DNA replication: quantitative
comparisons between theory and experiment. Cell Cycle 3, 223229
20 McCune, H.J. et al. (2008) The temporal program of chromosome
replication: genomewide replication in clb5D Saccharomyces
cerevisiae. Genetics 180, 18331847
21 Patel, P.K. et al. (2006) DNA replication origins fire stochastically in
fission yeast. Mol. Biol. Cell 17, 308316
22 Czajkowsky, D.M. et al. (2008) DNA combing reveals intrinsic temporal
disorder in the replication of yeast chromosome VI. J. Mol. Biol. 375,
1219
23 Patel, P.K. et al. (2008) The Hsk1(Cdc7) replication kinase regulates
origin efficiency. Mol. Biol. Cell 19, 55505558
24 Mantiero, D. et al. (2011) Limiting replication initiation factors execute
the temporal programme of origin firing in budding yeast. EMBO J. 30,
48054814
25 Wu, P.Y. and Nurse, P. (2009) Establishing the program of origin firing
during S phase in fission yeast. Cell 136, 852864
26 Tanaka, S. et al. (2011) Origin association of sld3, sld7, and cdc45
proteins is a key step for determination of origin-firing timing. Curr.
Biol. 21, 20552063
379
Review
27 Spiesser, T.W. et al. (2009) A model for the spatiotemporal organization
of DNA replication in Saccharomyces cerevisiae. Mol. Genet. Genomics
282, 2535
28 de Moura, A.P. et al. (2010) Mathematical modelling of whole
chromosome replication. Nucleic Acids Res. 38, 56235633
29 Luo, H. et al. (2010) Genome-wide estimation of firing efficiencies of
origins of DNA replication from time-course copy number variation
data. BMC Bioinform. 11, 247
30 Lygeros, J. et al. (2008) Stochastic hybrid modeling of DNA replication
across a complete genome. Proc. Natl. Acad. Sci. U.S.A. 105, 12295
12300
31 Koutroumpas, K. and Lygeros, J. (2011) Modeling and analysis of DNA
replication. Automatica 47, 11561164
32 Raghuraman, M.K. and Brewer, B.J. (2010) Molecular analysis of the
replication program in unicellular model organisms. Chromosome Res.
18, 1934
33 Hayano, M. et al. (2011) Mrc1 marks early-firing origins and
coordinates timing and efficiency of initiation in fission yeast. Mol.
Cell. Biol. 31, 23802391
34 Knott, S.R. et al. (2012) Forkhead transcription factors establish origin
timing and long-range clustering in S. cerevisiae. Cell 148, 99111
35 Herrick, J. et al. (2000) Replication fork density increases during DNA
synthesis in X. laevis egg extracts. J. Mol. Biol. 300, 11331142
36 Lucas, I. et al. (2000) Mechanisms ensuring rapid and complete DNA
replication despite random initiation in Xenopus early embryos. J. Mol.
Biol. 296, 769786
37 Labit, H. et al. (2008) DNA replication timing is deterministic at the
level of chromosomal domains but stochastic at the level of replicons in
Xenopus egg extracts. Nucleic Acids Res. 36, 56235634
38 Goldar, A. et al. (2008) A dynamic stochastic model for DNA replication
initiation in early embryos. PLoS ONE 3, e2919
39 Gauthier, M.G. and Bechhoefer, J. (2009) Control of DNA replication by
anomalous reactiondiffusion kinetics. Phys. Rev.Lett. 102, 158104
40 Harland, R.M. and Laskey, R.A. (1980) Regulated replication of DNA
microinjected into eggs of Xenopus laevis. Cell 21, 761771
41 Hyrien, O. and Mechali, M. (1993) Chromosomal replication initiates
and terminates at random sequences but at regular intervals in the
ribosomal DNA of Xenopus early embryos. EMBO J. 12, 45114520
42 Yang, S.C. and Bechhoefer, J. (2008) How Xenopus laevis embryos
replicate reliably: investigating the random-completion problem. Phys.
Rev. E 78, 041917
43 Graham, C.F. (1966) The regulation of DNA synthesis and mitosis in
multinucleate frog eggs. J. Cell Sci. 1, 363374
44 Goldar, A. et al. (2009) Universal temporal profile of replication origin
activation in eukaryotes. PLoS ONE 4, e5899
45 Blumenthal, A.B. et al. (1974) The units of DNA replication in
Drosophila melanogaster chromosomes. Cold Spring Harb. Symp.
Quant. Biol. 38, 205223
46 Lima-de-Faria, A. and Jaworska, H. (1968) Late DNA synthesis in
heterochromatin. Nature 217, 138142
47 Gilbert, N. et al. (2004) Chromatin architecture of the human genome:
gene-rich domains are enriched in open chromatin fibers. Cell 118,
555566
48 Schwaiger, M. and Schubeler, D. (2006) A question of timing: emerging
links between transcription and replication. Curr. Opin. Genet. Dev. 16,
177183
49 MacAlpine, D.M. et al. (2004) Coordination of replication and
transcription along a Drosophila chromosome. Genes Dev. 18, 3094
3105
50 Hiratani, I. et al. (2009) Replication timing and transcriptional control:
beyond cause and effect: part II. Curr. Opin. Genet. Dev. 19, 142149
51 Lieberman-Aiden, E. et al. (2009) Comprehensive mapping of longrange interactions reveals folding principles of the human genome.
Science 326, 289293
52 Ryba, T. et al. (2010) Evolutionarily conserved replication timing
profiles predict long-range chromatin interactions and distinguish
closely related cell types. Genome Res. 20, 761770
53 Hayashi, M.T. and Masukata, H. (2011) Regulation of DNA replication
by chromatin structures: accessibility and recruitment. Chromosoma
120, 3946
54 Lebofsky, R. et al. (2006) DNA replication origin interference increases
the spacing between initiation events in human cells. Mol. Biol. Cell 17,
53375345
380
Review
84 Norio, P. and Schildkraut, C.L. (2001) Visualization of DNA
replication on individual EpsteinBarr virus episomes. Science 294,
23612364
85 Kitamura, E. et al. (2006) Live-cell imaging reveals replication of
individual replicons in eukaryotic replication factories. Cell 125,
12971308
86 van Oijen, A.M. and Loparo, J.J. (2010) Single-molecule studies of the
replisome. Annu. Rev. Biophys. 39, 429448
87 Riehn, R. et al. (2005) Restriction mapping in nanofluidic devices. Proc.
Natl. Acad. Sci. U.S.A. 102, 1001210016
88 Sidorova, J.M. et al. (2009) Microfluidic-assisted analysis of replicating
DNA molecules. Nat. Protoc. 4, 849861
89 Raghuraman, M.K. et al. (2001) Replication dynamics of the yeast
genome. Science 294, 115121
90 Woodfine, K. et al. (2004) Replication timing of the human genome.
Hum. Mol. Genet. 13, 191202
381
Review
364
Glossary
Acheiropodia: an autosomal recessive disorder that results in severe truncations of the arms and legs, such that there is lack of the distal extremities.
Acrocapitofemoral dysplasia: a rare recessive condition characterized mainly
by short limbs, dwarfism and cone-shaped epiphyses at the joints, mainly in
the hands and hips.
Apical ectodermal ridge (AER): a specialized ectodermal structure that forms
along the distal edge of the limb bud and acts as a major signaling center
through the FGFs.
Brachydactyly: a condition that affects the length of the digits, making the
fingers and toes appear shorter.
Craniosynostosis Philadelphia type: craniosynostosis is a condition in which
one or more of the bony primordia of the infant skull prematurely ossifies, thus
changing the growth pattern of the skull. Philadelphia type has associated
syndactyly of the hands and feet.
Preaxial and postaxial polydactyly: polydactyly means additional digits and
pre-and postaxial refer to the side of the hand or foot that the extra digit
appears. Preaxial is the thumb and big toe side; whereas postaxial is the
opposite side.
Syndactyly: a condition in which two or more digits are fused together.
Syndromic: a syndromic condition is characterized by having several
recognizable clinical features that occur together and are associated for
diagnosis. A nonsyndromic condition has a single clinical feature.
Triphalangeal thumb: whereas each finger has three phalanges (the small
bones of the digits), the thumb only has two. In this condition, the thumb has
an extra phalanx and often has the appearance of a finger.
Zone of polarizing activity (ZPA): an area of mesenchymal cells located along
the posterior margin of the limb bud that produces SHH. SHH patterns the early
limb bud along the AP axis, specifying digit identity and the number of digits
that will form.
ZPA regulatory sequence (ZRS): an approximately 800-bp cis-regulatory
sequence that is necessary and sufficient for the limb specific expression of
the Shh gene.
0168-9525/$ see front matter 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tig.2012.03.012 Trends in Genetics, August 2012, Vol. 28, No. 8
Review
(a)
(b)
AER (FGFs)
GLI3R
AER (FGFs)
SHH
AER
HAND2
SHH
ETV4/ETV5
ETS1/GABP
5 HOXD
HAND2
GLI3
(c)
GLI3A
TRENDS in Genetics
Figure 1. Expression of genes that polarize the limb and regulate sonic hedgehog (Shh) expression in the zone of polarizing activity (ZPA). The earliest limb bud (a) is
polarized by the expression of GLI3 in the anterior (A) and by HAND2 (which downregulates GLI3) in the posterior (P). The expression of the 50 Hoxd and Shh genes follow
Hand2 expression and Shh is upregulated by HAND2 in the ZPA. Once SHH is produced (b), it maintains the expression of Hand2 and the 50 Hoxd genes in a regulatory loop.
The gradient of GLI3A is shown below. (c) Distal production of ETV4/ETV5 and ETS1/GABPa in overlapping patterns. ETV4/ETV5 ensures that ectopic expression does not
occur in the wild-type limb, whereas ETS1/GABPa determines the position of the Shh expression boundary. Abbreviations: AER, apical ectodermal ridge; FGFs, fibroblast
growth factors 4, 8, 9 and 17.
Review
(a)
Drosophila
No HH signaling
(b)
PTC
SMO
FU COS2
SUFU Ci
kinases
SLIMB
Proteasome
KIF7
SMO
KIF7
SUFU
SUFU
GRK2
GLI3
GLI3
CI-R
CI-R
SUFU
no transcription
of target
genes
PTCH1
GLI3
BTRCP
Drosophila
HH signaling
HH
PTC
B-ARRESTIN
PTCH1
KIF3A
H
SH
GLi3-A
GLI3
kinases
Proteasome
SMO
SMO
GLI3-R
FU
kinases
COS2
SUFU
CI-A
Ci
GLi3-R
transcription
of target
genes
no transcription
of target
genes
Vertebrate
No SHH signaling
GLi3-A
transcription
of target
genes
Vertebrate
SHH signaling
TRENDS in Genetics
Figure 2. Conservation of the Hh signaling pathway. (a) Schematic representation of key components of the Drosophila HH signaling pathway in the absence (top) or
presence (bottom) of HH. In the absence of ligand, Patched (PTC) inhibits Smoothened (SMO), which is held in intracellular vesicles (yellow ovals). A complex of proteins,
including cubitus interruptus (CI), Costal2 (COS2) and several kinases, is established. Phosphorylation of CI establishes recognition signals for SLIMB leading to partial
degradation of CI by the proteasome and formation of the repressor form (CI-R). CI-R then translocates to the nucleus, where it represses transcription of HH targets.
Binding of secreted HH to PTC, blocks PTC activity and releases SMO from inhibition. SMO moves to the plasma membrane, where phosphorylation allows interaction with
COS2. Subsequent phosphorylation of COS2 by FU leads to release of unphosphorylated, full-length CI, which can translocate to the nucleus where it promotes
transcriptional activation. (b) Schematic representation of key conserved components of the vertebrate HH signaling pathway. The cilium (which is absent in Drosophila) is
represented by the central axoneme and the centrosome and basal bodies (gray). In the absence of SHH ligand, PTCH1 inhibits SMO, which is held in intracellular vesicles.
GLI3 is kept in the primary cilium in a complex with KIF7 and SUFU. Phosphorylation of GLI3 by kinases allows its recognition by b-TRCP and leads to partial degradation by
the proteasome, resulting in the formation of the repressor molecule. Activation by SHH relieves the inhibition of SMO by PTCH1. SMO becomes phosphorylated by GRK2,
binds to b-ARRESTIN and KIF3A, and is trafficked to the cilium. This relieves the inhibitory effect of SUFU and allows the full-length GLI3 to translocate to the nucleus and
activate target genes. Homologous genes in Drosophila and vertebrates are colored similarly.
Review
G
G
T
G
G
463T
475A
477A
555G
621C
739A
743T
T
T
A,C
G
A
396C
402C
404G
406A
407T
769T
C
G
C
A
T
295T
297G
305A
329T
334T
C
A
105C
252G
258G
Werner mesomelic
syndrome
G
(a)
Rnf32
ZRS
Shh
800kb
ZRS duplication
(b)
Chromosomal
breakpoint
Hs chromosome 7
q22.1
q36.3
Shh
Rnf32
ZRS
Lmbr1
Rnf32
ZRS
Lmbr1
Shh
TRENDS in Genetics
Figure 3. Mutations and chromosomal lesions in the Shh locus responsible for limb abnormalities. (a) Shh gene and the upstream regulatory region (including enhancers
shown as pink boxes). The ZRS cis-activator is shown as a gray box inside the Lmbr1 gene, enlarged above to show the number and position of the point mutations that
cause preaxial polydactyly. Some of the other Shh enhancers are shown in pink. The position [30] of the human (black), mouse (red), cat (green) and chicken (gray)
mutations are shown above the enlarged ZRS. The position of the ETS (ETS1/GABPa) (green ovals) and ETV (ETV4/ETV5) (blue ovals) binding sites identified by biochemical
and chromatin immunoprecipitation methods [17] are shown below the ZRS. The position of the Werner mesomelic syndrome mutations [29] are highlighted in blue. Below
the wild-type Shh locus is a representative intrachromosomal duplication that results in triphalangeal thumb-polysyndactyly (TPTPS). These duplications can be of various
sizes, such that the duplicated ZRSs can reside at various distances from the other. (b) Approximate position of the breakpoints of the intrachromosomal inversion on
human chromosome 7. Below the chromosome is a representation of the position of the Shh gene before and after the inversion, showing that Shh is now regulated by
another limb enhancer, a process called (enhancer adoption) [41].
Review
different clinical classifications that show a robust genotypephenotype correlation but comprise an overlapping
spectrum of digit abnormalities. These are preaxial polydactyly type II (PPD2, MIM# 174500), which includes
isolated triphalangeal thumb, triphalangeal thumbpolysyndactyly syndrome (TPTPS), syndactyly type IV
(SD4, MIM# 186200) and Werner mesomelic syndrome
(WMS) [13,2532] (Figure 4). It has been suggested that
this group of limb defects should be collectively referred to
as ZRS-associated syndromes [29].
PPD2 is characterized by a triphalangeal thumb
(Figure 4) sometimes leading to the appearance of a fivefingered hand and, in some cases, may be accompanied by
additional digits. Fifteen single-point mutations in the
human ZRS have been identified that are associated with
this limb abnormality (Figure 3). Extra toes have also been
frequently observed in other species, including mice
[13,33,34], cats [35] and chickens [3638], and these abnormalities are associated with seven more point mutations in the ZRS. A mutation in polydactylous dogs was
4
5
Isolated
triphangial thumb
4 5
Normal hand
(b)
Preaxial
polydactyly type2
Postaxial
polydactyly typeA
Triphalangial thumb
polysyndactyly
(c)
TRENDS in Genetics
Figure 4. Representative phenotypes for each of the limb abnormalities caused by misexpression of the Shh gene. (a) Types of digit abnormality of the hands caused by
misexpression of the Shh gene. Bones are represented along the top, and each digit is numbered, the triphalangeal thumb is labeled T and digits that cannot be accurately
identified are labeled with an asterisk. Below are pictures of hands of patients with the various disorders [26,28,87,88]. Werner mesomelic syndrome [29] in (b) is associated
with short-limb dwarfism. The X-ray shows the tibial hypoplasia in the right leg (the white arrow indicates the end of the femur). A patient with acheiropodia [44] is shown in
(c) exhibiting the severe limb truncations that characterize this abnormality.
368
Review
found in a conserved domain upstream of the ZRS, called
the pre-ZRS; however, it is not clear how this domain
regulates Shh expression [39]. Much of the understanding
of the molecular mechanism underlying PPD2 comes from
studies in mice. A single nucleotide change in the sequence
of the ZRS is sufficient to generate ectopic production of
Shh such that it is anomalously expressed at the opposite,
anterior margin of the limb bud [30,35,40]. Ectopic Shh
expression presumably produces an additional ZPA and,
consequently, affects the GLI3R:GLI3A ratio, leading to
respecification of the developing anterior digits. The phenotypic outcome is seen in some cases as the transformation of the thumb to a fifth finger, often accompanied by the
production of additional digits.
Mechanisms that give rise to anomalous expression of
Shh are being investigated. The two ETS factors that
regulate the SHH expression boundary play a central role
in generating polydactyly in two different families [19]. In
these families, ZRS point mutations were shown to give
rise to new, additional ETS1/GABPa binding sites, leading
to the upregulation of the ZRS in the posterior margin of
the limb bud, setting a wider boundary of expression and
causing ectopic expression at the anterior margin. Because
both Ets1 and Gabpa genes are expressed at the anterior
margin (in mice) and the ZRS is primed for expression in
this ectopic region [18], the additional binding sites are
sufficient to override the inhibition of Shh expression and
cause ectopic expression. Another point mutation that
changes transcription factor binding to the ZRS was
reported for a polydactylous mouse designated DZ [34].
In this case, the point mutation introduced a higher affinity
binding site in the ZRS recognized by the nuclear factor
HnRNP U, which was postulated to mediate the interaction between the cis-regulator and the 50 end of the Shh
gene.
WMS (Figure 4) is an autosomal dominant disorder with
preaxial polydactyly of the hands and feet that also shows
the additional, distinctive characteristic of associated
dwarfism [13,29]. This condition appears to be at the severe
end of the phenotypic spectrum of ZRS mutations. The
short stature is the result of tibial hypoplasia (i.e. very
small or absent tibia). The molecular basis for this disorder
is also a point mutation, but at a specific position, nucleotide 404 (either a G>A or G>C change) (Figure 3), of the
ZRS. Again, this mutation is likely to have an effect on
transcription factor binding that is causative of the phenotype. Analysis of ZRS activity carrying the G>A mutation
by mouse transgenesis suggests that expression in the
ectopic domain occurs at a high level and extends broadly
along the anterior limb-bud margin [35]. This level of
ectopic SHH production may disrupt specification of the
tibia and affect chondrogenesis.
Recently, the genetic basis of a severe form of polysyndactyly (extra digits with fusions of digits, particularly of
the hands) was reported. Haas type (syndactyly type IV)
polysyndactyly and TPTPS [2729] (Figure 4) show a
consistent association with intrachromosomal duplications involving the genomic region that contains the
ZRS, leading to a tandem duplication of the ZRS (or
triplication in one patient) (Figure 3). The molecular
mechanism that gives rise to this limb phenotype is not
Review
Ihh misregulation also causes limb abnormalities
Another Hh signaling factor, Ihh, is expressed in the
cartilage of the developing long bones in the limb. Here,
Ihh is expressed within the growth plate, where it is
responsible for regulation of chondrocyte proliferation
and differentiation [45]. Ihh is not expressed at early
limb-bud stages when Shh is expressed in the posterior
mesenchyme, suggesting that Ihh has a role distinct from
Shh. Despite this, IHH and Shh operate along similar
signaling pathways, including regulation of the conserved
target GLI [46].
In humans, loss-of-function mutations in the IHH gene
result in the autosomal recessive condition acrocapitofemoral dysplasia (MIM# 607778) [47], while gain-offunction mutations of IHH result in brachydactyly type
A1 (MIM# 112500) [48,49]. Evidence suggests that the Ihh
gene has a similar regulatory landscape as Shh and that
Ihh is also under long-range regulatory control. This has
been highlighted through analysis of three families with
syndactyly type 1 (including some family members with
polydactyly) and craniosynostosis Philadelphia type
(MIM# 601222). This condition was found to map to a
single locus at 2q35. Further analysis revealed that all
three families contained distinct microduplications, but all
shared the same 9-kb region located within the intron of a
gene 40 kb upstream of IHH. This shared region contains a
putative distant regulator of IHH and represents a similar
situation to the duplication of the ZRS in the cases of
TPTPS [50].
Disruption of the long-range regulation of Ihh is also
considered to be the cause of the polydactyly phenotype
seen in the Doublefoot (Dbf ) mouse mutant. Dbf is an
autosomal dominant mutation that results in extreme
polydactyly of all four limbs, containing six to nine digits
on each paw that are triphalangeal and arise preaxially
[51,52]. Ihh is expressed ectopically within the mutant
limb bud across the AP axis, disrupting normal SHH
activity and overriding Shh expression usually driven by
the ZRS. A 600-kb deletion starting approximately 50 kb
upstream of Ihh underlies the Dbf phenotype. This region
is expected to contain a cis-acting regulatory element,
which could be a repressor of Ihh expression that is removed by the deletion or, alternatively, a cryptic enhancer
that may normally be located beyond the deleted region
and moves into an activating position [53].
Gli3 mutants affect Hh signaling
The zinc finger-containing transcription factor GLI3 is the
ultimate target for Shh signaling in the early limb bud [54].
Heterozygous mutations in the GLI3 gene cause Greig
cephalopolysyndactyly syndrome (GCPS: MIM# 175700)
and PallisterHall syndrome (PHS MIM# 146510), both
of which include polydactyly in the spectrum of disorders
[55,56]. In addition, in rare cases, GLI3 mutations cause
nonsyndromic polydactyly (MIM# 174700). The PHS and
GCPS phenotypes are clinically distinct and, as with the
Shh regulatory mutants, there is a robust genotypephenotype correlation [57]. The polydactyly phenotype in PHS has
a central or insertional polydactyly; whereas GCPS exhibits
pre- or postaxial polydactyly (most commonly preaxial of the
feet and postaxial of the hands) with variable syndactyly
370
Review
Box 2. Vertebrate Hh signaling in the cilium
The main difference between mammalian and Drosophila Hh
signaling is the central role played by cilia in mammals but not in
flies [6] (Figure 2, main text). Drosophila lacking cilia develop almost
normally, indicating that cilia are not required for Drosophila Hh
signaling [82]. In vertebrates, several steps from recognition of SHH
to the processing of GLI1-3 (here referred to as GLI) in the limb
involve the cilia and IFT [83]. The cilium is maintained and extended
by transport of particles along the axoneme (reviewed in [6062]).
The transport of molecules toward the cilia tip, via IFT, is called
anterograde trafficking (kinesin motor driven) and down the
axoneme toward the base of the cilia is referred to as retrograde
trafficking (dynein driven) [62,63].
Signal transduction takes place in the cilia, where PTCH1 is
located in the absence of the ligand and represses the function of
smoothened (SMO), which resides in the repressed state in
cytoplasmic vesicles [84]. Upon activation by SHH, PTCH1 is
internalized and SMO is phosphorylated by a G protein-coupled
receptor kinase (GRK2). This phosphorylation promotes SMO
binding to b-arrestin and Kif3a, a requirement for the trafficking of
SMO into the cilium, where it activates GLI.
Full-length GLIs are present in the cilia in a complex with the
anterograde IFT kinesin motor KIF7 [68]. SUFU promotes the
truncation of GLI into the repressor form (GLIR) and the retrograde
IFT-dynein motor enables GLIR to reach the nucleus. Activation of
SMO relieves the inhibition that SUFU exerts and promotes the
activator form of GLI (GLIA) [85,86]. This process is promoted by
KIF7, which may also block the function of SUFU. GLIA reaches the
nucleus and activates the transcription of Hh targets genes, which
include PTCH.
In the absence of SHH signaling, the processing of GLIs requires
regulated proteolysis by the large multiprotein proteasome complex. The GLIs are sequentially phosphorylated by kinases producing a phosphopeptide domain that is recognized by b-TrCP, which
recruits an SCF E3 ubiquitin ligase complex. Ubiquitination targets
Gli3 to the proteosome and initiates a limited degradation process,
allowing GLIR to be transported to the nucleus, where it inhibits
transcription [6].
Review
15 Nissim, S. et al. (2007) Characterization of a novel ectodermal signaling
center regulating Tbx2 and Shh in the vertebrate limb. Dev. Biol. 304,
921
16 Tarchini, B. and Duboule, D. (2006) Control of Hoxd genes colinearity
during early limb development. Dev. Cell 10, 93103
17 Capellini, T.D. et al. (2006) Pbx1/Pbx2 requirement for distal limb
patterning is mediated by the hierarchical control of Hox gene spatial
distribution and Shh expression. Development 133, 22632273
18 Amano, T. et al. (2009) Chromosomal dynamics at the Shh locus: limb
bud-specific differential regulation of competence and active
transcription. Dev. Cell 16, 4757
19 Lettice, L.A. et al. (2012) Opposing functions of the ETS factor family
define Shh spatial expression in limb buds and underlie polydactyly.
Dev. Cell 22, 459467
20 Mao, J. et al. (2009) Fgf-dependent Etv4/5 activity is required for
posterior restriction of Sonic Hedgehog and promoting outgrowth of
the vertebrate limb. Dev. Cell 16, 600606
21 Zhang, Z. et al. (2009) FGF-regulated Etv genes are essential for
repressing Shh expression in mouse limb buds. Dev. Cell 16, 607613
22 Zhang, Z. et al. (2010) Preaxial polydactyly: interactions among ETV,
TWIST1 and HAND2 control anterior-posterior patterning of the limb.
Development 137, 34173426
23 Fernandez-Teran, M. and Ros, M.A. (2008) The apical ectodermal
ridge: morphological aspects and signaling pathways. Int. J. Dev.
Biol. 52, 857871
24 Hill, R.E. (2007) How to make a zone of polarizing activity: insights into
limb development via the abnormality preaxial polydactyly. Dev.
Growth Differ. 49, 439448
25 Albuisson, J. et al. (2011) Identification of two novel mutations in Shh
long-range regulator associated with familial pre-axial polydactyly.
Clin. Genet. 79, 371377
26 Gurnett, C.A. et al. (2007) Two novel point mutations in the long-range
SHH enhancer in three families with triphalangeal thumb and
preaxial polydactyly. Am. J. Med. Genet. A 143, 2732
27 Klopocki, E. et al. (2008) A microduplication of the long range SHH
limb regulator (ZRS) is associated with triphalangeal thumbpolysyndactyly syndrome. J. Med. Genet. 45, 370375
28 Sun, M. et al. (2008) Triphalangeal thumb-polysyndactyly syndrome
and syndactyly type IV are caused by genomic duplications involving
the long range, limb-specific SHH enhancer. J. Med. Genet. 45, 589595
29 Wieczorek, D. et al. (2010) A specific mutation in the distant sonic
hedgehog (SHH) cis-regulator (ZRS) causes Werner mesomelic
syndrome (WMS) while complete ZRS duplications underlie Haas
type polysyndactyly and preaxial polydactyly (PPD) with or without
triphalangeal thumb. Hum. Mutat. 31, 8189
30 Furniss, D. et al. (2008) A variant in the sonic hedgehog regulatory
sequence (ZRS) is associated with triphalangeal thumb and
deregulates expression in the developing limb. Hum. Mol. Genet. 17,
24172423
31 Farooq, M. et al. (2010) Preaxial polydactyly/triphalangeal thumb is
associated with changed transcription factor-binding affinity in a
family with a novel point mutation in the long-range cis-regulatory
element ZRS. Eur. J. Hum. Genet. 18, 733736
32 Semerci, C.N. et al. (2009) Homozygous feature of isolated
triphalangeal thumb-preaxial polydactyly linked to 7q36: no
phenotypic difference between homozygotes and heterozygotes. Clin.
Genet. 76, 8590
33 Masuya, H. et al. (2007) A series of ENU-induced single-base
substitutions in a long-range cis-element altering Sonic hedgehog
expression in the developing mouse limb bud. Genomics 89, 207214
34 Zhao, J. et al. (2009) HnRNP U mediates the long-range regulation of
Shh expression during limb development. Hum. Mol. Genet. 18, 3090
3097
35 Lettice, L.A. et al. (2008) Point mutations in a distant sonic hedgehog
cis-regulator generate a variable regulatory output responsible for
preaxial polydactyly. Hum. Mol. Genet. 17, 978985
36 Dunn, I.C. et al. (2011) The chicken polydactyly (Po) locus causes allelic
imbalance and ectopic expression of Shh during limb development.
Dev. Dyn. 240, 11631172
37 Maas, S.A. et al. (2011) Identification of spontaneous mutations within
the long-range limb-specific Sonic hedgehog enhancer (ZRS) that alter
Sonic hedgehog expression in the chicken limb mutants
oligozeugodactyly and Silkie breed. Dev. Dyn. 240, 12121222
372
Review
65 Haycraft, C.J. et al. (2005) GLI2 and GLI3 localize to cilia and require
the intraflagellar transport protein polaris for processing and function.
PLoS Genet. 1, e53
66 Hildebrandt, F. et al. (2011) Ciliopathies. N. Engl. J. Med. 364, 1533
1543
67 Dafinger, C. et al. (2011) Mutations in KIF7 link Joubert syndrome
with Sonic Hedgehog signaling and microtubule dynamics. J. Clin.
Invest. 121, 26622667
68 Liem, K.F., Jr et al. (2009) Mouse Kif7/Costal2 is a cilia-associated
protein that regulates Sonic hedgehog signaling. Proc. Natl Acad. Sci.
U.S.A 106, 1337713382
69 Endoh-Yamagami, S. et al. (2009) The mammalian Cos2 homolog Kif7
plays an essential role in modulating Hh signal transduction during
development. Curr. Biol. 19, 13201326
70 Cheung, H.O. et al. (2009) The kinesin protein KIF7 is a critical
regulator of Gli transcription factors in mammalian hedgehog
signaling. Sci. Signal. 2, ra29
71 Zaghloul, N.A. and Katsanis, N. (2009) Mechanistic insights into
BardetBiedl syndrome, a model ciliopathy. J. Clin. Invest. 119, 428437
72 Ocbina, P.J. et al. (2011) Complex interactions between genes
controlling trafficking in primary cilia. Nat. Genet. 43, 547553
73 Gallet, A. (2011) Hedgehog morphogen: from secretion to reception.
Trends Cell Biol. 21, 238246
74 Murone, M. et al. (1999) Sonic hedgehog signaling by the patchedsmoothened receptor complex. Curr. Biol. 9, 7684
75 Aza-Blanc, P. et al. (1997) Proteolysis that is inhibited by hedgehog
targets Cubitus interruptus protein to the nucleus and converts it to a
repressor. Cell 89, 10431053
76 Hooper, J.E. and Scott, M.P. (2005) Communicating with Hedgehogs.
Nat. Rev. Mol. Cell Biol. 6, 306317
373
Review
Department of Psychiatry and Behavioral Sciences, and Center for Therapeutic Innovation, University of Miami Miller School of
Medicine, Miami, FL 33136, USA
2
St Laurent Institute, Cambridge, MA 02139, USA
0168-9525/$ see front matter 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tig.2012.03.013 Trends in Genetics, August 2012, Vol. 28, No. 8
389
Review
NAT
mRNA
TRENDS in Genetics
Figure 1. Epigenetic regulation induced by NATs. NATs regulate the epigenetic landscape of genomic loci from which they are transcribed (cis regulation). A specific
secondary structure permits the NAT to interact with different chromatin-modifying enzymes (green, red and purple shapes), thereby coordinating their action and directing
specific epigenetic modifications of the nearby chromatin (green and red flags). Locus specificity may be achieved through sequence-specific interactions between the NAT
and the DNA.
390
Review
One of the two mammalian female X chromosomes is
inactivated via an RNA-based mechanism in which the
antisense ncRNA Xist, expressed from the X chromosome,
mediates the recruitment of polycomb repressive complex
2 (PRC2) that in turn catalyzes the heterochromatinization
of the entire X chromosome [21,49].
A similar mechanism of RNA-based epigenetic regulation of gene expression was found to silence various
imprinted mammalian alleles. Most imprinted mammalian genes associate in clusters [50], and the presence of
NATs is a common feature of these loci [26,51,52]. For
example, Air is an imprinted, paternally expressed
lncRNA transcribed from the second intron of the mouse
insulin-like growth factor 2 receptor (Igf2r) gene [53]. In
mouse placenta, expression of Air induces the epigenetic
silencing of both the paternal allele of Igf2r, from which Air
is expressed, and neighboring upstream genes. Although
the transcription unit of Air only overlaps with Igf2r, Air
recognizes and binds to the promoter regions of its neighboring genes. The molecular mechanisms underlying these
interactions have not been clarified and might rely on a
specific secondary structure adopted by Air or on the
involvement of mediator proteins. The Air ncRNA interaction with the promoter of upstream genes in the cluster
results in the recruitment of the HMT G9a, which generates a repressive chromatin state [54]. The ability of Air to
silence non-overlapping genes in cis is reminiscent of Xistinduced X-chromosome inactivation. In the case of Xist,
epigenetic silencing spreads through the entire X chromosome, in contrast to the case of imprinted genes where
epigenetic silencing spreads only to a significant portion of
the locus. The extent of the spread of epigenetic silencing
may be related to the presence of insulator elements in the
DNA sequence and their association with the CCCTCbinding factor (CTCF) [55], a multifunctional protein that
enables insulator function and facilitates higher-order
chromatin interactions [56].
Another interesting example of imprinting regulation is
the antisense ncRNA transcript Kcnq1ot1, which is transcribed from intron 10 of the imprinted gene Kcnq1 [57].
This paternally expressed NAT silences Kcnq1 in cis, as
well as neighboring genes on the paternal chromosome, by
controlling chromatin and DNA modifications at that locus
[58]. Kcnq1ot1 mediates the allele-specific deposition of the
repressive histone marks H3K27me3 and H3K9me3 by
direct interaction with the PRC2 components Ezh2,
Suz12 and the H3K9-specific HMT, G9a [58,59]. Similar
to the situation with Air, the epigenetic changes caused by
Kcnq1ot1 occur outside the sequence boundary of this
lncRNA, emanating bidirectionally from the Kcnq1 locus.
Some of the imprinted genes in this cluster, although
silenced, lack Kcnq1ot1 enrichment [60].
Based on these examples, cis-acting NATs may remain
linked to their transcription loci but exert their regulatory
function on the neighboring genes via the recruitment of
different proteins and the organization of higher-order
chromatin structures. The presence or absence of insulator
elements may influence the extension of chromatin alterations in each locus [61]. In this hypothetical scenario, the
antisense transcript acts as a scaffold for the recruitment
of chromatin-modifying enzymes, initiating events that
Review
Bidirectional transcription at the p21 locus generates an
antisense transcript and p21 mRNA. The p21 NAT
represses p21 mRNA in a process involving the deposition
of the repressive histone mark H3K27me3 [66]. This mechanism is AGO1-independent, further excluding involvement of endogenous small RNA mediators in the
process. Thus, depending on the cellular context, an imbalanced expression of NATs can result in the silencing or
activation of partner protein-coding genes, providing an
interesting potential mechanism to explain the aberrant
upregulation or silencing of cancer-related genes.
Among the different body tissues, the brain expresses a
high abundance of ncRNAs [67]. Discovered in the developing mouse forebrain, the NAT Evf2 is transcribed from
the ultra-conserved Dlx5/6 region encoding the homeodomain transcription factors DLX5 and DLX6 [68]. Evf2
forms a complex with the DLX-2 homeodomain protein
to function as a transcriptional coactivator that increases
Dlx5/6 enhancer activity [68]. Recently, studies of an Evf2
loss-of-function mouse revealed more complex regulatory
functions of this NAT in the development of GABAergic
interneurons [69]. Through antisense interference, Evf2
negatively regulates the expression of Dlx6 mRNA. Moreover, Evf2 exerts a silencing effect on Dlx5 by recruiting
DLX and the methyl CpG binding protein 2 (MECP2) to the
enhancer region [69]. Mutant Evf2 mice have reduced
numbers of GABAergic interneurons in the dentate gyrus
of the early postnatal hippocampus and reduced synaptic
inhibition in the adult hippocampus [69]. This study highlights the importance of NATs in regulating gene expression during neuronal maturation and raises the possibility
of a more extended role of antisense transcripts in central
nervous system development.
In recent studies, repeat expansion diseases have often
been characterized by bidirectional transcription overlapping the repeat region [70]. Spinocerebellar ataxia type 7
(SCA7) is a neurological disorder associated with a polyglutamine repeat (CAG) expansion in the ataxin-7 gene
(ATXN7) [71]. SCAANT1 is a 1.4 kb long NAT overlapping
the ATXN7 gene that is actively transcribed upon CTCF
binding to target sites flanking the CAG repeat region [72].
SCAANT1 expression is associated with an increased level
of the repressive H3K27me3 mark and a decreased level of
the activating histone H3 acetylation mark at the ATXN7
promoter. The pathological increase of CAG expansion is
accompanied by reduced expression of SCAANT1 ncRNA
and increased expression of ATXN7 mRNA, showing an
inverse relationship between the NAT and its partner
sense transcript [72]. This study reveals an interesting
NAT-based mechanism that is potentially involved in
SCA7 pathogenesis.
NATs can silence gene expression in cis, making them
attractive therapeutic targets to achieve specific upregulation of gene expression. It has recently been shown that
brain-derived neurotrophic factor (BDNF) is under the
epigenetic control of an antisense transcript, BDNF-AS
[73]. Depletion of BDNF-AS can alter chromatin marks at
the BDNF locus and upregulate locus-specific gene expression. This study also described NAT-mediated endogenous
gene suppression of glia-derived neurotrophic factor
(GDNF) and ephrin B2 receptor (EPHB2), suggesting that
392
Review
pervasively transcribed in the human genome, and particularly those originating in the proximity of the transcriptional start sites (TSSs) of many active genes. However,
cell-, tissue- and developmental-specific transcription of
lncRNAs argues against the simplistic assumption that
these arise from transcriptional noise. Moreover, removal
of these ncRNAs often correlates with functional consequences. Aside from NATs, the human genome produces
many other classes of lncRNAs. For example, the analysis
of chromatin signatures revealed a family of over 1000
highly conserved lncRNAs, termed large intergenic noncoding RNAs (lincRNAs), that contain sense and antisense
members with many potential regulatory functions [41].
RNA-IP experiments of the PRC2 complex component
EZH2 followed by hybridization to a custom exon-tiling
array for 900 human lincRNAs showed that almost 30% of
expressed lincRNAs physically interact with PRC2 [76].
Immunoprecipitation of lncRNAs with EZH2 is highly
suggestive of functional roles of these transcripts through
the PRC2 pathway. The catalog of lincRNAs encoded in the
human genome as well as the understanding of their roles
in mediating the function of chromatin-modifying complexes is rapidly expanding.
Unlike most NATs, lincRNAs exert their regulatory
roles in trans to alter chromatin shape and gene expression
at distant loci. HOTAIR is a lincRNA encoded in antisense
orientation in the HOX-C cluster on chromosome 12 that is
necessary for the correct expression of the HOX-D cluster
of genes on chromosome 2 [23]. HOTAIR associates with
the PRC2 complex to silence and maintain a large domain
of heterochromatin in the HOX-D gene cluster. Genomic
regions flanking HOX-D contain high levels of H3K27me3
and low levels of H3K4me2/3 [77]. It was shown in several
cellular systems that HOTAIR acts as a modular scaffold
for the recruitment of both PRC2 and LSD1, the catalytic
subunit of the repressor complex CoREST/REST, which in
turn coordinate the methylation of H3K27me3 and demethylation of H3K4me2/3, respectively, in trans at many
different target genomic regions [78]. Interestingly, altered
HOTAIR expression in primary breast tumors is a
powerful predictor of metastasis and poor prognosis [35].
Inhibition of HOTAIR expression in cancer cells reduces
invasiveness and metastatic potential, consistent with its
physiological function in dictating chromatin states of
fibroblast during development [35].
A loss-of-function study in mESCs produced a functional
characterization of a large number of lincRNAs [32]. It was
shown that lincRNAs maintain the pluripotent state and
repress lineage programs in mESCs via trans-acting
mechanisms of global gene expression regulation. mESCs
lincRNAs associate with 12 different chromatin complexes
involved in different aspects of epigenetic regulation, such
as writers (Tip60/P400, Prc2, Setd8, Eset, Suv39), readers
(Prc1, Cbx1, Cbx3) and erasers (Jarid1b, Jarid1c, Hdac1)
[32]. Seventy-four lincRNAs associate with at least one of
these complexes and several lincRNAs associate with
functionally related chromatin complexes [32]. Because
lincRNAs physically associate with multiple chromatinregulatory proteins, they may serve as scaffolds to
bridge together similar complexes into larger functional
units.
Review
(a)
Sense mRNA
Antisense RNA
(b)
Sense mRNA
Antisense RNA
(c)
Sense mRNA
Antisense RNA
TRENDS in Genetics
sequences have been observed, thus suggesting that specific DNA motifs might be important for the recruitment of
these and other lncRNAs to their target genomic loci.
HOTAIR binding sites contain a GA-rich polypurine motif,
reminiscent of mammalian Polycomb response elements. It
is notable that although the HOTAIR binding sites overlap
with PRC2 and H3K27me3 chromatin regions, they are
restricted to small regions of a few hundred bp, raising the
possibility that HOTAIR nucleates PRC2 binding and
H3K27me3 spreading [82]. These data, together with
394
Concluding remarks
Although the examples of NAT and lncRNA mechanisms
described above suggest a broad continuum of function for
ncRNAs in epigenetic regulation, the exact roles and mechanisms of most of these molecules remain largely unknown. NATs have emerged as powerful transducers of
biological information, primarily due to their ability to
bridge the interaction between proteins and DNA [83].
The information content and structural features of these
ncRNAs collectively establish a dynamic interface with
other macromolecules [83], thus facilitating the formation
and modulation of ribonucleoprotein complexes crucial for
epigenetic signaling. These unique features permit NATs
and other lncRNAs to function as scaffolds to regulate
epigenetic mechanisms within the cell. The key to future
studies of lncRNAs will be to integrate successfully the
layers of knowledge gained from multiple genomic, transcriptomic, proteomic and epigenomic approaches to create
a multidimensional understanding of NATs within the
existing cellular framework [84].
Acknowledgments
The authors would like to thank Dr Chiara Pastori and Roya Pedram
Fatemi for helpful discussions and critical reading of the manuscript. The
research on long ncRNAs in C.W.s laboratory is supported in large part by
grants from the U.S. National Institutes of Health (5R01NS063974 and
5R01MH084880). M.M.s postdoctoral studies are supported by a fellowship
from the Swiss National Science Foundation (PBGEP3-136151).
References
1 Chi, P. et al. (2010) Covalent histone modifications miswritten,
misinterpreted and mis-erased in human cancers. Nat. Rev. Cancer
10, 457469
2 Daniel, J.A. et al. (2005) Effector proteins for methylated histones: an
expanding family. Cell Cycle 4, 919926
3 Cao, R. and Zhang, Y. (2004) The functions of E(Z)/EZH2-mediated
methylation of lysine 27 in histone H3. Curr. Opin. Genet. Dev. 14, 155
164
4 Shi, Y. et al. (2004) Histone demethylation mediated by the nuclear
amine oxidase homolog LSD1. Cell 119, 941953
5 Maurer-Stroh, S. et al. (2003) The Tudor domain Royal Family: Tudor,
plant Agenet, Chromo, PWWP and MBT domains. Trends Biochem.
Sci. 28, 6974
6 Mellor, J. (2006) It takes a PHD to read the histone code. Cell 126, 22
24
7 Kouzarides, T. (2007) Chromatin modifications and their function. Cell
128, 693705
8 van Steensel, B. (2011) Chromatin: constructing the big picture. EMBO
J. 30, 18851895
9 Schubeler, D. (2010) Chromatin in multicolor. Cell 143, 183184
10 Filion, G.J. et al. (2010) Systematic protein location mapping reveals
five principal chromatin types in Drosophila cells. Cell 143, 212224
Review
11 Kharchenko, P.V. et al. (2011) Comprehensive analysis of
the chromatin landscape in Drosophila melanogaster. Nature 471,
480485
12 Ernst, J. and Kellis, M. (2010) Discovery and characterization of
chromatin states for systematic annotation of the human genome.
Nat. Biotechnol. 28, 817825
13 Gerstein, M.B. et al. (2010) Integrative analysis of the Caenorhabditis
elegans genome by the modENCODE project. Science 330, 17751787
14 Bonasio, R. et al. (2010) Molecular signals of epigenetic states. Science
330, 612616
15 Bernstein, E. and Allis, C.D. (2005) RNA meets chromatin. Genes Dev.
19, 16351655
16 Rodriguez-Campos, A. and Azorin, F. (2007) RNA is an integral
component of chromatin that contributes to its structural
organization. PLoS ONE 2, e1182
17 Maison, C. et al. (2002) Higher-order structure in pericentric
heterochromatin involves a distinct pattern of histone modification
and an RNA component. Nat. Genet. 30, 329334
18 Mondal, T. et al. (2010) Characterization of the RNA content of
chromatin. Genome Res. 20, 899907
19 Yap, K.L. et al. (2010) Molecular interplay of the noncoding RNA
ANRIL and methylated histone H3 lysine 27 by polycomb CBX7 in
transcriptional silencing of INK4a. Mol. Cell 38, 662674
20 Yu, W. et al. (2008) Epigenetic silencing of tumour suppressor gene p15
by its antisense RNA. Nature 451, 202206
21 Zhao, J. et al. (2008) Polycomb proteins targeted by a short repeat RNA
to the mouse X chromosome. Science 322, 750756
22 Martianov, I. et al. (2007) Repression of the human dihydrofolate
reductase gene by a non-coding interfering transcript. Nature 445,
666670
23 Rinn, J.L. et al. (2007) Functional demarcation of active and silent
chromatin domains in human HOX loci by noncoding RNAs. Cell 129,
13111323
24 Bierhoff, H. et al. (2010) Noncoding transcripts in sense and antisense
orientation regulate the epigenetic state of ribosomal RNA genes. Cold
Spring Harb. Symp. Quant. Biol. 75, 357364
25 Wang, K.C. and Chang, H.Y. (2011) Molecular mechanisms of long
noncoding RNAs. Mol. Cell 43, 904914
26 Katayama, S. et al. (2005) Antisense transcription in the mammalian
transcriptome. Science 309, 15641566
27 Carninci, P. et al. (2005) The transcriptional landscape of the
mammalian genome. Science 309, 15591563
28 Birney, E. et al. (2007) Identification and analysis of functional
elements in 1% of the human genome by the ENCODE pilot project.
Nature 447, 799816
29 Clark, M.B. et al. (2011) The reality of pervasive transcription. PLoS
Biol 9, e1000625 discussion e1001102
30 Dinger, M.E. et al. (2008) Long noncoding RNAs in mouse embryonic
stem cell pluripotency and differentiation. Genome Res. 18, 1433
1445
31 Ahfeldt, T. et al. (2012) Programming human pluripotent stem cells
into white and brown adipocytes. Nat. Cell Biol. 14, 209219
32 Guttman, M. et al. (2011) lincRNAs act in the circuitry controlling
pluripotency and differentiation. Nature 477, 295300
33 Ji, P. et al. (2003) MALAT-1, a novel noncoding RNA, and thymosin
beta4 predict metastasis and survival in early-stage non-small cell
lung cancer. Oncogene 22, 80318041
34 Faghihi, M.A. et al. (2008) Expression of a noncoding RNA is elevated
in Alzheimers disease and drives rapid feed-forward regulation of
beta-secretase. Nat. Med. 14, 723730
35 Gupta, R.A. et al. (2010) Long non-coding RNA HOTAIR reprograms
chromatin state to promote cancer metastasis. Nature 464, 10711076
36 Wright, M.W. and Bruford, E.A. (2011) Naming junk: human nonprotein coding RNA (ncRNA) gene nomenclature. Hum. Genomics 5,
9098
37 Kapranov, P. et al. (2007) RNA maps reveal new RNA classes and a
possible function for pervasive transcription. Science 316, 14841488
38 Ghildiyal, M. and Zamore, P.D. (2009) Small silencing RNAs: an
expanding universe. Nat. Rev. Genet. 10, 94108
39 Malone, C.D. and Hannon, G.J. (2009) Small RNAs as guardians of the
genome. Cell 136, 656668
40 Carthew, R.W. and Sontheimer, E.J. (2009) Origins and mechanisms of
miRNAs and siRNAs. Cell 136, 642655
Review
68 Feng, J. et al. (2006) The Evf-2 noncoding RNA is transcribed from the
Dlx-5/6 ultraconserved region and functions as a Dlx-2 transcriptional
coactivator. Genes Dev. 20, 14701484
69 Bond, A.M. et al. (2009) Balanced gene regulation by an embryonic
brain ncRNA is critical for adult hippocampal GABA circuitry. Nat.
Neurosci. 12, 10201027
70 Batra, R. et al. (2010) Partners in crime: bidirectional transcription in
unstable microsatellite disease. Hum. Mol. Genet. 19, R77R82
71 Martin, J-J. (2012) Spinocerebellar ataxia type 7. In Handbook of
Clinical Neurology (Vol. 103; Ataxic Disorders) (Subramony, S.H.
and Durr, A., eds), pp. 475491, Elsevier
72 Sopher, B.L. et al. (2011) CTCF regulates ataxin-7 expression through
promotion of a convergently transcribed, antisense noncoding RNA.
Neuron 70, 10711084
73 Modarresi, F. et al. (2012) Inhibition of natural antisense transcripts in
vivo results in gene-specific transcriptional upregulation. Nat.
Biotechnol. (http://dx.doi.org/10.1038/nbt.2158)
74 Straub, T. and Becker, P.B. (2011) Transcription modulation
chromosome-wide: universal features and principles of dosage
compensation in worms and flies. Curr. Opin. Genet. Dev. 21, 147153
75 Ilik, I. and Akhtar, A. (2009) roX RNAs: non-coding regulators of the
male X chromosome in flies. RNA Biol. 6, 113121
396
Review
Institute of Cardiovascular and Medical Sciences, College of Medical, Veterinary and Life Sciences, University of Glasgow,
Glasgow G12 8TA, UK
2
Harvard Medical School, Massachusetts General Hospital, Broad Institute of Harvard University and Massachusetts Institute
of Technology, Boston, MA 02114, USA
0168-9525/$ see front matter 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tig.2012.04.001 Trends in Genetics, August 2012, Vol. 28, No. 8
397
Review
the African-American population in the USA than in either AfroCaribbean or native black African populations [5456]. In some
societies, BP shows only a small age-related increase and may be
related in part to their agrarian lifestyle as well as the high potassium,
low sodium diet of the hunter-gatherer, a more rural lifestyle and a
lower consumption of food [5760]. From an evolutionary perspective,
essential HTN is a disease of civilization with its abundance of
processed foods and long lifespans and could be an undesirable
pleiotropic effect of a genotype that may have optimized fitness in an
ancient environment [61]. The rates of HTN and sodium sensitivity are
generally higher in individuals carrying the ancestral alleles of
sodium-conserving genes, which show strong latitudinal clines with
the ancestral sodium-conserving alleles more prevalent in African
populations and less so in the northern regions [6264].
(b)
Hypertension
Population frequency
Platt
Pickering
Population frequency
(a)
Hypertension
Gene (0-n)
Gene mutation absent
Blood pressure
Environment (0-n)
Blood pressure
TRENDS in Genetics
Figure I. The PlattPickering debate about the quantitative or qualitative nature of HTN. (a) Platt argued that HTN occurred in a discrete subpopulation and was caused
by a single, heritable genetic mutation. (b) Pickering suggested that there was a range of BP levels in the population, and that there was no clear dividing line between
hypertension and normotension; instead, HTN represented the end of a continuum and was therefore polygenic in origin.
398
Review
Table 1. Genetic loci associated with monogenic BP syndromes and identified through GWASa
CHR
GWAS b
Monogenic syndrome
1p36.13
Gene/nearest
gene
CLCNKB
1p36.13
SDHB
1p36.2
MTHFR
(NPPA, NPPB)
1q23.3
SDHC
1q42.2
AGT
2q36.2
CUL3
3p25.3
VHL
3q22.1
ULK4
3q26.2
GBPG, ICBP
4q21.2
MECOM
(MDS1)
FGF5
4q31.2
NR3C2
5p15.3
SDHA
5p13.3
NPR3
5q31.2
KLHL3
6p22.2
HFE
7p22
Pathway
Notes
Renal electrolyte
balance
Paragangliomas 4
OMIM #115310
Sympathetic system
Autosomal recessive
Impaired chloride reabsorption in
the thick ascending loop of Henle
leads to impaired sodium
reabsorption
Low/normal BP
Multiple catecholamine-secreting
head and neck paragangliomas and
retroperitoneal
pheochromocytomas
Methylene-tetrahydrofolate
reductase; has been associated
with changes in plasma
homocysteine levels and preeclampsia. Atrial natriuretic and
brain natriuretic peptides genes
have been associated with
hypertension
Tumors or extra-adrenal
paraganglia- associated
pheochromocytoma
The cleaved products angiotensin I,
angiotensin II and angiotensin III
are known regulators of BP and
sodium homeostasis
Modulation of renal salt, K+ and H+
handling in response to
physiological challenge
Autosomal dominant
Associated with retinal, cerebellar,
and spinal hemangioblastoma,
renal cell carcinoma (RCC),
pheochromocytoma, and
pancreatic tumors
Serine-threonine kinase of
unknown function
Myelodysplasia syndrome
protein 1
Fibroblast growth factor 5;
stimulates cell growth and
proliferation and is associated with
angiogenesis
Autosomal dominant
Missense mutation (S810L) in the
mineralocorticoid receptor
Low-renin, low-aldosterone,
hypokalemia
CHARGE, GBPG,
AGEN, ICBP,
Gene-centric
Paragangliomas 3
OMIM #605373
Renal electrolyte
balance
Sympathetic system
Gene-centric
Pseudohypoaldosteronism
type IIE
OMIM *603136
von HippelLindau syndrome
OMIM #193300
Renal electrolyte
balance,
vascular function
Renal electrolyte
balance
Sympathetic system
Hypertension exacerbation
in pregnancy
OMIM #605115
Renal electrolyte
balance
Pseudohypoaldosteronism
type I
OMIM #177735
Renal electrolyte
balance
Autosomal dominant
Renal unresponsiveness to
mineralocorticoids
Paragangliomas 5
OMIM #614165
Sympathetic system
Tumors or extra-adrenal
paraganglia-associated
pheochromocytoma
Natriuretic peptide clearance
receptor
Modulation of renal salt, K+ and H+
handling in response to
physiological challenge
Autosomal recessive
Iron metabolism
Autosomal dominant
Hyperaldosteronism due to
adrenocortical hyperplasia not
suppressed by dexamethasone
AGEN, ICBP,
Gene-centric
Pseudohypoaldosteronism
type IID
OMIM #614495
Hemochromatosis
OMIM #235200
Familial
hyperaldosteronism type 2
OMIM #605635
Renal electrolyte
balance
Renal electrolyte
balance
ICBP, Gene-centric
Steroid/aldosterone
synthesis
399
Review
Table 1 (Continued )
Gene/nearest
gene
NOS3
Monogenic syndrome
GWAS b
Pathway
Notes
Pregnancy-induced
hypertension
OMIM +163729
HYPERGENES,
Gene-centric
Endothelial function
8q24.3
CYP11B1,
CYP11B2
8q24.3
CYP11B2
8q24.3
CYP11B1
Familial
hyperaldosteronism type 1
Glucocorticoidremediable
aldosteronism (GRA)
OMIM #103900
Corticosterone
methyloxidase
II deficiency
OMIM #61060
Steroid 11b-hydroxylase
deficiency
OMIM #202010
10p12.3
CACNB2
10q11.2
RET
Multiple endocrine
neoplasia type IIA
OMIM #171400
10q24.3
CYP17A1
17a-hydroxylase and/or
17,20-lyase deficiency
OMIM *609300
11p15.1
PLEKHA7
CHARGE, ICBP
11p15.2
SOX6
Gene-centric
Transcription
11p15.5
LSP1/TNNT3
Gene-centric
?Endothelial
function
11q12.2
SDHAF2
Paragangliomas 2
OMIM #601650
Sympathetic
system
11q23.1
SDHD
Paragangliomas 1
OMIM #16800
Sympathetic
system
11q24.3
KCNJ1
Bartter syndrome,
antenatal, type 2
OMIM #241200
Hypertension with
Brachydactyly
Bilginturan syndrome
OMIM %112410
Pseudohypoaldosteronism
type IIC
Gordons syndrome
OMIM #614492
Renal electrolyte
balance
CHR
7q36.1
12p12.2
12p12.3
WNK1
12q21.3
ATP2B1
400
Steroid/aldosterone
synthesis
Steroid/aldosterone
synthesis
Steroid/aldosterone
synthesis
CHARGE, ICBP
?Vascular/cardiac
function
Sympathetic
system
CHARGE, GBPG,
AGEN-BP, ICBP
Steroid/aldosterone
synthesis
Renal electrolyte
balance
?Vascular function
Autosomal dominant
Gain-of-function mutations in
WNK1
Low plasma renin, normal or
elevated K+
Encodes plasma membrane
calcium- or calmodulin-dependent
ATPase expressed in endothelium
Review
Table 1 (Continued )
CHR
Monogenic syndrome
GWAS b
Pathway
Notes
?Endothelial
function
12q24.1
Gene/nearest
gene
SH2B3
12q24.2
TBX5TBX3
15q21.1
SLC12A1
15q24.1
CSK
16p12.2
SCNN1B,
SCNN1G
16p12.3
UMOD
16q13
SLC12A3
Gitelman syndrome
OMIM #263800
Renal electrolyte
balance
16q22.1
HSD11B2
Apparent mineralocorticoid
excess
OMIM # 218030
Steroid/aldosterone
synthesis
17q21.3
17q21.3
ZNF652
WNK4
20q13
GNASEDN3
CHARGE, ICBP
Bartter syndrome,
antenatal, type 1
OMIM #601678
Renal electrolyte
balance
CHARGE, GBPG,
AGEN-BP, ICBP
Liddle syndrome
OMIM #177200
Vascular function
Renal electrolyte
balance
BP-Extremes
?Renal electrolyte
balance
?Renal function
Pseudohypoaldosteronism
type IIB
Gordons syndrome
OMIM #614491
GBPG, ICBP
Vascular function
The key genes at each locus are shown with their known or potential role in BP regulation. The grey shaded rows indicate genes implicated in monogenic syndromes of
high/low BP. The Pathway column is color-coded according to the pathway involved.
GWAS studies: AGEN [7], BP-Extremes [10], CHARGE [8], GBPG [9], Gene-centric [6], HYPERGENES [11], and ICBP [5].
pressure [6,25,26]; and proxies for rs1004467 show genome-wide significant associations with coronary artery
disease, schizophrenia, intracranial aneurysm and parkinsonism [6,810,24,2730]. This illustrates the challenges ahead when attempting to design studies to
functionally dissect these signals. Figure 1a also shows
genes that are associated with monogenic syndromes
from Online Mendelian Inheritance in Man (OMIM) that
occur in the GWAS-related DNA segments shown. The
only genes known to be associated with monogenic forms
of high blood pressure and have been identified by GWAS
are cytochrome P450, family 17, subfamily A, polypeptide 1 (CYP17A1) and nitric oxide synthase 3 (NOS3).
Even once a SNP has been identified that is associated
with HTN, it is difficult to identify the gene involved. For
example, Figure 1b shows the genes within 50 kb on
Review
(a)
(b)
TRENDS in Genetics
Figure 1. Phenotypic, genetic and regulatory context of GWAS signals for blood pressure and hypertension. (a) Phenotypic landscape of GWAS signals in BP/HTN GWAS.
The strongest SNPs for BP and HTN also show very little overlap with genes involved in monogenic BP syndromes. Only CYP17A1 and NOS3 are associated with
monogenic BP syndromes and occur within 50 kb of BP GWAS SNPs. The strongest BP GWAS SNPs and their proxies are not associated exclusively with BP phenotypes
402
Review
(c)
TRENDS in Genetics
Figure 1. (Continued ).
but show pleiotropy with non-BP traits that can either point to plausible underlying pathways (for example UMOD and its association with HTN and kidney function) or
novel common pathways or may be independent associations. The rings from outer to inner represent: (1) chromosomal segments with GWAS SNPs (including 50 kb
flanking region); (2) GWAS SNPs; (3) black markers on chromosomal segments SNP proxy locations for the index SNP in the region (r2>0.8); (4) genes implicated in
monogenic syndromes from OMIM present in the chromosomal regions; (5) non-BP phenotypes that showed genome-wide significance within these loci. (b) Genetic
landscape of GWAS signals in BP/HTN GWAS. Only a few genes (NPPA, NOS3, UMOD) have been clearly linked to the strongest GWAS SNP, whereas many of the SNPs lie
in gene-rich regions, highlighting the challenges ahead in fine-mapping and identifying the causative gene/variant. It is very likely that GWAS SNPs may influence the
regulation of distant genes outside the 50 kb regions shown in this figure. Furthermore, the GWAS loci are also rich in copy-number variants and insertion/deletion variants
that will need to be considered in the functional dissection of GWAS signals. The rings from outer to inner represent 15 as in (a); (6) shows structural variations present
within the chromosomal regions. (c) Regulatory landscape of GWAS signals in BP/HTN GWAS. Bioinformatic analysis of GWAS BP SNP loci show microRNA targets,
conserved transcription factor binding sites, and epigenetic loci, that may influence the genotypephenotype association and offer another avenue for the design of
molecular and functional experiments to elucidate the causal pathways. The SNP positions are indicated by red bars on the chromosome and are the same SNPs as shown
in (a) and (b). The rings from outer to inner represent: (1) MicroRNA targets and associated genes; (2) chromosomal segments with GWAS SNPs (including 50 kb flanking
region); (3) transcription factors binding sites conserved in the human/mouse/rat alignment in the chromosomal regions using TFBS Conserved (tfbsConsSites) in UCSC
Browser showing those transcription factors with score >800; (4) DNase hypersensitive areas assayed in a large collection of cell types; (5) predicted CpG islands. The
height of the line indicates the length of the segment; (ac) were generated using Circos [68] with the Feb 2009(GRCh37/hg19) assembly data from UCSC Genome Browser
(http://genome.ucsc.edu/) [69].
403
Review
One striking result of the BP GWAS is that the genes
from highly plausible pathways are not represented near
the identified SNPs (Figure 1). Using the GRAIL textmining algorithm (Gene Relationships Across Implicated
Loci [34]) to search for connectivity between genes near the
associated SNPs, based on existing literature (published
before 2006 before the explosion of GWAS publications),
Figure 2 shows that of the 41 BP GWAS loci, 14 showed
underlying genes with significant relatedness, as defined
by the degree of similarity in the text describing them
within article abstracts, implying these connected genes
are involved in a common cellular process or pathway.
These regions of GRAIL connectivity show the expected
connection between NPPA/B and NPR3 but, in cases when
the GWAS SNPs lie in gene-rich regions, also reveal connections that point to specific novel genes for follow-up
studies. This is highlighted by rs805303, present in a very
gene-rich locus, and where a connection between NOTCH4
TRENDS in Genetics
Figure 2. Representation of the connections between 41 BP GWAS SNPs and their corresponding genes using the GRAIL literature-based text-mining algorithm (Gene
Relationships Across Implicated Loci [34]). This searches for connectivity between genes near the associated SNPs, based on existing literature (we selected published
literature before 2006 before the surge of GWAS publications). The thickness of the red lines indicates the strength of the literature-based connectivity between the genes.
This type of analysis supports known interactions but also suggests new connections that are worth following up in future studies.
404
Hypercontrols
Age>50 years
BP<120/80
No prevalent CVD or incident
CVD during 10 year follow-up
Hypertensive population
Age <63 years
BP>160/100
0.06
0.08
0.00 0.02
0.04
Frequency
0.10
0.12
Review
30
35
40
45
50
Number of BP-increasing alleles
TRENDS in Genetics
Figure 3. For the prediction of complex diseases, genotypes at multiple SNPs are
often combined into scores (for example, scores are calculated according to the
number of risk alleles carried). The frequency distribution of the number of BPincreasing alleles carried in the general population would be normally distributed
because each allele is inherited independently. The frequency distributions of 35
BP-increasing alleles from GWAS SNPs in populations selected from the extremes
of BP distribution (top 9% and the bottom 2%) [11] show a large overlap of scores,
and the majority of the individuals from both phenotypic extremes lie in the middle
of the distribution. This illustrates the fallacy of using risk scores from GWAS SNPs
to identify individuals at high risk for hypertension. Abbreviation: CVD:
cardiovascular diseases.
(from 35 genome-wide significant GWAS SNPs) in hypercontrols and the extreme hypertensive cases are shown in
Figure 3, illustrating the significant overlap of cases and
controls by genetic risk score despite extremeness of the
phenotypic ascertainment. Using genetic risk scores constructed from up to 13 GWAS BP SNPs, a novel longitudinal study showed that individuals with the highest
combination risk score had significantly higher diastolic
BP at the age of nine years, and the effect was persistent
from childhood through adult age [35]. Genetic risk scores,
including many non-genome-wide significant SNPs,
explained more of the variance than scores based only
on very significant SNPs in adults and children
(1.21.7% in adults and 0.81.4% in children) [36].
Novel pathways uncovered by GWAS
Highly correlated SNPs (r2>0.9) in the 50 end of UMOD
have been independently identified in large GWAS of blood
pressure extremes and kidney function [11,16]. The UMOD
gene [expressed primarily in the thick ascending limb
(TAL) of the loop of Henle] encodes the TammHorsfall
protein [THP/uromodulin (UMOD)], an extracellular protein anchored by a glycosyl phosphatidylinositol (GPI)
functional group at the luminal face of tubular epithelia
and released into the urine by proteolytic cleavage. It is the
most abundant tubule protein in the urine. In the HTN
study, the minor G allele of rs13333226 at the 50 end of
UMOD gene is associated with a lower risk of HTN [OR
(95% CI): 0.87 (0.84;0.91); P = 3.61011], 0.49 mmHg
lower SBP (P = 2.6105) and 0.30 mmHg lower DBP
(P = 1.5105), increased estimated glomerular filtration
rate (eGFR) (3.6 ml/min/minor-allele, P = 0.012), reduced
Review
explain most of the missing heritability of BP. Although the
clinical applications of these findings will be limited given
the very low frequency of these variants in the population,
these studies should uncover novel pathways and provide a
deeper understanding of the genetic architecture of blood
pressure.
Epigenetics
Not all features of gene regulation are encoded in genes or
contained in the DNA sequence. MicroRNAs (miRs), histone modifications and DNA methylation have all been
investigated with regard to their role in BP gene regulation. The potential role of miRs in vascular smooth-muscle
biology and blood pressure is just beginning to be appreciated. Mice lacking miR-143 and miR-145 develop significant reductions in BP resulting from modulation of actin
dynamics [42]. Intrarenal expression of miR-200a, miR200b, miR-141, miR-429, miR-205 and miR-192 were found
to be increased in hypertensive nephrosclerosis, and the
degree of upregulation correlated with disease severity.
There are significant correlations between miR species and
proteinuria and GFR, suggesting a doseresponse type of
relationship between intrarenal miR expression and the
severity of hypertensive nephrosclerosis [43]. Renin gene
expression appears to be regulated by miR-181a and miR663 [44]. The identification of these miRs may lead to the
elucidation of pathways involved in HTN causation and
novel therapeutics. Recently, an observational study
showed that human cytomegalovirus (HCMV) seropositivity and titers are positively associated with essential hypertension independently of other HTN risk factors [45].
The HCMV-encoded miR hcmv-miR-UL112 was highly
expressed in hypertensive patients, pointing to a potentially novel pathway involved in HTN. There is support
from an animal study showing that infection of mice with
mouse cytomegalovirus can alone elevate blood pressure
[46]. Although this is an observational finding, it highlights
the prospect of an abundance of pathways and risk factors
that lead to the final common BP phenotype and may have
implications for the discovery of new treatments.
Recently, renal sympathetic denervation has shown
considerable promise in treating refractory HTN [47].
The sympathetic innervation of the kidney is implicated
in the pathogenesis of HTN by increasing plasma renin
activity that leads to sodium and water retention and
reduces renal blood flow (RBF). The procedure involves
radiofrequency ablation of the renal sympathetic nerves,
and has shown remarkable reductions in BP, but the
underlying mechanism is unclear. Recently, histone modification has been shown to play an important role in the
epigenetic modulation of WNK4 transcription in the development of salt-sensitive HTN. Isoproterenol-induced transcriptional suppression of WNK4 was shown to be
mediated via inhibition of histone deacetylase-8 activity
(HDAC8) at the WNK4 promoter [48], which in turn can
stimulate thiazide-sensitive Na+-Cl+ cotransporter (NCC/
SLC12A3) implying that sympathetic nerve activity can
increase BP partly by activating NCC. The evidence that
isoproterenol induces transcriptional suppression of
WNK4 and leads to activation of NCC offers an opportunity
to combine genomics, epigenomics and NCC detection in
406
Review
because the molecular and functional dissection of the
novel variants will require more detailed low-throughput
science in contrast to the high-throughput screening
methods applied so far.
References
1 Evans, J.G. and Rose, G. (1971) Hypertension. Br. Med. Bull. 27, 3742
2 Kearney, P.M. et al. (2005) Global burden of hypertension: analysis of
worldwide data. Lancet 365, 217223
3 Hottenga, J.J. et al. (2005) Heritability and stability of resting blood
pressure. Twin Res. Hum. Genet. 8, 499508
4 Kupper, N. et al. (2005) Heritability of daytime ambulatory blood
pressure in an extended twin design. Hypertension 45, 8085
5 Padmanabhan, S. et al. (2008) Hypertension and genome-wide
association studies: combining high fidelity phenotyping and
hypercontrols. J. Hypertens. 26, 12751281
6 Ehret, G.B. et al. (2011) Genetic variants in novel pathways influence
blood pressure and cardiovascular disease risk. Nature 478, 103109
7 Johnson, T. et al. (2011) Blood pressure loci identified with a genecentric array. Am. J. Hum. Genet. 89, 688700
8 Kato, N. et al. (2011) Meta-analysis of genome-wide association studies
identifies common variants associated with blood pressure variation in
East Asians. Nat. Genet. 43, 531538
9 6Levy, D. et al. (2009) Genome-wide association study of blood pressure
and hypertension. Nat. Genet. 41, 677687
10 Newton-Cheh, C. et al. (2009) Genome-wide association study
identifies eight loci associated with blood pressure. Nat. Genet. 41,
666676
11 Padmanabhan, S. et al. (2010) Genome-wide association study of blood
pressure extremes identifies variant near UMOD associated with
hypertension. PLoS Genet. 6, e1001177
12 Salvi, E. et al. (2012) Genomewide association study using a highdensity single nucleotide polymorphism array and casecontrol design
identifies a novel essential hypertension susceptibility locus in the
promoter region of endothelial NO synthase. Hypertension 59, 248255
13 Dominiczak, A.F. and Munroe, P.B. (2010) Genome-wide association
studies will unlock the genetic basis of hypertension: pro side of the
argument. Hypertension 56, 10171020
14 Kurtz, T.W. (2010) Genome-wide association studies will unlock the
genetic basis of hypertension: con side of the argument. Hypertension
56, 10211025
15 Gudbjartsson, D.F. et al. (2010) Association of variants at UMOD with
chronic kidney disease and kidney stones-role of age and comorbid
diseases. PLoS Genet. 6, e1001039
16 Kottgen, A. et al. (2009) Multiple loci associated with indices of renal
function and chronic kidney disease. Nat. Genet. 41, 712717
17 Gudbjartsson, D.F. et al. (2009) Sequence variants affecting eosinophil
numbers associate with asthma and myocardial infarction. Nat. Genet.
41, 342347
18 Schunkert, H. et al. (2011) Large-scale association analysis identifies
13 new susceptibility loci for coronary artery disease. Nat. Genet. 43,
333338
19 Stahl, E.A. et al. (2010) Genome-wide association study meta-analysis
identifies seven new rheumatoid arthritis risk loci. Nat. Genet. 42, 508
514
20 Dubois, P.C. et al. (2010) Multiple common variants for celiac disease
influencing immune gene expression. Nat. Genet. 42, 295302
21 Ganesh, S.K. et al. (2009) Multiple loci influence erythrocyte
phenotypes in the CHARGE Consortium. Nat. Genet. 41, 11911198
22 Ikram, M.K. et al. (2010) Four novel Loci (19q13, 6q24, 12q24, and
5q14) influence the microcirculation in vivo. PLoS Genet. 6, e1001184
23 Teslovich, T.M. et al. (2010) Biological, clinical and population
relevance of 95 loci for blood lipids. Nature 466, 707713
24 Wain, L.V. et al. (2011) Genome-wide association study identifies six
new loci influencing pulse pressure and mean arterial pressure. Nat.
Genet. 43, 10051011
25 Pichler, I. et al. (2011) Identification of a common variant in the TFR2
gene implicated in the physiological regulation of serum iron levels.
Hum. Mol. Genet. 20, 12321240
26 Chambers, J.C. et al. (2009) Genome-wide association study identifies
variants in TMPRSS6 associated with hemoglobin levels. Nat. Genet.
41, 11701172
27 Coronary Artery Disease (C4D) Genetics Consortium (2011) A genomewide association study in Europeans and South Asians identifies five
new loci for coronary artery disease. Nat. Genet. 43, 339344
28 Ripke, S. et al. (2011) Genome-wide association study identifies five
new schizophrenia loci. Nat. Genet. 43, 969976
29 Simon-Sanchez, J. et al. (2009) Genome-wide association study
reveals genetic risk underlying Parkinsons disease. Nat. Genet. 41,
13081312
30 Yasuno, K. et al. (2010) Genome-wide association study of intracranial
aneurysm identifies three new risk loci. Nat. Genet. 42, 420425
31 Newton-Cheh, C. et al. (2009) Association of common variants in NPPA
and NPPB with circulating natriuretic peptides and blood pressure.
Nat. Genet. 41, 348353
32 Manolio, T.A. et al. (2009) Finding the missing heritability of complex
diseases. Nature 461, 747753
33 Busst, C.J. et al. (2011) The epithelial sodium channel gamma-subunit
gene and blood pressure: family based association, renal gene
expression, and physiological analyses. Hypertension 58, 10731078
34 Raychaudhuri, S. et al. (2009) Identifying relationships among genomic
disease regions: predicting genes at pathogenic SNP associations and
rare deletions. PLoS Genet. 5, e1000534
35 Oikonen, M. et al. (2011) Genetic variants and blood pressure in a
population-based cohort: the Cardiovascular Risk in Young Finns
study. Hypertension 58, 10791085
36 Taal, H.R. et al. (2012) Genome-wide profiling of blood pressure in
adults and children. Hypertension 59, 241247
37 Renigunta, A. et al. (2011) TammHorsfall glycoprotein interacts with
renal outer medullary potassium channel ROMK2 and regulates its
function. J. Biol. Chem. 286, 22242235
38 Boyden, L.M. et al. (2012) Mutations in kelch-like 3 and cullin 3 cause
hypertension and electrolyte abnormalities. Nature 482, 98102
39 Ji, W. et al. (2008) Rare independent mutations in renal salt handling
genes contribute to blood pressure variation. Nat. Genet. 40, 592599
40 Lifton, R.P. et al. (2001) Molecular mechanisms of human
hypertension. Cell 104, 545556
41 Eyre-Walker, A. (2010) Evolution in health and medicine Sackler
colloquium: genetic architecture of a complex trait and its
implications for fitness and genome-wide association studies. Proc.
Natl. Acad. Sci. U.S.A 107 (Suppl. 1), 17521756
42 Xin, M. et al. (2009) MicroRNAs miR-143 and miR-145 modulate
cytoskeletal dynamics and responsiveness of smooth muscle cells to
injury. Genes Dev. 23, 21662178
43 Wang, G. et al. (2010) Intrarenal expression of miRNAs in patients with
hypertensive nephrosclerosis. Am. J. Hypertens. 23, 7884
44 Marques, F.Z. et al. (2011) Gene expression profiling reveals renin
mRNA overexpression in human hypertensive kidneys and a role for
microRNAs. Hypertension 58, 10931098
45 Li, S. et al. (2011) Signature microRNA expression profile of essential
hypertension and its novel link to human cytomegalovirus infection.
Circulation 124, 175184
46 Cheng, J. et al. (2009) Cytomegalovirus infection causes an increase of
arterial blood pressure. PLoS Pathog. 5, e1000427
47 Esler, M.D. et al. (2010) Renal sympathetic denervation in patients
with treatment-resistant hypertension (The Symplicity HTN-2 Trial):
a randomised controlled trial. Lancet 376, 19031909
48 Mu, S. et al. (2011) Epigenetic modulation of the renal beta-adrenergic
WNK4 pathway in salt-sensitive hypertension. Nat. Med. 17, 573580
49 Ellison, D.H. and Brooks, V.L. (2011) Renal nerves, WNK4,
glucocorticoids, and salt transport. Cell Metab. 13, 619620
50 Zhang, D. et al. (2009) Epigenetics and the control of epithelial sodium
channel expression in collecting duct. Kidney Int. 75, 260267
51 Gerszten, R.E. and Wang, T.J. (2008) The search for new
cardiovascular biomarkers. Nature 451, 949952
52 Leitschuh, M. et al. (1991) High-normal blood pressure progression to
hypertension in the Framingham Heart Study. Hypertension 17, 2227
53 Franklin, S.S. et al. (1997) Hemodynamic patterns of age-related
changes in blood pressure. The Framingham Heart Study.
Circulation 96, 308315
54 Burt, V.L. et al. (1995) Prevalence of hypertension in the US adult
population. Results from the Third National Health and Nutrition
Examination Survey, 19881991. Hypertension 25, 305313
55 Kaminer, B. and Lutz, W.P. (1960) Blood pressure in Bushmen of the
Kalahari Desert. Circulation 22, 289295
407
Review
56 Truswell, A.S. et al. (1972) Blood pressures of Kung bushmen in
Northern Botswana. Am. Heart J. 84, 512
57 Poulter, N.R. et al. (1990) The Kenyan Luo migration study:
observations on the initiation of a rise in blood pressure. BMJ 300,
967972
58 Crews, D.E. and Mancilha-Carvalho, J.J. (1993) Correlates of blood
pressure in Yanomami Indians of northwestern Brazil. Ethn. Dis. 3,
362371
59 Carvalho, J.J. et al. (1989) Blood pressure in four remote populations in
the INTERSALT Study. Hypertension 14, 238246
60 Laville, M. et al. (1994) Epidemiological profile of hypertensive disease
and renal risk factors in black Africa. J. Hypertens. 12, 839843
61 Neel, J.V. (1962) Diabetes mellitus: a thrifty genotype rendered
detrimental by progress? Am. J. Hum. Genet. 14, 353362
62 Nakajima, T. et al. (2004) Natural selection and population history in
the human angiotensinogen gene (AGT): 736 complete AGT sequences
408
63
64
65
66
67
68
69
in chromosomes from around the world. Am. J. Hum. Genet. 74, 898
916
Weder, A.B. (2007) Evolution and hypertension. Hypertension 49, 260265
Young, J.H. et al. (2005) Differential susceptibility to hypertension is
due to selection during the out-of-Africa expansion. PLoS Genet. 1, e82
Pickering, G.W. (1955) The genetic factor in essential hypertension.
Ann. Intern. Med. 43, 457464
Oldham, P.D. et al. (1960) The nature of essential hypertension. Lancet
1, 10851093
Adeyemo, A. et al. (2009) A genome-wide association study of
hypertension and blood pressure in African Americans. PLoS Genet.
5, e1000564
Krzywinski, M. et al. (2009) Circos: an information aesthetic for
comparative genomics. Genome Res. 19, 16391645
Kent, W.J. et al. (2002) The human genome browser at UCSC. Genome
Res. 12, 9961006
Editor
Rhiannon Macrae
Executive Editor
Feng Chen
Letter
361
Journal Manager
Basil Nyaku
Journal Administrators
Ria Otten and Patrick Scheffmann
Advisory Editorial Board
K.V. Anderson, New York, USA
A. Clark, Ithaca, USA
G. Fink, Cambridge, USA
W.J. Gehring, Basel, Switzerland
D. Goldstein, Durham, USA
L. Guarente, Cambridge, USA
Y. Hayashizaki, Yokohama, Japan
S. Henikoff, Seattle, USA
J. Hodgkin, Oxford, UK
H.R. Horvitz, Cambridge, USA
L. Hurst, Bath, UK
M. Justice, Houston, USA
E. Koonin, Bethesda, USA
E. Meyerowitz, Pasadena, USA
S. Moreno, Salamanca, Spain
C. Scazzocchio, Orsay, France
J. Smith, Cambridge, UK
M. Takeichi, Kobe, Japan
D. Tautz, Pln, Germany
O. Voinnet, Strasburg, France
Editorial Enquiries
Trends in Genetics
Cell Press
Reviews
364
374
382
389
397
Sandosh Padmanabhan,
Christopher Newton-Cheh and
Anna F. Dominiczak
409
Erratum
417
Cover: During conjugation, members of the ciliate genus Oxytricha inherit a genome that looks like typical eukaryotic
chromatin but is replete with fragmented and scrambled genes. The subsequent developmental process produces
a rearranged somatic genome containing on the order of twenty million of the shortest known telomere-bearing
chromosomes. On pages 382388, Aaron Goldman and Laura Landweber describe recent progress toward understanding
Oxytrichas genomic dimorphism and discuss its various implications for our understanding of ancient genome evolution and
early life. The cover shows an SEM image of Oxytricha, false-colored with Photoshop, courtesy of Bob Hammersmith.
Review
Mechanisms of transcriptional
precision in animal development
Mounia Lagha1, Jacques P. Bothma2 and Michael Levine1
1
2
Center for Integrative Genomics, Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
Biophysics Graduate Group, Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA
We review recently identified mechanisms of transcriptional control that ensure reliable and reproducible patterns of gene expression in natural populations of
developing embryos, despite inherent fluctuations in
gene regulatory processes, variations in genetic backgrounds and exposure to diverse environmental conditions. These mechanisms are not responsible for
switching genes on and off. Instead, they control the
fine-tuning of gene expression and ensure regulatory
precision. Several such mechanisms are discussed, including redundant binding sites within transcriptional
enhancers, shadow enhancers, and poised enhancers
and promoters, as well as the role of redundant gene
interactions within regulatory networks. We propose
that such regulatory mechanisms provide population
fitness and fine-tune the spatial and temporal control
of gene expression.
Transcriptional precision
The basic mechanisms for switching genes on and off
during development were intensively studied in the
1980s and 1990s. The enhancer was shown to play a key
role in integrating complex regulatory information to generate cell-specific patterns of gene expression [1]. However,
in natural populations enhancerpromoter interactions
can be affected by changes in temperature and variations
in genetic background, but the developmental program
remains unperturbed. What is the basis for this stability
in developmental programming?
Our central premise is that the mechanisms used to
provide stability in gene expression in natural populations also produce greater precision in developmental
patterning mechanisms. By transcriptional precision we
refer to the formation of sharp borders of gene expression, the exact timing of gene activation, coordinate
expression of groups of genes within a developing tissue,
and homogenous expression of a given gene across a field
of coordinately developing cells. The advent of wholegenome technologies and improved imaging methods has
provided recent insights into more subtle aspects of
differential gene activity, namely the reproducible
deployment of developmental programs in natural
populations.
Glossary
Canalization: a measure of the ability of a population to produce the same
phenotype regardless of fluctuations in its environment, genotype or other
sources of variability. Our use of the term robustness conveys the same
essential meaning.
Enhancer: the predominant regulatory DNA for controlling gene expression. It
has the defining property of driving reporter expression in transgenic assays
from a heterologous promoter.
Gene regulatory network: interacting genes and their associated regulatory
DNAs that are responsible for a specific developmental process such as the
specification of gut or muscle.
Paused polymerase: RNA Pol II that has initiated transcription, but arrests after
producing a small nascent RNA of 3050 nt. The Pol II is ready to go but needs
additional regulators to undergo elongation.
Pioneer factor: a specialized TF (sequence-specific) that binds to nucleosomal
DNA and prepares enhancers for rapid and timely deployment.
Poising: preparing genes for rapid and timely transcription. This can be
achieved by priming the promoter, the enhancer, or both.
Redundancy: two genes are considered to be redundant if they play similar
functions and are able to replace one another. This can be extended to a
genetic interaction or enhancers or binding sites within enhancers. However,
we do not believe in true redundancy. Instead, genes or regulatory DNAs might
appear to possess redundant, or overlapping, activities in the laboratory, but
not in natural populations subject to stress.
Shadow enhancer: an enhancer that is sometimes located far from the gene it
regulates. The term shadow is a metaphor which reflects that, historically,
these distal enhancers tended to be discovered after the proximal/primary
enhancer and in unexpected locations such as in the introns of neighboring
genes.
0168-9525/$ see front matter . Published by Elsevier Ltd. doi:10.1016/j.tig.2012.03.006 Trends in Genetics, August 2012, Vol. 28, No. 8
409
Review
Wildtype
(a)
End-3 mutant
(b)
skn-1
skn-1
med-1/2
med-1/2
end-3
end-1
end-3
end-1
(Variable)
elt-2
Intestinal differentiation
(Variable) elt-2
Figure 1. Redundant interactions in gene regulatory networks. Summary of the genetic cascade governing intestinal cell specification in C. elegans (see ref. [6]). (a) Wildtype network. skn-1 is maternally deposited and, in concert with other maternal and zygotic factors, activates the expression of transcription factors end-3 and end-1, both of
which activate elt-2, the key regulator of intestine differentiation. (b) In end-3 mutants, end-1 can compensate and intestine differentiation is essentially normal. However,
end-1 expression becomes significantly more variable, resulting in erratic expression of elt-2 and abnormal intestine differentiation in some individuals.
Review
(a)
eve BAC
Minimal enhancer
Extension
eve
Normal
(b)
eve BAC,
no minimal stripe 2
eve
211bp
Non viable
(c)
eve BAC,
no extension
eve
Under stress : reduced viability
TRENDS in Genetics
Figure 2. Importance of redundant binding sites for robustness. (a) Diagram of a BAC transgene containing the entire eve locus, including 50 and 30 stripe enhancers. Only
the stripe 2 regulatory region is shown. The full-length enhancer contains both the minimal 500 bp enhancer (green) and 200 bp 30 extension (blue). The yellow ovals
represent a subset of the TF binding sites in the stripe 2 regulatory DNA. (b) Removal of the minimal eve stripe 2 enhancer results in lethality, and embryos die with defects
in the thorax (derived from the region of stripe 2 expression). (c) Removal of the 30 extension does not impair embryogenesis under optimal culturing conditions, and
normal adult flies are obtained. However, under genetic stress, only 5% of the flies survive. Thus, redundant binding sites in the 30 extension are required for robustness.
Review
(a)
Enhancer
Enhancer
(b)
10% failure
10% failure
Enhancer
10% failure
1% failure
TRENDS in Genetics
Figure 3. Model for enhancer synergy. (a) Schematic showing that the primary and shadow enhancers (green boxes) possess the same regulatory logic (TF binding sites are
illustrated by colored circles). (b) To activate transcription, an enhancer loops to its cognate promoter. This interaction has a typical failure rate of 10%. In the presence of
two enhancers regulating the same gene at the same time (primary and shadow), the combined failure rate is 1% (10% x 10% = 1%). This assumes that the two enhancers
work independently of one another.
limb enhancer of the paired-box homeodomain transcription factor Prx has no obvious effect on Prx expression
levels or on limb development in mice [24], suggesting the
existence of additional, shadow enhancers. More recently,
4C assays (Box 1) identified multiple putative enhancers
for Hoxd13 expression within a distal gene desert that
contains known regulatory elements, GCR and Prox [25].
Deletions of GCR and Prox have little effect on Hoxd13
expression in digits, thereby suggesting the occurrence of
redundant regulatory elements. Indeed, complete abolition
of Hoxd13 expression in digits is achieved only when the
gene desert, together with the GCR and Prox regions, are
completely deleted (830 kb deletion).
The preceding examples suggest that multiple enhancers represent a simple means for improving the reliability
of gene expression. The underlying mechanism is uncertain, but they might increase the probability of gene activation at any given time during critical windows of
development and make it more robust to perturbation.
For example, if a typical enhancer has a 10% failure rate
to loop and engage its target promoter, and if the proximal
and distal enhancers function more or less independently
of one another, then there is a combined failure rate of only
1% (e.g. [12]). That is, two enhancers function in an inherently multiplicative manner to activate gene expression
(Figure 3). Such a mechanism also provides robustness.
For example, if the failure rate of each individual enhancer
increases to 30% due to stress, then the combined failure
rate is only 9%.
An alternative explanation is that multiple enhancers
ensure high levels of expression above a minimal threshold
required for genetic function (as suggested in the case of
ATOH7 regulation). In reality, multiple enhancers could be
important both for the reliable activation of gene expression and for maintaining high levels of expression. We still
do not understand the details of how an enhancer switches
on a gene and affects levels of expression, and therefore
this is very much an open question. The source of shadow
enhancers is uncertain, but it has been proposed that they
might arise from cryptic duplication events [18].
Rendering genes poised for activation
Timing is crucial in development, and recent studies have
identified several mechanisms that ensure faithful activation of gene expression upon receipt of key inducing signals.
Review
+1
Enhancer
Promoter
Poised enhancer
Exon
Poised promoter
Key:
DSIF Nelf
ser-5P
mRNA (30nt)
Pol II
Pol II
Nucleosome free paused promoter
Nucleosome
TRENDS in Genetics
Figure 4. Summary of mechanisms of transcriptional priming. Gene transcription depends on enhancers (blue) and promoters (purple). The transcription start site (TSS) is
indicated by an arrow labeled +1. The promoter can be primed or poised for transcription by the recruitment of Pol II before gene expression. This promoter pausing
generates a small mRNA (around 3050 nt) and then elongation is blocked by the binding of negative elongation factors such as Nelf and DSIF. The enhancer can be
prepared for activation by the binding of pioneer factors (represented by gray boxes), by recruitment of Pol II, or by the modification of the chromatin landscape
(positioned nucleosomes and associated histone marks). These three features at enhancers may be linked, but for simplicity we illustrate them sequentially. Nucleosomes
are represented by hexagons and histone marks with colored flags. A simplified scheme of a paused promoter is represented in the gray box.
Review
activation of gene expression [31]. The idea is that regulating Pol II release, rather than recruitment, permits
rapid induction of gene expression. This hypothesis has
been explored using detailed mathematical modeling of
transcription [32], but it still remains to be tested experimentally.
A nonexclusive alternative view is that paused Pol II is
involved in recruiting chromatin-modifying enzymes that
expedite transcription. For example, the chromatin landscape of the Hsp70 locus (the prototypic paused gene in
Drosophila) is rapidly altered following heat shock, through
a mechanism independent of transcription [33]. This rapid
change is key to the effective activation of Hsp70 expression
upon heat shock. Moreover, there is an inverse correlation
between paused Pol II and positioned nucleosomes at the
core promoter [34,35]. An increase in positioned nucleosomes has been observed upon destabilization of paused
Pol II (e.g. NelfE knockdown in S2 cells) [34]. Conversely,
diminished levels of the Polycomb repressor (in esc mutant
embryos) correlates with augmented levels of paused Pol II
[35]. It would appear that the promoter regions of developmentally regulated genes contain either paused Pol II or
positioned nucleosomes, but the basis for this regulatory
switch is uncertain.
These studies raise the possibility that paused Pol II
might prepare genes for activation by establishing an
open configuration at the promoter. However, this possibility has not yet been critically tested.
Poised enhancers
There is also evidence that enhancers can be prepared for
rapid deployment before gene activation (Figure 4). For
example, the forkhead transcription factor FoxA binds to
the Albumin enhancer in the primitive endoderm of mouse
embryos where it is inactive (reviewed in [36]). FoxA is an
example of a pioneer factor [37]; it binds to inactive
enhancers and renders them poised for rapid induction
upon the appearance of key activators, such as those
mediating cell signaling.
To bind inactive enhancers, pioneer factors have the
defining property of binding to nucleosomal DNA and
compact chromatin, and remain bound even during mitosis. Since the initial discovery of FoxA and GATA factors as
pioneer factors in the liver differentiation program, additional examples have been described [38,39].
Zelda is a maternal zinc finger transcription factor that
is essential for the activation of 100 genes 23 h after
fertilization during Drosophila embryogenesis (maternal
to zygotic transition) [4042]. It binds to the enhancer
regions of many or most developmental control genes
before their activation. Disrupting Zelda binding sites
can delay the onset of expression, or cause sporadic patterns of activation [40,41]. Thus, Zelda renders developmental enhancers poised for activation by maternal
determinants such as Bicoid and Dorsal, and may function
as a pioneer factor. It might also help ensure reliable
patterns of gene activation in natural populations under
stress, but this idea has not yet been tested.
The mechanisms by which pioneer factors prepare
enhancers for efficient activation are not known. It has
been suggested that they can displace nucleosomes and
414
thereby render adjacent binding sites available for occupancy [36,38]. A nonexclusive possibility is that pioneer
factors recruit chromatin-modifying enzymes that mark
enhancers for rapid deployment. For example, inactive
liver and pancreas enhancers exhibit active chromatin
modifications in the mouse foregut endoderm where they
are inactive [36]. This suggests pre-patterning of the
enhancers in progenitor tissues before their induction in
the liver and pancreas. The P300 histone acetyltransferase
and the EZH2 histone methyltransferase have been implicated in these modifications [43]. It is conceivable that such
modifications are not strictly required for gene expression,
but might improve the precision and stability of gene
expression in natural populations.
More recently it has been suggested that histone modifications and Pol II help to prime distal enhancers [44]
(Figure 4). In this study, whole-genome Chip-Seq assays
were performed on isolated tissues obtained from staged
Drosophila embryos. The timing of gene expression correlated with Pol II binding and two types of chromatin marks
in enhancers. Pol II occupancy at enhancers is counterintuitive, but multiple studies, in human ES cells [45] and
mice [45,46], suggest that enhancers can be bound by Pol II
and are sometimes transcribed. Additional members of the
general transcription machinery, such as the TATA binding protein TAF3 [47], are also seen at particular enhancers. It was suggested that these factors might foster
looping interactions between distal enhancers and promoters, but it is currently unclear how Pol II and associated
factors might render enhancers poised for activation. It is
possible that they are recruited to enhancers by pioneer
TFs, but this idea awaits further studies.
When stochastic expression is purposeful
Many developmental patterning genes in Drosophila contain paused Pol II, shadow enhancers, or both. We have
discussed how these mechanisms might foster the precision and stability of gene expression in development. However, there are examples of developmental control genes
that exhibit sporadic or stochastic patterns of expression.
Some might exhibit such expression because there is no
selective pressure for them to be expressed in a precise and
synchronous manner. However, there are cases where
stochastic expression is used as a purposeful strategy for
generating regulatory diversity among the cells of a population [48]. One of the most striking examples is seen in the
eye of the adult fly [4951].
Color vision depends on the differential expression of
rhodopsin-3 (Rh3) and Rh4 in the R7 photoreceptor cells
and the differential expression of Rh5 and Rh6 in the R8
photoreceptor cells. These differential patterns depend on
stochastic expression of spineless, which encodes a homeobox transcription factor that activates Rh4 in R7 [52].
Approximately 70% of the ommatidia express spineless,
but the patterns of activation differ among adult flies.
When spineless is expressed, Rh4 is activated in R7; if
not, Rh3 is expressed instead. The identity of these distinct
classes of R7 cells dictates the identities of the underlying
R8 cells. When spineless and Rh4 are expressed in R7, then
Rh6 is expressed in the associated R8 cell. Conversely,
when spineless is absent and Rh3 is expressed in R7, then
Review
Box 3. Outstanding questions
How do multiple enhancers provide precision in gene expression:
do they increase the levels or probability of expression?
Are genes with multiple enhancers more or less evolvable? Do
shadow enhancers increase the probability of evolving novel gene
activities?
How do pioneer factors prime enhancers?
How does paused Pol II prime the promoter?
When are imprecise, stochastic modes of gene activation
advantageous in development?
Review
33 Petesch, S.J. and Lis, J.T. (2008) Rapid, transcription-independent loss
of nucleosomes over a large chromatin domain at Hsp70 loci. Cell 134,
7484
34 Gilchrist, D.A. et al. (2010) Pausing of RNA polymerase II disrupts
DNA-specified nucleosome organization to enable precise gene
regulation. Cell 143, 540551
35 Chopra, V.S. et al. (2011) The Polycomb group mutant esc leads to
augmented levels of paused Pol II in the Drosophila embryo. Mol. Cell.
42, 837844
36 Zaret, K.S. and Carroll, J.S. (2011) Pioneer transcription factors:
establishing competence for gene expression. Gene Dev. 25, 22272241
37 Watts, J.A. et al. (2011) Study of FoxA pioneer factor at silent genes
reveals Rfx-repressed enhancer at Cdx2 and a potential indicator of
esophageal adenocarcinoma development. PLoS Genet. 7, e1002277
38 Magnani, L. et al. (2011) Pioneer factors: directing transcriptional
regulators within the chromatin environment. Trends Genet. 27,
465474
39 Fakhouri, T.H.I. et al. (2010) Dynamic chromatin organization during
foregut development mediated by the organ selector gene pha-4/FoxA.
PLoS Genet. 6, e1001060
40 Liang, H.L. et al. (2008) The zinc-finger protein Zelda is a key activator
of the early zygotic genome in Drosophila. Nature 456, 400403
41 Nien, C.Y. et al. (2011) Temporal coordination of gene networks by
Zelda in the early Drosophila embryo. PLoS Genet. 7, e1002339
42 Harrison, M.M. et al. (2011) Zelda Binding in the early Drosophila
melanogaster embryo marks regions subsequently activated at the
maternal-to-zygotic transition. PLoS Genet. 7, e1002266
43 Xu, C.R. et al. (2011) Chromatin prepattern and histone modifiers in a
fate choice for liver and pancreas. Science 332, 963966
44 Bonn, S. et al. (2012) Tissue-specific analysis of chromatin state
identifies temporal signatures of enhancer activity during embryonic
development. Nat. Genet. 44, 148156
45 Rada-Iglesias, A. et al. (2011) A unique chromatin signature uncovers
early developmental enhancers in humans. Nature 470, 279283
46 De Santa, F. et al. (2010) A large fraction of extragenic RNA pol II
transcription sites overlap enhancers. PLoS Biol. 8, e1000384
47 Liu, Z. et al. (2011) Control of embryonic stem cell lineage commitment
by core promoter factor, TAF3. Cell 146, 720731
48 Eldar, A. and Elowitz, M.B. (2010) Functional roles for noise in genetic
circuits. Nature 467, 167173
49 Vasiliauskas, D. et al. (2011) Feedback from rhodopsin controls
rhodopsin exclusion in Drosophila photoreceptors. Nature 479, 108112
416
50 Johnston, R.J. et al. (2011) Interlocked feedforward loops control celltype-specific rhodopsin expression in the Drosophila eye. Cell 145, 956
968
51 Jukam, D. and Desplan, C. (2010) Binary fate decisions in
differentiating neurons. Curr. Opin. Neurobiol. 20, 613
52 Wernet, M.F. et al. (2006) Stochastic spineless expression creates the
retinal mosaic for colour vision. Nature 440, 174180
53 Dietrich, J.E. and Hiiragi, T. (2007) Stochastic patterning in the mouse
pre-implantation embryo. Development 134, 42194231
54 Silva, J. and Smith, A. (2008) Capturing pluripotency. Cell 132, 532
536
55 Kalmar, T. et al. (2009) Regulated fluctuations in nanog expression
mediate cell fate decisions in embryonic stem cells. PLoS Biol. 7,
e1000149
56 Glauche, I. et al. (2010) Nanog variability and pluripotency regulation
of embryonic stem cellsinsights from a mathematical model analysis.
PLoS ONE 5, e11238
57 Perry, M.W. et al. (2011) Multiple enhancers ensure precision of gap
gene-expression patterns in the Drosophila embryo. Proc. Natl. Acad.
Sci. U.S.A. 108, 1357013575
58 Berman, B.P. et al. (2002) Exploiting transcription factor binding site
clustering to identify cis-regulatory modules involved in pattern
formation in the Drosophila genome. Proc. Natl. Acad. Sci. U.S.A.
99, 757762
59 Markstein, M. et al. (2002) Genome-wide analysis of clustered Dorsal
binding sites identifies putative target genes in the Drosophila embryo.
Proc. Natl. Acad. Sci. U.S.A. 99, 763768
60 Valouev, A. et al. (2011) Determinants of nucleosome organization in
primary human cells. Nature 474, 516520
61 Giresi, P.G. and Lieb, J.D. (2009) Isolation of active regulatory
elements from eukaryotic chromatin using FAIRE (formaldehyde
assisted isolation of regulatory elements). Methods 48, 233239
62 Gilmour, D.S. and Fan, R. (2009) Detecting transcriptionally engaged
RNA polymerase in eukaryotic cells with permanganate genomic
footprinting. Methods 48, 368374
63 Nechaev, S. et al. (2010) Global analysis of short RNAs reveals
widespread promoter-proximal stalling and arrest of Pol II in
Drosophila. Science 327, 335338
64 Core, L.J. et al. (2008) Nascent RNA sequencing reveals widespread
pausing and divergent initiation at human promoters. Science 322,
18451848
Letter
Genetics Coordinating Center, Department of Biostatistics, University of Washington, Seattle, WA, USA
Center for Inherited Disease Research, Institute of Genetic Medicine, Johns Hopkins University, Baltimore, MD, USA
3
Broad Institute (Massachusetts Institute of Technology/Harvard), Cambridge, MA, USA
2
Letter
(+) strand
C GG TC TGCA CA CG TC
() strand
5
G
GC CA GA CG T G T G C A
A
C
T
TRENDS in Genetics
Figure I. A simplified schematic of the SNP probe, where the probe sequence is
in blue and the target sequence in black text. The design alleles (T or G) are the
fluorescently labeled nucleotides recruited to the allele probe in this two-color
primer-extension assay. Adapted from materials available on the Illumina
website (www.illumina.org).
Name
rs216614
IlmnStrand
BOT
SNP
[T/G]
TopGenomicSeq
...CATCCC[A/C]TGCACA. . .
RefStrand
TOP
A
C
Design
T
G
Forward/reverse
The dbSNP resource of the US National Center for Biotechnology Information (NCBI) contains detailed information for each SNP in its database. Each refSNP (or rs)
entry consists of one or more submitted SNP (or ss)
records, each submitted by individual laboratories. Each
dbSNP record shows a flanking DNA sequence, which is
simply taken from the submission with the longest flanking sequence [6,7]. SNP alleles reported on the same strand
as this exemplar sequence in dbSNP sequence are called
forward alleles. Conversely, alleles on the opposite strand
are called reverse alleles. Note that the dbSNP meaning of
forward is easily confused with (+) genomic strand, which
has been referred to as the forward strand by the HapMap
project [8,9].
Achieving strand consistency
The most basic level of strand consistency requires only
that genotypes are reported on the same DNA strand
across datasets. At strand-unambiguous SNPs, discrepant
nucleotides are sufficient to identify strand inconsistencies
(e.g., A/C in one dataset and T/G in another). However,
harmonizing strand-ambiguous SNPs requires converting
allele calls to a specific strand, according to one of the
strand naming conventions described above. Given a nucleotide sequence with a SNP and its flanking bases (e.g.,
CATCCC[A/C]TGCACA) one can determine whether the
strand of that sequence is (i) plus or minus, by sequence
matching with the genomic reference sequence; (ii) TOP or
BOT, from the SNP itself or its flanking sequence [1]; and
(iii) forward or reverse, from the ss sequence record in
dbSNP. Determination of probe or target strand requires
additional information about assay design. In practice,
genotyping assay vendors generally supply annotations
362
Forward
A
C
Plus
A
C
References
1 Illumina Inc. (2006) TOP/BOT strand and A/B allele (Technical Note).
http://www.illumina.com/documents/products/technotes/technote_
topbot.pdf
2 Affymetrix Inc. (2012) Affymetrix genotyping glossary. http://
www.affymetrix.com/support/help/genotyping_glossary/index.affx
3 Cherry, J.M. et al. (1998) SGD: Saccharomyces genome database.
Nucleic Acids Res. 26, 7379
4 Dunham, I. et al. (1999) The DNA sequence of human chromosome 22.
Nature 402, 489495
Letter
5 Cartwright, R.A. and Graur, D. (2011) The multiple personalities of
Watson and Crick strands. Biol. Direct 6, 7
6 National Center for Biotechnology Information (2005) Sequence
formatting in dbSNP reports. http://www.ncbi.nlm.nih.gov/books/
NBK44414
7 Kitts, A.K. and Sherry, S. (2002) The single nucleotide polymorphism
database (dbSNP) of nucleotide sequence variation. In The NCBI
Handbook (McEntyre, J. and Ostell, J., eds), National Center for
Biotechnology Information (Chap. 5) In: http://www.ncbi.nlm.nih.gov/
books/NBK21101/)
8 Frazer, K.A. et al. (2007) A second generation human haplotype map of
over 3.1 million SNPs. Nature 449, 851861
363
Erratum
In European populations, genes that affect skin pigmentation (SLC24A5 and SLC45A2) have undergone
positive selection.
We apologize to the readers of this article for this error.
0168-9525/$ see front matter 2012 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.tig.2012.05.003 Trends in Genetics, August 2012, Vol. 28, No. 8
It should read:
417