You are on page 1of 43

II.

COMPOSITION DNA
AND

AND

STRUCTURE

OF

CHROMOSOMES

II. COMPOSITION AND STRUCTURE OF DNA AND CHROMOSOMES


II. 1. Composition of DNA
DNA (deoxyribonucleic acid), as a chemical molecule, was discovered in 1869 by Friedrich Mieser (a Swiss), who was at the time working in the laboratory of Hoppe-Seyler in Tbingen (Germany). DNA was in fact discovered as an annoying contaminant in the purification of proteins. In 1889 it was fully purified and found to be an acid, hence the name deoxyribonucleic acid. As DNA contains a lot of phosphate it was first considered as a cellular storage material for phosphate. Later on, Albrecht Kossel (another collaborator of Hoppe-Seyler) discovered that DNA contains four bases: two purines (adenine and guanine) and two pyrimidines (thymine and cytosine). Bases are heterocyclic rings with carbon and nitrogen atoms. Slightly later Kossel also discovered the sugar moiety, the deoxyribose. Herewith all the constituents of DNA were known: four different bases, phosphate and deoxyribose. In 1910 Kossel got the Nobel prize for his discoveries. Soon thereafter, Phoebus Levene discovered that there are two kinds of nucleic acids, which he called yeast nucleic acid [now known as RNA (ribonucleic acid)] and DNA. At that time it was believed that RNA was only present in plants and DNA in animals. Of course this is not correct. Both types occur in all living organisms, except viruses which generally carry only one of them, DNA or RNA (both molecules can be either single or double stranded). One known exception to this rule is the Cytomegalovirus, which carries one DNA molecule and four RNA molecules. The Cytomegalovirus is a kind of herpes virus that can infect the nervous system of children or of adults with a deficient immune system. Erwin Chargaff discovered that DNA does not contain equal amounts of the four bases but that the quantities differ depending on the source of the DNA. The % G+C can vary from as low as 26% to as high as 74% for DNA from different species. Most importantly, however, Chargaff found that for every double-stranded DNA molecule the amount (in molarity) of adenine is always equal to the amount of thymine and the amount of guanine is equal to the amount of cytosine. This relationship is now known as Chargaff's rule. This observation combined with the X-ray diffraction patterns of DNA crystals obtained by Rosalind Franklin in 1951 have been exploited by James Watson and Francis Crick to propose the double helical structure for double-stranded DNA that was published in 1953 in Nature. Watson, Crick and Wilkins (the boss of R. Franklin, who had died in the meantime) got the Nobel prize for this structure in 1962. This double spiral structure provided the structural basis for the gene, which had remained a very abstract matter till then. What Mendel had supposed could now be explained in chemical terms. The double spiral model for DNA was also very important for the development of models for DNA replication. The model could explain how two daughter cells can get an exact copy of the genetic material of the original cell. This can be explained by the semiconservative mode of DNA replication that was confirmed by the experiment of Meselson and Stahl in 1958 (see further, Chapter III).

II. 2. Organization of the genetic material in chromosomes


Within the cell, DNA is associated with proteins. Each DNA molecule and its associated protein molecules is called a chromosome. This is valid for all kinds of organisms, viruses, prokaryotes and eukaryotes. The various proteins in chromatin perform a number of essential functions: -(i) numerous DNA-binding proteins catalyse and regulate vital cellular processes such as DNA replication, transcription, DNA modification (methylation ), repair and recombination ... -(ii) DNA-binding proteins compact the DNA -(iii) DNA binding proteins protect the DNA from degradation (naked DNA is much more rapidly degraded by nucleases and more sensitive to oxidative damage) Only approximately 3% of the nucleoid is DNA. Prokaryotes and eukaryotes show important differences in the organization of their genetic material. Prokaryotes (Bacteria and Archaea) have no nucleus. Their DNA is not surrounded by a nuclear membrane and appears as a granular structure associated with the membrane, the nucleoid. Mostly, the genetic material of a prokaryotic cell consists of one single circular DNA molecule ranging from 1.55 Mbp (1,550 kbp, Picrophilus torridus an acidophilic archaeon, currently the smallest genome of a free living organism, not a parasite) to approximately 9.1 Mbp (Myxococcus xanthus, a !-proteobacterium). Symbionts and parasites like Mycoplasma and Nanoarchaeum, have an even smaller genome of only 0.5 Mbp. When prokaryotic cells are dividing rapidly, portions of the chromosome in the process of replication are present in 2 or sometimes even four copies. Streptomyces lividans has a single linear chromosome. Some bacteria have more than one chromosome: for example, Sinorhizobium meliloti has three circular chromosomes and Agrobacterium tumefaciens has four chromosomes (3 circular + 1 linear). Prokaryotes also frequently carry one or more smaller independent circular DNAs called plasmids or episomes. Plasmids do not integrate into the main chromosome, episomes can reside in the cell as independent molecules or can integrate into the main chromosome. Plasmids and episomes are generally not essential for bacterial growth. They carry genes that confer desirable traits to the bacteria, such as antibiotic resistance, allow the transfer of genetic information from one cell to another by means of conjugation (like the F episome or fertility factor of Escherichia coli), or carry virulence factors. The Escherichia coli genome is approximately 4.7 Mbp long. Its sequence has been entirely determined in 1997. The dimensions of an E. coli cell are only 1.5 x 2 to 6 m. Therefore, the DNA has to be strongly compacted to fit into the cell otherwise it would have a diameter of 430 m, or a length of about 1 mm). This compaction is realized by small basic proteins that bind to the DNA, mostly in a non-specific manner (sequence-independent). They organize the genome into compacted loops such that the genome is divided into domains (dynamic organization in about 400 independent supercoiled domains of 10 kb each for the E. coli chromosome). These proteins that help in the compaction of bacterial and archaeal genomes are frequently referred to as histone-like proteins [IHF (Integration Host Factor), H-NS (histone-like nucleoid structuring protein), HU (heat-unstable nucleoid protein), FIS (factor for inversion stimulation), etc). This name is however somewhat misleading since these

proteins are totally different from the eukaryotic histones. The compacted E. coli nucleoid occupies approximately 15% of the cellular volume. Some archaea, the euryarchaeota, have real equivalents of the eukaryotic histone heteroteramer (H3-H4)2 that show the same characteristic histone fold. The crenarchaeota have no equivalents of the eukaryotic histones. They compact their DNA by the use of small basic proteins, more similar to the bacterial way of compacting. Eukaryotes are characterized by a nuclear membrane that surrounds their genetic patrimonium. The nucleus is an organelle with a diameter of several m and is mostly visible in the light microscope. The membrane has some 3,000 to 4,000 pores of 9 nm diameter, which allow the passage of macromolecules up to 60,000 Dalton (Da) and contains numerous proteins involved in active transport of small and macromolecules in (proteins, cDNA) and out (mature mRNA) the nucleus. DNA replication and transcription take place in the nucleus, but protein synthesis occurs in the cytoplasm. Transcription and translation of mRNA will, therefore, take place in different cellular compartments and will be uncoupled in space and in time. The nuclear membrane plays a very important role in the transport of RNA (inside " outside) and proteins (outside " inside). The genetic material of eukaryotes is organized in several chromosomes. Eukaryotic chromosomes are linear molecules (not circles like the vast majority of prokaryotic genomes, except for the mitochondrial and chloroplast DNA's, that are circular). Chromosome means colored body. This reflects the fact that chromosomes were first discovered in the light microscope by using staining techniques. They are mainly invisible except at the moment of the cell division, when they are compacted and condensed. In eukaryotes the DNA is tightly associated with basic proteins, the histones which assure the compaction by wrapping of the DNA around histone octamers (see below, Chapter II. 6.). These condensed structures are called nucleosomes, one nucleosome contains about 200 bp. The succession of nucleosomes forms a fibrous structure called chromatin (10 nm fiber). Chromatin can be further condensed by folding and bending to form a 30 nm fiber, which in turn is arranged as loops around a proteinaceous scaffold to form a chromosome. The majority of the eukaryotic cells are diploid. They contain two copies of each chromosome (except the sex chromosome). The two copies are called homologs; one is derived from each parent. A subset of the eukaryotic cells is either haploid or polyploid. Haploid cells contain a single copy of each chromosome and are for instance involved in the sexual reproduction. Eggs and spermatozoids are haploid cells. Yeast is also a haploid organism for most of its cell cycle. Polyploid cell have more than two copies of each chromosome. Polyploidy is mainly associated with plant cells. Some other organisms maintain the majority of their adult cells in a polyploid state. In extreme cases the number of copies can be as high as 100 or 1,000. This type of chromosome amplification allows the cell to generate larger amounts of mRNA and thus protein. For example, megakaryocytes are specialized polyploid cells (#128 sets of chromosomes) that produce thousands of platelets that lack chromosomes but are essential components of human blood (200,000 platelets/ml of blood). The segregation of large numbers of chromosomes is very difficult, therefore polyploid cells have almost always stopped dividing. The diploid human genome (ensemble of all the chromosomes) would be about 2 m long without compaction. The complete genetic information is stored in 46 chromosomes with a total length of only 200 m. This indicates a compaction of about 10,000-fold. Strong compaction by several orders of magnitude is a must because the nucleus of a human cell is only 10-15 m diameter.

The total amount of DNA in the haploid genome is called the C-value. There is an enormous variation in the range of C-values of different organisms: from <106 bp for a mycoplasma to > 1011 bp for some plants and amphibians. It appears that the minimal amount of DNA required for a member of the different evolutionary phyla (the smallest genome size found for a member of each group) increases from prokaryotes to eukaryotes. The DNA content of a nucleus and the number of chromosomes can vary largely from one organism to another one. - Giardia (protozoan) 12 Mbp - Saccharomyces cerevisiae (baker's yeast) 12 Mbp -Schizosaccharomyces pombe (fission yeast) 12 Mbp - Arabidopsis thaliana (weed) 125 Mbp - Drosophila melanogaster (fruit fly) 180 Mbp - Homo sapiens 4,800 Mbp 4 chromosomes 16 chromosomes (haploid) 3 chromosomes (haploid) 5 pairs (diploid) 4 pairs (diploid) 23 pairs (diploid)

The smallest human chromosome contains about 10-fold more bps than the E. coli genome. The haploid form of the human genome has about 200-fold more DNA/cell than yeast, but it has only a few chromosomes more. Therefore, the DNA content of chromosomes can vary widely. It appears that genome size is roughly correlated with the complexity of the organism. Prokaryotic cells typically have genomes of less than 10 Mbp and many Archaea have rather small genomes compised between 2 and 3 Mbps. The genomes of single cell eukaryotes are typically less than 50 Mbp (more complex protozoans can have up to 200 Mbp genomes). Multicellular eukaryotes have even larger genomes, up to greater than 100,000 Mbp. Nevertheless, organisms of apparently similar complexity can have very different genome sizes. A fruit fly has a genome 25-fold smaller than a locust, and rice has a genome that is approximately 40-fold smaller than that of wheat. This lack of correlation between genome size and genetic complexity is referred to as the C-value paradox. It is presently not understood why natural selection allows this variation and whether it has evolutionary consequences. The fruit fly Drosophila melanogaster (genome = 180 Mbp) has about 15,000 genes that code for proteins (180 genes/Mbp). A human cell (4,800 Mbp) has about 20,000 of such genes (9.3 genes/Mbp). Only approximately 3% of the human genome is used as information coding for proteins. The remaining 97% are essentially used to regulate the expression of the other 3%. It consists of introns, repetitive DNA, snRNA genes (small nuclear RNA), etc. Clearly, the genome density (number of genes/Mbp of DNA) can vary largely and it appears that more complex organisms have a much lower gene density. Therefore, different organisms use the gene-encoding potential of DNA with varying efficiencies. The highest gene densities are found in viruses. In some instances they use both strands of a given DNA region to encode overlapping genes. In bacteria overlapping genes are quite rare but the gene density is still high, about 1,000 genes/ Mbp (on average 1 gene per kbp). The gene density of S. cerevisiae is about 500 genes/ Mbp (about half of the gene density of prokaryotes). But the human genome has a gene density that is still 50-fold lower (9.3 genes/Mbp). Two factors contribute to the reduction in gene density: -(i) an increase in gene size due to the presence of introns

-(ii) an increase in the length of intergenic sequences These intergenic regions consist of repetitive DNA: microsatellites (< 13 bp in length, highly repetitive DNA with many thousand copies per genome, frequently organized as long tandem repeats), genome wide repeats (> 100 bp up to > 1 kb) and pseudo genes. Although there are numerous classes of repeats, their common feature is that they are all forms of transposable elements (or remnants thereoff, inactive forms of transposable elements). The percentage of repeated DNA varies widely. Prokaryotes have nearly no repetitive DNA sequences, in lower eukaryotes it represents about 20 % of the genome, in animal cells it may represent up to 50 % of the genome (about 40% in humans) and in some plant cells and amphibians it can represent even up to 80%, so that the nonrepetitive DNA is reduced to a minority. Although it is common to refer to repeated DNA as junk DNA, the stable maintenance of these sequences over hundreds to thousands of generations suggests that intergenic DNA confers a positive value or selective advantage to the host organisms. We have seen above that there exist linear and circular chromosomes. Each form poses specific challenges that must be overcome for the maintenance and replication of the genome. The replication of a circular molecule will inevitably result in the generation of two circular daughter molecules that are catenated. Their separation requires the action of special enzymes, topoisomerases (see below: II. 5. 3.). The replication of linear DNA molecules poses specific problems of priming at the ends of the molecule (see Chapter III. 9. 4.). The extremities of the linear eukaryotic chromosomes contain special sequences, the telomeres. In between there is the centromere. The centromere plays an important role in the distribution of the sister chromatids to the daughter cells upon DNA replication and cell division. The telomeres (and a special reverse transcriptase called telomerase) play an extremely important role in the replication of the linear eukaryotic DNA molecules. In nature most DNA is negatively supercoiled. In bacteria this supercoiling is introduced by the action of topoisomerases, in eukaryotes (and to some extent in the euryarchaeota) negative supercoiling is generated by wrapping of the DNA around the histones.

II. 3. Structure of DNA


One of the most important features of DNA is that it is usually composed of two polynucleotide chains twisted around each other in the form of a double helix. Each chain is composed of a deoxyribose-phosphate backbone. A 3' - 5' phosphodiester bond is formed between two successive sugar moieties. They link the C3' atom of one sugar molecule to the C5' atom of the next sugar molecule (primes are used to design the positions of atoms in the sugar molecules to avoid confusion with the position of C-atoms in the base rings). One single phosphoryl group is therefore linked by two ester bonds with two sugar molecules. Phosphodiester linkages create the repeating sugar-phosphate backbone of the polynucleotide chain, which is a regular feature of DNA. The phosphodiester linkages impose the polarity of the DNA chain. On the 5'-end of the DNA chain the sugar molecule has a free phosphate, on the 3'- end it has a free hydroxyl group. In contrast, the order of the bases is irregular. This irregularity (and the way it is exploited in the flow of genetic information) is the basis for the enormous information content of DNA molecules. The bases are attached to the sugar moieties by a glycosidic bond formed between the C1' atom of the sugar and the N1 atom of a pyrimidine residue or the N9 atom of a purine residue. The two chains run antiparallel (5' " 3' and 3' " 5') and their sequences are complementary. Base-pairing between

complementary purine and pyrimidine residues is the key for the complementarity of the strands: A-T and G-C. Specific pairing is based on the complementarity of both shape and the hydrogen bonding properties of complementary bases: 2 hydrogen bonds for an A-T pair, 3 bonds for a G-C pair. Thus a purine (larger molecule, double ring structure) is always paired with a pyrimidine (smaller, one ring structure) and vice versa. This strict "Watson-Crick" base pairing is a prerequisite for the maintenance of a constant width (diameter of 20 , 2 nm) of the double-stranded DNA molecule. Indeed, the four bp have exactly the same geometry and there is an approximately twofold axis of symmetry that relates the two sugars and all four bp can be accomodated within the same arrangement without any significant distortion of the overall structure the DNA. Watson-Crick base pairing requires that the bases are in their preferred tautomeric states, that is with keto (C=O) and exocyclic amino (NH2) groups. Not the enol (C-OH) and imino (N-H) forms. The transient occurrence of these latter alternative tautomeric forms of the bases may result in the introduction of mismatches upon DNA replication (see Chapter IV, Mutagenesis and DNA repair). The complementarity between the sequences of the two strands also gives DNA its extremely important self-encoding character that is the basis of genetic inheritance (see Chapter III. DNA replication). In addition, the base pairs can also stack nicely on top of each other between, and on the inside, of the two helical sugar-phosphate backbones. The double helix can be viewed as a spiral staircase in which the bases form the treads. The stability of the double stranded DNA molecule is mainly generated by the energy of the hydrogen bonds uniting complementary bases in strands of opposite polarity and by the hydrophobic interactions of stacked bases (roughly perpendicular to the helical axis). When the strands come together in a double stranded molecule the H2O molecules are displaced from the bases. This creates disorder and therefore increases the entropy, thus stabilizing the DNA. The bases are flat and relatively H20 insoluble molecules. Electron cloud interactions ($ - $) between bases significantly contribute to the stability of the DNA helix. The stacking energy is different for every combination of successive bases and is minimal for the TpA step (which is therefore frequently found in regions where the DNA helix must melt to allow initiation of DNA replication or transcription. As a result of the winding of one strand around the other, the DNA molecule is a long extended polymer with two grooves that are different in size. The existence of a minor (12 across) and a major groove (22 across) is a consequence of the geometry of the base pairs. The angle at which the two sugars protrude from the bp (angle between the glycosidic bonds) is about 120 for the narrow angle and 240 for the wide angle. Since the bases stack on top of each other with an approximately 34 rotation between two successive bases in one strand (average of 10.5 bp per turn for B-form DNA), this generates a minor groove and a major groove that spiral around the molecule. The edges of each base pair are exposed in the major and minor grooves, creating a pattern of hydrogen bond donors and acceptors and of van der Waals surfaces that characterize the specific bp. If we look at the distribution of these groups in the two grooves, then it is clear that there are more potential bonding groups in the major than in the minor groove and it will be easier for an interacting molecule (mostly proteins) to unambiguously recognize a specific combination in the major groove than in the minor groove. This is extremely important for the formation of specific protein-DNA complexes which play an essential role in nearly all vital cellular processes implying DNA transactions (replication, transcription, modification, repair, recombination, segregation, ...). For a protein to unambiguously read the identity of a bp, at least two hydrogens bonds have to be made In the minor groove an A-T pair can not be differentiated from a T-A pair. Upon

switching, the N3 acceptor of adenine and the C2 carbonyl acceptor of thymine will simply switch positions. Similarly, a G-C pair can hardly be distinguished from a C-G pair. In the major groove switching does not create similar combinations of hydrogen bonding groups. Furthermore, the hydrophobic C5-methyl (-CH3) of thymine provides an additional sequence recognition element that protrudes into the major groove.

II. 4. DNA exists in different forms


B-form DNA In vivo most of the DNA is supposed to exist in the B-form. B-form DNA is a regular, righthanded helix (diameter of 20 ) in which the turns run clockwise when looked at along the helical axis. It has on average 10.5 bp per turn and a rise of 3.32 (pitch = 33.2 ). The pentose C2' is in the endo-configuration and the bases in the anti-conformation. The bases are lying approximately flat and perpendicular with respect to the helical axis (only tilted by 1.2). These parameters can, however, be locally influenced by particular sequences (such as stretches of A's or alternating G and C residues) and environmental factors (such as the presence of Mg2+ ions and the degree of hydration). A-form DNA A-form DNA is a less hydrated form of dsDNA. It is also characteristic of double stranded RNA and DNA-RNA hybrids and of double-stranded DNA stretches in some DNA-protein complexes. The 2 OH-group of the ribose in RNA avoids that the B-form is adapted. A-DNA is shorter and larger than B-form DNA (for the same number of bps). The bases are not lying flat as in B-form DNA, but they are slightly tilted (+19) with respect to the helical axis. Aform DNA has more bps per turn (pitch = 24.6 ), therefore it is more compact and is underwound (11 bp/turn; rise = 2.3 ). The major groove of A-form DNA is narrower and deeper and the phosphates overhang it. Therefore the major groove is less accessible than in the B-form. In contrast, the minor groove is superficial, flat and broad. The molecular basis for the distinction between A- and B-form DNA comes from the flexibility of the sugar ring. In the A-form the sugars are in the C3'-endo configuration. Because of this change in the sugar pucker, the distance between two adjacent phosphate residues along the chain is reduced by approximately 1 (from 7 in the B-form to 5.9 in the A-form). The sugar molecules constitute therefore an elastic element in the DNA backbone that permits conformational changes. Z-form DNA Z-form DNA is the only left-handed form of DNA. It has about 12 bp/turn and a rise of 3.8 /bp (pitch = 45.6 ); therefore, it has the least twisted structure and is underwound. Z-DNA is skinny. The plane of the base is slightly tilted with respect to the helical axix (-9). Its name indicates the zig-zag structure of the backbone. Z-DNA has only one groove, with a higher density of negative charges than the grooves of B-form DNA. The minor groove is more narrow and very deep. The equivalent of the major groove in B-form DNA is no longer a groove, but a convex surface. The Z-DNA form is most easily adopted in polymers with a sequence of alternating purine and pyrimidine nucleotides, especially -GCGCGCGC-. The molecular basis of Z-DNA is the syn-conformation (rotation around the glycosidic bond) of the guanines and their C3'-endo conformation of the sugar moieties (C2-endo for the cytosine residues). This generates a dinucleotide repeat instead of the mono-nucleotide repeat in Bform DNA. Z-DNA is stabilized by negative supercoiling in vivo. It appears to be associated

with actively transcribed regions (a moving polymerase generates positive supercoiling ahead of, and negative supercoiling behind the polymerase). Other DNA-forms DNA can still adopt other forms, such as the triple-stranded DNA which is an important intermediate, generated by the action of the RecA protein, in the process of homologous DNA recombination. The cruciform structure uses the complementary pairing of inverted repeat sequences in a single strand. It is also a structure that is widely found in nature and serves as a specific signal molecule in crucial cellular processes. A similar structure is adopted in four way junctions, typical intermediates in the resolution process of recombining molecules (Holliday junctions).

II. 5. DNA Topology


II. 5. 1. Negative and Positive Supercoiling All kinds of dynamic processes like DNA replication and transcription will affect the number of times that the strands are twisted around each other. As the ends of a linear molecule are free, these molecules can freely rotate to accomodate changes in the number of times that the two chains of the double helix twist about each other. However, if the ends are covalently joined, as in a circular molecule, or anchored to the membrane, then the absolute number of times the chains can twist around each other cannot change. Such a closed molecule is said to be topologically constrained. Even the very large, linear eukaryotic chromosomes are topologically constrained due to their extreme length, entrainment in chromatin (see below) and interactions with other cellular components. Therefore, in nature all DNA molecules are topologically constrained. Despite these constraints, DNA is involved in all kinds of dynamic transactions (replication, transcription, repair, recombination, ...) that involve local strand separation, and hence have an impact on the topology. The topological state of a circular DNA molecule can be described by the linking number (Lk). It is the sum of two geometric components: the twist (Tw) and the writhe (Wr). The twist is simply the number of helical turns of one strand around the other. If we consider a covalently closed circular DNA molecule (cccDNA) lying flat on a plane, then the linking number is fully composed of twist. The helical crossovers (twist) in a right-handed helix are defined as positive such that the linking number of DNA will have a positive value. However, cccDNA is generally not lying flat on a plane. Instead, it is usually torsionally stressed such that the long axis af the double stranded helix crosses over itself, often repeatedly, in threedimensional space (think about a telephone cord that has been overtwisted). This crossing of the helix over itself is called the writhe. The writhe can take two forms. One form is the interwound or plectonemic writhe, in which the long axis is twisted around itself. Positive plectonemic writhe corresponds to a left-handed superhelix. The other form of writhe is a spiral or toroid in which the long axis is wound in a cilindrical manner, as often occurs when DNA wraps around protein (see below, nucleosomes). For toroidal writhe, left-handed wrapping induces negative superhelicity (whereas the opposite is true for interwound or plectoneic writhe). The left-handed toroidal wrapping of DNA around the nucleosome reduces the linking number of the associated DNA. The assembly of many nucleosomes on topologically constrained DNA requires the presence of a topoisomerase that can remove positive supercoils. Without the help of a topoisomerase, for every nucleosome formed, the unbound part of the DNA would have to accommodate an equivalent increase in linking number (in topologically constrained DNA the linking number is fixed in the absence of topoisomerase activity).

The writhing number (Wr) is the total number of interwound and/or spiral writhes in cccDNA. Interwound writhe and spiral writhe are topologically equivalent to each other and are readily interconvertible. Also twist and writhe are interconvertible. A cccDNA molecule can undergo distortions that convert some of its twist to writhe or vice versa without the breakage of any covalent bond. The only constraint is that the sum of the twisting number (Tw) and of the writhing number (Wr) must remain equal to the linking number (Lk). (A change in Lk can only be obtained by breaking of at least one phosphodiester bond, see below: topoisomerases).

Lk = Tw + Wr
A cccDNA molecule that has no supercoiling is said to be relaxed. Its linking number is Lk0 and corresponds to the total number of bp of that molecule divided by the number of bp per turn. The extent of supercoiling of a molecule corresponds to the difference between the linking number and the Lk0 of that molecule.

!Lk = Lk - Lk0
If Lk < Lk0 and !Lk < 0 (negative) then the DNA is said to be negatively supercoiled. Conversely, if Lk > Lk0 and ! Lk > 0 (positive), then the DNA is positively supercoiled. Because !Lk and Lk0 are dependent on the length of the molecule it is convenient to refer to a normalized measure of supercoiling, the so called superhelical density (%):

% = !Lk/Lk0
Circular DNA molecules (plasmids, episomes) extracted from bacteria and eukaryotes are usually negatively supercoiled, with % values of about -0.06 (about one negative supercoil/200bp, which corresponds to -9 kcal/mol). Negative supercoiling can be thought of as a means of storage of free energy that aids in processes that require local strand separation, such as DNA replication and transcription. Because Lk = Tw + Wr, negative supercoils can be converted into untwisting of the double stranded helix. Thus strand separation can be more easily accomplished in negatively supercoiled DNA than in relaxed DNA. In nature, DNA is generally negatively supercoiled, except in hyperthermophiles (microorganisms with an optimal temperature for growth of 80C or above). These have relaxed to positively supercoiled DNA. The positive supercoils can be thought of as a means to store free energy that helps to protect the DNA from denaturing at high temperatures. In so far positive supercoils (overwound DNA) can be converted into more twist, strand separation of positively supercoiled DNA will require more energy than for negatively supercoiled DNA. II. 5. 2. Topoisomers can be separated by gel electrophoresis and ultracentrifugation Relaxed and supercoiled DNA have a different density (weight/volume). Therefore, plasmid DNA (cccDNA) can be separated from genomic DNA (linear, because broken during the extraction procedure) by centrifugation in a CsCl gradient. Upon ultracentrifugation (50,000 rpm) the Cs+ and Cl- ions are pushed towards the bottom of the tube by the centrifugal force and finally they will form a gradient with the highest concentration at the bottom and the

lowest concentration at the top of the tube. The DNA molecules are also pushed towards the bottom but the increasing salt concentration will exert an opposite force. Finally, the DNA will float at a position in the gradient where its density is equal to the density of the CsCl gradient. Thus, plasmid DNA will float closer to the bottom and the linearized genomic DNA closer to the top. This strategy has been routinely used in the past for plasmid DNA preparation and purification. Today, it is replaced by extraction kits which are based on another principle (rapid denaturation/renaturation and binding to a resin), but the technique is still used in applications that require very pure plasmid DNA (without contamination with genomic DNA). cccDNA molecules of the same length but of different linking number are called topoisomers. Even though they have the same molecular weight, they can be separated by electrophoresis through an agarose or polyacrylamide gel. The basis for this separation is that the greater the writhe, the more compact the shape of the cccDNA will be. The more compact (at least up to a certain limit) the molecule is, the more easily it will be able to migrate through the gel matrix. Thus a relaxed molecule will migrate more slowly than a supercoiled topoisomer of the same circular DNA molecule. Ethidium is a large, flat multi-ringed cation. Its flat shape enables ethidium to intercalate between the stacked base pairs of DNA. Because it fluoresces when exposed to UV light and because its fluorescence increases dramatically after intercalation, ethidium bromide is used as a stain to visualize DNA (with caution because EtBr is mutagenic and likely oncogenic!). When an ethidium ion intercalates between two bp it causes the DNA to unwind by 26. Therefore, the normal rotation/bp is reduced from 36 to 10. In other words, ethidium decreases the twist of DNA. The linking number of the DNA does not change (if it is cccDNA), but the twist decreases by 26 for each ethidium ion that has intercalated. This decrease in twist must be compensated for by a corresponding increase in writhe (Lk = Tw + Wr). If the DNA was originally negatively supercoiled, then the binding of ethidium will relax the DNA and consequently, it will migrate more slowly. If enough ethidium is added, the negative supercoiling will become zero and if still more is added, the DNA might even become positively supercoiled. II. 5. 3. Topoisomerases As described above, the linking number is a invariant property of a DNA molecule that is topologically constrained. It can only be changed by introducing a break (interruption of the sugar-phosphate backbone) in at least one strand of a double stranded molecule, passing one strand through the break and resealing of the ends (restoration of the phosphodiester bond). This can be performed by a special class of enzymes, the topoisomerases, which are able to introduce transient single-stranded or double-stranded breaks and to religate the ends after modification of the Lk. Topoisomerases play an important role in vital processes such as DNA replication and transcription that generate tension in the DNA. Indeed, the unwinding of the two strands of a duplex molecule ahead of the moving replication fork or transcription bubble generates positive supercoiling. This is due to the fact that upon unwinding the linking number has to be conserved (!Lk=0) in a increasingly smaller number of bps. This accumulation of tension is incompatible with the further progress of the replication fork or the transcription bubble. Topoisomerase activity is, therefore, required to eliminate this tension.

Topoisomerases can be distinguished and classified in two major types: Type I and Type II. Both types can remove supercoils from DNA, but only some special type II topoisomerases (like gyrase) can introduce negative supercoils. Type I enzymes make transient single-stranded breaks and change the linking number in steps of one. They do not consume ATP and are monomeric enzymes. E. coli toposiomerase A (TopA or & protein) is a typical type I topoisomerase (first topoisomerase that was discovered, prototype of Type Ia enzymes). It can relax negatively but not positively supercoiled DNA. Type I enzymes can do this without the help of other enzymes or highenergy molecules; they do not consume ATP. They can do so because they use a covalentintermediate mechanism. The energy of the phosphodiester bond, which is cleaved by the attack of a hydroxyl group from a tyrosine residue in the active site of the topoisomerase, is used to covalently link the tyrosine to the DNA (phospho-tyrosine linkage). This bond conserves the energy of the phosphodiester bond that was cleaved. The other end of the broken DNA strand terminates with a free OH-group. This end is also tightly held by the enzyme. The DNA can be re-sealed simply by reversing the original reaction: the OH-group of the broken DNA end attacks the phospho-tyrosine bond, thereby reforming the original phosphodiester bond and releasing the topoisomerase which can then perform another reaction cycle. Correct functioning of a topoisomerase requires that the three steps: DNA cleavage, strand passage and strand rejoining, occur in a highly coordinated manner. To initiate a relaxation cycle the topoisomerase binds to a locally melted DNA duplex. Local separation of the strands is favored in highly negatively supercoiled DNA (see above), therefore negatively supercoiled DNA is a better substrate for relaxation. One of the strands will then bind in a cleft in the enzyme that places the DNA near the tyrosine residue of the active site. The success of the reaction (proceeding to the reversal) requires that the other end of the cleaved DNA is also tightly bound by the enzyme. After the cleavage, the topoisomerase undergoes a conformational change to open up a gap in the cleaved strand, with the enzyme bridging the gap. The uncleaved strand then passes through the gap and binds to a DNA-binding site in an internal "donut-shaped" hole in the enzyme. After the strand passage, a second conformational change in the topoisomerase-DNA complex brings the cleaved ends back together, rejoining of the strands can then occur by attack of the free OH-end of the DNA on the phospho-tyrosine bond. The DNA molecule is then identical to the original one except that the linking number has been increased by one unit (!Lk = +1). Type I enzymes are further divided in Type IA and Type IB enzymes on the basis of the polarity of strand cleavage. Type IA enzymes make a covalent bond between the 5-phosphate end of the nick and the tyrosine residue and generate a free 3-OH end. Type IB enzymes covalently link the tyrosine residue to the 3-P and generate a free 5-OH end. Type IA enzymes are present in all three domains of life (bacteria, archaea and eukaryotes), a typical example is the & protein of E. coli. Type IA topoisomerases bind to single stranded regions (facilitated by negative supercoiling that can be converted in untwisting); therefore they can only remove negative supercoiling (but not positive supercoiling). They use a mechanism of enzyme bridged strand passage and the Lk changes by steps of 1. Type IB enzymes are ubiquitous in eukaryotes and can bind ds DNA regions and remove both negative and positive supercoils. Type IB enzymes use a controlled rotation mechanism for the strand passage. They constitute the major topoisomerase activity in eukaryotes to relax positive supercoiled DNA generated during DNA replication and transcription (in bacteria this is done by Type II enzymes).

Type II enzymes make transient double stranded gaps and change the linking number in steps of two. They can generally act on both negatively and positively supercoiled DNA. Type II enzymes require the energy of ATP hydrolysis for their action. This energy is not required for the cutting and sealing reactions, but to promote the topological changes in the protein-DNA complex required for enzyme turnover. Type II enzymes are multimeric [homodimeric (eukaryotes) or heterotetrameric (bacterial)] enzymes with a dyad symmetry. They all contain an ATPase domain and a DNA-binding/cleavage domain. The two domains can be present on the same polypeptide chain as in eukaryotic enzymes (and eukaryotic viruses that have their own topoisomerase) or be present on separate polypeptide chains as in bacterial enzymes. Each domain is present in 2 copies (homodimers in eukaryotes and heterotetramers in bacteria). Type II enzymes can relax both negative and positive supercoiled DNA, they can catenate and decatenate. Therefore, they play an essential role in all cells for the segregation of chromosomes after replication and before cell division. Two subunits with their active site tyrosine residues are required to cleave both strands of a duplex (the gate or G-segment) thus generating a gap through which the transported segment (T-segment) is passed, and finally the gap is sealed again. Both prokaryotes and eukaryotes have type I and type II enzymes that can remove supercoils from DNA. However, only prokaryotes have a special type II enzyme, gyrase, that can introduce negative supercoils. Gyrase is responsible for the negative supercoiling of prokaryotic chromosomes. It can introduce negative supercoiling due to the wrapping (in the positive sense) of the DNA around the C-terminal domain of GyrA. The sign inversion that occurs upon passage of the T-segment through the Gate finally results in the introduction of a negative supercoil. Gyrase introduces a staggered cut in which the DNA extremities have a free 3'-OH group and a 4 nucleotide long 5'-single stranded extension that is covalently bound to the enzyme (conservation of energy). E. coli gyrase is a heterotetrameric enzyme A2B2 of about 400 kDa. The A subunit is responsible for the cleavage and resealing, the B-subunit has the ATPase activity. Gyrase is sensitive to the antibiotics nalidixic acid that acts on the A-subunit and coumermycin that blocks the Bsubunit. Gyrase is also the target for the action of the CcdB dote (toxin) protein of the E. coli fertility factor or F-episome addiction system CcdA-CcdB. Reverse gyrase. Hyperthermophiles, bacteria and archaea with an optimal temperature for growth of 80C or above, have slightly relaxed to positively supercoiled DNA and a special enzyme, reverse gyrase. Reverse gyrase is a type I topoisomerase (and not a type II, as gyrase is) that introduces positive supercoils at the expense of the enrgy of ATP hydrolysis. It combines two activities (which can be present in one polypeptide chain or on different subunits, depending on the organism): a helicase activity and a topoisomerase activity of Type I. Hyperthermophiles also have Type II topoisomerase activity, but this enzyme can only remove but not introduce negative supercoils. It can unknot and decatenate (see below) and is therefore supposed to play an important role in the segregation of daughter chromosomes after replication (see below). It is the combination of the reverse gyrase and type II activities that results in positively supercoiled chromosomes in hyperthermophiles. Therefore, the DNA of hyperthermophiles has a higher linking number (Lk) than that of mesophiles. This might be important to protect the DNA of these organisms against melting at high temperatures. In hyperthermophilic bacteria such as Thermotoga that have a combination of gyrase and reverse gyrase the DNA is negatively supercoiled, indicating that the action of gyrase is dominant. Topoisomerases can also catenate and decatenate (disentangle) DNA molecules and unknot. These reactions are also important to maintain a proper DNA structure in the cell. Circular DNA molecules are said to be catenated if they are linked together like two rings of a chain. Catenated molecules are generally generated upon replication of a circular DNA molecule. These have to be separated to allow their segregation to different daughter cells.

Therefore, the decatenation activity of topoisomerases is of crucial physiologic importance. Decatenation requires the passage of the two strands of one molecule through a doublestranded break in the other molecule. Therefore, this is performed by type II topoisomerases. They are essential cellular components. However, if one double stranded molecule already carries a single stranded break (nick), then a type I enzyme may also unlink the catenated molecules. Although eukaryotic chromosomes are linear molecules, their extreme length will also create topological problems. During a round of replication the two double-stranded DNA molecules (sister chromatids) will often become entangled and these sites of entanglement block the separation of the chromosomes during mitosis. Occasionally, a DNA molecule can also become knotted. This is particularly so in some sitespecific DNA recombination reactions. These knots can be removed by type II topoisomerases. Also, if the molecule has already a single stranded break (nick, as may temporarily exist in reaction intermediates), a type I enzyme can do the job.

II. 6. Structure of eukaryotic chromosomes


Different levels of eukaryotic chromosome structure and organization can be observed in the microscope. Long before it was clear that chromosomes are the source of genetic information in the cell, their movements and changes during cell division were already well analyzed; the compact nature of condensed mitotic chromosomes allows their easy observation in the light microscope. In the electron microscope two other states of chromatin could be readily observed: the 30-nm and the 10-nm fibers. Indeed, chromosomal DNA in the interphase is much less compact. The 30-nm fiber appears as a structure folded into large loops reaching out of a protein scaffold. The 10-nm fiber is the least compact form of chromatin. It resembles a regular series of "beads on a string". The beads are the nucleosomes. They are the building blocks of eukaryotic chromosomes. II. 6. 1. Structure of the nucleosome A nucleosome is composed of a core of eight histone proteins and the DNA wrapped around this core in a left-handed manner. This wrapping introduces negative supercoiling (toroidal supercoiling) in the eukaryotic DNA. It are the positively charged N-terminal tails of the histones and the way they protrude from the histone core that impose the direction of the wrapping. The DNA between the nucleosomes is called linker DNA. By assembling into nucleosomes the DNA is compacted about 6-fold (much less than the 10,000fold observed in chromosomes). The DNA that is tightly associated with the histones is called the core DNA. It is # 147 bp long and is wound 1.65-times around the outside of the histone octamer, like thread around a spool. This is so in all eukaryotic cells. In contrast, the length of the linker DNA is variable (in S. cerevisiae it is on average 13-18 bp, in humans 38-53 bp long). In any cell there are also stretches of DNA that are not packaged into nucleosomes. These are typically regions that are being replicated or transcribed. These sites are then typically associated with non-histone proteins that are either involved in these processes or are regulating these processes. The accessibility of the DNA and the chromatin remodeling are important aspects of eukaryotic transcription and transcription regulation. Histones are small, positively charged proteins (they contain more than 20% lysine and arginine residues). They are by far the most abundant class of DNA-associated proteins. Eukaryotic cells generally contain five different abundant histones: H1, H2A, H2B, H3 and H4. The histones H2A, H2B, H3 and H4 are the core histones. They are 11 to 15 kDa proteins that form the octameric disc shaped core around which the DNA is wrapped. Histone H1 (20

kDa) binds to the linker DNA and is referred to as linker histone. H1 is half as abundant as the core histones (which are present in equal amounts). The core histones assemble in an ordered fashion only in the presence of DNA. In solution they form intermediate assemblies. A conserved region, called the histone-fold domain (a very characteristic fold) mediates the assembly of the histone-only intermediates. The histone fold is composed of three '-helical regions separated by two short unstructured loops. The histone fold mediates the formation of head to tail heterodimers of specific pairs of histones. First H3 and H4 form heterodimers that then assemble into a tetramer (H3-H4)2. H2A and H2B form heterodimers (not tetramers). The further ordered assembly of the nucleosome involves the association of these building blocks with DNA. First the (H3-H4)2 tetramer binds to DNA. Then two H2A-H2B dimers join the (H3-H4)2 tetramer - DNA complex to form the final nucleosome. The assembly of nucleosomes on replicating DNA requires the CAF-I complex. The four core histones have an N-terminal extension, called the tails. The tails lacks defined structure and are accessible within the intact nucleosome (treatment with trypsin which specifically cleaves proteins after positively-charged residues readily removes the N-terminal tails). These tails are the sites of extensive modifications that alter the function of individual nucleosomes. These modifications mainly include: phosphorylation of serine residues and acetylation and methylation of lysine and arginine residues. These modifications alter the interaction of the DNA with the histone core and play a major role in modification of chromatin structure and therefore affect processes such as transcription initiation (see Eukaryotic transcription). Fourteen distinct sites of contact can be observed between the histones and the core nucleosomal DNA. That is one for each time the minor groove of the DNA faces the histone octamer. This generates about 140 hydrogen bonds between the histone proteins and DNA. The majority of the hydrogen bonds are between the proteins and the oxygen atoms in the phosphodiester backbone near the minor groove of the DNA (non sequence specific interactions). Only 7 hydrogen bonds are between the proteins side chains and the basespecific groups in the minor groove (none of these is with elements that distinguish between a G-C and an A-T bp). This high number of hydrogen bonds (most protein-DNA interactions will make only 20 hydrogen bonds) provides the driving force to bend the DNA around the histone core. The highly positive charge of the histone proteins also serves to mask the negative charge of the phosphates on the inside of the bend into unfavorable close proximity and facilitates the close juxtaposition of the two '-helices in the 1.65-times wrapped structure. The organization of DNA into nucleosomes is dynamic. Three forms of mobility can be observed: -(i) sliding of the histone octamer along the DNA -(ii) complete transfer of the histone octamer from one molecule to another -(iii) more subtle remodeling of the protein-DNA interactions within nucleosome. II. 6. 2. Higher-order chromatin structure: binding of H1 stabilizes 30 nm fibers a

The next step in the packaging of DNA, once the nucleosomes are formed, is the binding of the linker histone H1. The basic protein H1 makes contacts with two distinct regions: it binds the linker DNA and the DNA helix in the middle of the nucleosome-bound DNA, thereby further tightening the histone-DNA association and bringing these two regions in close proximity. Therefore, H1 binding increases the length of the DNA wrapped tightly around the histone-octamer. There are two models for the 30-nm fiber. In the solenoid model, the nucleosomal DNA forms a superhelix containing 6 nucleosomes per turn. In this model, the flat surfaces on either face of the histone octamer disc are adjacent to each other and the DNA surface of the nucleosomes forms the outside, accessible surface of the superhelix. The linker DNA is buried in the center of the superhelix, but it never passes through the axis of the fiber. It circles around the central axis as the DNA moves from one nucleosome to the next. An alternative model for the 30-nm fiber is the zig-zag model. In this zig-zag model, the linker DNA is required to pass through the central axis of the fiber in a relatively straight form. Longer linker DNA will favor this conformation. Because the average length of the linker DNA varies between different species, the form of the 30-nm fiber may not always be the same. It has been observed that core histones lacking the N-terminal tails are incapable of forming 30-nm fibers. Therefore, the N-terminal tails are absolutely required and their most likely role is to stabilize the 30-nm fiber structure by interacting with adjacent nucleosomes. The formation of the 30-nm fiber structure results in the compaction of the linear length of DNA by 40-fold (still insufficient). Additional folding of the 30-nm fiber is required to fit 1-2 meters of DNA in a nucleus of about 10-5 m (10 m) across. The exact nature of this folding is still unclear, but it appears that the 30-nm fiber forms loops of 40-90 kb that are hold together at their bases by a proteinaceous structure, the nuclear scaffold. Topoisomerase II (Topo II, a type II topoisomerase) and the SMC proteins (Structural Maintenance Chromosome) are abundant components of the nuclear scaffold. These proteins are key components of the machinery that condenses and holds daughter chromosomes together after chromosome duplication.

II. 7. Molecular biology techniques based on DNA-DNA and DNA-RNA complementarity


II. 7. 1. Southern blotting (named after its inventor Edward Southern) A cloned gene (or part of a gene) or a PCR fragment can be used as a probe for finding segments of (genomic) DNA that have the same or a very similar sequence, to verify the construction of a knock-out mutant, etc. The genomic DNA is extracted and digested with a restriction enzyme that will cut it into defined populations of segments of specific size. These fragments can be separated into groups of fragments of the same length by using (agarose) gel electrophoresis. DNA is negatively charged and will migrate toward the positive pole. The electric field causes the molecules to move through the pores of the gel in a wormlike fashion at speeds inversely proportional to their size:

v = E.Q.h2/f.L2
v = velocity E = electric field G = effective charge

h = end-to-end distance f = friction coefficient L = contour length After fractionation, the gel is soaked in alkali to denature the double-stranded DNA and then blotted onto a piece of porous positively charged membrane, where they stay in the same relative positions. After having been soaked in alkali to separate the DNA strands and link the DNA to the membrane, the membrane is placed in a hybridization buffer containing the probe. The single-stranded probe will find and bind to its complementary DNA sequence. Unbound probe is removed by washing steps. The probe must be labeled such that it can be readily located on the membrane once it has bound to its complementary target sequence. Probes can be labeled with radioactive atoms (for example by adding a 32P-atom at the 5'-end of DNA by the enzyme polynucleotide kinase and (-32P-ATP as substrate, or by incorporation of a labeled radioactive or fluorescent precursor during in vitro DNA synthesis). Radioactive probes can be detected by autoradiography (X-ray sensitive film) or by photomultipliers that emit light in response to the excitation by the beta particles emitted from 32P, 33P and 35S. Fluorescent probes can be detected by irradiation with appropriate wavelength UV light and monitoring the longer wavelength that is emitted in response. II. 7. 2. Northern blotting A procedure similar to the Southern blotting can be applied for detecting a particular mRNA in a population of RNAs. Because mRNA molecules are relatively short (generally < 5 kb) there is no need to digest them; they can be directly separated by gel electrophoresis, transferred to a positively charged membrane and probed with a DNA probe of choice. This technique can provide information on the amount of the particular mRNA and its size. Therefore, it is one of several techniques that allows the quantitative analysis of promoter strength (amount of mRNA produced) and the potential use of alternative promoters (monocistronic and polycistronic mRNA in prokaryotes) and of alternative splicing (in eukaryotes) (see further: Transcription). II. 7. 3. DNA-DNA hybridization in chemotaxonomy Two DNA molecules are expected to hybridize to one another in proportion to the similarities in their gene sequences. Genomic hybridization measures the degree of sequence similarity in two DNAs and is useful for differentiating very closely related organisms where rRNA sequencing may fail to be definitive. DNA isolated from one organism is sheared to relatively small size, radiolabeled with 32P or 3H (or with another non-radioactive labeling technique), heated to denature, and mixed with an excess of unlabeled DNA prepared in the same way from a second organism. The DNA mixture is then cooled to allow it to reanneal and doublestranded DNA is separated from the remaining unhybridized single-stranded DNA. Then, the amount of radioactivity in the hybridized DNA fraction is determined and compared with the control (2x DNA from the same organism), which is taken as 100%. DNA:DNA hybridization is a very sensitive technique for revealing subtle differences in the genes of two organisms and is therefore useful for differentiating closely related organisms. There is no fixed convention as to how much hybridization between two DNAs is necessary to assign two organisms to the same taxonomic rank. However, hybridization values of 70% or greater are recommended for considering two isolates to be of the same species. By contrast, values of at least 25% are required to argue that two organisms should reside in the

same genus. More distantly related organisms like Clostridium (gram-positive) and Salmonella (gram-negative) will only show background level hybridization ("10%). II. 7. 4. FISH (fluorescent in situ hybridization) In the FISH technique, a DNA or RNA probe is labeled by attatching a fluorescent dye and used to hybridize to a complementary nucleic acid from a mixture. Probes can be general or specific: universal ribosomal RNA probes can be generated that will hybridize to conserved sequences in the rRNA of all organisms, regardless of the domain. By contrast, specific probes can be designed that will only hybridize to rRNA of Archaea, or Bacteria, or Eucarya, because of the unique signatures found in their rRNA sequences. Even major groups within a domain such as genera or families can be targeted with specific probes. FISH is therefore widely used in microbial ecology and clinical diagnostics. The binding of a fluorescent probe to its target can be seen microscopically. By treating cells with the appropriate reagents, membranes become permeable and allow the penetration of the probe. FISH can be applied directly to cells in culture or in the natural environment and offers therefore a rapid method for assessing the composition of microbial communities directly by microscopy.

II. 7. 5. Microarrays and ChIP-chip (see Transcription).

You might also like