You are on page 1of 70

Molecular Marker and Its Application to Genome Mapping and Molecular Breeding

Binying Fu
Institute of Crop Sciences The Chinese Academy of Agricultural Sciences Beijing 100081, China

Nov-14-2012

Definition of Biological Marker


Biological markers can be anything that distinguishes one individual or population from another Can be phenotypic Color: yellow vs white etc Texture: smooth vs rough etc Shape: round vs irregular etc Can be a biochemical or genetic difference

Phenotypic Markers

http://cgil.uoguelph.ca/QTL/Fig2_3.htm

Weakness: unstable and limited number and polymorphism

Cytological Marker
Any distinct and heritable feature of chromosome structure that can be used to follow (usually by microscopy) that chromosome or chromosome region in breeding experiments.

Weakness: side effect and need special technique

Biochemical Marker-Isozyme and Protein

Weaknesslimited number, spatio-temporal expressed and need special technique such as Starch Gel with special staining

Characteristics of Ideal Markers


Polymorphism Stability, no influences from the environment Wide dispersion through the genome Simplicity of observation Low cost Mendelian Heritability Co-dominancy Reproducibility Portability between species

Molecular Markers
DefineA molecular selection technique of DNA signposts which allows the identification of differences in the nucleotide sequences of the DNA in different individuals. Or any genetic element ( locus, allele, DNA sequence or chromosome feature) which can be readily detected by phenotype, cytological or molecular techniques, and used to follow a chromosome or chromosomal segment during genetic analysis. (Also DNA marker) Agriculture: a tool which allows crop geneticists and breeders to locate on a plant chromosome the genes for a trait of interest. It is considered more efficient than conventional breeding as it has the potential to greatly reduce development times and substitutes laboratory selection for much of the fieldwork. MAS or MDB! Molecular, or DNA-based, markers have been increasingly important in plant breeding because of their features: Phenotypic stability (not affected by environment), Useful polymorphism, Ease of development.

Where does the molecular marker come from?


Mutation = heritable (at the cell level) changes in DNA sequence, regardless of whether the change produces any detectable effect on a gene product. Mutations are the source of new variation (polymorphism) upon which natural selection works. Inherited mutations that are dispersed through a population can become polymorphisms. Polymorphism = presence in the same population of two or more alternative forms of a DNA sequence, with the most common allele having a frequency of 99% or less. Any two individuals have a polymorphic difference every 1,000-10,000 base pairs.

Comparison of Mutation Frequencies


Class of Mutation
Genome mutation

Mechanism
Chromosome missegregation Chromosome rearrangement Base-pair mutation

Frequency
10-2/cell division

Example
Aneuploidy

Chromosome mutation

6x10-4/cell division

Translocation

Gene mutation

10-10/base pair/cell division 10-5-10-6/locus/generation

Point mutation

humans have ~109 base pairs/haploid genome, therefore each person will have 1-100 new mutations 1 in 20 people will have a new gene mutation

Types of Mutations (1)


Nucleotide Substitutions Altering Coding Sequence Missense mutations (amino acid substitution) Nonsense mutations (premature stop codon)

Types of Mutations (2)


Nucleotide Substitutions Altering Gene Expression RNA processing mutations (destruction of splice sites, cap sites, poly A sites, or creation of cryptic sites) Regulatory mutations (promoter mutations)

Types of Mutations (3)


Deletions and Insertions (InDels) Insertion or deletion of small number of bases If number of bases involved is not a multiple of 3, causes frameshift If number of bases involved is a multiple of 3, causes loss or gain of codons Larger deletions, inversions, and duplications Can create gene syndromes

Recombination-Generated RecombinationDuplications, Deletions, Insertions

Duplication

Insertion Inversion

Brief Summary
The term MARKER is usually used for LOCUS MARKER. Each gene has a particular place along the chromosome called LOCUS. Due to mutations, genes can be modified in several forms mutually exclusives called ALLELES (or allelic forms). All allelic forms of a gene occur at the same locus on homologous chromosomes. When allelic forms of one locus are identical, the genotype is called HOMOZYGOTE (at this locus), whereas different allelic forms constituted a HETEROZYGOTE. In diploid organisms, the GENOTYPE is constituted by the two allelic forms of the homologous chromosomes. Thus, MOLECULAR MARKERS are all loci markers related to DNA (sometimes biochemical or morphological markers included).

Molecular Markers Classes


First Generation: 1980s -Based on DNA-DNA hybridizations, such as RFLP. Second Generation: 1990s -Based on PCR: Using random primers: RAPD, DAF, ISSR Using specific primers: SSR, SCAR, STS -Based on PCR and restriction cutting: AFLP, CAPs Third Generation: recently -Based on DNA point mutations (SNP), can be detected by SSCP, DASH, DNA chip, sequencing etc.

The Evolution of Markers


AFLPs on microarrays (2000) AFLPs on automated sequencers (1998) AFLPs (1996) SCARs RAPDs (1990) Microsatellites (SSRs 1989)

SNPs on Chips SNPs

Automation

Complete Genomic Sequence High-throughput marker analysis

Genomic Era

Hallmark event

cDNA Sequencing-cSSR SSCPs


CAPs (1993)

Gene Specific PCR OLIGO-Scene Pre-PCR DNA-Hybridization-Scene Protein-Scene


Morphological Variants (Pre 1950s)
PCR (1986)

RFLPs (1980)
Restriction (1968) and Southern Blotting (1975)

Allozymes (1960s)
Gel Eletrophoresis (1950s)

DNA Markers
Simple Sequence Repeats-SSR Single Nucleotide Polymorphism-SNP Single Feature Polymorphisms (SFPs)

Microsatellites
What are microsatellites? Simple sequence repeats (SSRs) or microsatellites are tandemly repeated mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide motifs. SSR length polymorphisms are caused by differences in the number of repeats SSR loci are individually amplified by PCR using pairs of oligonucleotide primers specific to unique DNA sequences flanking the SSR sequence.
Example
Mononucleotide SSR (A)11

AAAAAAAAAAA
Dinucleotide SSR (GT)6

GTGTGTGTGTGT
Trinucleotide SSR (CTG)4

CTGCTGCTGCTG
Tetranucleotide SSR (ACTC)4

ACTCACTCACTCACTC

Microsatellites
Feature of SSR Marker

SSRs tend to be highly polymorphic. SSRs are highly abundant and randomly dispersed throughout most genomes. Most SSR markers are co-dominant and locus specific. Genotyping throughput is high and can be automated.

Microsatellites
Where are microsatellites found? Majority are in non-coding region

Microsatellites
Repeat Motifs
AC repeats tend to be more abundant than other di-nucleotide repeat motifs in animals The most abundant di-nucleotide repeat motifs in plants, in descending order, are AT, AG, and AC. Because AT repeats self-anneal, AT-enrichment methods have not been developed. Typically, SSRs are developed for di-, tri-, and tetra-nucleotide repeat motifs. CA and GA have been widely used in plants. SSR markers have been developed for a variety of tri- and tetra-nucleotide repeats in plants. Tetra-nucleotide repeats have the potential to be very highly polymorphic.

SSR Containing Sequences from BACBAC-ends


1 % in Corn 21% 3%
2bp 3bp 4bp 5-6bp

0.6 % in Soybean

76 %
SSR containing sequences in different BAC ends, there are 1% SSR in Corn, 0.6% in Soybean. Among these, most are dinucleotide repeats

Trinucleotide Repeats in Soy BACBAC-end Sequences


AAT AAC AAG ATG ATC AGG ACT CCT CGT ACC CTG

15%

5%

25%

48%
In the Soybean genome, most of the trinucleotide repeats in BAC-end sequences are AAT repeats, one quarter of them are AAC repeats.

Simple sequence repeats (SSRs). SSRs are particularly useful for developing genetic markers. They are believed to vary through DNA replication slippage , and are related to genetic instability . In Table 2, we describe SSR content for two sectors, n 6 to 11 units and n >11 units, to emphasize that the number of SSRs dropped substantially after 11 units. The SSR content for 93-11 was 1.7% of the genome, lower than in the human, where it was 3%. The overwhelming majority of rice SSRs were mononucleotides, primarily (A)n or (T)n, and with n is 6 to 11. In contrast, for the human, the greatest contributions came from dinucleotides.

From Nipponbare, Goff etal., 2002, Sciences.


The most prevalent SSR is tri-nucleotide; Most frequent 2-SSR is AG, 3-SSR is CGG, 4-SSR is CGAT.

Microsatellites
How do microsatellites mutate?

Replication Slippage Unequal crossing-over during meiosis

Replication Slippage
When the DNA replicates, the polymerase loses track of its place, and either leaves out repeat units or adds too many repeat units. Polymerase slippage or slipped-strand mispairing. A commonly observed replication error is the replication slippage, which occurs at the repetitive sequences when the new strand mispairs with the template strand. The microsatellite polymorphism is mainly caused by the replication slippage. If the mutation occurs in a coding region, it could produce abnormal proteins, leading to diseases.

Unequal crossing-over during meiosis

This is thought to explain more drastic changes in numbers of repeats. In this diagram, chromosome A obtained too many repeats during crossing-over, and chromosome B obtained too few repeats.

Microsatellites
Why do microsatellites exist? "junk" DNA, and the variation is mostly neutral a necessary source of genetic variation regulate gene expression and protein function
Moxon, E. R., Wills, C. 1999. "DNA microsatellites: Agents of Evolution?" Scientific American. Jan., pp. 72-77. Kashi, Y. and M. Soller. 1999. "Functional Roles of Microsatellites and Minisatellites." In: Microsatellites: Evolution and Applications. Edited by Goldstein and Schlotterer. Oxford University Press.

Models of Microsatellite Mutation (1)


1. Stepwise Mutation Model (SMM) This model holds that when microsatellites mutate, they only gain or lose one repeat. This implies that two alleles that differ by one repeat are more closely related (have a more recent common ancestor) than alleles that differ by many repeats. In other words, size matters when doing statistical tests of population substructuring. The SMM is generally the preferred model when calculating relatedness between individuals and population substructuring, although there is the problem of homoplasy.

Models of Microsatellite Mutation(2)


2. Infinite Alleles Model (IAM)
Each mutation can create any new allele randomly. A 15-repeat allele could be just as closely related to a 10-repeat allele as a 11-repeat allele. All that matters is that they are different alleles. In other words, size isn't important. A 15-repeat allele could be just as closely related to a 10-repeat allele as a 11-repeat allele.
15-repeat 11-repeat 10-repeat

8-repeat

Conventional Developmental Steps of SSR Markers

Genomic DNA

PCR test using diverse genotypes

Specific SSR DNA Library SSR probes

SSR

Positive Clones

Sequencing of positive DNA clones

Four Assay Methods


1. The customary method for SSR genotyping is denaturing polyacrylamide gel electrophoresis using silver-stained PCR products. These assays can usually distinguish alleles differing by 4 bp and may distinguish alleles differing by 2 bp. Semi-automated SSR genotyping can be performed by assaying fluorescently labelled PCR products for length variants on an automated DNA sequencer. Several instruments have been developed (e.g., Applied Biosystems and Li-Cor). Alleles differing by 2 to 4 bp can usually be distinguished. SSR length polymorphisms can be assayed using non-denaturing high performance liquid chromatography (Marino et al. 1998). Alleles differing by 2 to 4 bp can usually be distinguished. SSR alleles differing by several repeat units can often be distinguished on agarose gels.

2.

3.

4.

SSRs assayed on polyacrylamide gels typically show a characteristic stuttering. Stutter bands are artifacts produced by DNA polymerase slippage. Typically, the most prominent stutter bands are +1 and - 1 repeat (e.g., + or - 2 bp for a di-nucleotide repeat), and, if visible, the next most prominent stutter bands are +2 and -2 repeats.

Weaknesses
The development of SSRs is labor intensiveNO in sequence-based SSR development) . SSR marker development costs are very high. SSR markers are taxa specific. Start-up costs are high for automated SSR assay methods. Developing PCR multiplexes is difficult and expensive. Some markers may not multiplex.

Single Nucleotide Polymorphisms


SNP is the molecular basis for most phenotypic differences between individuals SNP is the most common genetic variations. SNPs are highly abundant, stable and distributed throughout the genome SNP assay is amenable to automation and high throughput. SNP is biallelic.

GATTTAGATCGCGATAGAG GATTTAGATCTCGATAGAG

Single Nucleotide Polymorphisms


SNPs in intergenic regions may Have no genetic effect Affect genetic regulatory signals Interfere with RNA splice sites SNPs in Coding regions (cSNP) may Synonymously change the codon of an amino acid, which may have no further effect, or may influence e.g. codon bias. non-synonymously alter the encoded amino acid (nsSNP) by a conservative exchange, or nonconservative (radical) mutation.

SNP Variation in Maize and Soybean %


40 35 30 25 20 15 10 5 0 Maize Soy CT GA GC AC GT AT Del

Frequency of Candidate SNPs from Different Sources in Maize and Soy


Region EST (5end) Genomic 3UTR Maize 1/1.5kb 1/640bp 1/441bp Soy 1/1.9kb 1/750bp 1/416bp

SNP/250bp

SNP/268bp

SNP/236bp

SNP/243bp

16.5% 18.2% 65.3%

23.5% 21.8% 54.7%

14.3% 16.3% 69.4%

23.3% 22.4% 54.3%

SNPs Discovery
1. Sequence databases searches 2. Target specific SNP discovery and development -Conformation-based mutation scanning -Direct DNA sequencing

Identify SNP from Sequence Databases

Identification of Target Specific SNPs


Steps: 1. Amplify the genes of interests with PCR 2. Scan for mutation with various methods -Conformation-based mutation scanning
- Single -strand conformation polymorphism analysis - Gel electrophoresis - Chemical and enzymatic mismatch cleavage detection - Denaturing gradient gel electrophoresis - Denaturing HPLC

3. Sequence positive PCR products -Sequence multiple individuals


-Sequence heterozygotes

4. Align sequences from different sources to find SNPs

Technologies for Detecting Known SNPs


Gel-Based Methods -PCR-restriction fragment length polymorphism analysis -PCR-based allelic specific amplification -Oligonucleotide ligation assay genotyping -Minisequencing(10~20base) Non-Gel-Based High Through Genotyping Technologies -Solution hybridization using fluorescence dyes -Allelic specific ligation -Allelic specific nucleotide incorporation 1. High resolution separation 2. Chemical color reaction -DNA microarray genotyping

Oligo Ligation AssayOLA

Two allele-specific oligonucleotide probes (one specific for the wild-type allele and the other specific for the variant allele) and a fluorescent common probe are used in each assay. The 3' ends of the allele-specific probes are immediately adjacent to the 5' end of the common probe. In the presence of thermally stable DNA ligase, ligation of the fluorescently labeled probe to the allele-specific probe(s) occurs only when there is a perfect match between the variant or the wild-type probe and the PCR product template. These ligation products are then separated by electrophoresis, which permits the recognition of the wild-type genotypes, the variants, the heterozygotes, and the unligated probes.

Allele-Specific Codominant PCR Strategy


Figure. Schematic representation of the allelespecific codominant PCR strategy. Oligonucleotide primers with 3' nucleotides that correspond to an SNP site are used to preferentially amplify specific alleles. A, Primer P1 forms a perfect match with allele 1 but forms a mismatch at the 3' terminus with the DNA sequence of allele 2. Primer P2 similarly forms a perfect match with allele 2 and a 3' terminus mismatch with allele 1. B, Schematic of agarose gel analysis showing the expected outcome for the amplification of organisms homozygous and heterozygous for both alleles using primers P1 and P2. P1, Primer 1; P2, primer 2; A1, allele 1; A2, allele 2. Eliana Drenkard et al. 2000 Plant Physiol 124: 1483-1492

SNP Detection Allele Specific Oligohybridization

Principle: A 1 bp mismatch in the center of a 15mer will change the T m by 5 - 10 degrees, therefore a SNP in the middle of a 15mer can be genotyped using paired ASOs. PCR amplify target gene (different individual) in 96 well format Prepare dot-blot on nylon filter Hybridize to allele-specific 15mer and detect the signal Wash at stringency temperature Repeat for alternate allele and other SNPs

Single-Strand Conformation Polymorphism Analysis


Single-stranded DNAs are generated by denaturation of the PCR products and separated on a nondenaturing polyacrylamide gel. A fragment with a single-base modification generally forms a different conformer and migrates differently when compared with wild-type DNA.
Size <200bp, Accuracy: 70%-95% Size >400bp, Accuracy: 50% 1% false positive

SNP Genotyping Using Oligo Chip


T genotype

Oligo Chip: a set of 15nucleotide probes, which consist of different sets of probes overlapped each other, 14 nucleotides were overlapped, among the four probes in one set, the sequences are almost the same except one A/G/C/T

C genotype

http://www.ricesnp.org/index.aspx##

Direct Sequencing - New Sequencing Technology


Pyrosequencing technology offers rapid and accurate genotyping, allowing for dependable SNP and mutation analysis. This technology utilizes an enzyme cascade system that results in the production of measurable light whenever a nucleotide forms a base pair with its complimentary base in a DNA template strand. Solexa/Illumina Sequencing Munroe & Harris, (2010) Third-generation sequencing fireworks at Marco Island. Nature Biotechnology 28: 426428.

Use of SNPs

1. Markers for linkage mapping-Discover SNPs contribute to agronomic traits 2. Trace origin of introgression 3. Markers for association studies (Linkage Disequilibrium) 4. Markers for population genetic analysis

Further Reading:
McNally et al., 2009. Genomewide SNP variation reveals relationships among landraces and modern varieties of rice. PNAS 106(30):12273-8. Jones et al., 2009. Development of single nucleotide polymorphism (SNP) markers for use in commercial maize (Zea mays L.) germplasm. Mol Breeding 24 (2):165-176. Varshney et al., 2007. Single nucleotide polymorphisms in rye (Secale cereale L.): discovery, frequency, and applications for genome mapping and diversity studies. TAG 114 (6): 1105-1116. Wu et al., 2010. SNP discovery by high-throughput sequencing in soybean. BMC Genomics 11:469.

Single Feature Polymorphisms (SFPs)


SFPs are a consequence either of insertions/deletion (InDel) polymorphisms or represent multiple SNPs across the complementary sequences. SFPs identified through hybridization of genomic DNA to whole-genome tiling arrays (i.e., Affymetrix Genechips) or home-made microarray. References Yeast: Wodicka, L., H. Dong, M. Mittmann, M.H. Ho, and D.J. Lockhart. 1997. Nat Biotechnol 15: 1359-1367. Arabidopsis: Borevitz, J.O., D. Liang, D. Plouffe, H.S. Chang, T. Zhu, D. Weigel, C.C. Berry, E. Winzeler, and J. Chory. 2003. Genome Res 13: 513-523.

Further reading: Kumar et al., 2007. Single Feature Polymorphism Discovery in Rice. Plos ONE, 2(3): e284

Principle of Microarray-based genotyping of Single Feature Polymorphisms (SFPs) by Oligo Chip.

A genotype

B genotype

A/B genotype

http://cropwiki.irri.org/gc p/images/6/61/Single_Fe ature_Polymorphism.pdf

Classification of DNA Markers


A. Mutation at restriction sites (RFLP, CAPS, AFLP) or PCR primer sites (RAPD, DAF, AP-PCR, SSR, ISSR) B. Insertion or deletion between restriction sites (RFLP, CAPS, AFLP) or PCR primer sites (RAPD, DAF, AP-PCR, SSR, ISSR) C. Changes in the number of repeat unit between restriction sites or PCR primer sites: SSR, VNTR, ISSR D. Mutations at single nucleotides: SNP

Summary of Common Molecular Markers


Single Locus
RFLP (restriction fragment length polymorphism) CAPS (cleaved amplified polymorphic sequences) SSLP (simple sequence length polymorphism) ---- VNTR (variable number of tandem repeat) ---- SSR/STR (simple sequence repeats/tandem repeats) SCAR (Sequence characterized amplified region) SNP (Single nucleotide polymorphism) ---- DASH (dynamic allele-specific hybridization) ---- SSCP (single strand conformation polymorphism) Hybridization Conformation Hybridization or PCR PCR PCR

Detection
Hybridization PCR

Summary of Common Molecular Markers


Multiple Loci
AFLP (amplified fragment length polymorphism) RAPD (random amplified polymorphic DNA) AP-PCR (arbitrarily primed-PCR) DAF (DNA amplification fingerprinting) SSLP (simple sequence length polymorphism) when multiple pairs of primers were used ISSR (inter-simple sequence repeat) SNP (Single nucleotide polymorphism) -- SSCP (single strand conformation polymorphism) when used to scan for randomly located SNPs Conformation PCR

Detection
PCR PCR PCR PCR PCR

Conclusion
All molecular markers are not equal. None is ideal. Some are better for some purposes than others. However, all are generally preferable to morphological markers for mapping and marker assisted selection.

Thanks For Your Attention!

You might also like