You are on page 1of 11

80A

Statistical evaluation in
forensic DNA typing
by

Henry Roberts
Aimee Pollett
[We are indebted to numerous people for communicating their
ideas to us. Sections of this chapter are based on material
presented in particular by J.S. Buckleton, B. Budowle, I.W.
Evett and B.S. Weir in international forums and informal
discussions. Suggestions by J.S Buckleton, S.J. Gutowski, C.M.
Triggs and B.S. Weir greatly improved the first edition of this
Chapter.]

THOMSON REUTERS

80A - 1

Update: 59

EXPERT EVIDENCE

Author information
Aimee Pollett is a forensic scientist in the Victoria Police Forensic Services
Department. She gained her Bachelor of Science degree majoring in Biochemistry and
Molecular Biology at the University of Melbourne in 2000. She completed a
Postgraduate Diploma in forensic science at LaTrobe University in 2002. In 2003, she
started her employment within the Biology division of the Victoria Police Forensic
Services Centre at Macleod, Victoria. In 2008, she completed a Postgraduate course
entitled Biostatistics for Forensic DNA Profile Interpretation offered by the University
of Washington.
Dr Henry Roberts is a forensic scientist in the Victoria Police Forensic Services
Department. He gained his Bachelor of Arts degree in Biology with Chemistry at the
University of York (United Kingdom) in 1969. He completed a Doctor of Philosophy
degree at the University of Oxford in 1971, working in protein chemistry. He has 30
years experience in the areas of forensic biology and forensic chemistry. From 1988 to
2000 he was head of the VPFSD DNA analysis laboratory. His current position is
leader of the DNA Interpretation and Statistics Unit. He is a member of the Australasian
Scientific Working Group Forensic DNA Statistics (STATSWG). He is author or
co-author of 12 papers in the scientific literature on the subjects of biochemistry and
the use of DNA profiling in forensic science.
Ms Pollett and Dr Roberts may be contacted at:
DNA Interpretation and Statistics, Unit Biological Examination Branch, Victoria Police
Forensic Services Department, 31 Forensic Drive, MACLEOD VIC 3085 AUSTRALIA
Telephone: 61 (03) 9450 3444 Fax: 61 (03) 9450 3601
Aimee.pollett@police.vic.gov.au; Henry.Roberts@police.vic.gov.au
COPYRIGHT AND INFRINGEMENT NOTICE
All rights reserved under Australian and International Copyright Conventions. No part
of this work covered by Copyright may be used, reproduced or copied in any form or by
any means (graphic, electronic or mechanical, including photocopying, recording,
record taping, or information retrieval systems) without the written permission of the
Victoria Police Force. Copyright in this work has worldwide protection, and any
unauthorised use, reproduction or copy of this work may be an infringement of
copyright which the Victoria Police Force is entitled to prevent.

THOMSON REUTERS

80A - 2

Expert Evidence

TABLE OF CONTENTS

INTRODUCTION ............................................................................................................ [80A.10]


Why do we need statistics to interpret DNA profiles? ...................................................... [80A.10]
What does a DNA profile tell us? ....................................................................... [80A.20]
What kind of information does a DNA profile NOT provide? ............................. [80A.30]
Laboratory error............................................................................................................... [80A.100]
Match probability ............................................................................................................. [80A.200]
Can we ever be certain that the suspect is the source of the DNA at the crime
scene?.......................................................................................................... [80A.210]
If alleles are not unique, are whole profiles unique? ....................................... [80A.220]
Is DNA typing different from other comparative forensic techniques in this
respect? ....................................................................................................... [80A.230]
Approaches to solving the match probability problem.................................................... [80A.300]
What assumptions are made?.......................................................................... [80A.310]
Which population? ............................................................................................ [80A.320]
Drawing conclusions from a database ............................................................. [80A.330]
PROBABILITY THEORY ......................................................................................... [80A.1000]
Probability...................................................................................................................... [80A.1000]
Estimating allele frequency ............................................................................. [80A.1110]
Laws of Probability ........................................................................................................ [80A.1210]
First law of probability..................................................................................... [80A.1210]
Second law of probability ............................................................................... [80A.1220]
Joint probabilities ............................................................................................ [80A.1230]
Conditional probabilities - Third Law of Probability ........................................ [80A.1240]
Likelihood Ratios ........................................................................................................... [80A.1300]
Genotype frequency and match probability .................................................................. [80A.1400]
FALLACIES AND FANTASIES .............................................................................. [80A.2100]
The danger of misusing statistics ................................................................................. [80A.2100]
Prosecutors fallacy ......................................................................................... [80A.2110]
Defence attorneys fallacy .............................................................................. [80A.2120]
The meaning of frequencies ........................................................................... [80A.2130]
Database searches ......................................................................................... [80A.2140]
Uniqueness and individualisation ................................................................... [80A.2150]
Verbal scales ................................................................................................................. [80A.2200]
DATABASES.............................................................................................................. [80A.3000]
Sample selection ........................................................................................................... [80A.3100]
Making estimates from population samples.................................................................. [80A.3200]
Sampling uncertainty..................................................................................................... [80A.3300]
Confidence limits ............................................................................................ [80A.3310]
The factor of 10 rule..................................................................................... [80A.3320]
Bootstrap......................................................................................................... [80A.3330]
Bayesian support interval or size bias correction........................................... [80A.3340]
Highest posterior density ................................................................................ [80A.3350]
Comparison of methods to estimate sampling effects ................................... [80A.3360]
MODELLING THE POPULATION.......................................................................... [80A.4100]
A simple model.............................................................................................................. [80A.4100]
Testing the model............................................................................................ [80A.4110]
Chi-square test ............................................................................................... [80A.4120]
Exact tests ...................................................................................................... [80A.4130]
Conclusions from testing ................................................................................ [80A.4140]
Subpopulation theory .................................................................................................... [80A.4200]
Modelling subpopulations ............................................................................... [80A.4210]
The sampling formula ..................................................................................... [80A.4220]
Heterozygote probability ................................................................................. [80A.4230]
Homozygote probability .................................................................................. [80A.4240]
Subpopulation theory and linkage between loci............................................. [80A.4250]
THOMSON REUTERS

80A - 3

Update: 59

EXPERT EVIDENCE

What value of to use ................................................................................... [80A.4260]


Nonconcordances ......................................................................................................... [80A.4300]
Considering drop-out in an assumed single source profile............................ [80A.4310]
MORE COMPLEX PROBABILITIES ..................................................................... [80A.5100]
Mixtures ......................................................................................................................... [80A.5100]
Identifying the presence of a mixture ............................................................. [80A.5110]
Characteristics of single-source DNA profiles ................................................ [80A.5120]
Procedure for the interpretation of mixtures................................................... [80A.5130]
Simple mixed stain example........................................................................... [80A.5140]
Random Man Not Excluded ........................................................................... [80A.5150]
Application of subpopulation theory to mixtures ............................................ [80A.5160]
Complex mixtures ........................................................................................... [80A.5170]
Implementing guidelines for mixture interpretation......................................... [80A.5180]
Resolving 2-person mixtures .......................................................................... [80A.5190]
Unresolvable two-person mixtures ................................................................. [80A.5200]
Low-level profiles with the possibility of dropout ............................................ [80A.5210]
Mixtures of DNA from more than two people ................................................. [80A.5220]
Paternity calculations .................................................................................................... [80A.5300]
Paternity trio.................................................................................................... [80A.5310]
Exclusions ....................................................................................................... [80A.5320]
Missing persons ............................................................................................................ [80A.5400]
Family tree ...................................................................................................... [80A.5410]
Comparison of DNA profiles ........................................................................... [80A.5420]
Relatives........................................................................................................................ [80A.5500]

THOMSON REUTERS

80A - 4

Expert Evidence

TABLE OF CONTENTS

Abbreviations
df

degrees of freedom

DNA

deoxyribonucleic acid

FST

Wrights Fixation Index

IBD

identical by descent

LR

Likelihood Ratio

NAFIS

National Automated Fingerprint Identification System

NRC

National Research Council

PCR

polymerase chain reaction

POI

person of interest

RFLP

restriction fragment length polymorphism

RFU

Relative fluorescence unit

RMNE

Random man not excluded

STR

short tandem repeat

co-ancestry coefficient

[The next text page is 80A-7]


THOMSON REUTERS

80A - 5

Update: 59

EXPERT EVIDENCE

THOMSON REUTERS

80A - 6

Expert Evidence

GLOSSARY

Glossary
allele one of two or more different forms of a gene or DNA sequence at a genetic locus that
can exist on different chromosomes
allele frequency the number of occurrences of a particular allele among the profiles of
individuals within a particular database
Bayes theorem a mathematical formula used for calculating conditional probabilities. It
figures prominently in Bayesian approaches to statistics
co-ancestry coefficient see FST
concordance an allele in an evidentiary sample that matches a corresponding allele in a
person of interests profile
confidence interval an interval which is expected to include the unknown true value of a
particular parameter, a specified proportion of the time
constrained model a model that utilises peak height information and/or mixture proportion
rules to exclude genotype combinations based on those that do not meet acceptable thresholds
within mixed DNA profiles
cumulative density function a statistical distribution that describes the area under the
curve of a probability density function. It measures probability of a particular variable
database a list of DNA profiles obtained from a collection of individuals in a group or
population
drop-out a phenomenon where an allele may not be detected due to low levels of template
DNA in a sample
ethnic group a group of people whose members have common ancestral origin
explicable non-concordance the absence of a person of interests allele in an evidentiary
profile that can be explained by known phenomena such as drop-out or somatic mutation
leading to extreme peak height imbalance. This type of non-concordance leads to
non-exclusions
FST or more or less interchangeable terms that describe the relatedness of individuals
within a population
genetic drift the tendency for the genetic makeup of a population to change with time
owing to the random nature of inheritance of alleles, and the consequent finite probability of
some alleles becoming rare or even extinct in the population simply because they failed by
chance to be passed from one generation to another
genotype characterisation of an individuals alleles at a particular site on their DNA
Hardy-Weinberg Equilibrium the observation that the proportions of the various
genotypes of a particular locus are the same in successive generations in a population
highest posterior density a statistical method that is used to account for the uncertainty
which arises as a result of using a sample of a population to make estimates about the whole
population. The method generates an interval which captures the most probable values of a
particular variable such as allele frequency.

THOMSON REUTERS

80A - 7

Update: 59

EXPERT EVIDENCE

inexplicable non-concordance the absence of a person of interests allele in an evidentiary


profile that cannot be explained in terms of drop-out or the number of contributors proposed. In
other words, the absence of an allele when it would be expected to be present if the person of
interest was a contributor to the evidentiary sample. This type of non-concordance leads to
exclusions.
intron a portion of the gene not translated into protein; an intervening or non-coding
sequence
Likelihood Ratio a mathematical equation that gives that probability of the evidence
occurring given two alternative propositions, usually the prosecution hypothesis and defence
hypothesis. For single-source profiles the Likelihood Ratio is the inverse of the match
probability.
linkage equilibrium a state in which multilocus genotype proportions are the same in
successive generations in a population; where there is statistical independence between alleles
at different loci and where the genotype at one locus does not influence the probability of a
genotype at another
match/matches the situation where a person of interests alleles are the same as those in the
evidentiary profile. This means that the person of interest is not excluded as being the source of
the DNA.
match probability the likelihood that a second person from some population possesses the
same single-source DNA profile
mean the mathematical average of a set of numbers
mixture proportion the relative proportions of DNA from the individual contributors to a
mixed DNA profile
multinomial distribution a statistical formula that gives the probability of the possible
results of an experiment with repeated trials in which each trial can result in a specified
number of outcomes that is greater than two, eg the results of tossing two dice, because each
die can land on one of six possible values
mutually exclusive a statistical term used to describe two or more possible alternative
outcomes where in reality only one outcome can occur (a situation where the occurrence of
one event is not influenced or caused by another event). In addition, it is impossible for
mutually exclusive events to occur at the same time.
non-coding sections of the DNA that are not translated into protein
non-concordance an allele in a person of interest that is not present in an evidentiary
profile
normal distribution a statistical distribution which plots all of its values in a symmetrical
fashion and therefore follows a bell-shaped curve. In a normal distribution, the shape of the
curve is completely described by the mean and the variance.
peak height ratio the ratio of the intensities of two heterozygote peaks (smaller peak
divided by the larger peak)
population genetics the study of the frequency of genes and alleles in various populations
probability density function a statistical distribution that describes the probability that a
variable may take on a range of values
THOMSON REUTERS

80A - 8

Expert Evidence

GLOSSARY

probability interval an interval which is expected to include the unknown true value of a
particular parameter, with an associated probability
product rule a model that is used to evaluate the strength of a DNA match that involves
multiplying alleles frequencies to obtain locus genotype frequencies, and to multiply these to
estimate the frequency of the whole profile; a statistical model in which the probability of a set
of characteristics is the product of the probabilities of the individual characteristics
racial group a population genetic term used to describe one of the four major racial
classifications of humans: Caucasian, Negroid, Mongoloid (east Asian) and Australoid
random man not excluded the chance that someone selected at random (random man)
could not be excluded as a contributor to a set of alleles observed in a mixture
relative fluorescence unit the unit of measurement of the intensity of an allele
relative frequency the number of times a particular outcome is observed (counts) divided
by the total number of trials
standard deviation a measure of the spread of a set of data from its mean in a Normal
distribution. The more spread apart the data, the higher the standard deviation. Mathematically,
the standard deviation is the square root of the variance.
stutter a phenomenon which occurs during the amplification process, which generates a
small peak (one repeat unit or four base pairs) directly before or after a larger peak
unconstrained model a model that considers all possible genotype combinations within a
mixed DNA profile
Wahlund effect the observation of the genetic pool (increase in homozygotes and decrease
in heterozygotes) as a result of the mixing of two populations that differ or were once isolated,
and which do not undergo random mating

[The next text page is 80A-11]


THOMSON REUTERS

80A - 9

Update: 59

EXPERT EVIDENCE

THOMSON REUTERS

80A - 10

Expert Evidence

INTRODUCTION
Why do we need statistics to interpret DNA profiles?
[80A.10] Let us suppose that a DNA profile has been obtained from a biological sample
found at a crime scene that is believed to have been left by the perpetrator of the crime. This
profile is then compared with profiles from one or more reference samples from individuals
who are considered to be possible sources of this material.

[80A.20] What does a DNA profile tell us?


1. We can eliminate people whose DNA profile characteristics (alleles) are not present
when we would expect to find them.
2. Conversely, a person whose DNA profile matches the profile of the crime scene
sample is not excluded as a source of the biological material in question.

[80A.30] What kind of information does a DNA profile NOT


provide?
1. It does not identify the suspect as the source of crime scene material that he matches,
because we cannot be sure that no-one else has the same set of matching
characteristics (alleles).
2. It tells us nothing about how or when the DNA came to be at the crime scene.
3. In particular it does not tell us who else could have been the source of the DNA.
There are several possible explanations for a match between two DNA profiles:
1. The samples come from the same person.
2. The crime scene sample comes from another individual whose profile matches by
chance.
3. The profile matches because it comes from a close relative.
4. A laboratory error occurred.
The second and third explanations are the main focus of this chapter. However, to put the
debate in context, it is first necessary to consider how the possibility of an error may be
handled.

[The next text page is 80A-103]


THOMSON REUTERS

80A - 11

Update: 59

You might also like