Journal of Geochemical Exploration, 3(1974)129--149

Elsevier Scientific Publishing Company, Amst erdam -- Printed in The Netherlands

Department of Geological Sciences, University of British Columbia, Vancouver, B.C.
(Accept ed for publication November 7, 1973)
Sinclair, A.J., 1974. Selection of threshold values in geochemical data using probability
graphs. J. Geochem. Explor., 3: 129--149.
A method of choosing threshold values between anomalous and background geo-
chemical data, based on partitioning a cumulative probability plot of the data is described.
The procedure is somewhat arbitrary but provides a fundamental grouping of data values.
Several practical examples of real data sets that range in complexity from a single popula-
tion to four populations are discussed in detail to illustrate the procedure.
The method is not restricted to the choice of thresholds between anomalous and back-
ground populations but is much more general in nature. It can be applied to any polymodal
distribution containing adequate values and populations with appropriate density
distribution. As a rule such distributions for geochemical data closely approach a lognormal
model. Two examples of the more general application of the method are described.
Tennant and Whi t e ( 1959) wer e among t he first t o r ecogni ze t he useful ness
of pr obabi l i t y graph paper f or conci se visual r epr es ent at i on of geochemi cal
dat a. Si nce t he appear ance of t hei r publ i cat i on pr obabi l i t y paper has been
used s ome wha t spasmodi cal l y, but wi t h increasing regul ari t y f or graphi cal
r epr es ent at i on and anal ysi s of many t ypes of geochemi cal dat a. In part i cul ar,
Williams ( 1967) and Lepel t i er ( 1969) have emphasi zed t he ease wi t h whi ch
such pl ot s can be used f or rapi d, graphi cal anal ysi s of large quant i t i es of dat a.
Bol vi ken ( 1971) st at es t hat pr obabi l i t y graphs are now used r out i nel y by t he
Nor wegi an Geol ogi cal Sur vey as an aid in i nt erpret i ng geochemi cal anal yt i cal
results. Woods wor t h ( 1972) makes ext ensi ve use of pr obabi l i t y pl ot s as t he
basis f or a t hor ough st at i st i cal anal ysi s of a bout 2000 r econnai ssance st r eam
sedi ment anal yses f r om an expl or at i on pr ogr am in cent ral British Col umbi a.
Numer ous ot her exampl es coul d be ci t ed. None of t hese papers, however ,
t r eat s in det ai l t he pr obl em of usef ul and ef f i ci ent sel ect i on of t hr eshol d
Thr eshol d is a t er m used t hr oughout t he mi neral expl or at i on i ndust r y t o
signify a specific value t hat effect i vel y separates high and low dat a values of
f undament al l y di fferent charact er t hat refl ect di fferent causes. Commonl y,
t he t erm is applied t o a value t hat distinguishes an upper or anomal ous dat a
set from a l ower or background set. For many t ypes of data, part i cul arl y
t hose of a geochemical nat ure, anomal ous values are related t o mineralized
rock. Consequent l y, t he choi ce of a t hreshol d value has considerable impor-
t ance in directing expl orat i on t o specific anomal ous sample sites where t he
chances of discovery of an economi c mineral deposi t are greatly enhanced.
Threshol ds in geochemi cal dat a are chosen in a variety of ways. A met hod
r ecommended in several publ i cat i ons involves t he est i mat i on of t he mean and
st andard deviation of a dat a set with an arbi t rary choice of a t hreshol d at a
value correspondi ng t o the mean plus t wo standard deviations (see Hawkes
and Webb, 1962; Lepeltier, 1969). In some cases this procedure might be
adequat e but it ignores t he fact t hat no a priori reason exists for exact l y t he
upper 21~% of every data set being anomalous. Fur t her mor e, t he met hod does
not take i nt o account adequat el y, t he fact t hat anomal ous and background
popul at i ons have fairly extensive ranges of overlap in some cases, and as t hey
are t wo popul at i ons t he mean and standard deviation derived from t he whol e
dat a set really have no statistical validity and are j ust numbers. These failings
are recogni zed by many field pract i t i oners who rely on subjective visual
exami nat i on of histograms of dat a sets t o choose t hreshol d values.
A t hi rd approach is t o defi ne t hreshol ds at poi nt s of maxi mum curvat ure
in cumul at i ve probabi l i t y plots (e.g. Woodswort h, 1972). The pr ocedur e
entails approxi mat i ng segments of a probabi l i t y curve by straight lines and
picking t hreshol d values at ordi nat e levels t hat correspond t o intersections of
these "l i near " segments. At best, this met hod is approxi mat e, at worst it can
result in a high pr opor t i on of anomal ous values going unrecognized.
Obviously, a pr ocedur e is desirable for choosing t hreshol d values t hat
maximizes t he likelihood of recogni t i on of anomal ous values and minimizes
t he number of background values included with anomal ous data. Cumulative
probabi l i t y plots provide an effective graphical means of meet i ng t hese ends.
Ari t hmet i c probabi l i t y paper is a special kind of commerci al l y available
graph paper generally designed with an ari t hmet i c ordi nat e scale and an
unusual abscissa scale of probabi l i t y (or cumulative frequency percent )
arranged such t hat a normal (gaussian) cumul at i ve di st ri but i on plots as a
straight line. Lognormal probabi l i t y paper differs onl y in t hat t he ordi nat e
scale is logarithmic. Ari t hmet i c values of a single lognormal di st ri but i on
grouped in exact l y t he same manner as requi red for the const ruct i on of a
cumul at i ve histogram, pl ot as a straight line on log probabi l i t y paper. A
bi modal di st ri but i on consisting of t wo lognormal popul at i ons plots as a curve.
Exampl es of a single l ognormal di st ri but i on and bi modal l ognormal distribu-
tions are shown in Fig.1. In these examples, and t hr oughout t he remai nder
JO 00 W , i x ' ~
5 O O
3 0 C
iO0 Of"
3C C~
2 ~0 30 50 70 90 98 99
Fig.l. Examples of unimoda] and bimoda! real distributions plotted on logarithmic
probability paper.
of this paper, values are cumulated for plotting by starting at the upper or
high value end (cf. Lepeltier, 1969). The probability scale is taken as the
abscissa because most commercially available probability paper in North
America is arranged in this manner.
There are numerous advantages to probability plots t hat are wort h noting
(1) The form of density distribution of a dat a set can be examined.
(2) Parameters of normal and lognormal popul at i ons can be estimated
rapidly and with adequat e accuracy for most sets of geochemical data.
(3) Several data sets can be represented on a single graph with much greater
clarity t han multiple histograms.
(4) Plots of several dat a sets can be compared visually for rapid recognition
of similarities or differences.
Addi t i onal advantages resulting from the ability to part i t i on pol ymodal
distributions into their individual populations will become apparent in
examples presented later. Of course, there are limitations to these plots as
well, t hat must be recognized: (1) dat a might not have normal or lognormal
distributions; (2) const ruct i on of a probability graph normally requires a
mi ni mum of about 100 values, although techniques are available for dealing
with fewer dat a (see Koch and Link, 1970); (3) scatter of dat a on a probability
plot can be t oo great to permit a confi dent analysis of the data.
Despite these limitations a high proport i on of geochemical data sets can be
analysed usefully and confi dent l y on probability graph paper.
Partitioning refers to met hods used to extract individual popul at i ons from
a pol ymodal distribution consisting of a combi nat i on of t wo or more popula-
tions. The met hods are not well described in the literature but are referred to,
or implied by various writers (e.g., Harding, 1949; Bolviken, 1971). Cassie
(1954) and Williams (1967) describe partitioning procedures briefly but their
publications are not widely available. Consider t he case of a bimodal distribu-
tion: providing t hat populations in the data set have normal or lognormal
density distributions and are pl ot t ed on appropriate probability paper, an
estimate of their proport i ons is given by an inflection point or change in
direction of curvature on the probability curve (Harding, 1949). For example,
in Fig.2, an inflection poi nt at the 20 cumulative percentile, indicated by an
arrow, shows t he presence of 20% of a higher popul at i on A, and 80% of a
lower popul at i on B. The form of t he curve is characteristic of t wo overlapping
populations, a relatively gently sloping central segment indicating considerable
overlap of t he t wo.
200 ~ ] ~ i ,
,oo ~ ~ ~ A
8o -:~L ~ - ~ - - - .
" ' 7 8 ~ ,
E 60
~ ~o . . . . . . . . . . . . . . . . . . ~- . ~__ . . . . . . . . . . . . _- ~
A = B I
2 0 I I I I L i I
-5 2 IO 30 50 70 90 98 9~5
PROBABILITY ( cur n. %)
Fi g. 2. T wo i deal i ze h y p o t h e t i c a l p o p u l a t i o n s A and B ar e c o mb i n e d i n t he p r o p o r t i o n s
A / B = 20/ 80 t o p r o d u c e t he i n t e r me d i a t e cur ved d i s t r i b u t i o n d r a wn t h r o u g h cal cul at ed
points shown as solid dots. An inflection point is shown by the arrowhead. Arbitrary
thresholds at the 1% level of B popul at i on and the 99% level of A popul at i on correspond
to 78 and 44 ppm, respectively.
The uppermost pl ot t ed poi nt on t he curve at t he 180-ppm ordinate level
represents 1% of t he total data. However, it also represents (1/20 X 100) = 5
cumulative percent of popul at i on A because at this ext remi t y of the dat a set
there is no effective cont ri but i on from popul at i on B. Consequently, a poi nt
on A popul at i on is defined at 5 cumulative percent on t he 180-ppm level. In
the same manner, t he poi nt pl ot t ed on the curve at t he 150 ordi nat e level
represents (2. 6/20 X 100) = 13 cumulative percent of popul at i on A and a
1 3 3
second poi nt on popul at i on A is obtained. This procedure is repeated unVil
sufficient points are obtained t o define popul at i on A by a straight line or
until t he repl ot t ed points begin to depart from a linear pat t ern indicating
t hat popul at i on B is present in significant amount s. When sufficient points
are obtained, a line is drawn t hrough t hem as an estimate of popul at i on A.
Popul at i on B can be obtained in precisely the same way, providing the
probability scale is read as compl ement ary values, e.g., 90 cumulative percent
is read as (100 - 90) = 10 cumulative percent. Calculated points for bot h A
and B popul at i ons are shown as open circles in Fig.2.
Validity of t he t wo-popul at i on model can be checked by combining t hem
in t he proport i ons 20% A and 80% B at various ordinate levels. In this
hypot het i cal example, check points are not shown because it has been con-
structed ideally. Throughout the remainder of the paper, however, check
calculations are indicated by open triangles. The checking procedure involves
the calculation of ideal combi nat i ons of the partitioned populations at various
ordinate levels using t he relationship PM = f APA + fBPB where PM, the
probability of the "mi xt ur e", is to be calculated (see Bolviken, 1971); PA and
PB are cumulative probabilities of popul at i ons A and B read from t he graph
at a specified ordinate level; fA is the proport i on of popul at i on A, and fB =
1 - fA is t he proport i on of popul at i on B. In practice, several trials might be
necessary to obt ai n a good fit of the ideal mi xt ure with t he real dat a because
of t he di ffi cul t y in defining t he inflection poi nt accurately. In most cases, t he
partitioning procedure is as straight forward as outlined. In ot her cases, a
slight modi fi cat i on is necessary when dealing with real dat a as will become
apparent in some of the examples t hat follow.
Partitioning of pol ymodal curves containing three or more populations is
somewhat more complex but is done in an analogous way, proceeding in
stages. Generally, partitioning begins with the populations represented by t he
extremities of the probability curve, followed by partitioning of more centrally
located populations.
Note t hat in this idealized example, parameters of the individual partitioned
popul at i ons can be estimated. The geometric mean of each can be read at the
50 percentile and t he range including 68% of the values can be det ermi ned at
the 84 and 16 cumulative percentiles. This range encompassing 2 standard
deviations is asymmet ri c about the geometric mean. The met hod of represen-
t at i on adopt ed here is to quot e t he geometric mean, followed in brackets by
t he range t hat includes 68% of t he values. These parameters for the part i t i oned
popul at i ons A and B are 100 (144, 71) and 42 (55, 33), respectively.
Estimates of t he arithmetic mean and variance can be obtained from this
i nformat i on as described by Krumbei n and Graybill (1965), but normal l y are
not required.
The hypot het i cal example in Fig.2 illustrates a common general situation
of high and low popul at i ons with an effective range of overlap. If no significant
overlap of values existed, t he cent ral moder at el y steep segment of t he curve
woul d be nearly vertical and a single t hreshol d coul d be chosen rapi dl y at its
mi d-poi nt . In t he general case, however, choi ce of t hreshol ds is mor e compl ex.
Consider 2 t hreshol ds chosen arbitrarily at t he 99 and 1 cumul at i ve
percentiles of the part i t i oned popul at i ons A and B, respectively of Fig.2
(recall t hat A and B are present in t he ratio A/B = 20/ 80). These percentiles
divide t he data i nt o 3 groups at t he 44- and 78-ppm ordi nat e levels. 16% of
t he t ot al dat a is above t he upper t hreshol d of 78 ppm. In a hypot het i cal
sample of 100 values, this upper group woul d consist appr oxi mat el y of 15
values f r om A popul at i on and 1 value f r om B popul at i on. The l ower group
bel ow 44 ppm cont ai ns 46% of t he t ot al data. It consists of 1% of popul at i on
A (at most , 1 value in this case) and 57% of popul at i on B (about 46 values).
The i nt ermedi at e group bet ween t he t wo t hreshol ds cont ai ns about 38% of
t he t ot al dat a consisting of 42% of t he B popul at i on and 33% of t he A popu-
lation. In our hypot het i cal sample this corresponds to about 6 or 7 A values
and 33 or 34 B values (Table I).
Total data A population B population
% No.* % No.* % No.*
Group I 16 16 76 15.2 1 0.8
Group II 38 38 23 4.6 42 33.6
Group III 46 46 1 0.2 57 45.6
100 100 100.0 20.0 100 80.0
*Sample = 100 of which 20 are A and 80 are B population.
The procedure, al t hough arbi t rary, has thus divided t he dat a rat her effec-
tively into t hree groups, t wo of which cont ai n significant pr opor t i ons of t he
upper A popul at i on and a t hi rd t hat almost exclusively represent s t he l ower
B popul at i on. Let us assume for t he moment t hat A and B represent anoma-
lous and background popul at i ons, respectively. The upper group above t he
upper t hreshol d can be consi dered t op pri ori t y for follow up exami nat i on
because pract i cal l y all values are anomalous. Lower pri ori t y can be at t ached
t o values in t he i nt ermedi at e group because al t hough it cont ai ns virtually all
remaining anomal ous values, an increased amount of expl orat i on manpower
per anomal ous sample is requi red t o check t hem and sort t hem out f r om
background values in t he same range.
There is not hi ng sacrosanct about t he percentiles used t o defi ne thresholds.
In this case, values were chosen t hat corresponded with 99 and 1 cumul at i ve
percentiles of t he A and B popul at i ons, respectively. Threshol ds coul d equally
well have been defi ned by t he 98 and 2 cumul at i ve percentiles of t he
appropri at e part i t i oned popul at i ons. Whatever choice is made, it is possible
t o det ermi ne estimates of t he pr opor t i ons of each popul at i on occurring in
t he groups t hus delimited. In t he writer' s experi ence, t he t wo sets of figures
ment i oned above have proved most useful but di fferent values coul d be chosen
dependi ng on t he nat ure of t he dat a and the requi red probabi l i t y t hat all
anomal ous values be ret ai ned in t he upper t wo groups.
Not e t hat in this hypot het i cal but t ypi cal case t he choice of a t hreshol d at
t he mean plus t wo standard deviations woul d have placed most of t he anom-
alous values with background. The same effect woul d be obt ai ned wi t h a
common variation of this procedure, t he assumpt i on t hat t he upper 21/2% of
values are anomal ous. Were t he probabi l i t y curve appr oxi mat ed by t hree
linear segments, t hei r intersections woul d have provi ded thresholds at
appr oxi mat el y 103 and 55 ppm. The common pr ocedur e of adopt i ng t he
upper value as t hreshol d woul d result in rej ect i on of more t han 50% of t he
anomal ous values. Even t he choi ce of t he l ower value woul d result in rej ect i on
of about 5% of anomal ous values.
Fig.3 is a probabi l i t y graph of 173 zinc analyses of B hori zon soils t aken
on a grid pat t er n in an area of known Mo--Cu mi neral i zat i on near Tchent l o
Lake in central British Columbia. Underlying rock is a t ext ural l y and mineral-
ogically uni form, well-jointed diorite. Joi nt s are mineralized, principally with
quart z and pyri t e, but in some places mol ybdeni t e is abundant and small
amount s of chal copyri t e occur. A t hi n layer of overburden covers the area
except for sporadic out cr op knolls.
~ N:173
220 --
b = 87
b + S L = 1 4 0
b-SL= 5 5
,oc %
3 0 I I I I i I
I 0 3 0 5 0 7 0 9 0 9 8 9 9
Fig.3. PrObability plot of 173 values of Zn in B zone soils, Tchentlo Lake, B.C. Listed
parameters of the distribution were obtained from the straight line drawn through original
data points (solid dots). 95% confidence limits are shown after Lepeltier (1969).
1 3 6
The probabi l i t y pl ot is linear if one neglects slight divergences at the
extremities, t hat commonl y result from sampling error. Consequent l y, an
estimate of t he distribution can be obtained by a straight line t hrough the
pl ot t ed points. 95% confi dence limits of the popul at i on were det ermi ned
graphically (cf. Lepeltier, 1969). Woodsworth (1972) suggests t hat a useful
procedure for recognizing significant curvature in a probability graph is to
assume t he presence of a single popul at i on and const ruct its 95% confi dence
belt. Significant curvature to t he pl ot is assumed at points t hat plot outside
the zone of 95% confidence. None of the pl ot t ed points for Tchent l o Lake
dat a lie outside the band defined by t he 95% confidence limit suggesting t hat
onl y a single popul at i on is present.
In this case, the range of values and the form of the probability graph
suggest t hat the dat a represent a single background population. A wise proce-
dure, however, is to assume t hat t he few highest values are anomal ous until
proven otherwise. This is a convenient safety precaution in cases where
anomal ous values are present in t oo low proport i on to define a second popu-
lation. To standardize a procedure for dealing with such data, it is convenient
to pick an arbitrary t hreshol d at an ordinate level corresponding to the mean
plus 2 standard deviations as recommended by Hawkes and Webb (1962).
This procedure assumes t hat approxi mat el y the upper 21/~% of values are
anomal ous until shown otherwise, and should be applied only when a single
popul at i on is indicated from an exami nat i on of the probability graph. In this
example, the upper 5 zinc values were found t o plot on a plan of t he grid,
sporadically, but away from known mineralized areas.
Copper analyses for 158 stream sediment samples from the Mt. Nansen
area, Yukon Territory, are shown as a probability plot in Fig.4 (see Bianconi
and Saagar, 1971). A smoot h curve t hrough the data points has t he form of a
bimodal densi t y distribution with an inflection poi nt at the 15 cumulative
percentile. The curve was part i t i oned using the met hod described previously
to obtain popul at i ons A and B whose estimated parameters are given in
Table II. The partitioning procedure was checked at various Cu ppm levels by
combining the t wo part i t i oned populations in the proport i on of 15% A and
85% B. Check points are shown as open triangles on the Figure and are seen
to coincide with t he real dat a curve. In this case, some high values are
associated with known Cu--Mo mineralization related to porphyri t i c intrusions
and it seems reasonable to interpret t he two populations as anomal ous (A)
and background (B).
Two arbitrary threshold values can be det ermi ned readily from t he graph
at the 1.0 and 99 cumulative percentiles of the B and A populations, respec-
tively. These percentiles coincide with 70 and 37 ppm Cu, respectively. Hence,
the dat a are divided into 3 groups, an upper group of predomi nant l y anomal ous
values, a lower group of predomi nant l y background values, and an intermediate
,oo ~ % ~ Is ' /o
) =100
a. 50
JO I I , ~ I
2 I 0 30 50 ?0 9 0 9B 99
P R O B A B I L I T Y { cur e, %)
Fi g. 4. Bi mo d a l p r o b a b i l i t y p l o t of 158 Cu ' s i n s t r e a m s e d i me n t s , Mt . Na n s e n , Yu k o n .
Op e n ci r cl es ar c p a r t i t i o n i n g p o i n t s us e d t o e s t a bl i s h p o p u l a t i o n s A a n d B. Op e n t r i a ngl e s
ar e c h e c k p o i n t s o b t a i n e d b y c o mb i n i n g A a n d B i n t h e r a t i o 1 5 / 8 5 .
Es t i ma t e d p a r a me t e r s o f p a r t i t i o n e d p o p u l a t i o n s , Cu i n s t r e a m s e d i me n t s , Mt . Na n s e n a r e a
( Yu k o n Te r r i t o r y )
P o p u l a t i o n P r o p o r t i o n No. o f Va l ue s i n p p m Cu
( %) s a mp l e s
b b + s L b - s L
A: a n o ma l o u s 15 24 101 155 63
B : b a c k g r o u n d 85 134 14. 7 28. 5 7. 4
A + B 100 158
group cont ai ni ng bot h anomal ous and background values. Of t he 158 values,
about 23 are anomal ous, and 135 are background. 80% or about 18 of t he
anomal ous values are above t he 70-ppm t hreshol d; and 5 are bel ow it, for all
practical purposes, in t he i nt ermedi at e range. Of the 135 background values,
91.5% or 124 values, are bel ow t he lower t hreshol d, the remaining 11 back-
ground values are above t he l ower t hreshol d in t he i nt ermedi at e range.
Consequent l y, anomal ous values occur in onl y t wo ppm intervals t o which
priorities can be assigned for follow up expl orat i on. Virtually all values above
70 ppm are anomal ous and have t op pri ori t y. Second pri ori t y is assigned t o
t he 16 values in t he i nt ermedi at e range, about 5 of which are anomal ous.
Theoretically, individual values t hat lie bet ween the t wo thresholds cannot
be assigned to either A or B populations. Therefore, since onl y about 1 in 3
is anomal ous in this range, about three times as much work is required to
check each anomal ous sample as is required for values above 70 ppm Cu;
hence, the reason for assigning priorities to the two groups. In practice, some
of t he anomal ous values in this central range can be recognized with a fair
degree of certainty. For example, a number of t hem might be expected to
occur down stream from top priority anomal ous samples. This sort of
geographic relationship stands out particularly well if samples are colour-coded
as to group, on a plan of the sampled streams. In many cases, virtually all
samples in t he i nt ermedi at e range can be identified in this manner with a fair
degree of cert ai nt y. A comparable procedure can be used when dealing with
soil or whole rock analyses for which t wo thresholds are det ermi ned. Those
intermediate range samples t hat group geographically with known anomal ous
samples commonl y can also be considered anomalous. In this way, follow-up
exami nat i on of second priority anomalies can be cut to a mi ni mum and in
many cases avoided compl et el y.
Fig.5 is a log probability graph of 166 Ni analyses of soils obtained from a
grid superimposed on a known Cu--Ni mineralized zone. The mineral showing
is associated with ultramafic rocks enclosed in regionally met amorphosed
fine-grained clastic sedi ment ary rocks, near Hope in sout hern British Col umbi a
A smoot h curve drawn t hrough t he dat a points has t he form of at least three
popul at i ons based on inflection points at 5.5 and 25 cumulative percentiles.
The A and C popul at i ons were part i t i oned using the met hod described in a
previous section. Popul at i on B was t hen estimated using the relationship:
PM = f A P A + f B P s + f c P c
In this equat i on: fA = 0.055, fB = 0.195, fc = 0.75 and PM, PA, PC can be
read from t he graph for any ordinate level. Hence, PB is the onl y unknown
and can be estimated for various ordinate levels, pl ot t ed, and an estimate of
popul at i on B det ermi ned by passing a straight line t hrough t he calculated
points. The three part i t i oned populations A, B and C were t hen combi ned
ideally in t he proport i on: 5. 5/ 19. 5/ 75 for a number of ordinate values, to
check t he partitioning procedure. These check values are shown in Fig.5 as
open triangles t hat almost coincide wi t h the smoot h curve t hrough the
original data.
Popul at i on A is obviously not well defined as indicated by the scatter of
points about its linear estimator. The reason is t hat onl y a small proport i on
of t he t ot al dat a represents popul at i on A, t hus its estimation by partitioning
is based on very few dat a points -- four in this case. Populations B and C
appear well defined, principally because their ideal combi nat i on in t he ratio
19. 5/75. 0 agrees with t he real dat a curve. Estimated parameters of t he three
A 5"5%
b =1170
b+SL= 1380
b - S L = 9 8 0
_ B , 9 . 5 - / .
E b : 5 5 6
~. b + s L : 5 1 5
C~ 7 5 % ~ ~ b - S L - - 2 4 8
b = 5 2 ' 0
Ioc b + s = 1 0 8 ~
b - S L = 2 4 " 5
~o I i i i i
io 30 50 70 90 98
PROBABILITY ( cur e % )
Fig.5. Probai)ility plot of 166 Ni' s in soils, Hope, B.C., with 2 inflection points (indicated
by arrowheads) suggesting it results from the combi nat i on of three lognormal populations
in the ratio 5.5/19.5/75. A, B and C are the three partitioned populations estimated by
lines through the calculated points (open circles). Parameters of each popul at i on are listed.
Open triangles are check points that agree well with the original data (black dots).
popul at i ons are given in Table III. On t he basis of the part i t i oned populations,
a single threshold at 780 ppm Ni can be chosen to distinguish effectively
between popul at i ons A and B. Populations B and C overlap somewhat and
t wo thresholds must be chosen. These thresholds are arbitrarily t aken at t he
2 cumulative percentile of popul at i on C (i.e. 236 ppm Ni) and the 98 cumu-
lative percentile of popul at i on B (i.e. 170 ppm Ni).
Estimated parameters of partitioned populations~Ni in soils, Hope area (southern British
Population Proportion No. of Values in ppm Ni
(%) samples
b b +s L b- s L
A: anomalous 5.5 9 1170 1380 980
B: background 19.5 32 356 515 248
C: background 75 125 52 108 24.5
A + B + C 100 166
These t hree t hreshol d values divide t he dat a into 4 groups, 3 of which each
consist principally of a single popul at i on and a f our t h cont ai ni ng values f r om
t wo popul at i ons (Table IV). The t hreshol ds can now be used as cont our
values on a plan of t he grid, or can be used t o code dat a on a plan using
col our or symbol s, t o aid in i nt erpret i ng t he significance of each popul at i on.
In this case, popul at i on A is related t o Ni mi neral i zat i on and is t her ef or e
i nt erpret ed as an anomal ous popul at i on. Popul at i on B corresponds t o areas
underl ai n by ul t ramafi c rocks, and popul at i on C occurs in areas underl ai n by
met asedi ment ar y rocks.
Estimated thresholds, Ni in soils, Hope area (southern British Columbia)
Threshold Principal content of group
almost exclusively population A
almost exclusively population B
combination of populations B and C
almost exclusively population C
The choi ce of t hreshol ds is arbi t rary. For exampl e, one coul d equally well
have chosen t he t wo t hreshol ds for t he B and C popul at i ons at t he 1 and 99
cumul at i ve percent i l e of t he C and B popul at i ons respectively, or t he 2.5 and
97. 5 cumul at i ve percent i l e and so on. . . A choice should be made wi t h t he
idea of defining a short range of overlap of t he t wo popul at i ons, and, at t he
same time, produci ng adj acent ranges t hat t o all intents and purposes cont ai n
values of a single popul at i on, wi t h negligible or mi nor amount s of ot her
popul at i ons.
A probabi l i t y pl ot of 795 soil copper analyses is shown in Fig.6. The sinuous
charact er of t he pl ot is pr obabl y real because of t he large number of values in
t he dat a set. This t ype of dat a is characteristic of t he sort obt ai ned f r om
reconnai ssance surveys where large quant i t i es of i nformat i on are obt ai ned in
a relatively short time. The area sampled is underlain pr edomi nant l y by acid
to i nt ermedi at e intrusive bodies t hat cut a t hi ck monot onous sequence of
volcanic rocks.
Infl ect i on poi nt s are evi dent at appr oxi mat el y t he 1, 2 and 32 cumul at i ve
percent i l es indicating t he presence of at least four popul at i ons. These popula-
tions can be est i mat ed by part i t i oni ng t he curve in stages. In this case, it is
most conveni ent t o begin wi t h t he popul at i on C for whi ch most data poi nt s
I i
b + s ~ 1 4 5 I I I ~ ' = ' ~ o o l J
~c~ b-sL-- 9'6
0-5 I 0 30 50 70 90 95
PROBABI L I TY ( cur e. %)
F i g . 6 . P r o b a b i l i t y p l o t o f 7 9 5 Cu ' s i n B - h o r i z o n s o i l s , S m i t h e r s a r e a , B. C. S y m b o l s a r e as
defined for Fig.5.
are available. Once C has been defined, popul at i on D can be estimated using
C and the original dat a curve. These t wo populations can be specified reason-
ably well. The upper t wo popul at i ons A and B can be approxi mat ed roughly
but cannot be delineated with much precision because of t he small percentage
of total dat a t hat each represents and hence the small number of points
available for partitioning. Crude estimates of popul at i ons A and B are shown
based on t he limited dat a available.
A number of check points, shown as open triangles on t he curve were
calculated for the part i t i oned popul at i ons A, B, C and D, combi ned in the
ratio 1/ 1/ 30/ 68. These points agree almost perfectly with the smoot h curve
describing t he data, suggesting t hat the partitioning represents a plausible
model for the data. Estimated parameters of part i t i oned popul at i ons are listed
in Table V. Comparison of the dat a with a geological map of the sampled area
suggested t hat popul at i ons C and D represent background Cu in soils over
volcanic and plutonic rocks, respectively. By the same means, it was concluded
t hat popul at i ons A and B are anomal ous populations in areas underlain by
volcanic and pl ut oni c rocks, respectively.
In choosing thresholds for distinction between anomal ous and background
values there is no need to consider either popul at i on A or D. The critical part
of the graph is t he range of overlap of populations B and C.
We know t hat about 2% of the data, or about 16 values are anomalous. Of
these, 11 are above 100 ppm Cu as is 1 value of C popul at i on. Hence, one of
12 values above 100 ppm Cu is not anomal ous and 100 can be chosen as an
arbitrary upper threshold.
Estimated parameters of partitioned populations, Cu in soils, Smithers area (central
British Columbia)
Population Proportion No. of Val ues in ppm Cu
(%) samples
b b + s L b s b
A 1 8 135 145 128
B 1 8 100 108 93
C 30 239 42.8 57.2 32.1
D 68 540 14.8 21.8 9.6
A + B + C + D 100 795
Virtually all of t he anomal ous popul at i on is above 85 ppm Cu. Thus, t he
range 85--100 ppm Cu cont ai ns t he remaining 5 anomal ous values. This range
also cont ai ns about 1.0% of t he C background popul at i on, about 2 values.
Thus, t wo t hreshol ds are del i mi t ed t hat for all practical purposes defi ne all
anomal ous values with a mi ni mum of background values represent ed.
This exampl e illustrates several i mpor t ant poi nt s in procedure:
(1) It is wise t o carry t hrough with a compl et e part i t i oni ng pr ocedur e in
examining compl ex di st ri but i ons in order t o check the realism of t he inter-
pret at i on.
(2) Even when individual popul at i ons cannot be defi ned part i cul arl y
accurat el y, t hreshol ds can commonl y be det er mi ned with adequat e accuracy.
(3) Infl ect i on poi nt s in a probabi l i t y curve based on abundant dat a are
pr obabl y real and should f or m a basis for i nt erpret at i on.
(4) An alternative approach woul d have been t o group t he dat a i nt o t wo
subclasses based on presence of underl yi ng volcanic or pl ut oni c rock. This
pr ocedur e was not used here onl y because adequat e t hreshol ds coul d be
obt ai ned wi t hout spending addi t i onal manpower in carrying out a mor e
detailed analysis.
(5) The bot t om popul at i on, D, is reasonabl y well known despite t he fact
its part i t i oni ng was based on onl y t wo points.
The foregoing exampl es show t hat t he maj or advantage of probabi l i t y plots
is t o provi de a useful groupi ng of data. Commonl y, this grouping is not simply
for t he purpose of obt ai ni ng t hreshol ds bet ween anomal ous and background
popul at i ons -- but more generally t o derive t hreshol ds bet ween popul at i ons
t hat aid in a general i nt erpret at i on of t he significance of t he data.
pH measurement s are commonl y an integral part of stream sedi ment
surveys. A probabi l i t y pl ot of pH values f r om one such survey in sout hern
British Col umbi a is shown in Fig.7. The pl ot is on ari t hmet i c probabi l i t y
paper -- a logarithmic t ransform being i ncorporat ed in t he original data
73 _ ~ . . . . . " ~ " " "--~
A 1 6 % ~ ' ~ - ~ - ~
7 =7 . 2 0
6 9 ~ , A: B: C 69: 15
6-7 ~ = 6 " 6 9
s = + O1 5
Q - ,
6.3 Z~
~=588 ~ %
59 - s =+0. 21
2 I 0 50 50 70 90 98
PROBABI LI TY (cure. %)
Fig.7. Probability plot of pH values obtained from a st ream sediment survey in southern
British Columbia. Symbols are as defined for Fig.5.
b e c a u s e of t h e v e r y n a t u r e o f p H val ues. A s mo o t h c ur ve t h r o u g h t he d a t a has
t he f o r m o f a t r i mo d a l d i s t r i b u t i o n wi t h i nf l e c t i on p o i n t s a t t he 16 a nd 85
c u mu l a t i v e pe r c e nt i l e s . Th e c ur ve ha s b e e n p a r t i t i o n e d us i ng t he me t h o d
de s c r i be d p r e v i o u s l y t o o b t a i n p o p u l a t i o n s A, B a nd C. Ch e c k p o i n t s ba s e d o n
i deal mi x t u r e s o f t he t h r e e p o p u l a t i o n s in t he p r o p o r t i o n 1 6 / 6 9 / 1 5 agr ee
r e ma r k a b l y wel l wi t h t he r eal d a t a cur ve.
Th r e s h o l d s a r bi t r a r i l y c h o s e n a t t he 99 c u mu l a t i v e p e r c e n t i l e s o f A a nd B
p o p u l a t i o n s , a n d t h e 1 c u mu l a t i v e pe r c e nt i l e s of t he B a nd C p o p u l a t i o n s ,
p r o v i d e t h e i n f o r ma t i o n i n Ta b l e VI .
Estimated thresholds, pH values (southern British Columbia)
pH % of total data
principally population A 15
populations A + B 4.5
principally popul at i on B 64.5
principally popul at i on C 16
Thus, t he dat a can be divided i nt o four groups on t he basis of pH measure-
ment s and pri or t o furt her analysis and i nt erpret at i on. Such a grouping coul d
have f undament al significance in i nt erpret at i on of t race el ement dat a because
of t he effect of pH on met al dispersion.
The Gui chon bat hol i t h has long been known as an i mpor t ant Cu-rich
pl ut on in central British Col umbi a with several large por phyr y- t ype deposits
either produci ng or nearing pr oduct i on at t he present time. An investigation
of the whol e rock Cu cont ent of unmineralized samples scattered over t he
bat hol i t h was under t aken by Brabec and involved an analysis of t he data using
probabi l i t y graphs (Brabec and White, 1971). A probabi l i t y pl ot of t he t ot al
data, some 330 analyses, coul d not be i nt erpret ed with confi dence. However,
when data were grouped on t he basis of relative age and l i t hol ogy and each
such group pl ot t ed separately, a realistic i nt erpret at i on became possible.
Fig.8 cont ai ns probabi l i t y graphs of each of t he t hree groups, r epl ot t ed
f r om dat a of Brabec and White (1971). The general similarity of shape of t he
t hree curves suggests t hat t he grouping has fundament al significance. Each
curve has the form of a bi modal di st ri but i on. In each case, however, t he
bot t om part of t he bi modal curve is part l y missing due t o the bar interval
chosen for const r uct i on of t he probabi l i t y plots (15 ppm Cu). Assuming t hat
all di st ri but i ons are l ognormal it is possible t o part i t i on each curve using a
modi fi cat i on of t he pr ocedur e described earlier. The upper popul at i on can be
" - - GROUP TI N = I I 6
IC I I I " I \
O 5 2 tO 30 50 70 90 9?
PROBABI LI TY ( cum. %)
F i g . 8 . P r o b a b i l i t y p l o t s o f w h o l e r o c k Cu ' s f o r 3 r o c k g r o u p s o f t h e G u i c h o n b a t h o l i t h ,
central British Columbia. Group I = youngest age, Group II = intermediate age and
Group III= oldest age (after Brabec and White, 1971).
det er mi ned in t he normal manner. Points on t he l ower popul at i on are t hen
calculated using t he expression:
PM = f A P A + f BPB
PM is read from t he dat a curve, fA and fB are known f r om t he posi t i on of
t he i nfl ect i on poi nt and PA is read from t he part i t i oned popul at i on A. PB is
t he onl y unknown and can be calculated and pl ot t ed for various ordi nat e
levels. A line can t hen be passed t hrough these calculated poi nt s t o est i mat e
popul at i on B.
One exampl e is described in detail. The probabi l i t y pl ot for group II rocks
is r epr oduced in Fig.9. Some difficulties were encount er ed in specifying an
i nfl ect i on poi nt precisely, because t he t wo popul at i ons overlap t o a consider-
able ext ent . However, a series of trial values were used until t he upper popu-
lation pl ot t ed as a straight line, leading t o an i nfl ect i on being assigned at t he
80 cumul at i ve percentile. One addi t i onal probl em with t he data is a fl at t eni ng
at t he upper end of t he curve. In fact, this fl at t eni ng is present t o some ext ent
in plots for each of the 3 groups and is a characteristic pat t ern obt ai ned when
a symmet ri c popul at i on has been t op-t runcat ed. Brabec and White (1971)
arbitrarily rej ect ed a small pr opor t i on of high values f r om t hei r analysis t o
impose this artificial t op t r uncat i on on t hei r data. Since the t r uncat ed values
account for onl y about 2% of t he data, no ef f or t was made t o correct for
t hei r absence. The upper ext remi t i es of all curves, however, were ignored
during t he partitioning.
.... ' ~ ~ ' I ~I=I16
8 \ )+s,
I -
~ 0 x h L
I I 0 30 50 70 90 98 99
PROBABI L I T Y (cure. %)
Fig.9. Probability plot of 116 whole rock Cu's in Group II rocks (intermediate age) of the
Guichon batholith, central British Columbia, showing partitioned populations and their
parameters. Symbols are those defined for Fig.5.
1 4 6
O n c e t h e u p p e r p o p u l a t i o n A i s d e f i n e d , p o p u l a t i o n B c a n b e e s t i m a t e d
us i ng t h e r e l a t i ons hi p:
PM = f APA + f BPB
as de s c r i be d ear l i er . Ch e c k p o i n t s o f i deal mi x t u r e s o f p a r t i t i o n e d p o p u l a t i o n s
A a nd B, s h o wn as o p e n t r i angl es in Fi g. 9, c oi nc i de wi t h t he r eal d a t a c ur ve
e x c e p t a t t h e u p p e r t r u n c a t e d end. Pa r a me t e r s o f t h e p a r t i t i o n e d p o p u l a t i o n
f or e a c h o f t h e 3 g r o u p s ar e gi ven in Ta b l e VI I .
Estimated parameters, whole rock Cu, Guichon batholith (central British Columbia)
Lithologic Population Proportion No. of Values in ppm Cu
group samples
b b + s L b - s L
I A 60 56 98 142 68
B 40 39 26.7 46.4 15.2
A + B 100 95
II A 80 93 69 139 34.5
B 20 23 10.9 20 5.9
A + B 100 116
III A 40 28 54 85 34.5
B 60 91 10.3 20.2 5.1
A + B 100 119
Fo r g r o u p A d a t a t h r e s h o l d s can be c h o s e n a r bi t r a r i l y as t h e 98 a nd 2
c u mu l a t i v e o f p o p u l a t i o n s A a n d B. Th e s e p e r c e n t a g e s c o r r e s p o n d t o 16. 5
a nd 39 p p m Cu, r e s p e c t i v e l y a n d di vi de t he d a t a i nt o 3 gr oups . An u p p e r
g r o u p a b o v e 39 p p m Cu, cons i s t s o f 63% o f t he t o t a l d a t a a nd is es s ent i al l y
o n l y A p o p u l a t i o n . A l o we r g r o u p b e l o w 16. 5 p p m Cu cons i s t s o f a b o u t 16%
o f t he d a t a a n d f or al l pr a c t i c a l p u r p o s e s c o n t a i n o n l y B p o p u l a t i o n . Th e
r e ma i n i n g 21% o f t he d a t a is a mi x t u r e o f A a nd B p o p u l a t i o n s i n t h e
r a n g e b e t we e n t h e t wo t h r e s h o l d s . I n t hi s case, c o n s i d e r a b l e ove r l a p exi s t s
b e t we e n t h e t wo p o p u l a t i o n s . Ne ve r t he l e s s , i t is pos s i bl e t o i d e n t i f y t he
p o p u l a t i o n t o wh i c h mo s t o f t h e i ndi vi dual val ues b e l o n g a nd t hi s g r o u p i n g
c o u l d ai d c o n s i d e r a b l y i n i n t e r p r e t a t i o n o f t h e s i gni f i cance o f e a c h p o p u l a t i o n
Th u s f ar , a n i mpl i c i t a s s u mp t i o n in t h e p r o c e d u r e f or e s t i ma t i n g t h r e s h o l d s
is t h a t a na l yt i c a l val ues ar e knov, n pr e c i s e l y. I n pr a c t i c e , o f c our s e , r e c o r d e d
val ues i nc l ude a c o mb i n e d s a mp l i n g a nd a na l yt i c a l e r r or . Co n s e q u e n t l y , s o me
va l ue s a b o v e t he t h r e s h o l d a c t u a l l y b e l o n g b e l o w i t a nd vi ce ver sa. No r ma l l y
t hi s c o n f u s i o n a f f e c t s o n l y a s mal l p r o p o r t i o n o f t he da t a , b u t b e c o me s mo r e
a n d mo r e p r o n o u n c e d as t he pr e c i s i on b e c o me s p o o r e r a nd p o o r e r .
In some cases t he confusi on is minimal relative to the problem on hand and
can be ignored. More generally, however, the sampling and analytical error
should be t aken into account in defining thresholds. A convenient procedure
to achieve this end is to consider the t hreshol d a range of values centred
about t he single t hreshol d obt ai ned by assuming t hat values are perfect l y
known. The t hreshol d range is a confidence belt based on the precision of
the data. Average precision is normal l y adequate for defining such threshold
ranges. Precision, however, does vary with absolute amount of the variable
being estimated (e.g., Bolviken and Sinding-Larsen, 1973) and this can be
t aken into account where adequat e dat a are available. Such t hreshol d ranges
define narrow bands on cont our maps.
This procedure i ncreases the number of pot ent i al l y anomal ous samples and
t herefore involves additional t i me and money in checking such added samples.
These efforts can be minimized by examining the geographic positions of t he
additional samples relative to known anomal ous samples.
The met hod for choosing thresholds described here is a standardized
t echni que applicable to t he vast quant i t y of geochemical data. It can be used
for any pol ymodal distribution if sufficient dat a of adequat e quality are
present so t hat partitioning is feasible. A grouping of the dat a values is
obt ai ned t hat can be invaluable in interpretation. For this reason, t he met hod
is more fundament al and pot ent i al l y more useful t han other met hods in
common use. In particular, t he met hod outlined here stresses the concept
t hat bot h background and anomal ous values represent populations t hat in
many cases overlap (see Bolviken, 1971).
The procedure is not restricted to t he choice of thresholds between
anomal ous and background populations. It is much more general in nature,
permitting grouping of many types of dat a with appropriate densi t y distribu-
tions. In addi t i on, probability graph analysis of data is simple, rapid and
amenable to use in t he field (see Lepeltier, 1969).
Examples used to illustrate the selection of thresholds give ample evidence
of the general usefulness of probability plots in dealing with geochemical
data. This is t rue even if three or four populations are represented in the data,
al t hough, in general, simpler interpretations result if data are first grouped on
the basis of some fundament al physical or geological criterion.
(1) Geochemical analyses commonl y approxi mat e lognormal densi t y
distribution sufficiently closely t hat the distributions can be represented
usefully on lognormal probability paper.
(2) Providing a dat a set contains adequat e values, normal l y a mi ni mum of
about 100, a pol ymodal cumulative probability plot can be part i t i oned to
produce estimates of t he individual popul at i ons t hat make up t he t ot al
di st ri but i on.
(3) The part i t i oned popul at i ons can be used t o defi ne arbi t rary but
meaningful t hreshol ds t hat divide t he dat a into groups t hat have f undament al
(4) In t he special case of no effective overlap bet ween anomal ous and
background popul at i ons, a single t hreshol d can be defined. In t he common
simple case of t wo overlapping anomal ous and background popul at i ons, t wo
t hreshol ds are obt ai ned t hat divide t he data into t hree groups. An upper
group of pr edomi nant l y anomal ous values, a central group of anomal ous and
background values, and a t hi rd group of background values.
(5) Pol ymodal di st ri but i ons of geochemical dat a consisting of more t han
t wo popul at i ons can commonl y be t reat ed in t he same way as bi modal
di st ri but i ons t o obt ai n useful t hreshol d values. In some cases, however, t he
pr ocedur e can be simplified by grouping dat a on t he basis of some fundamen-
tal characteristic (e.g., pH, underl yi ng rock t ype) t o pr oduce simpler
probabi l i t y plots t hat permi t greater confi dence in part i t i oni ng and inter-
(6) The met hod described for choosing t hreshol ds is not confi ned t o t he
di st i nct i on bet ween anomal ous and background values but has general
appl i cat i on t o any t ype of data, providing t he individual popul at i ons approx-
imate l ognormal (or normal ) densi t y di st ri but i on. For t unat el y, this cri t eri on
is met in t he bulk geochemical data.
This paper is an out gr owt h of a more extensive st udy of t he use of
probabi l i t y paper in dealing with various kinds of dat a obt ai ned f r om mineral
expl orat i on programs. The st udy is suppor t ed by a grant f r om t he Depar t ment
of Energy, Mines and Resources of Canada. Technical assistance t hrough
much of t he st udy was given by Mr. A.C.L. Fox. Exampl es are drawn ent i rel y
f r om real probl ems encount er ed in i ndust ry and in university research proj ect s
Appreci at i on is expressed to t he numerous individuals and compani es
involved for permission t o publish t hem. Dr. W.K. Fl et cher offered construc-
tive criticism of an earlier draft of t he paper.
