You are on page 1of 6

Richard Carrier: Proving history or idiocy?

In preparation for the release of the sequel to Dr. Richard Carriers Proving History, I
thought it would be good to address the foundations of the sequel as presented in
Proving History. Namely, Dr. Carriers arguments about Bayes Theorem or BT (as
presented in his book, anyway). There are several reasons for this. First, Dr. Carrier
argues that BT should be THE method for all historical research. Second, the entire
point of Proving History was to argue that the approach we will find in the sequel is the
best. The third and most important reason, however, is that Dr. Carrier doesnt
understand BT, how it differs from the method he actually advocates (Bayesian
inference/statistics), or how fundamentally inaccurate basically every claim he makes
about his methodology is.
We can begin our journey through Dr. Carriers failure to understand his own method by
looking at his flawed proof. At one point, Dr. Carrier states we can conclude here and
now that Bayes Theorem models and describes all valid historical methods. No other
method is needed (p. 106). His proof, though, stands or falls with the first
proposition in it: BT is a logically proven theorem (ibid.). This is true. However, Dr.
Carrier doesnt seem to have read the sources he cites. For example, on p. 50 Dr. Carrier
refers the reader via an endnote (no. 9) to several highly commendable texts on BT.
The one he states gives a complete proof of formal validity of BT is Papoulis, A. (1986).
Probability, Random Variables, and Stochastic Processes. (2nd Ed.). I dont have the
2nd edition, but I do have the 3rd and as this proof is trivial I really could use any intro
probability textbook. Papoulis begins his complete proof of formal validity (as opposed
to proof of informal validity? or incomplete proof?) by defining a set and probability
function for which the axioms of probability hold. A key axiom is that any set of possible
outcomes must sum or integrate to 1 (simplistically, for those who havent taken any
calculus, integration is a kind of summation). For example, imagine an individual names
Anna is drawing cards from the pack. Lets imagine that
It wasnt the Jack of Diamonds
Nor the Joker she drew at first
It wasnt the King or the Queen of Hearts
But the Ace of Spades reversed
The probability of drawing the card she did is 1/52. This is true for the other 51 cards as
well. The probability that she would draw a card from the deck that was in the deck is
52/52 or 1. This is intuitive and obvious, but the important point is that it also follows
from the fact that there are 52 cards and the probability for drawing any one of them is
1/52, hence the probability of drawing a card is given by the sum of the probabilities of
drawing each individual card, or 1/52 summed 52 times. In Dr. Carriers appendix (p.
284) he notes that probability functions must sum to 1. What he apparently doesnt
understand is what this entails. It means that in order to use BT to evaluate how
probable some outcome, result, historical event, etc., is, one must consider every single
one.
Let me make this simpler with a simple example. I can calculate the probability that,
given a full deck of cards, a random draw will yield an ace because I know in advance
every possible outcome. If, however, someone mixed together 10 cards drawn at random
from 300 different decks and asked me to pick a card, I cant calculate the probability
anymore. Even if I were told that the new deck contained 3,000 cards, I have no idea
how many are aces. Incidentally, this is the perfect situation for Bayesian statistical
inference, which works (simplistically) by assuming e.g., a certain distribution of aces
and then changing my model of how likely it is that the next draw will yield an ace as I
learn more about the distribution of cards in the entire 3,000 card deck.
Dr. Carrier wishes to use what he thinks BT is to evaluate the probability that particular
events occurred ~2,000 years ago. For example, on pp. 40-42 he considers the
possibility that Jesus was a legendary rabbi in terms of the class of legendary rabbis
and information we have on such a class. We are in a far worse position than in the
mixed card deck example above, because we dont even know the number of legendary
rabbis still less who they might be (if we did wed have the answer: Jesus either would be
one or wouldnt).
There is another basic property of BT Dr. Carrier seems to have missed. As Papoulis
clearly states, BT is only valid for events/outcomes that are mutually exclusive. Often,
both of these requirements (the sum to 1 and mutually exclusivity) are given together:
the set of outcomes must be collectively exhaustive and mutually exclusive, or BT is only
valid if
1) all possible outcomes are known
2) one and only one outcome can occur.
This makes BT useless for most purposes including historiography. However, Dr.
Carrier isnt really using BT. As his references show (as well as his formulation of the
theorem shows), he is actually using something called Bayesian inference/Bayesian
analysis. However, this negates his entire proof because it doesnt matter if BT is a
logically proven theorem and there is no complete proof of formal validity for some
Bayesian inference/analysis theorem Dr. Carrier could use in place of his first
proposition.
Ok, so we cant use BT, but that doesnt mean we cant use Bayesian methods. However,
in order to use Bayesian methods Dr. Carrier would have to understand Bayesian
statistics (and statistics in general). He doesnt. We can see this clearly when Dr. Carrier,
whose expertise is ancient history, addresses the frequentist vs. Bayesian debate. To
keep things simple, lets just say that this is an ongoing debate arguably going back to
Bayes but definitely is over a century old. Dr. Carrier is apparently so confident in his
mathematical acuity he resolves the dispute with almost no reference to math or the
literature in a few pages: The whole debate between frequentists and Bayesians,
therefore, has merely been about what a probability is a frequency of, the rules are the
same for either (p. 266). Hm. Amazing that generations of the best statistical minds
missed this. Oh wait. They didnt.
Lets look at how Carrier describes the dispute: The debate between the so-called
frequentists and Bayesians can be summarized thus: frequentists describe
probabilities as a measure of the frequency of occurrence of particular kinds of event
within a given set of events, while Bayesians often describe probabilities as measuring
degrees of belief or uncertainthy. (p. 265). This is laughably wrong:
Frequentist statistical procedures are mainly distinguished by two related features; (i)
they regard the information provided by the data x as the sole quantifiable form of
relevant probabilistic information and (ii) they use, as a basis for both the construction
and the assessment of statistical procedures, long-run frequency behaviour under
hypothetical repetition of similar circumstances.
Bernardo, J. M. & Smith, A. F. (1994). Bayesian Theory. Wiley.
Undoubtedly, the most critical and most criticized point of Bayesian analysis deals with
the choice of the prior distribution, since, once this prior distribution is known,
inference can be led in an almost mechanic way by minimizing posterior losses,
computing higher posterior density regions, or integrating out parameters to find the
predictive distribution. The prior distribution is the key to Bayesian inference and its
determination is therefore the most important step in drawing this inference. To some
extent, it is also the most difficult. Indeed, in practice, it seldom occurs that the available
prior information is precise enough to lead to an exact determination of the prior
distribution, in the sense that many probability distributions are compatible with this
informationMost often, it is then necessary to make a (partly) arbitrary choice of the
prior distribution, which can drastically alter the subsequent inference.
Robert, C. P. (2001). The Bayesian Choice: From Decision-Theoretic Foundations to
Computational Implementation (Springer Texts in Statistics). (2nd Ed.). Springer.
The frequency part of frequentist does have to do with kinds of events, but
frequencies are the measure of probability, not the reverse. To illustrate, consider the
bell curve (the graph of the normal distribution). Its a probability distribution. Now
imagine a standardized test like the SATs which is designed such that scores will be
normally distributed and have this bell curve graph. What does the graph tell us? It tells
us that the most people who take the test get very close scores, but very infrequently
some test-takers will get high scores and other will get low. In other words, the bell
curve is the graph of a probability function (technically, a probability density function or
pdf), and it is formed by the frequency of particular scores. We know that it is very
improbable for a persons score to fall in either of the ends/tails of the bell curve because
these are very infrequent outcomes.
What does this mean for frequentist methods? Well, Kaplan, The Princeton Review,
and other test prep companies try to show their methods work by using this normal
distribution. They claim that people who take their classes arent distributed the way the
population is, because too frequently students taking their class obtain scores above
average (i.e., those who take the classes have test scores that arent distributed the way
the population is). They use the frequency of higher-than-average scores to argue that
their class must improve scores.
Whats key is that the data are obtained and analyzed but the distribution is only used to
determine whether the values the analysis yielded are statistically significant. Bayesian
inference reverses this, creating fundamental differences. The process starts with a
probability distribution. The prior distributions obtained represent uncertainty and
make predictions about the data that will be obtained. Once the new data is obtained,
the model is adjusted to better fit it. This is usually done many, many times as more and
more information is tested against an increasingly more accurate model. The key
differences are
1) the iterative process
2) the use of models which make predictions
3) the use of distributions to represent unknowns and (in part) the way the model will
learn or adapt given new input.
So why dont we find any of this in Dr. Carriers description of Bayesian methods? Why
do we always find ad hoc descriptions of priors? Because Dr. Carrier wants to use
Bayesian analysis but apparently doesnt understand what priors actually are or how
complicated they can be in even simple models:
In many situations, however, the selection of the prior distribution is quite delicate in
the absence of reliable prior information, and generic solutions must be chosen instead.
Since the choice of the prior distribution has a considerable influence on the resulting
inference, this choice must be conducted with the utmost care.
Marin, J. M., & Robert, C. (2007). Bayesian Core: A Practical Approach to
Computational Bayesian Statistics. (Springer Texts in Statistics). Springer.
While the axiomatic development of Bayesian inference may appear to provide a solid
foundation on which to build a theory of inference, it is not without its problems.
Suppose, for example, a stubborn and ill-informed Bayesian puts a prior on a population
proportion p that is clearly terrible (to all but the Bayesian himself). The Bayesian will
be acting perfectly logically (under squared error loss) by proposing his posterior mean,
based on a modest size sample, as the appropriate estimate of p. This is no doubt the
greatest worry that the frequentist (as well as the world at large) would have about
Bayesian inference that the use of a bad prior will lead to poor posterior inference.
This concern is perfectly justifiable and is a fact of life with which Bayesians must
contendWe have discussed other issues, such as the occasional inadmissibility of the
traditional or favored frequentist method and the fact that frequentist methods dont
have any real, compelling logical foundation. We have noted that the specification of a
prior distribution, be it through introspection or elicitation, is a difficult and imprecise
process, especially in multiparameter problems, and in any statistical problem, suffers
from the potential of yielding poor inferences as a result of poor prior modeling.
Samaniego, F. J. (2010). A Comparison of the Bayesian and Frequentist Approaches to
Estimation. (Springer Texts in Statistics). Springer.
The stubborn and ill-informed Bayesian is in a much better position than Dr. Carrier.
Dr. Carrier has confused BT with Bayesian analysis and the Bayesian approach with the
frequentist all because he apparently hasnt understood any of these. Instead of prior
distributions his priors are best guesses. Instead of real belief functions we find heres
what I believe. No considerations are given to the nature of the data (categorical,
nominal, and in general none numerical data require specific models and tests, Bayesian
or not).
So instead of the universally valid historical method Dr. Carrier argues BT provides, all
that hes actually done is butcher mathematics in order to plug values in to a formula
that is as mathematical as numerology but apparently seems impressive if you have no
clue what you are talking about. Perhaps thats why Dr. Carriers CV indicates hes been
lecturing on Bayes Theorem since 2003, but his 2008 dissertation contains no
reference to Bayes Theorem.

You might also like