You are on page 1of 4

Some Problems Statistics Everyday of and Life

GEORGEE. P. BOX*

When Fred Leone, our Executive Director, was exduties were,he told plainingto me what my presidential me that one ofthe "perks' associatedwiththisjob is that I get to give the annual address to the Association,and I have a captive audience foras longas theyare prepared to sit there. Fred said to me, "George, don't give them anythingtoo technical because this is a light occasion have and therewillbe a lot ofpeople that the statisticians draggedalong-husbands, wives,friends-who have had about all the statisticsthey can take." I Well, imagine my disappointment. had prepared a 200-page draft of my talk. It was called "The Present Problem: A DecisionSecretary Status ofthe One-Armed Theoretic Approach," and it made free use of a-fields, Hilbertspaces, and all kindsofsquigglyletterswithdots on. This I reluctantlyset aside. (I don't think any of you would have understoodit anyway.) I have had to look foran alternative.I toyed forsome time with the title, "Whither Statistics?," subtitled "Perhaps We Shouldn't Start fromHere," but in the end abandoned that too. Eventuallyit struckme that many ofthe issues that we face as members of the American Statistical fromthose we Associationare really not very different face as ordinaryhuman beings. This is what my talk is about.

very good at doing a very large numberof thingsthat come up in facingthe worldas it actually is. Anotherway to say this is that thereis reallynothing per wrongwithoptimization se, but that we oughtto try which of to optimize over thatdistribution circumstances to theworld reallypresents us. The mistakeis choosingthe suboptimizabest over too narrowa set of alternatives, tion. It is sometimesargued that by doing simplified exercises, can at least obtainusefulpointers. we How-6Ver, I feel that such pointersare very likely to indicate the wrong direction, mightbe truein the case of the razor as blade and the rustynail.

WHAT IS THE REALWORLDLIKE?


apin The difficulty taking the wider robustification expectto get good resultsunless proachis that we cannot we are reallypreparedto engage in the hazardous undertaking of findingout what the world is really like. It to requiresus, as statisticians, have some knowledgeof reality. I believe we do have members,we may have ASA fellows,possiblywe have even had ASA Presidentswho reallydo not care what theworldis reallylike. Some years of ago a friend mine told me about his daughterwho was then at OxfordUniversity.She was a very brightgirl, but she got interestedin politics (it was in the 1960s); she got behindin her studies,and the time of graduation was approaching. You may know that in the English grades of bachelor's system, there are many different degree. The younglady startedto worry:was she going to get a "pass" degree (which is almost like the Universityspittingat you), or was it to be a third class, lower second class, upper second class, or a firstclass honours degree? She decided to ask her tutor about it. in Findinghim buried somewhere the dust of one of the Oxfordcolleges,she eventuallygot around to askinghim the delicate question, "Would it matter in the outside worldifI didn't get a verygood degree?" He looked very What do I knowabout startledand said, "Outsideworld? the outsideworld?" When the statisticianlooks at the outside world, he errorsthat are incannot, for example, rely on finding in distributed approximately and identically dependently In normal distributions. particular,most economic and businessdata are collectedseriallyand can be expected, to therefore, be heavily seriallydependent. So is much
? Journalof the AmericanStatisticalAssociation March 1979,Volume 74, Number365 PresidentialAddress

THE BEST IS OFTEN NOT VERYGOOD


with optimal or Some of us have had a preoccupation But the best,ofcourse,is not necessarily best procedures. very good. For instance,to bringin the aspect of everymy day life,ifever I had to decide betweencutting throat witha razor blade or witha rustynail, I suppose I would choose the razor blade. But, although not strictlyrelevant to the problemas posed, one question that might cross my mind would be, "Have I considered all my options?" A principlethat is being given more attentionthese days is that of "robustification."Here one doesn't attempt to guarantee that thingswill be optimal over some tractable, but perhaps very narrow, set of circumstances.Instead one triesto ensurethat theywill be fairlygood over a wide range of possibilitieslikely to Look at the humanhand,forexample. happenin practice. I doubt ifthereis any singlethingthat it does that could but not be done betterby some special instrument, it is
* George of Professor Statistics, University E.P. Box is R.A. Fisher of Wisconsin, Madison,WI 53706. This articleis the text of the of at delivered the 138thAnnualMeeting the Address Presidential Statistical Association, August15, 1978,in San Diego. American

Journalof the AmericanStatisticalAssociation, March 1979 Bayes theoryand sampling theoryare not alternatives at all. that the advancementof learnIt is widelyrecognized ing does not proceedby conjecturealone, norby observainvolvingboth. Certainly, tion alone, but by an iteration scientific investigationproceeds by such iteration. Examinationof empiricaldata inspiresa tentativeexplanaexposed to reality,may lead to tion which,when further its modification. This modified explanationis again put in jeopardy by further exposureto reality,and so on, in and deduction. a continued alternation betweeninduction even good I am continually surprised that statisticians, ones,stillseem to ignorethisiterativeaspect ofinvestigaan tion and talk as ifthe movement from initial (perhaps ill-posed) question, to design, to data collection, to analysis of the data, to "the answer" were a one-shot The wise investigator not affair. expendshis effort in one grand design (necessarilyconceived at a time when he reality), but in a series of knows least about unfolding smaller designs,analyzing,modifying, and gettingnew ideas as he goes. This iterativeaspect of researchhas a the on profound influence almost everything investigator and the statistician and it has been the sourceofmuch do, Just as the rules that governmathemisunderstanding. from those that matical iteration are very different governsolutionsin closed form,so the rules that ought to apply to the statisticsof most real scientific investigations are different, broader,and vaguer than those that mightapply to a single decision or to a single test of hypothesis. advance, to whichall statisticians Now, since scientific mustaccommodate,takes place by the alternationof two kinds of reasoning,we would expect also that different kinds of inferential two different process would be required to put it into effect. The first,used in estimatingparameters from data conditionalon the truth of some tentative model, is appropriately called Estimation. The second, used in in checking whether, thelightofthedata, anymodelofthe kind proposed is plausible, has been aptly named by CuthbertDaniel Criticism. While estimation should, I believe, employ Bayes' Theorem, or (for the fainthearted)likelihood,criticism approach. In practice,it is oftenbest needs a different of done in a ratherinformal way by examination residuals or other suitable functionsof the data. However, when it usingtests of goodnessof fit, must, it is done formally, I think,employsamplingtheoryforits justification. are Bayes and likelihoodinferences necessarilycondishould not be used alone for the tional and, therefore, same reason that the statement,"If the moon was made of green cheese, it would be a great place for mice," shouldnot tempta mouseto hang around Cape Kennedy.

of the data collected from the automatic instruments which are becoming so common in laboratories these days. Analysis of such data, using proceduressuch as standardregression analysiswhichassume independence, can lead to gross error.Furthermore, possibilityof the contaminationof the error distributionby outliers is always present and has recently receivedmuchattention. More generally, real data sets,especiallyif theyare long, in usually show inhomogeneity the mean, the variance, or both,and it is not always possibleto randomize. To find out what the world is really like, we must spend more time looking for ourselves at real sets of data. For example, David Cox says that deviationsfrom disnormalitvoftenoccur in the directionof light-tailed tributions well as heavy-tailedones. Let us discoverif as he's right.If he is, it could seriously affect some proposed robustmethods. In orderto betterconfront realitiesof the outside the world,some have urgedus to abandon classical methods of estimation, such as employ likelihood and Bayes' Theorem,and resortto a new empiricism each probfor lem that arises,and foreach authorthat writesabout it. The imperfection, I thinkthat notionis wrong-headed. of course,lies not withthe estimation method,but withthe model that we put into it. For example,it is truethat the sample average, which is the maximumlikelihoodesticould mate ofthe mean on standardnormalassumptions, be a verypoor estimateifwe believedthat the data were normaldistribution. Nevergeneratedby a contaminated theless,if our model took account, not of what we did not believe, but ratherof what we did believe,we could obtain excellent estimates by standard methods. The great advantage of the model-basedover the ad hoc approach,it seems to me, is that at any giventimewe know what we are doing. it Models, of course,are nevertrue,but fortunately is only necessarythat they be useful.For this it is usually needful only that they not be grosslywrong. I think of rather simple modifications our present models will prove adequate to take account of most realitiesof the of outside world. The difficulties computation which would have been a barrierin the past need not deter us now.

CHOOSINGTHE "BEST" DOESN'TMAKESENSE WITH OPTIONSTHAT ARE NOT ALTERNATIVES

with optima is a tendencyto want Anotherdifficulty to choose the best one of a set ofitemsthat are not really alternatives.The relevant question then is not, "Which entitieshave a role, is best?," but "Do these different and if so, what is it?" For example,it turnsout that it is much betterto have two sexes than one-and this not merely for hedonistic reasons. Again, if one mentions DIVISIONOF AN ENTITY INAPPROPRIATE Bayesian analysis and sampling theoryanalysis in the same breath,one not only hears the question,"Which is While we can make a mistake by looking for one best?," but also the question, "Which is right?," and answer when we should be lookingfortwo or more,we religiouspassions are quickly aroused. Yet, to my mind can make another mistake by dividingan entity mnap-

and of Box: SomeProblems Statistics Everyday Life


You will recall the story of Solomon, who propriately. of the determined truemother a childof disputedparentto age by offering cut it in two. One slicingof our subject which I thinkcan be harmfulis that into Applied Statisticsand TheoreticalStatistics. I hear people saying things like, "Of course I'm a theoreticalstatisticianmyself,but I agree there should be some applied statisticiansand there should even be applied statisticsdepartments;in fact, some of my best opinion, friendsare applied statisticians." Now, in mny that isn't any good, because, if you imaginethe theoretiabout a point on the rightof cal statisticiansdistributed about a a scale and the applied statisticiansdistributed pointon the left,you willend up witha bimodaldistribution withlow densityin the center.Now the people most needed are, in my opinion,those in the middle,and perhaps that's why theyseem to be in such shortsupply. If, we alternatively, aimed at a centraltarget,thenwe might This would still, achieve a single unimodal distribution. of course, allow diversity.We would have some highly theoreticalpeople in one tail and some highlyapplied people in the other. But the majority, while having also possessabilityand might training, propertheoretical experiencein applying what they knew to the solution of scientific yroblems.

3
It was a strokeof genius to realize that to render"a deed withouta name" respectable,you should name it (or perhaps I should say rename it), and we are all gratefulforthe name "Data Analysis." This important part of our subject can now be studied withoutapology or shame, and courses on it are taught and may be attended by consentingadults. The elevation of Data Analysisto its properplace as a subject meriting serious studymakes me as happy as I would be ifsome neglected but important activity of the carpenter, such as the use of the saw or the chisel, had at last received proper recognition study.But myenthusiasm the naming and for of Data Analysis does not extend to the renamingof Statisticiansas "Data Analysts,"any morethan I should be happy to hear a carpenterdescribedas a sawyeror a chiseler.Indeed, I am as appalled by the appearance of Data Analystsas entities I would be at contemplating as one halfofthe baby overwhichSolomonadjudicated, and forthe same reason. There can be no feedbackbetween the parts of a once-living thingcut in two. Please can Data Analysts get themselves together again and become whole Statisticians before it is too late? Beforethey,theiremployers, theirclientsforget and the otherequally important parts of the job statisticians should be doing, such as designinginvestigationsand buildingmodels? By inventionof the concept of Experimental Design, Fisher promotedthe statisticianfroma curatorof dustyrelicsto a valued memberof a scientific team, responsiblefor planning and taking part in the conduct of an investigation.Let us not allow him to be relegatedto his previouspassive and inferior role by an injudiciouschoice of a name. "Our Data Analyst" is too close for my likingto "Our Tame Statistician," a poor thingif that is all he is.

OF TRAINING STATISTICIANS

This suggeststhe question of how statisticiansshould be trained. It's fairlyeasy to see how we should not trainthem. I will make an analogy with swimming. Swimmingcould be taught by lecturingthe student swimmersin the classroomthree times a week on the various kinds of strokesand the principlesof buoyancy such Some mightbelievethat on completing and so forth. a course of study, the graduates would all eagerly run down to the pool, jump in, and swim at once. But I THE AMERICAN STATISTICAL ASSOCIATION thinkit's much morelikelythat theywould want to stay Finally, I want to talk a bit about our Associationbein the classroomto teach a freshlot of studentsall that cause it is ours and it can be as good or as bad as we, thev had learned. Let me mention another distinctionwhich is now make it. I During my timeas president have receiveda number needed, and which threatensto become an unnecessary of letters, all of them interestingand some of them have known and harmful slicing.Statisticalpractitioners fora longtime that,priorto usingthe methodsthat most critical. Some members feel that we should be doing textbooks emphasize, there is a very important and things we are not doing, some feel that we are doing largely neglected' phase of activitywhich Fisher called thingswe should not be doing,some feelthat the articles specificationand which has also been called mnodel in our journals are not on the subjects they would like, withsufficient identification.This involves informal techniques of or are not written clarity.The suggestions analysis of data, many of them graphical, aimed at made in such lettersare, of course, given careful conand way sideration not only by your president but by your lookingat the data in a preliminary exploratory in order to help understandwhat questions should be Board and its committees. you have ideas on these or If asked and what tentativemodels mightbe entertained. othersubjects,however, and want to see something more thisprocesswas regardedby Until recentyears,however, done about them,I do urge you, if you have not done so the majorityas not entirelyrespectable.Like the black already,to volunteerforactive duty. The Associationis art, it was widelyfeltthat it should be conducted,if at always seekingnew faces and new ideas forits commitall, only behindclosed doors. tees. And thereare otherthingsyou can do too. Methods of 1 An early was exception thesecondchapter Statistical Suppose, forexample,as a statisticalpractitioner, you in published 1925,in which (Fisher1925),first Workers forResearch techniques. feel that the journal Technometrzcs not adequately graphical the is discussed use ofpreliminary Fisher

Journalof the AmericanStatisticalAssociation, March 1979

that fulfilling part ofits mandate that says it will publish The editors are in desperate need of good refereesfor method articles of this and every other sort. They need people statistical of the papersillustrating application known througha paper, say encouraging or paperson who will go carefully expository tutorial to newor novelenvironments, statisticalmethods,and papers dealing with the words about good things, suggest how ideas could be particular of and problems applyingstatisticalmethodsto philosophy clarified and how imperfections be put rightand who, can design and performance. development, research, whennecessary, willfirmly rejectunsuitablemanuscripts. I have it on good authoritythat editorshave two probIn closing, I want to say how much I have enjoyed out such a mandate: in the firstplace, being your president, lems in carrying especiallybecause of the kindness, hard to come by, and consideration, articlesof this kind are extremely and help I have received fromthe memtend to reject such articles bers, board, and officers. particular,I wish to thank in the second place, referees In for the wrong reasons (perhaps because they have not Fred Leone, Ed Bisgyer,Jean Smith,and the restof the to read the mandate). I urge you, therefore, considerone Washingtonstaff. The Associationis indeed fortunate to courses: or both of the following be served by such accomplishedand dedicated people. 1. If vou have suitable material,please writeit up and send it in. 2. Please volunteerto act as a referee. REFERENCE
Fisher, R.A. (1925), Statistical Methods for Research Workers,

Oliver& Boyd. Edinburgh:

You might also like