You are on page 1of 23

Journal of Clinical and Experimental Neuropsychology 1380-3395/02/2403-383$16.

00
2002, Vol. 24, No. 3, pp. 383405 # Swets & Zeitlinger

From the BinetSimon to the WechslerBellevue:


Tracing the History of Intelligence Testing
Corwin Boake
Department of Physical Medicine and Rehabilitation, University of Texas-Houston Medical School and
The Institute for Rehabilitation and Research, Houston, TX, USA

ABSTRACT

The history of David Wechsler's intelligence scales is reviewed by tracing the origins of the subtests in the
1939 WechslerBellevue Intelligence Scale. The subtests originated from tests developed between 1880 and
World War I, and was based on approaches to mental testing including anthropometrics, association
psychology, the BinetSimon scales, language-free performance testing of immigrants and school children,
and group testing of military recruits. Wechsler's subtest selection can be understood partly from his clinical
experiences during World War I. The structure of the WechslerBellevue Scale, which introduced major
innovations in intelligence testing, has remained almost unchanged through later revisions.

When future psychologists look back upon the Wechsler subtests. The historical record demon-
practice of psychological testing during the strates that the Wechsler subtests represent the
1900s, they will be struck by the profound inu- continuation of tests that existed long before 1939
ence of David Wechsler's intelligence scales, and, in many cases, before 1900. The article
beginning with the publication of the Wechsler identies the persons who created the tests and
Bellevue Intelligence Scale in 1939. Current sur- reveals the purposes for which the tests were
veys of psychologists in the U.S.A. (Camara, developed. Finally, the article discusses implica-
Nathan, & Puente, 2000) demonstrate that the tions of the fact that, in the midst of accelerating
Wechsler intelligence scales continue to dominate scientic progress, advances from early intelligence
individual intelligence testing. According to Lezak tests to those of today have been relatively minor.
(1995), administration of a Wechsler intelligence
scale ``typically constitutes a substantial portion
of the test framework of the neuropsychological EARLY MENTAL TESTING
examination'' (p. 688). Yet, the history of the most
popular intelligence tests is unfamiliar to most of Tests for the assessment of cognitive and percep-
the psychologists who use them. It is timely to tual abilities had been developed well before the
retrace the origins of these scales because they publication of the rst BinetSimon intelligence
represent a major chapter in the history of psycho- scale in 1905 (Peterson, 1925). Form boards were
logical assessment and because this history might used during the mid-1800s by the French phy-
have implications for future use of the scales. sician Edouard Seguin (Pichot, 1948, 1949) for
The method used in this article is to review the training cognitively impaired children. The form
history of the Wechsler intelligence scales by board of Halstead's (1947) Tactual Performance
tracing the origins of the tests that became the Test was adapted by the American psychologist

Address correspondence to: Corwin Boake, TIRR, 1333 Moursund, Houston, TX 77030-3405, USA. Tel.: 1-713-
797-5913. Fax: 1-713-797-5208. E-mail: corwin.boake@uth.tmc.edu
Accepted for publication: July 31, 2001.
384 CORWIN BOAKE

Henry Goddard from one of Seguin's designs. The digit span test selectively measured a process that
British scientist Francis Galton developed a set of he termed prehension and that he dened ``as the
``anthropometric'' measures, such as line bisection, mind's power of taking on certain material'' (p.
that were administered to persons attending the 79). Jacobs concluded that since ``we clearly
1884 International Health Exhibition in London cannot take in without rst taking on . . . the men-
(Galton, 1885). The pioneering American psychol- tal operation we have been testing thus seems a
ogist James McKeen Cattell, who in 1890 coined necessary preliminary to all obtaining of mental
the term mental test, adapted Galton's tests for material,'' and he recommended ``that `span of
research with American college students during prehension' should be an important factor in
the 1890s. The following examples of the digit span determining mental grasp, and its determination
and substitution tests demonstrate that the Wechs- one of the tests of mental capacity'' (1887, p. 79).
ler scales contain mental tests from the period
before the BinetSimon scale and before the con- Substitution Test
cept of psychometric intelligence. The substitution test, the precursor of the Wechsler
Digit Symbol and Coding subtests, was probably
Digit Span created around 1900 as an American college
The digit span test was already familiar in classroom experiment for demonstrating the pro-
psychology before 1900, having been used in cesses involved in learning associations (Dearborn,
England during the 1880s in studies by Galton 1910; Starch, 1911). In his 1911 book of class-
(1887) and by the cultural historian Joseph Jacobs room teaching exercises, Starch reported that the
(1887). In the article introducing this test of ``the test had been ``originally devised several years
power of reproducing sounds accurately,'' Jacobs ago by [Joseph] Jastrow'' of the University of
(1887) noted that the test stimuli had been Wisconsin. The test's rationale was explained in a
switched to digits after it was found that nonsense 1910 compendium of mental tests authored by
syllables, which had been used initially following Guy Whipple, who described the substitution test
the example of Ebbinghaus, were too difcult for as a measure of ``the rapidity with which associa-
school children. Jacobs (1887) described how the tions are formed by repetition'' (p. 350). Whipple
test instructions were developed: proposed that ``in theory, an S whose nervous
system is plastic and retentive, who, in other
It was necessary, in the rst place, to adopt words, is a quick learner, will make the most
some uniform rate at which the dictation rapid progress'' (p. 350). One version of the
should be given, as the power of apprehension substitution test consisted of ``pages headed
varied with the rate of utterance. A sound every with an imitation typewriter keyboard'' in which
half-second was found to be a convenient rate, ``each letter of the alphabet is enclosed with a
and a little practice with a metronome beating number in a circle'' (Starch, 1911, pp. 47 48).
twice a second gives the experimenter a sense Below the keyboard diagram was a text passage to
of the proper interval. If possible, two sets of be ``transcribed'' by the subject through ``substi-
the series of sounds should be given, and the tuting the numbers for the letters'' in blank spaces
highest number correctly reproduced is to be placed alongside the text (Starch, 1911, p. 48).
regarded as the limit which we wish to nd, The Digit Symbol subtest of the Wechsler intel-
and which we term here the span. The reading ligence scales probably descended from another
should be in a monotonous tone, so as not to University of Wisconsin substitution test having a
give any perceptible accent or rhythm, either of smaller key in which the nine digits were each
which, it appeared, assists the power of repeti- paired with a different typographical symbol (e.g.,
tion in a considerable degree. (p. 76) equal sign). Revisions of this nine-pair substitu-
tion test were included in early children's bat-
Addressing ``the question as to what is the exact teries (Healy & Fernald, 1911; Pyle, 1913) as
power of the mind which is involved in reproduc- measures of association learning. The version of
ing these sounds,'' Jacobs (1887) claimed that the the substitution test shown in Figure 1, developed
HISTORY OF INTELLIGENCE TESTING 385

Fig. 1. Substitution Test. Introduced in a 1911 monograph by Robert Woodworth and Frederic Lyman Wells, the
test was designed to measure the ability to learn new associations. The test was derived from an earlier
substitution test with 20 letter pairs (Kirkpatrick, 1909), by pairing digits with geometric forms in order to
prevent ``easy mnemonics.'' This Substitution Test page also served as a test of rapid naming of geometric
forms. The ve geometric forms were selected because they do not change appearance when rotated, which
allowed the test page to be presented in different orientations.
Note. From ``Association tests. Being a part of the Report of the Committee of the American Psychological
Association on the Standardization of Procedure in Experimental Tests,'' by R.S. Woodworth and F.L.
Wells, 1911, Psychological Monographs, 13 (Whole no. 57), pp. 5253. In the public domain.

by Cattell's students Robert Woodworth and F. L. scale of intelligence'' that they had developed for
Wells (Woodworth & Wells, 1911), is probably use with Paris school children (Binet & Simon,
the source of the Coding subtest of the Wechsler 1905). The scale consisted of a series of 30 brief
children's scales. cognitive tests, arranged in order of difculty,
which could be administered in about 40 min.
The selection of tests included measures of lan-
BINET AND SIMON'S MEASURING SCALE guage skills (e.g., naming, following commands,
OF INTELLIGENCE semantic judgments), memory, reasoning, digit
span, and psychophysical judgments. Many of
In 1905, the psychologist Alfred Binet and psy- the tests had been discussed by Binet and Victor
chiatrist Theodore Simon published a ``measuring Henri in their 1895 review of ``differential
386 CORWIN BOAKE

psychology.'' The scale included tests developed the Picture Completion subtest of the Wechsler
by others (e.g., digit span) as well as ones devel- Bellevue scale and later Wechsler intelligence
oped by Binet and his collaborators. Validity of scales.
the scale was demonstrated, as in Jacobs' studies, In only a few years, the BinetSimon scale
by the increase of scores with age and by the became widely used in Europe and North
scale's ability to differentiate normal and cogni- America. Goddard learned of the scale while
tively impaired children (Peterson, 1925). traveling in Europe and arranged for its transla-
In 1908, Binet and Simon made a fundamental tion into English. From his position as research
revision of their scale by grouping the tests into director of the Training School at Vineland, a
age levels. In this procedure, later known as a year residential center in New Jersey for children with
scale, each test was assigned to the age level at cognitive disorders, Goddard led a drive to
which most children performed it successfully. popularize intelligence testing that rapidly led to
For example, digit span items of different lengths the use of the BinetSimon scale in American
were administered at various age levels, with institutions (Zenderland, 1998).
longer spans assigned to older ages. Administra- Shortly before the U.S.A. entered World War I,
tion to an individual child began with the tests at two major revisions of the BinetSimon scale
the child's age level and proceeded to higher or were produced by American psychologists. The
lower age levels, until nding the age level at rst revision, by Robert Yerkes and James Bridges
which most tests were failed. The child's of the Boston Psychopathic Hospital, restructured
intelligence was quantied in terms of the the BinetSimon scale from a year scale into a
intellectual level (later, mental age), dened as point scale termed the YerkesBridges Point
the highest age level at which the child completed Scale Examination, by grouping items of similar
most tests successfully. Although the 1908 content into a smaller number of subtests (Yerkes,
revision contained both verbal and non-verbal Bridges, & Hardwick, 1915). For example, the
tests, the consensus of psychometric opinion was Memory Span for Digits subtest of the Yerkes
that the scale overemphasized verbal skills such Bridges scale was formed by consolidating the
as vocabulary and repetition. digit span items (Repetition of two gures, ve
The BinetSimon scales have served as both a gures, etc.) that were spread over various Binet
model of form and source of content for later Simon age levels. Point-scale tests were adminis-
intelligence tests. The basic procedure of combin- tered beginning with the easiest item and
ing different mental tests to yield a composite proceeding in order of difculty until completing
score is the foundation of intelligence scales. the test. The correspondence between the Binet
Binet emphasized that since ``a particular test Simon tests and the YerkesBridges tests is
isolated from the rest is of little value'' or even illustrated in Table 1, where it can be seen that
``signies nothing,'' the important information the titles of the YerkesBridges tests were formed
contributed by intelligence scales was the sub- by abbreviating the BinetSimon titles into
ject's average performance over various tests. He descriptive phrases. The point-scale method
facetiously wrote that ``one might almost say, `It introduced by the YerkesBridges scale was the
matters very little what the tests are so long as model for the tests that evolved into the Wechsler
they are numerous.''' (1911/1916, p. 329). In scales (Thorndike & Lohman, 1990).
addition to establishing the basic form that The second revision, by Lewis Terman of
intelligence tests would take, the BinetSimon Stanford University, extended the age range into
scales contributed items and tests that have been adulthood and, most important, replaced the
recycled in later intelligence scales. Table 1 mental age with the intelligence quotient (IQ) as
shows tests and items of the 1905 and 1908 the preferred composite score. In addition, Ter-
BinetSimon scales that have been duplicated in man supplemented the BinetSimon tests with
the Wechsler intelligence scales. For example, the newer tests such as arithmetic reasoning items
BinetSimon Unnished pictures item, a drawing developed by Bonser (1910) and a form board
of a face missing a nose, is duplicated as an item of developed by Healy and Fernald (1911). Terman's
Table 1. BinetSimon Tests Duplicated in the WechslerBellevue Scale.
Test in BinetSimon scales Description/example Related test in Related test(s) in
(1905, 1908) YerkesBridges Point Scale (1915) StanfordBinet scale (1916)
Unnished pictures (1908) Identify missing parts of a Perception and comparison of Finding omissions in pictures
drawing of a face or person. pictures (missing parts)

HISTORY OF INTELLIGENCE TESTING


Reply to an abstract question (1905); ``When one breaks something Comprehension of questions Comprehension
Comprehension questions (1908) belonging to another what must
one do?'' (p. 224).
Repetition of three gures; Repeat spoken list of digits in Memory span for digits Repeating three digits, etc.;
Immediate repetition of gures (1905) same order. Repeating three digits reversed,
etc.
Denitions of abstract terms (1905) Dene abstract nouns. Score is Denition of abstract words Dening abstract words;
based on abstractness Vocabulary
of denition.
Verbal denition of known objects (1905); Dene concrete nouns, in terms Denitions of concrete terms Giving denitions in terms of
Denition of familiar objects (1908) of use or superior to use. use; Giving denitions superior
to use; Vocabulary
Resemblances of several known objects Example: ``In what way are a y, Comparison of three pairs Giving similarities
given from memory (1905) an ant, a buttery, and a ea of objects
alike?'' (p. 61).
Making change from 20 sous (1908) Make change when 4 sous are none Making change; Arithmetical
taken from 20. reasoning
Note. Quotations are from the translation of the BinetSimon tests by Elizabeth S. Kite (Binet, 1905/1916, 1908/1916).

387
388 CORWIN BOAKE

revision, the StanfordBinet Intelligence Scale for screening arriving immigrants for mental and
(Terman, 1916), quickly became the dominant physical disorders (Knox, 1914b; Yew, 1980).
measure in American intelligence testing. The task of mental screening was complicated
by the fact that many immigrants spoke no
English and had little or no formal education.
PERFORMANCE TESTS OF INTELLIGENCE The Ellis Island physicians rejected the Binet
Simon scale as inappropriate for testing immi-
The need for non-verbal measures of intelligence grants, on the rationale that the scale was an
was felt by clinicians examining subjects with ``arbitrary and articial scale that was derived
limited English-language skills, such as hearing- from experiments performed on French school
impaired and foreign-born persons. A 1911 children'' (Knox, 1914a) and that it would be
monograph by the psychiatrist William Healy ``manifestly absurd to use educational tests in the
and psychologist Grace Fernald, of the Chicago case of uneducated persons'' (Knox, 1913). To
Juvenile Psychopathic Institute, presented a group substitute for the BinetSimon scale, the Ellis
of ``practical'' tests designed for use with juvenile Island physicians assembled various form board
delinquents. In a criticism of the BinetSimon and puzzle assembly tasks into ``a graduated scale
scale to be repeated by many other authors, Healy with performance tests for determining the
and Fernald complained that the scale ``helps very intelligence of aliens, and especially illiterates''
little where the language factor is a barrier, either (Knox, 1915). Two surviving tests from the Ellis
on account of foreign parentage or insufcient Island testing program are the feature prole
schooling, and with uneducated deaf and dumb (Fig. 2) and the cube imitation test (Knox, 1914b).
children'' (1911, p. 5). As a substitute for the In the words of Howard Knox, an Ellis Island
BinetSimon scale, Healy and Fernald claimed physician who may have coined the term
that their tests had been constructed in order ``to performance tests, the tests required abilities such
ascertain the mental ability quite apart from the as ``a little native ingenuity, constructive imagi-
individual's experience in formal training in nation and sense of form and some judgment''
our language, or indeed in any language'' (1911, (1915, p. 53), but not any formal education.
p. 4). For example, Healy's (1914, 1921) Pictorial Collection of child and adult norms from Italian-,
Completion tests consisted of picture boards of Spanish-, and German-speaking immigrants
childhood scenes with empty spaces designed began at Ellis Island during 1914 but was
to be lled by different picture elements. The discontinued when World War I caused immigra-
subject's task was to select the element that tion to decline (Mullan, 1917). In an interview
completed the picture in the most appropriate conducted 60 years later, the former Ellis Island
way (e.g., placing a ball in the hand of a boy physician Grover Kempf recalled:
throwing something). Healy and Fernald
explained that, because their tests were designed The mental examination of immigrants was
``with the idea of invoking always as much inter- usually done in a quiet room it had to be done
est as possible in our tests, we have ever had in with an interpreter present, a man or woman
mind the development of them in forms resem- who was well versed in the language of the
bling games and puzzles, but really involving immigrant. Usually two ofcers sat in during
points much more open than puzzles to solution the examination and it was conducted in a
by use of simple reasoning ability'' (1911, p. 7). question and answer method, and also mostly
This method of measuring intelligence using non- with the board tests, putting blocks back
verbal tasks came to be termed performance together and the Knox cube test. And I had
testing. one test there of a face, called the Kempf test,
A seminal event in the history of performance which required the immigrant to place the
testing was the assessment program at Ellis blocks in order to form a human face. How-
Island, in New York harbor, by U.S. Public ever, none of these tests were standardized.
Health Service physicians who were responsible (U.S. Public Health Service, 1977)
HISTORY OF INTELLIGENCE TESTING 389

Fig. 2. Feature Prole Test. The test, on display at the Ellis Island Immigration Museum, was designed for mental
screening of arriving immigrants at Ellis Island. It was developed by Grover Kempf and Howard Knox,
Assistant Surgeons of the U.S. Public Health Service. Knox claimed that the test was ``eminently fair
because everyone has seen a human head'' (1914b, p. 744).
Note. Reproduced courtesy of the Ellis Island Immigration Museum.

Following the example of the Ellis Island PintnerPaterson scale is no longer sold as a
testing program, Rudolf Pintner and Donald battery, some of its component tests are still in use
Paterson of Ohio State University developed a today. For example, the Pintner Manikin Test is
performance test battery, the PintnerPaterson the beginning item of the Object Assembly
Performance Scale, for assessment of hearing- subtest of the Wechsler intelligence scales.
impaired school children (Pintner & Paterson,
1917). The rationale of the PintnerPaterson scale
was that the intelligence of these children, like U.S. ARMY INTELLIGENCE TESTING
that of many immigrants, would be underesti- DURING WORLD WAR I
mated by the verbally weighted BinetSimon
scale. Table 2 shows the tests comprising the Probably the most important event in the spread of
PintnerPaterson scale. Of the 15 tests in the intelligence testing was the testing program carried
scale, 4 were newly developed by Pintner and out in the U.S. Army during the rst world war.
Paterson, and the majority were existing tests Yerkes, who was president of the American Psy-
borrowed from the Ellis Island tests, the Healy chological Association at the time when the United
Fernald tests, and other sources. Although the States declared war, formed a committee of testing
390 CORWIN BOAKE

Table 2. Performance Scales that Preceded the WechslerBellevue Scale.


PintnerPaterson performance Army Performance Point Scale of Performance Ability
test series (Pintner & Paterson, Scale Performance Tests Scale (Cornell & Coxe,
1917) (Yerkes, 1921) (Arthur, 1930) 1934)
Mare & Foal Picture Ship Test Knox Cube Manikin-Prole
Board Manikin & Feature Seguin Form Board Block-Designs
Seguin Form Board Prole Two-Figure Form Board Picture-Arrangement
Five Figure Board Knox Cube Imitation Casuist Form Board Memory-for-Designs
Two Figure Board Cube Construction Manikin; Feature-Prole Digit-Symbol
Casuist Form Board Form Board Mare & Foal Cube-Construction
Triangle Test Memory for Designs Healy Picture Completion I Picture-Completion
Diagonal Test Digit Symbol Test Porteus Maze
Healy Puzzle `A' Porteus Maze Test Kohs Block Design
Manikin Test Picture Arrangement
Feature Prole Test Picture Completion
Ship Test
Picture Completion Test
Substitution Test
Adaptation Board
Knox Cube Test

experts to design tests to determine if Army recruits Army testing program paralleled the roles of the
were t for military service. The committee, shown BinetSimon scale and performance tests in
in Figure 3, created a preliminary version of a group individual intelligence testing. Both the Alpha
intelligence test in its rst 2-week session. Trials of and Beta examinations were point scales consist-
the new group test were conducted during July ing of a series of subtests that could be ad-
1917 and the data were analyzed by a statistical unit ministered in less than 1 hr. Army records estimate
at Columbia University headed by Edward Thorn- that, from 1917 to 1919, the Alpha and Beta exam-
dike and assisted by Arthur Otis and Louis Thur- inations were administered to 1,726,966 recruits
stone. The history of wartime intelligence testing in (Yerkes, 1921, p. 103).
the U.S. Army was later recorded in a 890-page The Army group examinations were a major
monograph edited by Yerkes and written partly by source of subtests and items used in the
Terman and E. G. Boring (Yerkes, 1921; see also WechslerBellevue scale, as explicitly stated by
Yoakum & Yerkes, 1920). The monograph's tech- Wechsler (1939) and noted by many authors (e.g.,
nical errors and conclusions about racial and ethnic Frank, 1983). Table 3 shows the three Alpha
differences have been widely criticized (e.g., subtests that correspond to WechslerBellevue
Gould, 1981). verbal subtests. The WechslerBellevue Arith-
The main Army intelligence tests were Group metic subtest is virtually an orally administered
Examinations Alpha and Beta, designed to be short form of the Arithmetical Problems test of
administered to groups of recruits by trained the Alpha examination. Of the 10 Wechsler
psychological examiners. The transformation Bellevue Arithmetic items, 7 are from the
from the pre-war individual intelligence scales Arithmetical Problems test, which contained
to the group-administered examinations was items adapted from a test for middle-school
made possible by the method of multiple-choice, students (Bonser, 1910). The WechslerBellevue
which was credited to Otis (Samelson, 1987; Comprehension subtest is largely derived from
Yerkes, 1921, pp. 299300). Group Examination the Alpha Practical Judgment test, which itself
Alpha was designed for the assessment of literate borrowed items from the 1905 BinetSimon
English speakers and Group Examination Beta for scale, Bonser's (1910) Selective Judgment test,
the minority of recruits who were illiterate or non- and pre-war tests developed at Stanford Univer-
procient in English. The complementary mis- sity by Terman and his students. The Alpha
sions of the Alpha and Beta examinations in the Information test, which appears to be the model
HISTORY OF INTELLIGENCE TESTING 391

Fig. 3. Committee on the Psychological Examination of Recruits, during one of their meetings at the Vineland
Training School between May and July 1917. Front: Edgar Doll, Henry Goddard, and Thomas Haines.
Rear: F.L. Wells, Guy Whipple, Robert Yerkes (chairman), Walter Bingham (secretary), and Lewis Terman.
Initial versions of the Army group and individual tests were developed during meetings of this committee in
summer 1917. Doll, who was Goddard's assistant at the Training School, was invited to join the committee
at later sessions when the individual tests were developed.
Note. Reproduced with permission of the Archives of the History of American Psychology, Henry Goddard
Papers.

Table 3. Alpha Examination Tests Related to WechslerBellevue Subtests.


Test in Alpha examination Committee member(s) Example of item related to WechslerBellevue scale
responsible for test
Arithmetical Problems Bingham ``If it takes 6 men 3 days to dig a 180-ft drain, how
many men are needed to dig it in half a day?''
[Yerkes, 1921, p. 221]
Information Wells ``Darwin was most famous in
literature science war politics'' (choose best answer)
[Yerkes, 1921, p. 234]
Practical Judgment Goddard ``If you are lost in the forest in the daytime, what is the
Haines thing to do?
hurry to the nearest house you know of
look for something to eat
use the sun or a compass as a guide''
(choose best answer)
[Yerkes, 1921, p. 229]
392 CORWIN BOAKE

for the WechslerBellevue Information subtest, (Yerkes, 1921, pp. 242243), included in a
also appears to be an adaptation of earlier tests preliminary version of the Beta examination but
(Healy & Fernald, 1911; von Mayrhauser, 1987). dropped before the nal version, supplied items
Three Beta tests are sources of Wechsler to the WechslerBellevue Picture Arrangement
Bellevue performance subtests. The Beta Digit subtest (Wechsler, 1939, p. 90). The Beta Picture
Symbol test (Yerkes, 1921, p. 254) was simply Arrangement test was modeled upon a children's
duplicated in the WechslerBellevue subtest with test from Belgium (Decroly, 1914).
the same title. Although the Beta Digit Symbol Recruits who failed the group examinations
test was credited to Otis, the test appears to be were assessed with individual intelligence tests
only a slight modication of the University of that included the StanfordBinet, the Yerkes
Wisconsin substitution test with nine digit- Bridges point scale, and the new Army Perfor-
symbol pairs. The Beta Picture Completion test mance Scale. It was estimated that 83,500 individ-
(Yerkes, 1921, pp. 238239, 256) contributed ual examinations were performed in the Army
items to the Picture Completion subtest of the testing program (Fig. 5). The Army Individual
WechslerBellevue scale. Figure 4 presents an Examination, an individual intelligence test devel-
item from the Beta Picture Completion test that oped for the Army testing program by the same
has been duplicated in the Wechsler intelligence committee, was to have a particularly strong
scales. The Beta Picture Completion test was inuence on the Wechsler scales.
modeled on Pintner and Hoops' (1918) `drawing
completion' technique for group administration Army Individual Examination
of the BinetSimon Unnished pictures test The Army Individual Examination was a point
(Table 1). The Beta Picture Arrangement test scale designed for English-speaking recruits that

Fig. 4. Item in the Picture Completion test of Group Examination Beta. This Beta test, credited to Truman Kelley,
was designed to assess intelligence of U.S. Army recruits who were illiterate or not procient in English.
Group administration was accomplished by asking subjects to draw the missing parts directly onto pictures
printed on the response form.
Note. From `Psychological examining in the United States Army,' edited by R. M. Yerkes, 1921, Memoirs of
the National Academy of Sciences, 15, p. 256. In the public domain.
HISTORY OF INTELLIGENCE TESTING 393

Fig. 5. Individual administration of the Pintner Manikin Test to a U.S. Army recruit during World War I. Individual
testing was administered to recruits who failed the Alpha and Beta examinations. The test originated with
the PintnerPatterson scale (Pintner & Paterson, 1917) and was included in the Army Performance Scale.
After the war, the test was used in the CornellCoxe Performance Ability Scale (Cornell & Coxe, 1934) and
later in the Object Assembly subtest of the WechslerBellevue scale. The original test is still commercially
available as part of the MerrillPalmer scale (Stutsman, 1931).
Note. From `Psychological examining in the United States Army,' edited by R.M. Yerkes, 1921, Memoirs of
the National Academy of Sciences, 15, p. 91. In the public domain.

consisted of 22 mostly verbal tests taken from the never came to serve as an individual intelligence
YerkesBridges point scale or adapted from the test. The Individual Examination appears to have
Alpha examination. It is a historical paradox that been a major source for the Wechsler intelligence
the Individual Examination, which was to strongly and memory scales, given that some Wechsler
inuence intelligence testing in the long run, was subtests are closely related to the tests in the
discontinued during its standardization stage and Individual Examination. Table 4 presents the three

Table 4. Army Individual Examination Tests Related to WechslerBellevue Subtests.


Test in Individual Examination Committee member(s) Example of item related to WechslerBellevue scale
responsible for test
Arithmetical reasoning Wells ``If a man buys 6 cents worth of postage stamps at the
post ofce and pays a dime, how much change does he
get back?'' (Yerkes, 1921, p. 144)
Comprehension Goddard ``Why are people who are born deaf usually dumb?''
Terman (Yerkes, 1921, p. 143)
Wells
Likenesses and differences Terman ``In what way are the eye and the ear alike?''
(Yerkes, 1921, p. 139)
394 CORWIN BOAKE

Individual Examination tests that correspond to Point Scale, and nearly always one or more of
subtests of the WechslerBellevue scale. Half of the available performance tests. It then occur-
the items in the WechslerBellevue Comprehension red to me that an intelligence scale, combining
subtest were taken from the Comprehension test of verbal and nonverbal tests, would be a useful
the Individual Examination. addition to the psychometrist's armamentar-
ium (Wechsler, 1979, p. 2).
Army Performance Scale
Like the Beta examination, the Army Perfor- Wechsler often remarked that he became
mance Scale was designed for recruits who per- convinced of the shortcomings of existing intelli-
formed poorly on the Army group examinations gence tests through wartime experiences with
or who were not procient in English. The com- recruits who had functioned normally as civilians
position of the Army Performance Scale, shown before induction, but who had failed the Army
in Table 2, corresponded closely to the Pintner group examinations and had obtained low mental-
Paterson scale. The WechslerBellevue Picture age scores on the StanfordBinet scale. He
Arrangement, Object Assembly, and Digit Sym- attributed these misdiagnoses to the emphasis of
bol subtests correspond to the Army Performance the StanfordBinet scale on verbal skills acquired
Scale tests of the same titles. The Army Perfor- through formal education. He recalled that the
mance Scale can therefore be regarded as a transi- rst such subject he encountered was ``a native,
tional scale that rened available performance tests white Oklahoman'' who obtained a Stanford
into a form leading to the WechslerBellevue Binet mental age of 8 years, but who ``before
performance scale. entering the Army . . . had gotten along very well,
was supporting a family, had been working as a
Wechsler as Wartime Psychological Examiner skilled oil-driller for several years and, at time of
In 1917 Wechsler, having completed his masters draft, was earning from $60 to $75 per week''
thesis with Woodworth at Columbia University, (1935, p. 256). Wechsler claimed that he had
began work at an Army camp on Long Island often encountered the type of subject who, like
where he scored Alpha examination protocols in a the Oklahoman,
unit supervised by Boring. Wechsler then enlisted
in the U.S. Army and during summer 1918 he systematically rates as a mental defective on
attended the School for Military Psychology in mental tests, but, who can in no way be judged
Georgia for training as a psychological examiner. as such, when diagnosed on the basis of con-
Upon graduating with the rank of corporal, he was crete social standards, i.e., in terms of capacity
assigned to Camp Logan near Houston, Texas, to adjust to the normal demands of his social
where he conducted individual psychological and economic environment. (1935, p. 256)
examinations (Edwards, 1974). Army records
estimate that 319 individual examinations were The next few years of Wechsler's life were a
performed at this base (Yerkes, 1921, p. 80). kind of brief odyssey that widened his clinical and
Wechsler later recalled how his wartime experi- scientic perspectives. While still in uniform he
ences inspired the intelligence scale he developed was transferred from the Texas army camp to the
later: University of London, where he studied with
Charles Spearman and Karl Pearson. Wechsler
My duties there called mostly for the admin- then won a 2-year fellowship to study in France,
istration of individual psychological tests to where he worked with Henri Pieron and other
soldiers who had failed both the Army Alpha Paris psychologists researching the psychophysiol-
and Army Beta, and who were being evaluated ogy of emotions. Upon returning to the U.S.A. in
for possible discharge, or special labor assign- 1922, he spent a summer internship with Wells at
ments. My usual examination of subjects the Boston Psychopathic Hospital. During this
included, in addition to a short interview, internship Wechsler attended teaching conferences
administration of the StanfordBinet or Yerkes conducted by Healy and the psychologist Augusta
HISTORY OF INTELLIGENCE TESTING 395

Bronner (Edwards, 1974). Wechsler then returned implement the ``combination-method'' that had
to New York where he completed his dissertation at been proposed by Ebbinghaus as a fundamental
Columbia University on physiological measures feature of intelligence tests. Kohs emphasized the
of emotion (Wechsler, 1925) and worked as a test's value as a measure of intelligence, as demon-
psychologist at a child guidance bureau. strated by correlations with the StanfordBinet
scale and other indicators of general intelligence,
Individual Mental Testing in the 1920s and but did not mention such concepts as visual-spatial
1930s perception.
For intelligence testing, the postwar decades Some postwar performance scales pursued the
were a period of aggressive growth. The Army PintnerPaterson strategy of constructing intel-
testing program had promoted the credibility of ligence scales using only nonverbal tests. The
intelligence testing and trained many psycho- Leiter International Performance Scale was
logists in test construction and interpretation. developed by Russell Leiter and later adapted
The Psychological Corporation was founded in by Stanley Porteus of the University of Hawaii for
1921 to offer psychological testing services to his research on racial differences (Leiter, 1936).
industry, using the tests administered to the Table 2 shows the tests comprising CornellCoxe
Columbia students as well as those developed in Performance Ability Scale (Cornell & Coxe,
the Army (Sokal, 1981). The publication of new 1934), which includes the immediate precursors
group intelligence tests dramatically expanded of most WechslerBellevue performance subtests
the application of intelligence testing, especially and of the Wechsler Memory Scale (WMS) Visual
in schools. Reproduction subtest. The signicance of the
Development of performance tests increased overlap between the CornellCoxe and the
during the 1920s and 1930s, probably driven by WechslerBellevue scales will be discussed
concerns about the adequacy of the Stanford below. Grace Arthur's Point Scale of Performance
Binet scale. One of the most successful of the new Tests (Arthur, 1930), also shown in Table 2,
performance tests was the Block Design test, consisted of seven PintnerPaterson tests plus
developed by Samuel Kohs as a Stanford University adaptations of the Porteus maze test and the Kohs
dissertation with Terman (Kohs, 1923). The test's Block Design test. The impact of these tests was
origin is unusual in that it was adapted from Color shown by a 19331934 survey of testing practices
Cubes, a commercially available game in which in U.S. psychological clinics (Report of Commit-
children constructed decorative mosaic designs tee of Clinical Section of American Psychological
from a set of 16 painted wooden cubes. The sides Association, 1935, pp. 2327). The survey found
of each cube were painted red, white, blue, yellow, that while the most popular psychological test
red-white and blue-yellow. Kohs' instructions was the StanfordBinet scale, 5 of the 9 most
advised that the blocks ``may be secured at any of widely used tests were either performance tests
the large department stores and at the various (Healy Pictorial Completion II, Porteus Mazes) or
distributing centers of Milton Bradley's'' (1923, scales comprised of such tests (Arthur Point
p. 64). Instructions for a later revision of the Block Scale, MerrillPalmer, PintnerPaterson). The
Design test recommended giving subjects ``extend- popularity of these performance tests suggests
ed practice'' on the initial designs in order ``to that many psychologists followed a practice of
offset in some measure the advantage possessed by supplementing the StanfordBinet scale with one
those patients who have had these sets of blocks as or more performance measures, out of concern
toys'' (Arthur, 1930, p. 30). According to Kohs, the with using the StanfordBinet scale as the sole
test required ``rst, the breaking up of each design measure of intelligence.
presented into logical units, and second, a reasoned The Psychological Corporation published civil-
manipulation of the blocks to reconstruct the ian versions of the Alpha and Beta examinations
original design from these separate parts'' (1923, designed for educational or industrial testing
p. 271). Kohs proposed that the test's involvement (Kellogg & Morton, 1935; Wells, 1932). The
of both analytic and synthetic thinking would Revised Beta Examination included a substitution
396 CORWIN BOAKE

test that used the stimulus key from the Beta Digit chronological age ratio score into a standard score
Symbol subtest but required the subject to having the same distribution at each age level
translate symbols into numbers. The Revised (Thorndike & Lohman, 1990).
Beta Examination also included a Picture Com- Wechsler's second major technical innovation
pletion test that shared some items with the was to incorporate verbal and performance tests
WechslerBellevue subtest. into the same scale, enabling the Wechsler
Bellevue scale to exploit both of the major contem-
porary approaches to intelligence testing. The
WECHSLER AND THE BELLEVUE `verbal' and `performance' labels of the two
INTELLIGENCE EXAMINATION WechslerBellevue scales were already in common
use. One rationale for the verbal-performance
In 1932 Wechsler became the chief psychologist combination was to minimize the over-diagnosing
at Bellevue Psychiatric Hospital in New York. Of of feeble-mindedness that was, he believed, caused
Wechsler's publications during the 1920s and early by intelligence tests that were too verbal in content.
1930s, few concerned intelligence testing and only He viewed verbal and performance tests as equally
one reported primary research on the topic. His valid measures of intelligence and criticized the
most relevant pre-1939 publication is a 1932 note in labeling of performance tests as measures of
which he suggested that the Army Alpha examina- ``special abilities'' while reserving for the Binet
tion had the potential advantage, not available with Simon scale the property of measuring ``general
other intelligence tests, ``of analyzing the subject's intelligence.'' This view, he argued, was ``incorrect
performance on the individual tests which comprise because it not only assumes that there are different
the examination, in order to discover'' if the subject kinds of general intelligence, but because it further
had ``any special abilities or disabilities'' (Wechs- implies that the Performance tests are relatively
ler, 1932, p. 254). In contrast, prole interpretation unimportant as measures of general intelligence''
of the StanfordBinet scale depended on proce- (Wechsler, 1939, p. 138).
dures that were complicated and unstandardized A further rationale cited by Wechsler in favor
(Wells, 1927). of the verbal-performance combination was that
Wechsler later recalled that, in his position at performance tests might be more sensitive to
Bellevue, the ``immediate experience of working ``temperamental and personality factors'' such as
with, and supervising the testing of, the diverse ``the subject's interest in doing the task set, his
patient population at the mental hygiene clinic, persistence in attacking them and his zest and
and Psychiatric Wards of Bellevue Hospital'' desire to succeed'' (Wechsler, 1939, p. 10). He
reinforced his ``growing conviction of the need argued that by including tests to measure such
for an alternate to the Binet tests, and in ``capacities that cannot be dened as either purely
particular, of an intelligence scale more suitable cognitive or intellective,'' the relationship of test
for use with adults'' (1979, p. 3). The chief results with everyday functioning would be
rationale that he cited for the new adult intel- strengthened. Echoing Terman, he stated that his
ligence test was statistical in nature. Wechsler scale was ``constructed on the hypothesis that an
argued that contemporary intelligence tests were individual manifests intelligence by his ability to
applicable only to children and adolescents do things, as well as by the way he can talk about
because of the insurmountable statistical artifacts them'' (Wechsler, 1939, p. 138). These arguments
caused by applying the mental age and ratio IQ to suggest that Wechsler intended the performance
adults. Repeating the views of Thurstone and subtests to serve not only as measures of intelli-
others, Wechsler advocated replacing the ratio IQ gence, but also to measure personality traits similar
with the deviation score, a method that calculated to what would now be termed executive functions.
IQ by converting the sum of subtest scores into a
standard score, using the mean and standard Constructing the WechslerBellevue Scale
deviation at each age level. Basically this method In retracing Wechsler's rationale for selecting the
changed the meaning of the IQ from a mental age- WechslerBellevue subtests, a few clues may be
HISTORY OF INTELLIGENCE TESTING 397

noted. First, it is obvious that all of the Wechsler Third, the composition of the Wechsler
Bellevue subtests except for Block Design were Bellevue performance scale corresponds closely
derived from the Army tests. Even the titles of with the performance scale developed by Cornell
several Army tests (i.e., Information, Comprehen- and Coxe (1934). As seen in Table 2, 4 of the 7
sion, Vocabulary, Picture Completion, Picture CornellCoxe tests also appear in the Wechsler
Arrangement, Digit Symbol) were duplicated as Bellevue performance scale. It should be noted
WechslerBellevue subtest titles. This reliance on that the CornellCoxe Picture Completion test is
the Army tests is understandable given Wechsler's one of the Pictorial Completion tests produced by
wartime experience, the endorsement of these Healy (1914, 1921), and not a missing-detail type
tests by contemporary testing experts, and their of test like the Picture Completion tests in the
continued use by Wechsler, Wells, and other Beta examination and Wechsler scales. The
psychologists through the 1930s. Wechsler's con- Memory-for-Designs test, 1 of 2 CornellCoxe
struction of his new intelligence scale from old tests not represented in the WechslerBellevue
tests, which contrasts sharply with today's custom scale, is closely related to the WMS Visual
of constructing new scales from new tests, was Reproduction subtest.
standard practice among contemporary test devel- Fourth, the initial group of WechslerBellevue
opers (Frank, 1983). Like the authors of the subtests differed from the now-traditional Wechs-
performance scales in Table 2, Wechsler expli- ler subtest prole. At the start of standardization,
citly identied the sources of all of his subtests. there were fewer verbal subtests (Information,
The reason why he had ``drawn so heavily on the Digit Span, Arithmetic, and Comprehension) than
experience of others,'' he explained, was because performance subtests. The performance scale
his ``aim was not to produce a set of brand new consisted of the ve traditional subtests plus a
tests but to select, from whatever source available, sixth subtest, Cube Analysis, which was based on
such a combination of them as would best meet the Beta examination subtest of the same title.
the requirements of an effective adult scale'' Because the standardization data revealed a sex
(Wechsler, 1939, p. 78). difference on the Cube Analysis subtest in favor
A second clue is that Wechsler appears to have of males, the subtest was omitted from the pub-
assigned subtests to the verbal and performance lished scale. Wechsler explained that the Simila-
scales in accordance with existing intelligence rities and Vocabulary subtests had been omitted
scales, rather than on an empirical basis. The from the initial subtest selection ``because of the
WechslerBellevue verbal scale was constructed mistaken belief'' that they ``would be unduly inu-
mostly from Army Alpha tests, and the Wechsler enced by the language factor'' (1939, p. 87) and
Bellevue performance scale from contemporary were ``seemingly unfair to illiterates and persons
performance tests. While emphasizing that most with foreign language handicap'' (p. 100). These
of the subtests correlated strongly with the total were the very shortcomings for which he critic-
score, he drew attention to the fact that some ized the StanfordBinet scale and which he
subtests did not. For example, he commented that was constructing his own scale to overcome. The
the Object Assembly subtest, despite having time needed to develop items for these subtests
serious statistical weaknesses, ``was included in may explain why they were added so late in
our battery only after much hesitation'' because the standardization. The Vocabulary subtest, the last
subtest provided clinical information about ``one's subtest added to the scale, acquired a smaller
mode of perception, the degree to which one relies normative sample of about 400 subjects and was
on trial and error methods, and the manner in which included in the published scale as an `alternate'
one reacts to mistakes'' (1939, p. 100). In light of verbal subtest.
the fact that the verbal and performance scales Finally, it may be signicant that the Wechsler
were based on an a priori classication that was Bellevue scale, unlike the StanfordBinet scale
inconsistent with correlational evidence, it is and the performance scales in Table 2, does not
understandable that later factor analytic studies include any memory tests. It is possible that
would suggest different composites. Wechsler's selection of tests for his intelligence
398 CORWIN BOAKE

scale was inuenced by his development of the it appears that Wechsler's 7-year project of
WMS during the same period (Wechsler, 1945). selecting subtests, creating items, supervising a
Indeed, Wechsler's selection of subtests for the standardization, and writing an introductory book
WMS shows some of the same inuences as the and an examiner's manual was basically a one-man
WechslerBellevue scale. show. The only individual whom Wechsler (1939)
It appears that Wechsler began work on the acknowledged for making substantive contribu-
new intelligence scale shortly after he came to tions was Wells, ``whose continued interest in the
Bellevue. In his 1939 book, The measurement of Scales and helpful suggestions were a source of
adult intelligence, he stated that work on the scale both inspiration and encouragement'' (p. vii).
had taken a little more than 7 years, of which the
initial 2 years were spent in trying out tests with
adult subjects. Collection of the standardization AFTERMATH OF THE WECHSLER
sample was funded by a grant from the Works BELLEVUE SCALE
Progress Administration (WPA), a federal agency
that funded employment in public works projects. In comparison with the StanfordBinet scale and
The entire WechslerBellevue standardization other contemporary intelligence tests, the advan-
sample, ranging from 7 to 59 years of age, was tages of the WechslerBellevue scale were for-
collected in the New York area. The adult sample midable. The WechslerBellevue scale was based
was limited to whites who understood and wrote on familiar tests that psychologists accepted as
English. Wechsler noted facetiously that a ``special valid. These tests were organized into verbal and
source worth mentioning was the Coney Island performance scales that could be administered
beach, where one of our resourceful examiners and interpreted separately. The use of deviation
went daily throughout a summer'' (Wechsler, 1939, scores obviated statistical artifacts and provided a
p. 114). The adult norms were rened by a kind of statistical basis for interpreting the subtest prole
stratication technique in which subjects were and verbal-performance discrepancy. The large
selected to match the proportions of the adult standardization sample, spanning from childhood
population employed in various occupations, to adulthood, had been selected in a principled
according to the U.S. Census. Wechsler's use of and precise manner. By incorporating these tech-
occupational stratication was similar to the 1937 nical innovations into a single scale, Wechsler
StanfordBinet revision, in which children were accomplished a major advance in the technology
selected so that their parents' occupations would of individual intelligence testing. Published
match census data about the proportions of adults in reviews of the scale drew attention to these
various occupational categories (Terman & Merrill, technical advances, while Wechsler's borrowing
1937). of tests was treated as an advantage, if mentioned
The WechslerBellevue children's sample was at all. For example, Lorge's (1943) review in the
tested at schools labeled `average' by the Board of Journal of Consulting Psychology commented
Education of the City of New York. The children's that Wechsler's chief contribution was ``in the
norms were likewise stratied according to the organization of well-known tests into a composite
proportions of children at each age who were scale'' (p. 167).
enrolled in different school grades of the New The combination of technical advances was
York public schools. The normative data for 79- probably more than sufcient for the Wechsler
year-olds were discarded because of oor effects, Bellevue scale to become the dominant adult
leaving a normative sample ranging from 10 to 59 individual intelligence test. However, the impact
years old. of the scale was further magnied because it met
Wechsler had already brought the scale to the need created by the rapid growth of clinical
completion when he contracted with the Psycho- psychology during the 1940s, particularly in adult
logical Corporation to produce the test materials psychiatry. As described by Matarazzo, ``over-
(Edwards, 1974). The examiner's manual was night this massive social need teamed up with the
published as an appendix to his 1939 book. Thus, new developments in professional assessment and
HISTORY OF INTELLIGENCE TESTING 399

psychotherapy which had been occurring in published most of the subtests of the Army
relative isolation among the country's few Individual Test as the Leiter Adult Intelligence
hundred full time practitioner-psychologists of Scale (LAIS).
the 1930s and early 1940s, and psychology in the The WechslerBellevue scale may have put an
United States found itself fully launched as a end to the development of separate performance
profession'' (1972, p. 11). A measure of this intelligence tests. A comparison of testing surveys
impact is that Wechsler's book, containing the in 1935 and 1946 (Louttit & Browne, 1947)
manual for the WechslerBellevue scale, went revealed a sharp decline in the usage of purely
through three editions between 1939 and 1944. performance tests during that interval. Two of the
The second edition (Wechsler, 1941) introduced a last performance batteries were created by
chapter describing how proles of subtest scores Halstead (1947) and by Goldstein and Scheerer
could be used in differential diagnosis, a proce- (1941) for testing patients with brain disorders.
dure that shaped the practice of test interpretation Both batteries were comprised of performance
by later clinicians. The scale was selected for the tests that required little or no spoken response.
standard psychological battery recommended by The GoldsteinScheerer Tests of Abstract and
prominent psychologists at the Menninger Clinic Concrete Thinking (Goldstein & Scheerer, 1941)
(Rapaport, Schafer, & Gill, 1944). A 1946 survey included the Kohs Block Design test as a measure
of psychological testing practices (Louttit & of ``abstract attitude,'' a concept akin to execu-
Browne, 1947) found that the WechslerBellevue tive function. In Halstead's battery, 3 of the 10
scale was one of the most widely used psycho- ``neuropsychological indicators'' comprising the
logical measures, second only to the 1937 Impairment Index were derived from the Tactual
StanfordBinet revision. Research with the scale Performance Test. Halstead's adaptation of this
increased rapidly and soon comprised a signif- form board followed the earlier procedure of
icant part of research in clinical psychology. administering three trials in order to assess
The entry of the U.S.A. into World War II learning. Administration of the Tactual Perfor-
created the need for new individual intelligence mance Test was probably based upon a procedure
tests for screening and assigning recruits. An known as the `tactual form,' used with subjects
alternate form of the Bellevue Intelligence Scale older than 10 12 years, in which the subject
was constructed, termed the Wechsler Mental performed the test blindfolded and was instructed
Ability Scale (Wechsler, 1946). The scale was after the nal trial to ``try to sketch the positions
published after the war as Form II of the of the forms and their shapes'' (Whipple, 1914,
WechslerBellevue scale. The Form II manual p. 301).
reported that the new scale had been standardized In 1949 a revision of WechslerBellevue Form
on a sample of 18 to 40 year-old males, but data II was published as a children's scale, the
from this standardization were not presented. Wechsler Intelligence Scale for Children (WISC;
Instead, Wechsler (1946) instructed the clinician Wechsler, 1949). Wechsler's comment that the
to obtain IQs and subtest scaled scores using the scale had been 5 years in preparation implies that
1939 norms, by adding a correction constant to the the children's revision had begun in 1944 or 1945,
Form II scores in order to equate the difculty level before WechslerBellevue Form II was published.
between the two forms. The revision from WechslerBellevue Form II to
During the war, the Army Individual Test the WISC involved few major changes, so that the
(Staff, Personnel Research Section, Classication children's scale inherited almost all the Wechsler
and Replacement Branch, The Adjutant General's Bellevue Form II items. In a clarication repeated
Ofce, 1944) was developed as an individually in the manual of each WISC revision, Wechsler
administered IQ test that would imitate the (1949) stated that ``most of the items in the WISC
verbal-performance structure of the Wechsler are from Form II of the earlier scales, the main
Bellevue scale but require less time to administer. additions being new items at the easier end of
The Trail Making Test was one of the ``nonverbal'' each test'' (p. 1). The WISC introduced the basic
subtests of this scale. After the war, Leiter (1951) standardization procedure to be followed by later
400 CORWIN BOAKE

Wechsler intelligence scales. The sample con- revisions of the WechslerBellevue scale, con-
sisted of 100 male and 100 female subjects at each serve the original subtests and many items of
age from 5 to 15 years. The subjects, who were all WechslerBellevue Form II. Like the adult
white, were selected to represent the proportions revisions, successive WISC revisions have been
of the U.S. population residing in four geograph- based on standardization samples that are larger
ical regions, as well as the proportion of parents and more racially and ethnically representative.
working in various occupations. A Wechsler intelligence scale for preschool chil-
The WechslerBellevue scale has undergone dren was published in 1967, based on the same
three revisions (Wechsler, 1955, 1981, 1997) verbal-performance structure but with some of the
culminating in the current Wechsler Adult Intel- subtests replaced by ones designed for younger
ligence Scale-III (WAIS-III). These revisions children.
have been restricted to adults but have provided Wechsler died in 1981. The newest revisions of
improved norms, based on samples of the adult his intelligence scales (Wechsler, 1991, 1997),
U.S. population that are considerably larger than which were developed long after his death,
the WechslerBellevue sample and that are geo- continue to list him as sole author. Because the
graphically as well as demographically represen- current revisions of the WISC and WAIS were
tative. While successive revisions have raised the published about 16 years after the preceding
upper end of the age range and have more accu- versions (Wechsler, 1974, 1981, 1991, 1997), the
rately sampled the U.S. population's racial-ethnic WISC-IV could be anticipated in 2007 and the
composition, these revisions have retained the WAIS-IV in 2013.
original subtest structure and many of the original
items. The only change in the subtest prole was
made by the WAIS-III, which substituted an CONCLUSIONS
untimed Matrix Reasoning subtest in place of the
Object Assembly subtest, whose statistical weak- From a historical perspective, the Wechsler
nesses were discussed by Wechsler (1939). A nal Bellevue Intelligence Scale is a battery of intelli-
edition of Wechsler's book about adult intel- gence tests developed between the 1880s and
ligence was published in 1958, probably to coin- World War I. In their origins, the Wechsler sub-
cide with the 1955 revision. The last revision of tests represent the major preWorld War I
the book, which Wechsler described as a ``neces- approaches to cognitive assessment. These
sary supplementary text'' (1955, p. iv), was pub- approaches (and related subtests) include anthro-
lished almost 30 years ago by Matarazzo (1972). pometrics (Digit Span), association psychology
Several factor analyses have been conducted of (Digit Symbol), the BinetSimon tests (Compre-
the Wechsler subtests in order to validate their hension, Similarities, Vocabulary, Picture Com-
categorization into verbal and performance pletion), performance testing of immigrants to the
scales. The factors obtained in these studies, U.S.A. (Object Assembly), and group testing in
starting with Balinsky's (1941) study of the American schools and businesses (Arithmetic,
WechslerBellevue standardization sample, have Information). The WechslerBellevue verbal sub-
tended to reproduce the historical origins of the tests were adapted mostly from the BinetSimon
subtests. The verbal factor is dened by subtests and YerkesBridges scales and from the Alpha
derived from the BinetSimon scale and Alpha examination. The WechslerBellevue perfor-
examination, and the `perceptual' factor by mance subtests were taken from the various tests
subtests derived from the Army Performance developed or popularized by Healy and Fernald,
Scale and the CornellCoxe scale. The Digit Span the Ellis Island physicians, Pintner and Paterson,
and Digit Symbol subtests, which cannot be Kohs, and Cornell and Coxe. These tests had been
assigned to either factor, differ in their historical selected by the Army testing program for the
origins from the other subtests. purpose of rapid personnel screening of young,
The WISC has undergone two revisions physically healthy men, and were mostly
(Wechsler, 1974, 1991) which, like the adult designed for testing groups rather than individ-
HISTORY OF INTELLIGENCE TESTING 401

uals. These Army tests were transformed directly of an examiner individually testing a subject with
into WechslerBellevue subtests with the same an invariant sequence of items, is not fundamen-
titles and, in some cases, the same content. tally different from how the BinetSimon scale
Wechsler's recycling of old intelligence tests was administered. Therefore, it is paradoxical that
was similar to the construction of other intelli- the WechslerBellevue scale, which was a model
gence scales of the period. His grouping of of technical innovation in 1939, represents in its
subtests into verbal and performance scales repre- current revision one of the oldest mental tests in
sented the two major contemporary approaches to continuous use. The intelligence scale that is
individual intelligence testing. The verbal and relied upon to make medical, educational, and
intelligence scales were designed to serve as legal decisions does not reect advances in
different measures of the same construct, that of understanding of cognitive functioning during
general intelligence. the past 60 years and contains tests from the
Wechsler's major contributions were not the 1800s.
subtests themselves but rather the technical in- While the revisions of the WechslerBellevue
novations of deviation scores, combining verbal scale have introduced few changes in its basic
and performance tests into a single scale, and structure, clinical interpretation of Wechsler test
selecting the adult standardization sample by oc- results has changed dramatically. Nowadays
cupation. These are the features of the Wechsler clinicians are encouraged to interpret the subtests
Bellevue scale that have been emulated by later not as measures of general intelligence, but rather
individual intelligence tests. For various reasons, as separate measures of specic cognitive abil-
including statistical innovations and historical ities. For example, Lezak (1995) described the
circumstances, the Wechsler intelligence scales Picture Arrangement subtest as a measure of both
have achieved almost monopoly status in the ``socially appropriate thinking'' and ``sequential
eld of individual adult intelligence testing. The thinking,'' so that the subtest can serve ``as a
point scale model of intelligence testing, as nonverbal counterpart of that aspect of Compre-
adapted by the WechslerBellevue scale, has hension'' (p. 639). To some extent this method of
replaced the year scale as the preferred model subtest interpretation is inspired by changes in
for children's intelligence tests, most of which Wechsler's own views. Wechsler's earlier view,
have abandoned the IQ label for their composite expressed in his 1939 book, was that the
scores. important factor underlying most of the subtests
The history of the Wechsler intelligence scales was general intelligence and not special abilities.
presents several paradoxes. First, it is remarkable For example, Wechsler criticized the view that the
that the rate of change in the technology of Picture Arrangement subtest measured ``social
individual intelligence testing has been so slow intelligence,'' arguing that he did ``not believe in
given the unprecedented pace of scientic pro- such an entity'' because ``social intelligence is
gress in the century since the 1905 BinetSimon just general intelligence applied to social situa-
scale. Over the century, advances in psycholog- tions'' (1939, pp. 90 91). However, his 1941
ical testing have been signicant albeit far less chapter introducing prole interpretation impli-
radical than in other technologies. Widespread citly assumed that subtests could be interpreted as
advances in psychological testing include the use measures of specic abilities, particularly when
of factor analysis to guide scale design, selection testing patients whose mental disorders were
of items using item response theory, evaluation of characterized by strengths or impairments in
test-taking motivation, and computerized test these abilities. He speculated that ``the good
administration. However, the only signicant score frequently obtained by the psychopath on
innovation incorporated into the Wechsler intel- the Picture Arrangement test'' might occur
ligence scales is the availability of factor scores because psychopaths ``generally have a grasp of
for the WAIS-III and WISC-III. The basic social situations'' (1941, p. 153), even though
administration procedure used by the Wechsler they tend to use this knowledge in an anti-social
scales and other individual intelligence tests, that way.
402 CORWIN BOAKE

A third paradox is that the Wechsler children's the British psychiatrist Andrew Paterson, at the
intelligence scale, now the dominant intelligence time a colleague of Oliver Zangwill at the
scale administered to school-aged children, is a Edinburgh Brain Injuries Unit in Scotland, in a
downward extension of tests designed for adults. 1944 address to the Royal Society of Medicine.
Thus, the StanfordBinet scale, which was design- Referring to the wartime context in which intelli-
ed for children and based on year-scale develop- gence tests were introduced into psychiatry and
mental norms, was replaced by an adaptation of an neurology, Paterson stated:
adult scale with neither a developmental rationale
nor sensitivity to developmental stages. Revisions On the outbreak of war there was a clamour for
of the Wechsler children's scale have conserved the tests for intellectual impairment. The academic
scale's basic structure without introducing major psychologists through no fault of their own
upgrades to improve its clinical utility. were encouraged to produce tests for condi-
While the 1940s clinicians probably recogn- tions of which they had little knowledge and no
ized the WechslerBellevue subtests as old tests clinical experience. This is the very opposite of
with familiar origins, currently these subtests are the clinical approach where close observation
used without recognition of the psychologists and should lead to the formulation of a test. In no
physicians who created them. It is paradoxical other sphere of clinical science are tests
that the creators of these subtests were not devised before the phenomena have been stu-
individuals of little note, but rather public gures, died. Such tests devised a priori tie nature
such as Terman, who are famed for other down to a certain pattern of breakdown and
contributions. The most direct way to recognize such an assumption has always hindered pro-
the tests' creators would be to rename their tests gress. It also leaves out of account the variety
in their honor. For example, the Digit Symbol of ways in which interference with cerebral
subtest could be renamed the WisconsinJastrow function may express itself in the eld of
Substitution Test. Whether or not the tests are performance. There is more than a danger
renamed, which is unlikely, test manuals and that the stereotyping of modes of investigation
psychology textbooks should properly credit the will force us to think along those lines only,
persons who created the tests. and to close our eyes to and cease investigation
Finally, it is curious that the Wechsler scales of the breakdown which the hard facts of
have come to play so central a role in neuropsy- clinical observation present. (1944, p. 559)
chological assessment, given that the measurement
of cognitive decit does not appear to have been a In hindsight, Paterson's warning can be seen as
major consideration in Wechsler's subtest selec- only partly true. On the one hand, the sensitivity
tion. Only in the case of the Digit Span subtest did of the verbal-performance discrepancy to brain
Wechsler (1939) indicate that the subtest was disorders contributed important evidence pointing
selected to measure cognitive decit, in this case as to brain-behavior relationships. In a discussion of
a measure of `attention' in cognitively impaired his own wartime experiences, Zangwill (1945)
persons. He commented that the Block Design and concluded that performance tests ``appear well
Digit Symbol subtests were especially sensitive to adapted to assess the ner grades of constructive
certain brain diseases, but not that these or other disability'' and had proven ``especially helpful in
subtests were selected for that purpose. Because the studying cases with lesions of the parieto-occipital
neuropsychological understanding of intelligence region'' (p. 249). On the other hand, years of
tests would have been quite limited in 1939, it inconclusive research on Wechsler subtest proles,
would have been difcult to foresee that the use of which have produced ``no rm rules'' (Spreen &
the Wechsler intelligence scales as measures of Strauss, 1998), were probably caused by reliance
cognitive decit would be successful. Indeed, it on the Wechsler intelligence scales as the sole
might have been predicted that intelligence testing measures of cognition. By breaking away from
of brain-disordered patients would be more mis- what Zangwill (1945), echoing Paterson, called
leading than helpful. Such a warning was given by ``premature stereotyping of methods'' (p. 248), the
HISTORY OF INTELLIGENCE TESTING 403

invention of new neuropsychological tests has Binet, A., & Henri, V. (1895). La psychologie
helped reveal brain-behavior relationships and individuelle. L'Annee Psychologique, 2, 411465.
Binet, A., & Simon, T. (1905). Methodes nouvelles
has produced a testing technology allowing clinical pour le diagnostic du niveau intellectuel des
neuropsychologists to exploit these discoveries. anormaux. L'Annee Psychologique, 11, 191244.
The history of the Wechsler intelligence scales Binet, A., & Simon, T. (1905/1916). New methods for
leads to the question of what the scales have to the diagnosis of the intellectual level of subnormals.
offer neuropsychological assessment in the new In H.H. Goddard (Ed.), Development of intelligence
in children (the BinetSimon Scale) (E.S. Kite,
century. New revisions of the Wechsler scales Trans., pp. 3790). Baltimore: Williams & Wilkins.
offer the benets of updated norms and what Binet, A., & Simon, T. (1908). Le developpement de
Lezak (1995) termed the ``hard won achievements l'intelligence chez les enfants. L'Annee Psycho-
of familiarity and experience,'' at the price of logique, 14, 190.
reinvesting in testing technology that is becoming Binet, A., & Simon, T. (1908/1916). The development
of intelligence in the child. In H.H. Goddard (Ed.),
obsolete. It is reasonable to anticipate that in the Development of intelligence in children (the Binet
new century, emerging technologies using com- Simon Scale) (E.S. Kite, Trans., pp. 182273).
puterized administration will offer decisive Baltimore: Williams & Wilkins.
advantages. Eventually, new tests based on these Bonser, F.G. (1910). The reasoning ability of children
technologies will replace the individual intel- of the fourth, fth, and sixth school grades (Teachers
College Contributions to Education, no. 37). New
ligence test as we know it. Then it will be the job York: Teachers College, Columbia University.
of these new tests to carry on the tradition of Camara, W.J., Nathan, J.S., & Puente, A.E. (2000).
mental testing established by the BinetSimon Psychological test usage: Implications in profes-
and WechslerBellevue scales. sional psychology. Professional Psychology:
Research and Practice, 31, 141154.
Cattell, J.M. (1890). Mental tests and measurements.
ACKNOWLEDGMENTS Mind, 15, 373381.
Cornell, E.L., & Coxe, W.C. (1934). A performance
ability scale. New York: World Book.
An earlier version of this paper was presented at Dearborn, W.F. (1910). Experiments in learning.
the 2000 meeting of the Midwest Neuropsychol- Journal of Educational Psychology, 1, 373388.
ogy Group in Madison, Wisconsin. I am grateful Decroly, O. (1914). Eprouve nouvelle pour l'examen
to David Baker, Bill Barr, Kevin Daley, Jeff mental. L'Annee Psychologique, 20, 140 159.
Edwards, A.J. (1974). Introduction. Selected papers of
Dosik, Janice Goldblum, Chris Grote, Kathy David Wechsler (pp. 329). New York: Academic
Hickey, Walter High, Sean Little, Kent Mercer, Press.
Mary Nowak, and Bernie Silver for their help. Frank, G. (1983). The Wechsler enterprise: An assess-
John Parascandola and John Wasserman shared ment of the development, structure, and use of the
historical materials. The Archives of the History Wechsler tests of intelligence. Oxford/New York:
Pergamon Press.
of American Psychology and National Park Ser- Galton, F. (1885). On the Anthropometric Laboratory
vice gave permission to reproduce the gures. of the late International Health Exhibition. Journal
of the Anthropological Institute of Great Britain and
Ireland, 14, 205221.
REFERENCES Galton, F. (1887). Supplementary notes on `prehension'
in idiots. Mind, 12, 7982.
Arthur, G. (1930). A point scale of performance tests. Goldstein, K., & Scheerer, M. (1941). Abstract and
New York: Commonwealth Fund. concrete behavior. An experimental study with
Balinsky, B. (1941). An analysis of the mental factors special tests. Psychological Monographs, 53
of various age groups from nine to sixty. Genetic (Whole No. 239).
Psychology Monographs, 23, 191234. Gould, S.J. (1981). The mismeasure of man. New York:
Binet, A. (1911/1916). New investigations upon the Norton.
measure of the intellectual level among school Halstead, W.C. (1947). Brain and intelligence: A
children. In H.H. Goddard (Ed.), Development quantitative study of the frontal lobes. Chicago:
of intelligence in children (the BinetSimon Scale) University of Chicago Press.
(E.S. Kite, Trans., pp. 274328). Baltimore: Healy, W. (1914). A pictorial completion test. Psycho-
Williams & Wilkins. logical Review, 7, 140143.
404 CORWIN BOAKE

Healy, W. (1921). Pictorial Completion Test II. Journal Pintner, R., & Hoops, H.A. (1918). A drawing
of Applied Psychology, 5, 225239. completion test. Journal of Applied Psychology, 2,
Healy, W., & Fernald, G.M. (1911). Tests for practical 164173.
mental classication. Psychological Monographs, Pintner, R., & Paterson, D.G. (1917). A scale of
13 (Whole no. 54). performance tests. New York: Appleton.
Jacobs, J. (1887). Experiments on ``prehension''. Mind, Pyle, W.H. (1913). The examination of school children:
12, 7579. A manual of directions and norms. New York:
Kellogg, C.E., & Morton, N.W. (1935). Revised beta Macmillan.
examination. New York: Psychological Corpora- Rapaport, D., Schafer, R., & Gill, M. (1944). Manual of
tion. diagnostic psychological testing: 1: Diagnostic
Kirkpatrick, E.A. (1909). Studies in development and testing of intelligence and concept formation. New
learning. Archives of Psychology, 12, 1101. York: Josiah Macy, Jr. Foundation.
Knox, H.A. (1913). The moron and the study of alien Report of Committee of Clinical Section of American
defectives. Journal of the American Medical Psychological Association. (1935). Psychological
Association, 60, 105106. Clinic, 23.
Knox, H.A. (1914a). A scale, based on the work at Ellis Samelson, F. (1987). Was early mental testing (a)
Island, for estimating mental defect. Journal of the Racist inspired, (b) Objective science, (c) A
American Medical Association, 62, 741747. technology for democracy, (d) The origin of the
Knox, H.A. (1914b). Mental defectives. New York multiple-choice exams, (e) None of the above?
Medical Journal, 99, 215222. (Mark the RIGHT answer). In M.M. Sokal (Ed.),
Knox, H.A. (1915). Measuring human intelligence. A Psychological testing and American society, 1890
progressive series of standardized tests used by the 1930 (pp. 113127). New Brunswick, NJ: Rutgers
Public Health Service to protect our racial stock. University Press.
Scientic American, 112, 5253, 5758. Sokal, M.M. (1981). The origins of the Psychological
Kohs, S.C. (1923). Intelligence measurement: A psycho- Corporation. Journal of the History of the Beha-
logical and statistical study based upon the block vioral Sciences, 17, 5467.
design tests. New York: Macmillan. Spreen, O., & Strauss, E. (1998). A compendium of
Leiter, R.G. (1936). The Leiter International Perfor- neuropsychological tests. Administration, norms,
mance Scale. Honolulu: University of Hawaii Press. and commentary (2nd ed.). New York: Oxford
Leiter, R.G. (1951). The Leiter Adult Intelligence University Press.
Scale. Psychological Service Center Journal, 3. Staff, Personnel Research Section, Classication and
Lezak, M.D. (1995). Neuropsychological assessment Replacement Branch, The Adjutant General's Ofce.
(3rd ed.). New York: Oxford University Press. (1944). The new Army Individual Test of general
Lorge, I. (1943). The measurement of adult intelligence mental ability. Psychological Bulletin, 41, 532538.
[review]. Journal of Consulting Psychology, 7, 167 Starch, D. (1911). Experiments in educational psychol-
168. ogy. New York: Macmillan.
Louttit, C.M., & Browne, C.G. (1947). The use of Stutsman, R. (1931). Mental measurement of preschool
psychometric instruments in psychological clinics. children with a guide for the administration of the
Journal of Consulting Psychology, 11, 4954. Merrill-Palmer Scale of Mental Tests. New York:
Matarazzo, J.D. (1972). Wechsler's measurement and World Book.
appraisal of adult intelligence (5th ed.). New York: Terman, L.M. (1916). The measurement of intelligence:
Oxford University Press. An explanation of and a complete guide for the use
Mullan, E.H. (1917). Mentality of the arriving of the Stanford Revision and Extension of the Binet
immigrant (Public Health Bulletin 90). Washington, Simon Intelligence Scale. Boston: Houghton Mif-
DC: Government Printing Ofce. in.
Paterson, A. (1944). Discussion on disorders of Terman, L.M., & Merrill, M.A. (1937). Measuring
personality after head injury (Section of Neurology. intelligence: A guide to the administration of the
May 4, 1944). Proceedings of the Royal Society of new revised StanfordBinet tests of intelligence.
Medicine, 37, 556 561. Boston: Houghton-Mifin.
Peterson, J. (1925). Early conceptions and tests of Thorndike, R.M., & Lohman, D.F. (1990). A century of
intelligence. Yonkers-on-Hudson, NY: World Book. ability testing. Chicago: Riverside.
Pichot, P. (1948). French pioneers in the eld of mental U.S. Public Health Service. (1977). Interviews with
deciency. American Journal of Mental Deciency, physicians stationed at Ellis Island in the 1910s and
53, 128137. 1920s, by Elizabeth Yew, 19771978. Bethesda,
Pichot, P. (1949). Les tests mentaux en psychiatrie, I. MD: National Library of Medicine (unpublished
Instruments et methodes. Paris: Presses Universi- transcript of interview with Grover A. Kempf,
taires de France. September 11, 1977).
HISTORY OF INTELLIGENCE TESTING 405

von Mayrhauser, R.T. (1987). The manager, the medic, Wechsler, D. (1981). Wechsler Adult Intelligence
and the mediator: The clash of professional Scale Revised. Manual. New York: Psychological
psychological styles and the wartime origins of Corporation.
group mental testing. In M.M. Sokal (Ed.), Psycho- Wechsler, D. (1991). WISC-III. Wechsler intelligence
logical testing and American society, 18901930 scale for children. Manual. San Antonio: Psycholog-
(pp. 128157). New Brunswick: Rutgers University ical Corporation.
Press. Wechsler, D. (1997). Wechsler Adult Intelligence
Wechsler, D. (1925). The measurement of emotional ScaleThird Edition. Administration and scoring
reactions: Researches on the psychogalvanic reex. manual. San Antonio: Psychological Corporation.
Archives of Psychology, 76. Wells, F.L. (1927). Mental tests in clinical practice.
Wechsler, D. (1932). Analytic use of the Army Alpha Yonkers, NY: World Book.
examination. Journal of Applied Psychology, 16, Wells, F.L. (1932). Army Alpharevised. Personnel
254256. Journal, 10, 411417.
Wechsler, D. (1935). The concept of mental deciency Whipple, G.M. (1910). Manual of mental and physical
in theory and practice. Psychiatric Quarterly, 9, tests. A book of directions compiled with special
232236. reference to the experimental study of school
Wechsler, D. (1939). The measurement of adult children in the laboratory or classroom. Baltimore:
intelligence. Baltimore: Williams & Wilkins. Warwick & York.
Wechsler, D. (1941). The measurement of adult Whipple, G.M. (1914). Manual of mental and physical
intelligence (2nd ed.). Baltimore: Williams & tests. A book of directions compiled with special
Wilkins. reference to the experimental study of school child-
Wechsler, D. (1945). A standardized memory scale for ren in the laboratory or classroom. Part I: Simple
clinical use. Journal of Psychology, 19, 8795. processes (2nd ed.). Baltimore: Warwick & York.
Wechsler, D. (1946). The WechserBellevue Intelli- Woodworth, R.S., & Wells, F.L. (1911). Association
gence Scale. Form II. Manual for administering and tests. Being a part of the Report of the Committee of
scoring the test. New York: Psychological Corpora- the American Psychological Association on the
tion. Standardization of Procedure in Experimental Tests.
Wechsler, D. (1949). Wechsler Intelligence Scale for Psychological Monographs, 13 (Whole no. 57).
Children. Manual. New York: Psychological Cor- Yerkes, R.M. (Ed.) (1921). Psychological examining in
poration. the United States Army. Memoirs of the National
Wechsler, D. (1955). Manual for the Wechsler Adult Academy of Sciences, 15 (Parts 13), Washington
Intelligence Scale. New York: Psychological Cor- DC: Government Printing Ofce.
poration. Yerkes, R.M., Bridges, J.W., & Hardwick, R.S. (1915).
Wechsler, D. (1958). The measurement and appraisal of A point scale for measuring mental ability. Balti-
adult intelligence (4th ed.). Baltimore: Williams & more: Warwick & York.
Wilkins. Yew, E. (1980). Medical inspection of immigrants at
Wechsler, D. (1967). Wechsler Preschool and Primary Ellis Island, 18911924. Bulletin of the New York
Scale of Intelligence. New York: Psychological Academy of Medicine, 56, 488510.
Corporation. Yoakum, C.A., & Yerkes, R.M. (1920). Army mental
Wechsler, D. (1974). Wechsler Intelligence Scale for tests. New York: Holt.
ChildrenRevised. Manual. New York: Psycholog- Zangwill, O.L. (1945). A review of psychological work
ical Corporation. at the Brain Injuries Unit, Edinburgh, 19411945.
Wechsler, D. (1979, September). The psychometric British Medical Journal, 2, 248250.
tradition: Developing the Wechsler Adult Intelli- Zenderland, L. (1998). Measuring minds: Henry
gence Scale. Paper presented at the 87th Annual Herbert Goddard and the origins of American
Meeting of the American Psychological Associa- intelligence testing. New York: Cambridge Uni-
tion, New York. versity Press.

You might also like