You are on page 1of 7

Purpose of Standardized Tests

Our goal at ETS and the purpose of standardized tests are the same to provide fair, valid and
reliable assessments that produce meaningful results. Standardized testing, if done carefully and
with a high degree of quality assurance, can eliminate bias and prevent unfair advantages by
testing the same or similar information under the same testing conditions.

Standardized tests allow the comparison of test takers from different areas of the state, the
country and the world. What a test can do depends on whether it is well-designed for a particular
purpose. Well-designed tests can provide results that can be used in a variety of meaningful
ways, such as:

Purpose Tests results can be used to:


Licensure or Verify whether someone has the necessary knowledge and skills to be a
Certification qualified practitioner or to be given advanced standing in an occupation or
profession
Admissions Inform decisions about which people should be selected for entrance to an
educational institution
Placement Determine which courses or level of a course a student should take
Employment Inform decisions on the hiring, placement and promotion of potential and
current employees
Curriculum-based Determine whether students have mastered the objectives of the course
End-of-Course Testing taken
Exit Testing Find out whether students have learned the amount necessary to graduate
from a level of education
Policy Tools Provide data to policymakers that helps them make decisions regarding
funding, class size, curriculum adjustments, teacher development and more
Course Credit Indicate whether a student should receive credit for a course he or she
didnt take through demonstration of course content knowledge
Accountability Hold various levels of the education system responsible for test results that
indicate if students learn what they should have learned

How Tests and Test Questions are Developed


ETS develops assessments that are of the highest quality, accurately measure the necessary
knowledge and skills, and are fair to all test takers. We understand that creating a fair, valid and
reliable test is a complex process that involves multiple checks and balances.

That's why dozens of professionals including test specialists, test reviewers, editors, teachers
and specialists in the subject or skill being tested are involved in developing every test
question, or "test item." And it's why all questions (or "items") are put through multiple, rigorous
reviews and meet the highest standards for quality and fairness in the testing industry.

Watch a brief video to learn how ETS creates fair, meaningful tests and test
questions (Flash, 5:59).

Watch Now >


View Transcript >

To help you further understand our process, here's an overview of the key steps ETS takes when
developing a new test.

Step 1: Defining Objectives


Educators, licensing boards or professional associations identify a need to measure certain skills
or knowledge. Once a decision is made to develop a test to accommodate this need, test
developers ask some fundamental questions:

Who will take the test and for what purpose?

What skills and/or areas of knowledge should be tested?

How should test takers be able to use their knowledge?

What kinds of questions should be included? How many of each kind?

How long should the test be?

How difficult should the test be?

Step 2: Item Development Committees


The answers for the questions in Step 1 are usually completed with the help of item development
committees, which typically consist of educators and/or other professionals appointed by ETS
with the guidance of the sponsoring agency or association. Responsibilities of these item
development committees may include:

defining test objectives and specifications


helping ensure test questions are unbiased

determining test format (e.g., multiple-choice, essay, constructed-response, etc.)

considering supplemental test materials

reviewing test questions, or test items, written by ETS staff

writing test questions

Step 3: Writing and Reviewing Questions


Each test question written by ETS staff or item development committees undergoes
numerous reviews and revisions to ensure it is as clear as possible, that it has only one correct
answer among the options provided on the test and that it conforms to the style rules used
throughout the test. Scoring guides for open-ended responses, such as short written answers,
essays and oral responses, go through similar reviews.

Step 4: The Pretest


After the questions have been written and reviewed, many are pretested with a sample group
similar to the population to be tested. The results enable test developers to determine:

the difficulty of each question

if questions are ambiguous or misleading

if questions should be revised or eliminated

if incorrect alternative answers should be revised or replaced

Step 5: Detecting and Removing Unfair Questions


To meet the stringent ETS Standards for Quality and Fairness (PDF) guidelines, trained
reviewers must carefully inspect each individual test question, the test as a whole and any
descriptive or preparatory materials to ensure that language, symbols, words, phrases and content
generally regarded as sexist, racist or otherwise inappropriate or offensive to any subgroup of the
test-taking population are eliminated.

ETS statisticians also can identify questions on which two groups of test takers who have
demonstrated similar knowledge or skills perform differently on the test through a process called
differential item functioning (DIF). If one group performs consistently better than another on a
particular question, that question receives additional scrutiny and may be deemed biased or
unsatisfactory. Note: If people in different groups actually differ in their average levels of
relevant knowledge or skills, a fair test question will reflect those differences.

Step 6: Assembling the Test


After the test is assembled, it is reviewed by other specialists, committee members and
sometimes other outside experts. Each reviewer answers all questions independently and submits
a list of correct answers to the test developers. The lists are compared with the ETS answer keys
to verify that the intended answer is, indeed, the correct answer. Any discrepancies are resolved
before the test is published.

Step 7: Making Sure Even After the Test is Administered


that the Test Questions are Functioning Properly
Even after the test has been administered, statisticians and test developers review to make sure
that test questions are working as intended. Before final scoring takes place, each question
undergoes preliminary statistical analysis and results are reviewed question by question. If a
problem is detected, such as the identification of a misleading answer to a question, corrective
action, such as not scoring the question, is taken before final scoring and score reporting takes
place.

Tests are also reviewed for reliability. Performance on one version of the test should reasonably
predict performance on any other version of the test. If reliability is high, results will be similar
no matter which version a test taker completes.

NEW! Translations available!

You might also be interested in ...

About ETS

Education Topics

ETS Research

How ETS Approaches Testing Home

Purpose of Standardized Tests

How Tests and Test Questions are Developed

o Quality and Fairness


How Tests are Scored

Preparing for Tests

Frequently Asked Questions

Glossary of Standardized Testing Terms

ETS site tools:

Bookmark

Share

Get Adobe Reader (for PDFs)

ASUREMENT AND EVALUATION:


CRITERION- VERSUS NORM-REFERENCED TESTING
Source: Huitt, W. (1996). Measurement and evaluation: Criterion- versus norm-referenced testing. Educational
Psychology Interactive. Valdosta, GA: Valdosta State University. Retrieved [date], from
http://www.edpsycinteractive.org/topics/measeval/crnmref.html

Return to: | Measurement & Evaluation | EdPsyc Interactive: Courses |

Many educators and members of the public fail to grasp the distinctions between criterion-
referenced and norm-referenced testing. It is common to hear the two types of testing referred to
as if they serve the same purposes, or shared the same characteristics. Much confusion can be
eliminated if the basic differences are understood.

The following is adapted from: Popham, J. W. (1975). Educational evaluation. Englewood


Cliffs, New Jersey: Prentice-Hall, Inc.

Criterion-Referenced Norm-Referenced
Dimension
Tests Tests

Purpose To determine whether each To rank each student with


student has achieved specific respect to the
skills or concepts. achievement of others in broad
areas of knowledge.
To find out how much students
To discriminate between high and low
know before instruction begins and
achievers.
after it has finished.

Measures specific skills which


make up a designated
curriculum. These skills are Measures broad skill areas
identified by teachers and sampled from a variety of
Content
curriculum experts. textbooks, syllabi, and the
judgments of curriculum experts.
Each skill is expressed as an
instructional objective.

Each skill is tested by at least Each skill is usually tested by


four items in order to obtain an less than four items.
adequate sample of student
Item performance and to minimize Items vary in difficulty.
Characteristics the effect of guessing.
Items are selected that discriminate
The items which test any given skill between high
are parallel in difficulty. and low achievers.

Each individual is compared Each individual is compared with


with a preset standard for other examinees and assigned a
acceptable achievement. The score--usually expressed as a
performance of other percentile, a grade equivalent
examinees is irrelevant. score, or a stanine.
Score
Interpretation A student's score is usually Student achievement is reported for
expressed as a percentage. broad skill areas, although some
norm-referenced tests do report
Student achievement is reported for student achievement for individual
individual skills. skills.

The differences outlined are discussed in many texts on testing. The teacher or administrator who
wishes to acquire a more technical knowledge of criterion-referenced test or its norm-referenced
counterpart, may find the text from which this material was adapted particularly helpful.

Additional resources:

Bond, L. (1996). Norm- and criterion-referenced testing. Practical


Assessment, Research & Evaluation, 5(2). Retrieved September 2002, from
http://ericae.net/pare/getvn.asp?v=5&n=2.
Linn, R. (2000). Assessments and accountability. ER Online, 29(2), 4-14.
Retrieved September, 2002, from http://www.aera.net/pubs/er/arts/29-
02/linn01.htm.

Sanders, W., & Horn, S. (1995). Educational assessment reassessed: The


usefulness of standardized and alternative measures of student achievement
as indicators for the assessment of educational outcomes. Education Policy
Analysis Archives, 3(6). Retrieved September 2002, from
http://olam.ed.asu.edu/epaa/v3n6.html.

You might also like