You are on page 1of 10

BHAVAN’S INSTITUTE OFMANAGEMENT

MYSORE

SUBJECT : HRM

ASSGIMENT ON : VALIDITY AND RELIABILITY

SUBMITED TO : PROF.ROHINI.G.SHETTY
HR DEPARTMENT
BHAVAN’S INSTITUTE OF
MANAGEMENT
MYSORE

SUBMITTED BY : SRINIVAS.D
IV {TRI-SEM}, P.G.PM
BHAVAN’S INSTITUTE OF
MANAGEMENT
MYSORE

DATE: 14-10-2008
DAY :
TUESDAY
VALIDITY

Introduction

Validity is arguably the most important criteria for the


quality of a test. The term validity refers to whether or not
the test measures what it claims to measure. On a test with
high validity the items will be closely linked to the test’s
intended focus. For many certification and licensure tests
this means that the items will be highly related to a specific
job or occupation. If a test has poor validity then it does not
measure the job-related content and competencies it ought
to. When this is the case, there is no justification for using
the test results for their intended purpose. There are several
ways to estimate the validity of a test including content
validity, concurrent validity, and predictive validity. The face
validity of a test is sometimes also mentioned.

TYPES OF VALIDITY

Content Validity

While there are several types of validity, the most important


type for most certification and licensure programs is
probably that of content validity. Content validity is a logical
process where connections between the test items and the
job-related tasks are established. If a thorough test
development process was followed, a job analysis was
properly conducted, an appropriate set of test specifications
were developed, and item writing guidelines were carefully
followed, then the content validity of the test is likely to be
very high. Content validity is typically estimated by
gathering a group of subject matter experts (SMEs) together
to review the test items. Specifically, these SMEs are
given the list of content areas specified in the test blueprint,
along with the test items intended to be based on each
content area. The SMEs are then asked to indicate whether
or not they agree that each item is appropriately matched to
the content area indicated. Any items that the
SMEs identify as being inadequately matched to the test
blueprint, or flawed in any other way, are either revised or
dropped from the test.

Concurrent Validity

Another important method for investigating the validity of a


test is concurrent validity. Concurrent validity is a statistical
method using correlation, rather than a logical method.
Examinees who are known to be either masters or non-
masters on the content measured by the test are identified,
and the test is administered to them under realistic exam
conditions. Once the tests have been scored, the
relationship is estimated between the examinees’ known
status as either masters or non-masters and their
classification as masters or non-masters (i.e., pass or fail)
based on the test. This type of validity provides evidence
that the test is classifying examinees correctly. The stronger
the correlation is, the greater the concurrent validity of the
test is.

Predictive Validity

Another statistical approach to validity is predictive validity.


This approach is similar to concurrent validity, in that it
measures the relationship between examinees'
performances on the test and their actual status as masters
or non-masters. However, with predictive validity, it is the
relationship of test scores to an examinee's future
performance as a master or non-master that is estimated. In
other words, predictive validity considers the question,
"How well does the test predict examinees' future status as
masters or non-masters?" For this type of validity, the
correlation that is computed is between the examinees'
classifications as master or non-master based on the test
and their later performance, perhaps on the job. This type of
validity is especially useful for test purposes such as
selection or admissions.

Face Validity

One additional type of validity that you may hear mentioned


is face validity. Like content validity, face validity is
determined by a review of the items and not through the use
of statistical analyses. Unlike content validity, face validity
is not investigated through formal procedures and is not
determined by subject matter experts. Instead, anyone who
looks over the test, including examinees and other
stakeholders, may develop an informal opinion as to
whether or not the test is measuring what it is supposed to
measure. While it is clearly of some value to have the test
appear to be valid, face validity alone is insufficient for
establishing that the test is measuring what it claims to
measure. A well developed exam program will include formal
studies into other, more substantive types of validity.
Summary

The validity of a test is critical because, without sufficient


validity, test scores have no meaning. The evidence you
collect and document about the validity of your test is also
your best legal defense should the exam program ever be
challenged in a court of law. While there are several ways to
estimate validity, for many certification and licensure exam
programs the most important type of validity to establish is
content validity.

RELIABILITY

Introduction

Reliability is one of the most important elements of test


quality. It has to do with the consistency, or reproducibility,
of an examinee's performance on the test. For example, if
you were to administer a test with high reliability to an
examinee on two occasions, you would be very likely to
reach the same conclusions about the examinee's
performance both times. A test with poor reliability, on the
other hand, might result in very different scores for the
examinee across the two test administrations. If a test
yields inconsistent scores, it may be unethical to take any
substantive actions on the basis of the test. There are
several methods for computing test reliability including test-
retest reliability, parallel forms reliability, decision
consistency, internal consistency, and interpreter reliability.
For many criterion-referenced tests decision consistency is
often an appropriate choice.

Types of Reliability

Test-Retest Reliability

To estimate test-retest reliability, you must administer a test


form to a single group of examinees on two separate
occasions. Typically, the two separate administrations are
only a few days or a few weeks apart; the time should be
short enough so that the examinees' skills in the area being
assessed have not changed through additional learning. The
relationship between the examinees' scores from the two
different administrations is estimated, through statistical
correlation, to determine how similar the scores are. This
type of reliability demonstrates the extent to which a test is
able to produce stable, consistent scores across time.

Parallel Forms Reliability

Many exam programs develop multiple, parallel forms of an


exam to help provide test security. These parallel forms are
all constructed to match the test blueprint, and the parallel
test forms are constructed to be similar in average item
difficulty. Parallel forms reliability is estimated by
administering both forms of the exam to the same group of
examinees. While the time between the two test
administrations should be short, it does need to be long
enough so that examinees' scores are not affected by
fatigue. The examinees' scores on the two test forms are
correlated in order to determine how similarly the two test
forms function. This reliability estimate is a measure of how
consistent examinees’ scores can be expected to be across
test forms.

Decision Consistency

In the descriptions of test-retest and parallel forms


reliability given above, the consistency or dependability of
the test scores was emphasized. For many criterion
referenced tests (CRTs) a more useful way to think about
reliability may be in terms of examinees’ classifications. For
example, a typical CRT will result in an examinee being
classified as either a master or non-master; the examinee
will either pass or fail the test. It is the reliability of this
classification decision that is estimated in decision
consistency reliability. If an examinee is classified as a
master on both test administrations, or as a non-master on
both occasions, the test is producing consistent decisions.
This approach can be used either with parallel forms or with
a single form administered twice in test-retest fashion.

Internal Consistency

The internal consistency measure of reliability is frequently


used for norm referenced tests (NRTs). This method has the
advantage of being able to be conducted using a single form
given at a single administration. The internal consistency
method estimates how well the set of items on a test
correlate with one another; that is, how similar the items on
a test form are to one another. Many test analysis software
programs produce this reliability estimate automatically.
However, two common differences between NRTs and CRTs
make this method of reliability estimation less useful for
CRTs. First, because CRTs are typically designed to have a
much narrower range of item difficulty, and examinee
scores, the value of the reliability estimate will tend to be
lower. Additionally, CRTs are often designed to measure a
broader range of content; this results in a set of items that
are not necessarily closely related to each other. This aspect
of CRT test design will also produce a lower reliability
estimate than would be seen on a typical NRT.

Interrater Reliability

All of the methods for estimating reliability discussed thus


far are intended to be used for objective tests. When a test
includes performance tasks, or other items that need to be
scored by human raters, then the reliability of those raters
must be estimated. This reliability method asks the
question, "If multiple raters scored a single examinee's
performance, would the examinee receive the same score.
Interrater reliability provides a measure of the dependability
or consistency of scores that might be expected across
raters.
Summary

Test reliability is the aspect of test quality concerned with


whether or not a test produces consistent results. While
there are several methods for estimating test reliability, for
objective CRTs the most useful types are probably test-
retest reliability, parallel forms reliability, and decision
consistency. A type of reliability that is more useful for NRTs
is internal consistency.
For performance-based tests, and other tests that use
human raters, interrater reliability is likely to be the most
appropriate method.

Bibliography

www.proftesting.com
www.google.com

You might also like