You are on page 1of 27

EDUCATIONAL MEASUREMENT & EVALUATION

MEASUREMENT AND EVALUATION

- Concept Of Measurement & Evaluation


- Difference between Measurement and Evaluation
- Nature, Scope, need, types and limitations of Educational Measurement
- Nature, Scope, need, types and limitations of Educational Evaluation
- Concept of true and error sources.

CONCEPT OF MEASUREMENT & EVALUATION


MEASUREMENT
Epilogue:

“A good picture of pupil is the fundamental picture of the teacher. Every


measurement is the knowledge of objectively existing relation between the objects and
phenomena of objective reality.” ---- Karel Berka.

DEVELOPMENT OF MEASUREMENT:

Etymological: The word measurement comes from the word measaure which means metria
in Greek, means ―a measuring‘. To know an object or a person really means to be able to describe
him accurately and comprehensively. But any description of an object or phenomenon or person is
selective. It is multi dimension, a phenomenon of a person cannot be measured, yes, we can measure
the characteristic of a person or phenomenon. Evaluation is a process of both formal and informal
and continuous to know the objects and persons.

The Developments of Measurements are classified into the following phases:


Periodical Development:

a) The Beginning of psychological Measurement

Psychological measurement was a part of philosophy; it starts to measure the mind or soul.
After 1850 psychology change into the idea of quantitative measurement in terms of the amount of
forgetting, level of intelligence. By 1900 Psychology started its measurement technique in all
directions. Any attempts to measure human behavior through experiments were ridicule but growths
of experimentation, Darwinian demonstration, and clinical adjustment of individuals start the basic
foundations of psychological measurements.
b) Development in the nineteenth century
Galton was the first man to undertake systematic and statistical investigation of individual
difference. He was followed by Weber, B. Alexander‘s ―Senses and intellect‖ and ―the emotions and
will‖ which leads to the development of a number of scientific tools for measurement. The first
psychological laboratory was setup in 1879 at university of Leipzig. Thus, Karl Pearson developed a
technique ―The product-moment method‖ for computing the correlation co-efficient and Binet
developed the different intelligence scales.

1
c) Development in the twentieth century

This period start with exploration and standardization of initial development and methods.
Spelling tests, arithmetic tests, language test and Thorndike handwriting scale along with the group
test of intelligence. Then, tests are standardized for multiplied, data sheet, questionnaires and
inventories came into being. Large scale batteries for educational and personal use are generated as
tools. Binet-Simon Intelligence scale was one such scale. Army Alpha, beta test and so on developed
a new methodology known as psychometric theory.

Topical Development:

a) Beginning of experimental Psychology


b) Early study of individual Differences
c) Clinical Study of Deviates
d) Early Educational Measurement.

Contributions of great Psychologist

a) Contribution of Galton
b) Contribution of Alfred Binet

Development of various Type of test

Individual test, performance test, aptitude test, group test, multifactor test, personality test, batteries
test, ratting scale, inventories, etc

Thus, scientific interest and method spreads from biological science and end with
development in behavioral science, rate of learning, and time complexity of mental task are
measured and interpreted in terms of statistical design and techniques. Analyzing and describing
individual difference was made simple; measurement of intelligence was replaced by oral and
written examinations and test in classroom. Measurements establish its own problem of study,
factors, and its scope. It expands learning theory to all the senses organs. Measurement can establish
new relationship and functioning of memory, mental image, attention, feelings strength, skill and
value judgment.

CONCEPT OF MEASUREMENT:

According to N. R. Campbell, (1920) defines the concepts of measurement in the following ways:
(i) The assignment of numerals to represent properties.
(ii) The process of assigning numbers to represent qualities.
(iii) The assignment of numerals to represent properties according to scientific laws.
(iv) The assignment of numerals to things so as to represent facts or conventions about them.

According to the B. Russell, the concept of measurement is regarded as a mathematical concept


‗number’; it is the combination of number and numerals.

According to S.S. Stevens, the concept of measurement is the assignment of numerals to objects or
events or aspects of object according to rules - according to any rule.

2
MEANING OF MEASUREMENT:

1. Measurement is a process of quantification. It means precision and quantification of a


phenomenon or variable.
2. Measurement is a process of assigning symbols to the dimensions of phenomenon in order to
characterize the status of the phenomenon as precisely as possible
3. Measurement is always done of a quality or attribute of a thing or a person.
4. The process of measurement converts the variables into variate which is used for drawing the
inferences like intelligence is quantified in terms of I.Q. and achievement in terms of scores.
5. Campbell defines ―measurement as the assignment of numerals to objects or events according to
rules.‖
6. Measurement means a characteristic is defined and an instrument is selected to measure it, e.g.,
height can be measured with a tape measure, weight can be measured with a weight scale.

STEPS OR ESSENTIALS OF MEASUREMENTS:

1. According to K.S. Sodhi, the requirements in measurement are:


 a set of objects,
 a set of numbers, and
 a rule or rules for the assignments of a number to each object.
2. According to Rani Swarupa, measurements in any field involves three essentials:
 Identifying and defining the quality, attribute or variable that it is to be measured.
 Determining the set of operations by which the attribute or variable may be made manifest
and perceivable and
 Establishing a set of procedure or definitions for translating observation into quantitative
statements of degree, extent or amount

COMPARISON OF PHYSICAL AND MENTAL MEASUREMENT:

Physical measurement exists in the physical and material world and is concerned primarily
with the dimension like age, weight, length, capacity, etc. These measures are quantitative and
therefore require units like years, month, kilogram, metres, litres etc. while mental measurement is
qualitative as against the physical measurement which is quantitative. It is not precise. It is
subjective and indefinite.

Physical Measurement Mental Measurement


1. There is a zero-point There is no zero-point, There is only a
standard or norm
2. Units are fundamental and of definite Units are to be derived. They are of
value indefinite value
3. Measurement of the entire quantity is Entire quantity cannot be measured. Only a
possible sample of it is measured.
4. Measurement is absolute Measurement is relative
5.It is objective It is very subjective

3
FUNCTIONS OF MEASUREMENT:

According to Lee J. Cronbach (1949), there are three main functions

1. Prognosis function.
The Function tells about the difference among their performances at the movement. The
prognosis has the administrative function such as classification, selection, promotion and gradation
of students. All decisions involve prediction when psychological test is mentioned, so called I.Q. test
administered to students in school to predict their academic performance come to mind and
prediction of future behavior.

2. Diagnosis function.
Diagonosis function identifies the weakness of the student-learning. The remedial instruction
can be prepared on the basis of diagnosis. It also implies the prediction but there is considerable
justification in listing diagnosis of a separate function of measurement. It establish the cause –effect
relationship thereby improvement in instructional procedure.

3. Research function.
Measurement provides a more objective and dependable basis for comparison than dots rough
impressions. Test scores are quantified in real and useful variable. Scientific hypothesis are verified
with the help of measurement.

Again, According to K.S. Sidhu, functions of measurements are classified as:


a) Classification
b) Selection
c) Diagnosis
d) Comparison
e) Prediction
f) Research

NEED OF MEASUREMENT IN EDUCATION:

There is need for measurement in education and psychology for a large number of reasons and
purposes. Educational or psychological measurement is simply the means by which qualitative
aspects of human behavior re observed with greater accuracy. The purposes are to make possible
more accurate prediction and control in the educational process.

 Overall planning in education.

An effective utilization of manpower is essential in any modern society. We need the levels
of aptitudes and the combinations in the development of different types of behavior in each
vocational area. Measurement provides a feedback in all aspects of educational planning so
that our educational programmes are oriented and updated from time to time.

 Educational placement

There are two overall functions of education – the integrative and the differentiative.
Integrative education is according to individual aptitudes designed to make people alike in
their ideals, values, virtues, language and general intellectual and social adjustment. It is also
known as general education. Differentiate education is designed to prepare individuals for the
4
required professions and specialities. General education adapted the curriculum to measured
aptitudes and abilities of the students. Where differentiative function the students are selected
in terms of their ability to succeed in various professional and specialized courses.

 Guidance and counseling

Measurement is done for selecting students who will succeed in a given curriculum.
Counseling is concerned with measurement as an aid to help the individual student to find the
vocation, college curriculum and social environment which will ensure his successful
adjustment. Aptitudes, interest, traits, skill, achievement profile are assesses with a view to
enable him to make an optimum vocational or educational or social adjustment. The use of
psychological measurement in counseling are the objective appraisal of personality for better
self-understanding and self – direction, improved basis of prediction, achievement and
growth and measure of capacity, measurement diagnosis the mental disability, deficiencies
and aberrations and used in the evaluation as the outcome of counseling.

 Improvement of instruction

The power of measurement is to modify and improve instructional procedures. Educational


goals have become more definite and meaningful. It involves the identification and
formulation of major objectives of a curriculum, the construction of valid reliable and
practical instrument. It helps in selecting objectives for a course. It helps in organizing
learning experience. Thereby, measurement helps to increased in the instruction and teaching
methods and strategies.

 The problems of individual difference

The general ability and standardized educational methods using uniform contents,
assignments, methods of teaching and learning and examinations appropriate to the level of
achievement. Measurement helps to identify the differences within the groups, learning their
skills area but also in terms of interest in a topic or in terms of personality and social needs.
Reporting the marks or letter grades to parents is not consistent with the policy of meeting
the needs in skill area but also of individual pupils.

 Diagnosis and treatment of learning difficulties

Effective learning should result in complex behavior patterns which may be differentiated
into higher or lower degrees of habits, skills, understanding, feeling, etc. The rate of
development of a given trait and the level of development attained at maturity, also because
the various traits of an individual develop at different rates and reach different levels of
maturity.

5
 Increase of accountability

Testing and measurement have received immense impetus from such recent educational
movements as excellence, effective schooling, minimum competency and above all public
accountability. The pressure of these movements brought more effective and different kinds
of tests to schools. It has heightened the demand of policy-makers and the public for detailed
information about the test and test results. Accountability in teaching , administration,
counseling, curriculum construction, instructional design for better performance. Individuals
accountability in testing can be made with the help of measurement.
 Value of testing in Education

Students and teachers depend on the immediate and ultimate rewards or satisfactions
obtained from their efforts. Measurement helps to establish creative values, work products,
good administrator, and excellence students and their teacher.

PROCESS or LEVELS OF MEASUREMENT


According to Baumgartner & Jackson, the levels of Measurement are as follows:

 Nominal level
 Ordinal or Rank level
 Interval level
 Ratio level

Nominal (categorical) scores - when a score places people or things into a category these
are called nominal scores. Nominal scores cannot be ranked or ordered along any
dimension. The categories must be exhaustive and mutually exclusive.

Ordinal scores - means people or things are rank ordered along some dimension. No
common unit of measurement exists between rankings in a system of ordinal scores.
Comparisons cannot be made across different group rankings.

Interval scores - These scores have a common unit of measurement between adjacent
points. No true zero point exists on the interval scale.

Ratio scores - These scores have a common unit of measurement between adjacent scores.
Ratio scores have a true zero point.

6
EVALUATION
Epilogue:

“To evaluate is to determine what something is worth to somebody, Evaluation is the


discovery of the nature and worth of something.” ---- Denny, 1969.

DEVELOPMENT OF EVALUATION:

Etymological: The word Evaluation comes from French word evaluer meaning "to find the
value of."

Evaluation means that you gather information to draw conclusions and make new
predictions. It starts with the difference between evaluation and research. First, problem
selection and definition in research is the responsibility of the research, where as in
evaluation the context of the study almost completely defines the problem for investigation.
Second, research hypotheses are usually derived by deduction from the theories or by
induction from the organized body of knowledge. Precise hypothesis can rarely be generated
and the task more usually becomes that of testing generalization derived from previous
knowledge and experience. Third, every evaluation study is unique. Fourth, Evaluation have
to be conducted in the presence of a multitude of variables which could have relevance in the
interpretation of results with randomization generally impossible or impractical to
accomplish. Data to be collected are heavily influenced in evaluation. Sixth, there is value
judgment. According to B. Bloom, Evaluation is the collecting and analyzing evidence of the
extent to which various groups “see value - see worth in - stated objectives”

7
CONCEPT OF EVALUATION:

1. Evaluation is the process of delineating, obtaining and providing useful information for judging
decision alternatives.
2. According to Stufflebearn, 1971, The adequacy of an evaluation may therefore be assessed on five
criteria:
---Reliability (is the information what the decision maker needs?)
---Pervasiveness (does the information reach all decision makers who need it?)
---Credibility (is the information trusted by the decision maker and those he must serve?)

CIPP model of evaluations:

Planning decisions --------- context evaluation: identify what improvement are needed in a
part of the educational system by specifying the major goals and specific
objectives to be served. It defines the context to be served, to question the
value of any stated goals, to identify and assess needs and unused
opportunities and to identify and delineate problems underlying the needs.

Programming decisions --------- input evaluation: designed to provide information on how


resources may be utilized to meet the goals of a programme. Their aim is
to ascertain the available capabilities of an educational sub-sytem and to
identify and assess potential strategies for evaluations. The method of
input evaluation is to describe and analyse available human and material
resources, solution strategies and procedural design in terms of their
relevance, feasibility and economy for the course of action to be taken.
The end product is an analysis of alternative in terms of potential costs and
benefits, which enables the decision-maker to specify such factors as the
procedures, materials, etc which will be required in the next stage.

Implementation decisions --------- Process evaluation: thus involves monitoring the educational
activities in order to aid the decision-maker responsible for controlling
their execution. The aim is to identify, and if possible anticipate any
defects in the design of the project or its implementation. The process
evaluator therefore accepts the programme as it is and as it evolves, and
focuses whatever evaluative technique may be appropriate on the most
crucial aspects of the project, in order to build up an account of what
actually happen. Process evaluation may thus have both a formative
function and a summative function.

Recycling decisions --------- product evaluation: is therefore used to determine the


effectiveness of the project after it has run full cycle. Its aim is to relate
outcomes to context, input and process. Product evaluation is the simple
concept of summative evaluation.

8
MEANING OF EVALUATION:

Evaluation involves assessing the strengths and weaknesses of programs, policies, personnel,
products, and organizations to improve their effectiveness.

Evaluation is the systematic collection and analysis of data needed to make decisions, a process in
which most well-run programs engage from the outset. Here are just some of the evaluation activities
that are already likely to be incorporated into many programs or that can be added easily:

 Pinpointing the services needed for example, finding out what knowledge, skills, attitudes, or
behaviors a program should address

 Establishing program objectives and deciding the particular evidence (such as the specific
knowledge, attitudes, or behavior) that will demonstrate that the objectives have been met. A
key to successful evaluation is a set of clear, measurable, and realistic program objectives. If
objectives are unrealistically optimistic or are not measurable, the program may not be able
to demonstrate that it has been successful even if it has done a good job

 Developing or selecting from among alternative program approaches for example, trying
different curricula or policies and determining which ones best achieve the goals

 Tracking program objectives for example, setting up a system that shows who gets services,
how much service is delivered, how participants rate the services they receive, and which
approaches are most readily adopted by staff

 Trying out and assessing new program designs determining the extent to which a particular
approach is being implemented faithfully by school or agency personnel or the extent to
which it attracts or retains participants.

Through these types of activities, those who provide or administer services determine what to offer
and how well they are offering those services. In addition, evaluation in education can identify
program effects, helping staff and others to find out whether their programs have an impact on
participants' knowledge or attitudes.

The different dimensions of evaluation have formal names: process, outcome, and impact evaluation.

Rossi and Freeman (1993) define evaluation as "the systematic application of social research
procedures for assessing the conceptualization, design, implementation, and utility of ... programs."
There are many other similar definitions and explanations of "what evaluation is" in the literature.
Our view is that, although each definition, and in fact, each evaluation is slightly different, there are
several different steps that are usually followed in any evaluation. It is these steps which guide the
questions organizing this handbook. An overview of the steps of a "typical" evaluation follows.

9
Process Evaluations
Process Evaluations describe and assess program materials and activities. Examination of materials
is likely to occur while programs are being developed, as a check on the appropriateness of the
approach and procedures that will be used in the program. For example, program staff might
systematically review the units in a curriculum to determine whether they adequately address all of
the behaviors the program seeks to influence. A program administrator might observe teachers using
the program and write a descriptive account of how students respond, then provide feedback to
instructors. Examining the implementation of program activities is an important form of process
evaluation. Implementation analysis documents what actually transpires in a program and how
closely it resembles the program's goals. Establishing the extent and nature of program
implementation is also an important first step in studying program outcomes; that is, it describes the
interventions to which any findings about outcomes may be attributed. Outcome evaluation assesses
program achievements and effects.

10
THE PROCESS OF EVALUATION INCLUDES THE FOLLOWING:
The teaching learning process
The teacher
The student – progress
The Parents
The administrators and supervisors
Guidance cell
Agencies

Outcome Evaluations
Outcome Evaluations study the immediate or direct effects of the program on participants. For
example, when a 10-session program aimed at teaching refusal skills is completed, can the
participants demonstrate the skills successfully? This type of evaluation is not unlike what happens
when a teacher administers a test before and after a unit to make sure the students have learned the
material. The scope of an outcome evaluation can extend beyond knowledge or attitudes, however,
to examine the immediate behavioral effects of programs.

Impact Evaluations
Impact Evaluations look beyond the immediate results of policies, instruction, or services to identify
longer-term as well as unintended program effects. It may also examine what happens when several
programs operate in unison. For example, an impact evaluation might examine whether a program's
immediate positive effects on behavior were sustained over time. Some school districts and
community agencies may limit their inquiry to process evaluation. Others may have the interest and
the resources to pursue an examination of whether their activities are affecting participants and
others in a positive manner (outcome or impact evaluation). The choices should be made based upon
local needs, resources, and requirements.

Regardless of the kind of evaluation, all evaluations use data collected in a systematic manner. These
data may be quantitative such as counts of program participants, amounts of counseling or other
services received, or incidence of a specific behavior. They also may be qualitative such as
descriptions of what transpired at a series of counseling sessions or an expert's best judgment of the
age-appropriateness of a skills training curriculum. Successful evaluations often blend quantitative
and qualitative data collection. The choice of which to use should be made with an understanding
that there is usually more than one way to answer any given question.

11
Need of Evaluation:
Evaluations serve many purposes. Before assessing a program, it is critical to consider who is most
likely to need and use the information that will be obtained and for what purposes. Listed below are
some of the most common reasons to conduct evaluations. These reasons cut across the three types
of evaluation just mentioned. The degree to which the perspectives of the most important potential
users are incorporated into an evaluation design will determine the usefulness of the effort.

Evaluation for Project Management

Administrators are often most interested in keeping track of program activities and
documenting the nature and extent of service delivery. The type of information they seek to
collect might be called a "management information system" (MIS). An evaluation for project
management monitors the routines of program operations. It can provide program staff or
administrators with information on such items as participant characteristics, program
activities, allocation of staff resources, or program costs. Analyzing information of this type
(a kind of process evaluation) can help program staff to make short-term corrections
ensuring, for example, that planned program activities are conducted in a timely manner.
This analysis can also help staff to plan future program direction such as determining
resource needs for the coming school year.

Operations data are important for responding to information requests from constituents, such
as funding agencies, school boards, boards of directors, or community leaders. Also,
descriptive program data are one of the bases upon which assessments of program outcome
are built it does not make sense to conduct an outcome study if results can not be connected
to specific program activities. An MIS also can keep track of students when the program ends
to make future follow-up possible.

Evaluation for Staying on Track

Evaluation can help to ensure that project activities continue to reflect project plans and
goals. Data collection for project management may be similar to data collection for staying
on track, but more information might also be needed. An MIS could indicate how many
students participated in a prevention club meeting, but additional information would be
needed to reveal why participants attended, what occurred at the meeting, how useful
participants found the session, or what changes the club leader would recommend. This type
of evaluation can help to strengthen service delivery and to maintain the connection between
program goals, objectives, and services.

Evaluation for Project Efficiency

Evaluation can help to streamline service delivery or to enhance coordination among various
program components, lowering the cost of service. Increased efficiency can enable a program
to serve more people, offer more services, or target services to those whose needs are
greatest. Evaluation for program efficiency might focus on identifying the areas in which a
program is most successful in order to capitalize upon them. It might also identify

12
weaknesses or duplication in order to make improvements, eliminate some services, or refer
participants to services elsewhere. Evaluations of both program process and program
outcomes are used to determine efficiency.

Evaluation for Project Accountability

When it comes to evaluation for accountability, the users of the evaluation results likely will
come from outside of program operations: parent groups, funding agencies, elected officials,
or other policymakers. Be it a process or an outcome evaluation, the methods used in
accountability evaluation must be scientifically defensible, and able to stand up to greater
scrutiny than methods used in evaluations that are intended primarily for "in-house" use. Yet
even sophisticated evaluations must present results in ways that are understandable to lay
audiences, because outside officials are not likely to be evaluation specialists.

Evaluation for Program Development and Dissemination

Evaluating new approaches is very important to program development in any field.


Developers of new programs need to conduct methodical evaluations of their efforts before
making claims to potential users. Rigorous evaluation of longer-term program outcomes is a
prerequisite to asserting that a new model is effective. School districts or community
agencies that seek to disseminate their approaches to other potential users may wish to
consult an evaluation specialist, perhaps a professor from a local university, in conducting
this kind of evaluation.

Use in Decision Making

Since there is no single, ―best‖ approach to evaluation which can be used in all situations, it
is important to decide the purpose of the evaluation, the questions you want to answer, and
which methods will give you usable information that you can trust. Even if you decide to hire
an external consultant to assist with the evaluation, you, your staff, and relevant stakeholders
should play an active role in addressing these questions. You know the project best, and
ultimately you know what you need. In addition, because you are one of the primary users of
evaluation information, and because the quality of your decisions depends on good
information, it is better to have ―negative‖ information you can trust than ―positive‖
information in which you have little faith. Again, the purpose of project-level evaluation is
not just to prove, but also to improve.

People who manage innovative projects have enough to do without trying to collect
information that cannot be used by someone with a stake in the project. By determining who
will use the information you collect, what information they are likely to want, and how they
are going to use it, you can decide what questions need to be answered through your
evaluation.

13
TYPES OF EVALUATION:

According to MacDonald (1976) has elucidated the complexity of the variety of context of valuation
by characterizing three style of evaluation:

Bureaucratic: It is an unconditional service to those government agencies which have major control
over the allocation of educational resources. The evaluator accepts the values of those
who hold office, and offers information which will help them to accomplish their
policy objectives. He acts as a management consultant, and his criterion of success is
client satisfaction. His technique of study must be credible to the policy-makers and
not lay them open to public criticism. He has no independence, no control over the use
made of his information, and no court of appeal. The report is owned by the
bureaucracy and lodged in its files.

Autocratic: It is a conditional service to those government agencies which have major control over
the allocation of educational resources. It offers external validation of policy in
exchange for compliance with its recommendations. Its values are derived from the
evaluator‘s perception of the constitutional and moral obligation of the bureaucracy.
He focuses upon issues of educational merit and acts as expert adviser. His techniques
of study must yield scientific proofs, because his power base is the academic research
community. His contractual arrangement guarantee non-interference by the client, and
he retains ownership of the study. His report is lodged in the fild of the bureaucracy,
but is also published in academic journals. If his recommendations are rejected, policy
is not validated. His court of appeal is the research community, and high levels in the
bureaucracy.

Democratic: It is an information service to the whole community about the characteristic of an


educational programme. Sponsorship of the evaluation study does not itself confer a
special claim upon this service. The democratic evaluator recognizes value pluralism
and seeks to represent a range of interests inhis issue formulation. The basic value is an
informed citizenry, and the evaluator acts as broker in exchanges of information
between groups who want knowledge of each other. His techniques of data-gathering
and presentation must be accessible to non-specialist audiences. His main activity is the
collection of definitions of, and reaction to, the programme. He offers confidentially to
informants and gives them control over his use of the information they provide. The
report is non-recommendatory, and the evaluator has no concept of information misuse.
He engages in periodic negotiation of his relationships with sponsors and programme
participants. The criterion of success is the range of audience served.

‘Evaluation from above’ and ‘Evaluation from below’

14
PURPOSES OF EVALUATION:

According to J. Rani Swarup (2004), the purposes of evaluation are as follows:


a) Motivating students to develop good study habits to correct errors
b) To direct their activities towards the achievement of errors
c) To direct their activities towards the achievement of desired goals
d) Diagnosing weakness.
e) Defining teaching objectives.
f) Differentiation of pupils for various purposes.
g) Certification of pupils.

Purposes of Evaluation, Program evaluations are typically conducted to accomplish one, two or all
of the following.

**To Render Judgments

**To Facilitate Improvements

**To Generate Knowledge

PRINCIPLES OF EVALUATION:

It is said that evaluation should always be regarded as a process that is guided by principles.
The Principles that govern the operation of evaluation process are as follows:

1. Evaluation should be based on clear instructional objectives.


2. Evaluation procedures and techniques should be selected in terms of the objective they serve.
3. Evaluation should be comprehensive.
4. Evaluation should be continuous.
5. Evaluation should be diagnostic and functional.
6. Evaluation should be a co-operative endeavor.
7. Evaluation should be used judiciously.

PLACES WHERE MEASUREMENT AND EVALUATION ARE USED:

Research, Education, Business, Sports, Medicine, Health and Rehabilitation

REASONS FOR MEASUREMENT AND EVALUATION

Motivation
Accountability
Equipment
Placement
Diagnosis
Evaluation of learning
Prediction
Program Evaluation

15
STANDARDS OF MEASUREMENT AND EVALUATION — MEASUREMENT THEORY

Reliability
Validity
Usability
Objectivity
Sensitivity

1. RELIABILITY: adequacy, objectivity, testing condition, test administration procedures


Methods of estimating reliability
1. Test-retest Method (uses Spearman rank correlation coefficient)
2. Parallel forms / alternate forms ( paired observations are correlated)
3. Split-half method (odd-even halves and computed using Spearman Brown
formula)
4. Internal-consistency method (Kuder-Richardson formula 20)
5. Scorer reliability method (two examiners independently score a set of test
papers then correlate their scores)

2. VALIDITY: Content, concurrent, predictive, construct


Content validity – face validity or logically validity used in evaluating
achievement test
Concurrent validity – test agrees with or correlates with a criterion (ex.
entrance examination)
Predictive validity – degree of accuracy of how test predicts the level of
performance in activity which it intends to foretell
Construct validity – agreement of the test with a theoretical construct or trait
(ex. IQ)

3. USABILITY: (practicality) ease in administration, scoring, interpretation and application,


low cost, proper mechanical make – up

16
DIFFERENCE BETWEEN MEASUREMENT AND EVALUATION

MEASUREMENT EVALUATION
1 Measurement refers to the process by which the 1 Evaluation is perhaps the most complex and least understood
attributes or dimensions of some physical object are of the terms. Inherent in the idea of evaluation is "value
determined.
2 Measurement is the process of gathering data, and the 2 Evaluation is the process of making judgment about measured
narrowest and involves comparative judgments, data; it is the technique for value judgment.
3 Without measurement there is no positive assurance that 3
the judgments are accurate , Without data there is no
proof that a real problem exists, Without measurement
there is no assurance that training efforts have achieved
their objectives
4 Measurement is objective, measurement remains 4 Evaluation may be subjective, evaluation depends on the mind
constant whoever measures frame of doer
5 For example taking out 1 kg of rice is measurement 5 but determining as good quality or otherwise is evaluation
6 Measurement can be a valuable input into evaluation; it 6 Evaluation is really composed of three components parts: ―e‖,
should never be equated with it. ―value‖, and ―action‖. The central element of the concept of
evaluation is value. Outcome of an evaluation is a judgment
7 Measurement is associated with organization. 7 Evaluation is associated with organizational process.
8 Measurement is to provide high-quality information to 8
assist in learning and improvement, rather than just
monitor goal achievement.
9 Measurement is the most fundamental management 9
system; it includes the following: management,
motivation, service, and training.
10 Measurement directs behavior, increases the visibility of 10
performance, increase alignment, improves decision
making and problem solving and gives early warning
signal.
11 Measurement enables prediction and understanding 11
12 12

17
MEASUREMENT EVALUATION
1 Measurement provides data 1 Evaluation interprets the data provided by measurement
2 Measurement is only a part of the system of examination 2 Evaluation is a comprehensive whole of the system of examination
3 Measurement suffers from limitations and shortcoming 3 Evaluation is an attempt to remove these limitations and
shortcomings
4 Measurement is restricted to quantitative description of pupil 4 Evaluation includes both quantitative and qualitative description of
behavior. pupil value judgments of that behavior.
5 Measurement tools may not be in a position to provide data 5 Evaluation endeavors to cover all aspects of the process of education
on many educational factors.
6 Measurement is like a product obtained after testing 6 Evaluation is like a process developing out of the products of testing
7 Equal interval and ratio scales are used 7 Nominal, ordinal equal interval are used
8 Its functions are prognosis, diagnosis and research 8 Its functions are selection, grading, guidance, prediction and
diagnosis
9 Formal process are planed 9 Both formal and informal are planned
10 It can be done at any time and space 10 It is a continuous process
11 It is content centered 11 It is objective centered
12 One dimensional in relation to its environment 12 It is multi-dimensional in nature to is environment.

18
NATURE, SCOPE, NEED, TYPES AND LIMITATIONS
OF EDUCATIONAL MEASUREMENT &
EVALUATION
NATURE OF EDUCATIONAL MEASUREMENT
Thorndike wrote that ―the nature of educational measurement is the same as that of all
scientific measurement‖ Educational measurement includes mental measurement with
physical measurement.
i) Measurement in education is quantitative in nature when express the result in
quantitative,
ii) Measurement is expressed in constant units.
Q. What is the value of measuring accurately the results of teaching?
Answer: The common answer to the above is a common sense answer, logical reasons or
experimental evidence. Logical reason for the value of accurate measurements by means of
standardized test is a generally accepted principle that in any field of human.
Natures of educational measurement are as follows:
a) It should be objective, reliable, and valid
b) It should be comprehensive and precise
c) It should be usable and practicable
- The usability implies the following features:
i) Ease in administering the tool
ii) Ease in scoring the answer scripts
iii) Ease in interpreting scores
iv) It should be economical from time, energy and money point of view.

SCOPE OF EDUCATIONAL MEASUREMENT


Educational Measurement discusses problems in the measurement of individual differences.
According to Richard h. Lindeman the scope of educational measurement are as follows:
a) Pupil characteristics to be measured: Three kinds of pupil characteristics:
- Achievement: What the pupil has learned? i.e. knowledge and abilities he has when the test is
given.
- General and specific aptitudes: What the pupil can learn? Given the appropriate learning
experience.
- Personal and social adjustment: it is concern with a number of affective characteristic
such as cooperativeness, honesty, interest, and attitudes.
b) Measurement of academic achievement: Teacher made test, standardized test,
performance test, observation in the classroom setting
c) Measurement of personality and adjustment
d) Organizational processes and techniques exercise by the leaders of learning centers
e) Measurement and Curriculum Design
f) Measurement with students Guidance and development

NEED OF EDUCATIONAL MEASUREMENT


Robert L. Ebel (1961) has outlined six needs for educational measurement:
i) Know the educational uses, as well as the limitations, of educational tests
ii) Know the criteria by which the quality of a test should be judged and how to
secure evidence relating to these criteria.
iii) Know how to plan a test and write the test questions to be included in it
iv) Know how to select a standardized test that will be effective in a particular
situation
19
v) Know how to administer a test properly, efficiently and fairly.
vi) Know how to interpret test scores correctly and fully, but with recognition of
their limitations

TYPE OF EDUCATIONAL MEASUREMENT


Tests are tools of measurements and measurements guide us and facilitate the realization of
the different purposes of education in the varying contexts of use. The several forms of
measurement are – scale, rank, classification and description. The types of educational
measurement are categorized as:
g) Purpose -specific categorization;
h) Mode -specific categorization;
i) Purpose -specific categorization.

Purpose - specific categorization


Diagnostic measurement:
Identify the areas of learning in which learners need remedial course. It gives us a
profile of what the learner knows and does not know in a given area of learning.
Aptitude Measurement:
It helps us in identify potentials talents. They identify the prerequisite characteristics
which are essential for one to be competent to perform given task.
Achievement Measurement:
To measure the extent to which the objective of a course have been achieved. It
measures the objectives of the given course and the cover areas of learning demarcated by
the given syllabus.

Proficiency Measurement:
To assess the general ability of a person at a given time, it is reasonable expectation
of what abilities learners of a given status should posses. National level selection in
different states and university jurisdictions can be taken as a typical example.

Mode -specific categorization

Formal assessment vs. informal assessment


Formal assessment is applicable to a situation where a body answerable to the public
is holding a test for a selection or an award. Assessment in such a situation has to ensure
objectivity, credibility and relevance with a set of standardized norms. Whereas, Informal
assessment is applicable to situations where an individual or a voluntary body is holding a
test to obtain some information to fulfill some personal requirements. It also needs to be
objective and reliable.

Formative assessment vs. Summative assessment


Formative assessment is concerned with identifying learner weakness in attainment
with a view to help the learner and remedies while Summative assessment at certifying and
grading the attainment of the learner at the end of a given course.

Continuous assessment vs. Terminal assessment


Course Work vs. Examination
Process vs. Product Assessment
Internal assessment vs. external assessment
Purpose -specific categorization
20
Teacher made and standardized
Norm reference and criterion reference
LIMITATIONS OF EDUCATIONAL MEASUREMENT

Apart from the characteristic, measurement has its own limitations too:
i) The most important limitation of measurement is that it is quite difficult to decide
about the nature of the object to be nature.
ii) Its scope is narrow and quite limited.
iii) Measurement fails in making clear out distinction between two traits such as
character and personality, achievement and aptitude etc. resulting into low level of
measurement.
iv) The process of measurement is often complex.
v) The traits measured under measurement are concrete and abstract. They have
different meaning for different categories, as a result of it is not so accurate as
physical measurement.
vi) Measurement only provides information rather than any kind of decision.
vii) One of the most important limitations of measurement is that the characteristics
measured under measurement are not physical and fixed. They are continually
changeable.
viii) In the absence of knowledge of the dimensions of educational characteristics the
measurement process lacks accuracy in comparison to physical measurement.

21
NATURE OF EDUCATIONAL EVALUATION

Evaluation includes all the means of collecting information about the students‘ learning.
The evaluator should make use of tests, observation, interview, rating scale, check list,
intuition and value judgment to gather complete and reliable information about the students.
Some of the following characteristics are the nature of educational evaluation:

a) It involves systematic collection of quantitative and qualitative data.


b) It is comprehensive, not simply concerned with the academic status of the
student but with all aspects of his growth-cognitive and non-cognitive
aspects.
c) It is continuous and not confined to one particular class or stage of education,
or any semester. It is to be conducted continuously as the student

SCOPE OF EDUCATIONAL EVALUATION

According to Arora and Vashist, the scope of modern educational evaluation are as follows:

Designs for secondary and elementary schools


a) Aspects of thinking: Test of interpretation of data, application of principles,
logical reasoning, and nature of proof.
b) Social sensitivity: Test of application to social problems of social values,
social facts, and generalization.
c) Civic and social beliefs: Scales of social, political and economic beliefs.
d) Aspects of appreciation in literature and art: a variety of techniques.
e) Interests: An inventory of personal, social, and school interests.
f) Personal and social development: Various self-reporting scales and anecdotal
records.
g) Various records: Pupils records forms for noting reading and listening.

Some of the common aspects/scopes of educational evaluation are as follows:

1. National assessment programs


2. International assessment programs
3. School performance reporting
4. Student monitoring systems
5. Assessment-based school self-evaluation
6. Examinations
7. System level Management Information Systems
8. School Management Information Systems
9. International review panels
10. School inspection/supervision
11. School self-evaluation, including teacher appraisal
12. School audits
13. Monitoring and evaluation as part of teaching
14. Program evaluation
15. School effectiveness and educational productivity studies

22
NEED OF EDUCATIONAL EVALUATION

The Educational Evaluation has come a long way since its initiation by Ralph Tyler
more than half a century ago. A thorough educational evaluation or psycho-educational
evaluation will provide you with your child's educational strengths, weaknesses, and
recommendations for educational interventions. Remember that a student's inability to stay
on task, hyperactivity, distractibility, and/or impulsivity will affect her performance on the
educational evaluation. There is usually no certification or license for an educational
evaluation. With this in mind, it is a priority to get information from the child's teachers:
classroom teacher, special subject teachers (music, art, physical education, and computer),
lunchroom monitor, playground monitors, and others who come into contact with the child
in the school setting.

According to Sally M. Thomas (2003), we need educational evaluation in the following


ways:
a) To formally regulate desired levels of quality of educational outcomes and
provisions.
b) To hold education systems accountable for their functioning and performance and
support direct democracy in education.
c) As a mechanism to stimulate improvement in education
d) Decentralization polices in education in many countries as a stimulating condition

According to Dr. B.S. Bloom (1971) the need of educational evaluation are as follows:
a) To discover the extent of competence which the student has developed in initiating
organizing and improving his day today working.
b) To diagnose the strengths and weakness of the learner with a view to guide him in
future.
c) To predict the educational practices which a student-teacher can best make use of.
d) At the end of career to certify the students‘ degree of competency in a particular
field.
e) To provide information to enable each pupil to develop his potentialities within the
framework of educational programme.

TYPE OF EDUCATIONAL EVALUATION


Before going into the types of educational evaluation, let us remain the steps of educational
evaluation:
The steps are as follows:
a) Identification and defining general objectives.
b) Identification and defining specific objectives.
c) Selecting teaching points.
d) Planning, implementing of suitable learning programmes and activities
e) Appraising and assessing the achievements
f) Using the result as feedback.

23
Purpose/function of educational evaluation

The functions of evaluation are as follows:

- To make provisions for guiding the growth of individual pupils.


- To diagnose the weakness and strengths of pupil.
- To locate areas where remedial measures are needed.
- To provide a basis for the modification of curriculum and the courses.
- To provide a basis to meet students‘ needs and requirements.
- To motivate pupils towards better attainment
- To test the efficiency of teachers in providing learning experiences and
the effectiveness of instruction.
- To bring out the inherent capabilities of pupils such as attitudes, habits,
appreciation and understanding, manipulative skills in addition to
conventional acquisition of knowledge.
- To achieve instructional objectives successfully.

LIMITATIONS OF EDUCATIONAL EVALUATION


The following are the limitations of evaluation approach:

 It requires training and understanding for using in class-room teaching.


 The content analysis and the identification of the objectives is not an easy task
 There is no standard criterion for determining teaching and testing points.
 The yearly plan and unit plan are prepared by a teacher, so it has subjectivity.
 The teachers do not take interest in using evaluation approach in class-room
teaching. It is used by teachers in training programme only.
 It is difficult in to write the objectives in behavioral terms

CONCEPT OF TRUE AND ERROR SOURCES

Types of error (from Basch and Gold 1986: 300–1)


Common problems in drawing conclusions from evaluation research include:
Type I error The wrong conclusion that an intervention has achieved significant change
when it has actually failed to do so.
Type II error The wrong conclusion that an intervention has failed to have a significant
effect when it actually has done so.
Type III error Judging that an intervention has failed when it was so poorly designed that it
could not have achieved the desired effect.
Type IV error Carrying out an evaluation of a programme that no-one cares about and is
irrelevant to decision-making.
Type V error The intervention is shown to have a statistically significant effect, but the
change is so small as to have no practical significance
Concept of true and error source

24
Truth destroys error, and Love destroys hate,
According to Neil J. Salkind, (2009) Observed Score = True Score + Error Score, the less
error, the more reliable – it‘s that simple, in other words, reduce the error and increase the
reliability.

Sources of Error: A source of error is a limitation of a procedure or an instrument that


causes an inaccuracy in the quantitative results of an experiment. A human error is not
considered a source of error under this definition. Students should strive to identify,
understand, and limit sources of error in their procedures whenever possible.
Reliability may be improved by clarity of expression (for written assessments),
lengthening the measure, and other informal means. However, formal psychometric
analysis, called the item analysis, is considered the most effective way to increase
reliability. This analysis consists of computation of item difficulties and item
discrimination indices, the latter index involving computation of correlations between the
items and sum of the item scores of the entire test. If items that are too difficult, too easy,
and/or have near-zero or negative discrimination are replaced with better items, the
reliability of the measure will increase.

Error is the amount of deviation in a physical quantity that arises as a result of the
process of measurement or approximation. Another term for error is uncertainty. Physical
quantities such as weight, volume, temperature, speed, or time must all be measured by an
instrument of one sort or another. No matter how accurate the measuring tool—be it an
atomic clock that determines time based on atomic oscillation or a laser interferometer that
measures distance to a fraction of a wavelength of light some finite amount of uncertainty is
involved in the measurement. Thus, a measured quantity is only as accurate as the error
involved in the measuring process. In other words, the error, or uncertainty, of a
measurement is as important as the measurement itself.

As an example, imagine trying to measure the volume of water in a bathtub. Using a


gallon bucket as a measuring tool, it would only be possible to measure the volume
accurately to the nearest full bucket, or gallon. Any fractional gallon of water remaining
would be added as an estimated volume. Thus, the value given for the volume would have a
potential error or uncertainty of something less than a bucket.Now suppose the bucket were
scribed with lines dividing it into quarters. Given the resolving power of the human eye, it is
possible to make a good guess of the measurement to the nearest quarter gallon, but the
guess could be affected by factors such as viewing angle, accuracy of the scribing, tilts in
the surface holding the bucket, etc. Thus, a measurement that appeared to be 6.5 gal (24.6 l)
could be in error by as much as one quarter of a gallon, and might actually be closer to 6.25
gal (23.6 l) or 6.75 gal (25.5 l). To express this uncertainty in the measurement process, one
would write the volume as 6.5 gallons +/-0.25 gallons.

As the resolution of the measurement increases, the accuracy increases and the error
decreases. For example, if the measurement were performed again using a cup as the unit of
measure, the resultant volume would be more accurate because the fractional unit of water
remain ing—less than a cup—would be a smaller volume than the fractional gallon. If a
teaspoon were used as a measuring unit, the volume measurement would be even more
accurate, and so on.
As the example above shows, error is expressed in terms of the difference between
the true value of a quantity and its approximation. A positive error is one in which the
observed value is larger than the true value; in a negative error, the observed value is
smaller. Error is most often given in terms of positive and negative error. For example, the
volume of water in the bathtub could be given as 6 gallons +/-0.5 gallon, or 96 cups +/-0.5
25
cup, or 1056 teaspoons +/-0.5 teaspoons. Again, as the uncertainty of the measurement
decreases, the value becomes more accurate.

An error can also be expressed as a ratio of the error of the measurement and the true
value of the measurement. If the approximation were 25 and the true value were 20, the
relative error would be 5/20. The relative error can be also be expressed as a percent. In this
case, the percent error is 25%.

Measurement error can be generated by many sources. In the bathtub example, error
could be introduced by poor procedure such as not completely filling the bucket or
measuring it on a tilted surface. Error could also be introduced by environmental factors
such as evaporation of the water during the measurement process. The most common and
most critical source of error lies within the measurement tool itself, however. Errors would
be introduced if the bucket were not manufactured to hold a full gallon, if the lines
indicating quarter gallons were incorrectly scribed, or if the bucket incurred a dent that
decreased the amount of water it could hold to less than a gallon.

In electronic measurement equipment, various electromagnetic interactions can


create electronic interference, or noise. Any measurement with a value below that of the
electronic noise is invalid, because it is not possible to determine how much of the
measured quantity is real, and how much is generated by instrument noise. The noise level
determines the uncertainty of the measurement. Engineers will thus speak of the noise floor
of an instrument, and will talk about measurements as being below the noise floor, or "in the
noise."

Measurement and measurement error are so important that considerable effort is


devoted to ensure the accuracy of instruments by a process known as calibration.
Instruments are checked against a known, precision standard, and adjusted to be as accurate
as possible. Even gas pumps and supermarket scales are checked periodically to ensure that
they measure to within a predetermined error.

Measurement Error
Knowledge gained from the study of measurement science will make clinicians
more or less certain of their interpretations of the research summarized above and their
confidence in the values reported as the true amount of axial rotation permitted by the
orthoses. For example, measurement theory shows one can never absolutely measure the
true quantity of a concept. Every measure taken by clinicians or scientists has a shadow
component, termed error.
The error associated with a measurement is defined as the difference between the
unknowable true score and the observed score recorded while taking measurements. Since
in theory the true score always remains unknown, it is crucial to estimate the errors
associated with observed scores, or measures, as a means of establishing confidence in the
measuring devices and procedures. This can be done by taking repeated measures of the
same phenomenon and then describing the various observed scores. If repeated observed
scores are consistent, it is assumed that the measurement error is small and that the observed
scores closely approximate the true score (9). The measurement device and procedures are
declared reliable, and one of the major pitfalls of clinical research, measurement error, has
been overcome.
Measurement error is often categorized as occurring either randomly or
systematically in an experiment. Random errors are inconsistent discrepancies that occur by
chance in a study. They are not found to follow any pattern that could introduce bias into
the results; they are simply naturally occurring events that detract from the precision of
26
clinical measures (10). If the researcher is inexperienced with the measurements to be taken
and is uncertain about his/her judgments, the possibility of random error is introduced.
Research procedures often include repeated trials for measurements to decrease these types
of random errors so an average of several trials may be entered for the subject's score. This
method will provide a score that will more closely approximate the subject's true score than
does any one trial score (10). Consistent errors that persist from one subject to another are
considered systematic errors.
Both random error and systematic error will undermine the validity of the clinical
measure (6). Systematic error is of particular concern since its effect on reliability can go
undetected; thus, a clinician may assume the clinical measure is reliable and proceed with
its use. In the study example, range-of-motion measurements were taken by placing a
precision protractor on a video monitor's screen and measuring both beginning and ending
angular measurements. If the numbers marked on the protractor were in error by three
degrees, then all measurements made with that protractor would be off by three degrees in
the same direction. One can see that this systematic error would not affect the reliability of
the measurements taken by the investigators, but statements about the average amount of
axial rotation permitted by an orthosis would carry with it the error of three degrees. This
illustrates the intimate relationship between reliability and validity and the influence of
measurement error on both of these important characteristics of measures.

Quantification of Measurement Error


Several statistical techniques are available to clinical researchers to allow them to
quantify the errors associated with their measures. In instances where the research project's
purpose is to predict measures of central tendency (e.g., mean) of a variable for a set of
subjects so that the researcher and reader may generalize this value to a population of
similar patients who were not studied, both confidence intervals and the standard error of
the mean are very useful tools (9). Applying these tools in the sample study would allow the
investigator to report the associated error (e.g., +/- 5 degrees) along with the estimate of
average cervical axial rotation available in each orthotic condition. These analyses improve
the reader's confidence in the reported means, especially those derived from studies with
small sample sizes.
Other Sources of Error in Research
Measurement error is considered the primary source of error in most research
designs. Errors in sampling, instrumentation, procedures and data analysis also occur so that
a thorough review of all aspects of the research design should be undertaken before
beginning an experiment. The search for errors of procedure and technique can be
facilitated by conducting a pilot study.
Particular attention should be paid to all procedures that might affect the measurements,
including lack of consistent stabilization of subjects in equipment, inconsistent instructions
given to subjects, inaccurate procedures for reading measurement dials or gauges (parallax),
inadequate procedures for initiating timing sequences, equipment failures and plans for
backup equipment. If the interaction between the subject and any research equipment is
novel, procedures should allow the subject to become thoroughly familiar with the
equipment prior to the recording of any measurements to eliminate learning effects in the
study.

As quickly becomes obvious, not all errors can be completely eliminated from
clinical investigations. Attempts to control sources of errors in measurement and procedures
of research are not unlike the tension between internal and external experimental validity
discussed by Lunsford (11). The application of too great an effort to control error may
enhance reliability but somewhat decrease validity. Reasonable efforts to assess
measurement properties should be expected of those conducting clinical investigations.
27

You might also like