You are on page 1of 28

Introduction to

plausible values

National Research Coordinators Meeting


Madrid, February 2010

Content of presentation

NRC
Meeting
Madrid
February 2010

Rationale for scaling


Rasch model and possible ability
estimates
Shortcomings of point estimates
Drawing plausible values
Computation of measurement error

Rationale for IRT scaling of data

NRC
Meeting
Madrid
February 2010

Summarising data instead of


dealing with many single items
Raw scores or percent correct
sample-dependent
Makes equating possible and can
deal with rotated test forms

The Rasch model


Models the probability to respond
correctly to an item as

exp n i
Pi ( X ni 1)
1 exp n i

NRC
Meeting
Madrid
February 2010

Likewise, the probability of NOT


responding correctly is modelled as

1
P ( X ni 0)
1 exp( n i )

IRT curves

NRC
Meeting
Madrid
February 2010

How might we impute a


reasonable proficiency value?
Choose the proficiency that makes the
score most likely
Maximum Likelihood Estimate
Weighted Likelihood Estimate
NRC
Meeting
Madrid
February 2010

Choose the most likely proficiency for


the score
empirical Bayes

Choose a selection of likely


proficiencies for the score
Multiple imputations (plausible values)

Maximum Likelihood vs. Raw


Score
5

Score

NRC
Meeting
Madrid
February 2010

3
2
1
0
Proficiency

The Resulting Proficiency


Distribution
Score 3
Score 4
Score 2

Score 5

Score 1

Score 0
Score 6

NRC
Meeting
Madrid
February 2010

Proficiency on Logit Scale

Characteristics of Maximum
Likelihood Estimates (MLE)

NRC
Meeting
Madrid
February 2010

Unbiased at individual level with


sufficient information BUT biased
towards ends of ability scale.
Arbitrary treatment of perfects and
zeroes required
Discrete scale & measurement
error leads to bias in population
parameter estimates

Characteristics of Weighted
Likelihood Estimates

NRC
Meeting
Madrid
February 2010

Less biased than MLE


Provides estimates for perfect and
zero scores
BUT discrete scale &
measurement error leads to bias in
population parameter estimates

Plausible Values
What are plausible values?
Why do we use them?
How to analyse plausible values?
NRC
Meeting
Madrid
February 2010

Purpose of educational tests


Measure particular students
(minimise measurement error of
individual estimates)
NRC
Meeting
Madrid
February 2010

Assess populations
(minimise error when generalising
to the population)

Posterior distributions
for test scores on 6 dichotomous
items

NRC
Meeting
Madrid
February 2010

Empirical Bayes
Expected A-Priori estimates (EAP)

NRC
Meeting
Madrid
February 2010

Characteristics of EAPs

NRC
Meeting
Madrid
February 2010

Biased at the individual level but unbiased


population means (NOT variances)
Discrete scale, bias & measurement error
leads to bias in population parameter
estimates
Requires assumptions about the distribution
of proficiency in the population

Plausible Values
Score 3

Score 4

Score 2
Score 5

Score 1

NRC
Meeting
Madrid
February 2010

Score 6

Score 0

Proficiency on Logit Scale

Characteristics of
Plausible Values
Not fair at the student level
Produces unbiased population parameter
estimates
if assumptions of scaling are reasonable
NRC
Meeting
Madrid
February 2010

Requires assumptions about the


distribution of proficiency

Estimating percentages below


benchmark with Plausible Values
Level One Cutpoint

NRC
Meeting
Madrid
February 2010

The proportion of plausible values less than the


cut-point will be a superior estimator to the EAP,
MLE or WLE based values

Methodology of PVs

NRC
Meeting
Madrid
February 2010

Mathematically computing
posterior distributions around test
scores
Drawing 5 random values for each
assessed individual from the
posterior distribution for that
individual

What is conditioning?

2
N

Assuming normal posterior distribution:

N X , 2

Model sub-populations:
X=0 for boy
X=1 for girl

0.45
0.4
0.35
0.3
0.25

NRC
Meeting
Madrid
February 2010

0.2
0.15
0.1
0.05

N X Y Z ..., 2

69

65

61

57

53

49

45

41

37

33

29

25

21

17

13

Conditioning Variables

NRC
Meeting
Madrid
February 2010

Plausible values should only be analysed


with data that were included in the
conditioning (otherwise, results may be
biased)
Aim: Maximise information included in the
conditioning, that is use as many
variables as possible
To reduce number of conditioning
variables, factor scores from principal
component analysis were used in ICCS
Use of classroom dummies takes
between-school variation into account (no
inclusion of school or teacher
questionnaire data needed)

Plausible values

NRC
Meeting
Madrid
February 2010

Model with conditioning variables will


improve precision of prediction of ability
(population estimates ONLY)
Conditioning provides unbiased
estimates for modelled parameters.
Simulation studies comparing PVs, EAPs
and WLEs show that
Population means similar results
WLEs (or MLEs) tend to overestimate variances
EAPs tend to underestimate variance

Calculating of measurement
error

NRC
Meeting
Madrid
February 2010

As in TIMSS or PIRLS data files,


there are five plausible values for
cognitive test scales in ICCS
Using five plausible values enable
researchers to obtain estimates of the
measurement error

How to analyse PVs - 1


Estimated mean is the AVERAGE
of the mean for each PV
1

M
NRC
Meeting
Madrid
February 2010


i 1

Sampling variance is the


AVERAGE of the sampling
variance for each PV
(2 )

( i )
i 1

How to analyse PVs - 2


Measurement variance computed
2
5
as: 2
1
( PV )

NRC
Meeting
Madrid
February 2010

M 1
i 1

Total standard error computed from


measurement and sampling
variance as:
1 2
2
( PV ) ( ) (1 ) ( PV )
M

How to analyse PVs - 3

NRC
Meeting
Madrid
February 2010

can be replaced by any statistic


for instance:
- SD
- Percentile
- Correlation coefficient
- Regression coefficient
- R-square
- etc.

Steps for estimating both sampling


and measurement error

NRC
Meeting
Madrid
February 2010

Compute statistic for each PV for fully


weighted sample
Compute statistics for each PV for 75
replicate samples
Compute sampling error (based on
previous steps)
Compute measurement error
Combine error variances to calculate
standard error

Questions or comments?

NRC
Meeting
Madrid
February 2010

You might also like