Iccs NRC (Feb10) Iccsworkshop Pvs

Introduction to
plausible values
National Research Coordinators Meeting

Madrid, February 2010
Content of presentation
NRC
Meeting
Madrid
February 2010
Rationale for scaling

Rasch model and possible ability
estimates
Shortcomings of point estimates
Drawing plausible values
Computation of measurement error
Rationale for IRT scaling of data
NRC
Meeting
Madrid
February 2010
Summarising data instead of

dealing with many single items
Raw scores or percent correct
sample-dependent
Makes equating possible and can
deal with rotated test forms
The Rasch model

Models the probability to respond
correctly to an item as
exp n i
Pi ( X ni 1)
1 exp n i
NRC
Meeting
Madrid
February 2010
Likewise, the probability of NOT

responding correctly is modelled as
1
P ( X ni 0)
1 exp( n i )
IRT curves
NRC
Meeting
Madrid
February 2010
How might we impute a

reasonable proficiency value?
Choose the proficiency that makes the
score most likely
Maximum Likelihood Estimate
Weighted Likelihood Estimate
NRC
Meeting
Madrid
February 2010
Choose the most likely proficiency for

the score
empirical Bayes
Choose a selection of likely

proficiencies for the score
Multiple imputations (plausible values)
Maximum Likelihood vs. Raw

Score
5
Score
NRC
Meeting
Madrid
February 2010
3
2
1
0
Proficiency
The Resulting Proficiency

Distribution
Score 3
Score 4
Score 2
Score 5
Score 1
Score 0
Score 6
NRC
Meeting
Madrid
February 2010
Proficiency on Logit Scale
Characteristics of Maximum
Likelihood Estimates (MLE)
NRC
Meeting
Madrid
February 2010
Unbiased at individual level with

sufficient information BUT biased
towards ends of ability scale.
Arbitrary treatment of perfects and
zeroes required
Discrete scale & measurement
error leads to bias in population
parameter estimates
Characteristics of Weighted
Likelihood Estimates
NRC
Meeting
Madrid
February 2010
Less biased than MLE

Provides estimates for perfect and
zero scores
BUT discrete scale &
measurement error leads to bias in
population parameter estimates
Plausible Values
What are plausible values?
Why do we use them?
How to analyse plausible values?
NRC
Meeting
Madrid
February 2010
Purpose of educational tests

Measure particular students
(minimise measurement error of
individual estimates)
NRC
Meeting
Madrid
February 2010
Assess populations
(minimise error when generalising
to the population)
Posterior distributions
for test scores on 6 dichotomous
items
NRC
Meeting
Madrid
February 2010
Empirical Bayes
Expected A-Priori estimates (EAP)
NRC
Meeting
Madrid
February 2010
Characteristics of EAPs
NRC
Meeting
Madrid
February 2010
Biased at the individual level but unbiased

population means (NOT variances)
Discrete scale, bias & measurement error
leads to bias in population parameter
estimates
Requires assumptions about the distribution
of proficiency in the population
Plausible Values
Score 3
Score 4
Score 2
Score 5
Score 1
NRC
Meeting
Madrid
February 2010
Score 6
Score 0
Proficiency on Logit Scale
Characteristics of
Plausible Values
Not fair at the student level
Produces unbiased population parameter
estimates
if assumptions of scaling are reasonable
NRC
Meeting
Madrid
February 2010
Requires assumptions about the

distribution of proficiency
Estimating percentages below

benchmark with Plausible Values
Level One Cutpoint
NRC
Meeting
Madrid
February 2010
The proportion of plausible values less than the

cut-point will be a superior estimator to the EAP,
MLE or WLE based values
Methodology of PVs
NRC
Meeting
Madrid
February 2010
Mathematically computing
posterior distributions around test
scores
Drawing 5 random values for each
assessed individual from the
posterior distribution for that
individual
What is conditioning?
2
N
Assuming normal posterior distribution:
N X , 2
Model sub-populations:
X=0 for boy
X=1 for girl
0.45
0.4
0.35
0.3
0.25
NRC
Meeting
Madrid
February 2010
0.2
0.15
0.1
0.05
N X Y Z ..., 2
69
65
61
57
53
49
45
41
37
33
29
25
21
17
13
Conditioning Variables
NRC
Meeting
Madrid
February 2010
Plausible values should only be analysed

with data that were included in the
conditioning (otherwise, results may be
biased)
Aim: Maximise information included in the
conditioning, that is use as many
variables as possible
To reduce number of conditioning
variables, factor scores from principal
component analysis were used in ICCS
Use of classroom dummies takes
between-school variation into account (no
inclusion of school or teacher
questionnaire data needed)
Plausible values
NRC
Meeting
Madrid
February 2010
Model with conditioning variables will

improve precision of prediction of ability
(population estimates ONLY)
Conditioning provides unbiased
estimates for modelled parameters.
Simulation studies comparing PVs, EAPs
and WLEs show that
Population means similar results
WLEs (or MLEs) tend to overestimate variances
EAPs tend to underestimate variance
Calculating of measurement
error
NRC
Meeting
Madrid
February 2010
As in TIMSS or PIRLS data files,

there are five plausible values for
cognitive test scales in ICCS
Using five plausible values enable
researchers to obtain estimates of the
measurement error
How to analyse PVs - 1

Estimated mean is the AVERAGE
of the mean for each PV
1

M
NRC
Meeting
Madrid
February 2010

i 1
Sampling variance is the

AVERAGE of the sampling
variance for each PV
(2 )
( i )
i 1

Measurement variance computed
2
5
as: 2
1
( PV )
NRC
Meeting
Madrid
February 2010
M 1
i 1
Total standard error computed from

measurement and sampling
variance as:
1 2
2
( PV ) ( ) (1 ) ( PV )
M
NRC
Meeting
Madrid
February 2010
can be replaced by any statistic

for instance:
- SD
- Percentile
- Correlation coefficient
- Regression coefficient
- R-square
- etc.
Steps for estimating both sampling

and measurement error
NRC
Meeting
Madrid
February 2010
Compute statistic for each PV for fully

weighted sample
Compute statistics for each PV for 75
replicate samples
Compute sampling error (based on
previous steps)
Compute measurement error
Combine error variances to calculate
standard error
Questions or comments?
NRC
Meeting
Madrid
February 2010

Iccs NRC (Feb10) Iccsworkshop Pvs

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Iccs NRC (Feb10) Iccsworkshop Pvs

Uploaded by

Copyright:

Available Formats

Introduction to

National Research Coordinators Meeting

Rationale for scaling

Rationale for IRT scaling of data

Summarising data instead of

The Rasch model

Likewise, the probability of NOT

How might we impute a

Choose the most likely proficiency for

Choose a selection of likely

Maximum Likelihood vs. Raw

The Resulting Proficiency

Proficiency on Logit Scale

Unbiased at individual level with

Less biased than MLE

Purpose of educational tests

Biased at the individual level but unbiased

Proficiency on Logit Scale

Requires assumptions about the

Estimating percentages below

The proportion of plausible values less than the

Assuming normal posterior distribution:

Plausible values should only be analysed

Model with conditioning variables will

As in TIMSS or PIRLS data files,

How to analyse PVs - 1

Sampling variance is the

How to analyse PVs - 2

Total standard error computed from

How to analyse PVs - 3

can be replaced by any statistic

Steps for estimating both sampling

Compute statistic for each PV for fully

You might also like