The Vagaries of The Vignette World Revisited: Method Effects in Factorial Surveys

GESIS Vortragsreihe, Mannheim, July 07, 2015
The Vagaries of the Vignette World Revisited:

Method Effects in Factorial Surveys
Katrin Auspurg (Goethe University, Frankfurt)
7. Juli 2015
Applications of Factorial Surveys in Social Sciences
1951: Established by Peter H. Rossi in his dissertation

1982: First introductory text book by Rossi and Nock
1982 2006: 106 studies published in core journals of sociology
Source: Thompsons Web

of Science
7. Juli 2015 2
Overview
1. Idea and Motivation of Factorial Surveys

2. Methodological Research: Nothing More Than Artifacts?
3. Recommendations
Response Scales
Experimental Design
7. Juli 2015 3
Factorial Surveys: Idea and Motivation
Combination of experiment and survey research. Respondents evaluate short descriptions

of objects or situations (vignettes). Within these descriptions attributes (dimensions)
experimentally vary on their levels.
7. Juli 2015 4

7. Juli 2015 5

7. Juli 2015 6
Research Aims and Advantages
Measuring influence of:

Dimensions on evaluations
Respondents characteristics on vignette evaluations
Amount of social consensus across respondents.
Advantages:
Experimental design
Heterogeneous respondents
Less social desirability bias
7. Juli 2015 7
Example: Discrimination of Female Employees?
Persisting pay gaps in the US and European countries.
Caused by discrimination?
Underlying mechanisms?
7. Juli 2015 8
Results for Germany: Mean Justice Evaluations
Source: Project Factorial Survey Design, general population sample 2009
7. Juli 2015 9
Results for Germany: Mean Justice Evaluations
Source: Project Factorial Survey Design, general population sample 2009
7. Juli 2015 10
Just Gender Pay Gaps in %
8.88 9.78
7.25 7.18
5.81
male female without vocational university

degree degree degree
respondent vignette person
7. Juli 2015 11
Hints on Underlying Mechanisms
Some evidence for double standards (e.g. education)

Evidence for priming processes: just gender pay gaps were more pronounced for
respondents how experienced strong actual pay gaps in their own occupations.
Only few evidence for statistical discrimination (pay gaps were found to be very resistant to
increasing amounts of information on employees performance).
7. Juli 2015 12
Or nothing more than artifacts?
The vagaries of the vignette world revisited
7. Juli 2015 13
Criticism: A Test of Intelligence (Faia 1980)
The combination of dimensions often leads to implausible or even illogical cases.

This might cause respondents to rate the plausibility instead of fairness.
(Respondents may have been sophisticated enough to recognize functional untenabilities and to classify
such vignettes as overpaid or underpaid largely on the basis of size of income alone; Faia 1980: 952).
Research on the effects of implausible vignettes showed that respondents place less attention
on the dimensions that cause the implausibility (Auspurg et al. 2009).
So one should exclude at least very implausible cases. But the remaining vignette cases still
might vary in their plausibility.
Research on the construct validity is lacking.
(Do respondents really rate fairness?).
7. Juli 2015
Criticism: Untenable Results? (Markovsky/Eriksson 2012)
The indirect evaluations could lead to untenable, biased results in regard to fair earnings
(M/E 2012 199); one reason would be strong range and anchor effects.
Using direct measurements as a reference point (directly asking for fair amounts of earnings without
providing anchors in the vignettes) M/E found indirect measurements to be completely unreliable:
Median fair annual earnings per vignette
800 r = - .07
(without outlier: .25)
indirect measurement [100,000 USD]
600
400
200
0
0 20 40 60 80 100
direct measurement [100,000$]
Source: own figure based on M/E 2012; n = 20 vignettes/45 students
7. Juli 2015
Criticism: Untenable Results? (Markovsky/Eriksson 2012)
The indirect evaluations could lead to untenable, biased results in regard to fair earnings
(M/E 2012 199); one reason would be strong range and anchor effects.
Using direct measurements as a reference point (directly asking for fair amounts of earnings without
providing anchors in the vignettes) M/E found indirect measurements to be completely unreliable:
Median fair annual earnings per vignette
800 r = - .07 But: really caused by indirect evaluations and

(without outlier: .25) anchor effects?
600
M/E actually tested only one very specific answer

400
scale (magnitude scale), and used in general
very specific design features: e.g. random
samples of vignettes in combination with Jassos
200
0
Justice Evaluation Function
0 20 40 60 80 100 (replication of Jasso/Meyersson Milgrom 2000).
direct measurement [100,000$]
Source: own figure based on M/E 2012; n = 20 vignettes/45 students
7. Juli 2015
Revisiting Research with Split Ballot Experiments
Research Project The Factorial Survey as a Method for Measuring Attitudes in

Population Surveys (funded by the German Research Foundation, DFG 2007-2010; principal investigators:
Thomas Hinz & Stefan Liebig).
Split ballot experiments focusing on more than 60 different design features.

E.g., separate variation of:
response scales (rating, magnitude, direct questioning for amounts)
range of earnings presented in the vignettes
number of dimensions and vignettes, number of levels
order of vignettes and dimensions

Several online surveys with university students (all in all > n = 2,000 students)
[Several large scale population surveys is Germany, Ukraine, Kirgizstan]
7. Juli 2015
Simply Artifacts? Construct Validity and Anchor Effects
Survey with university students, summer term 2011 (n= 89)
2x2 experimental design:

Evaluation task: fairness | plausibility
Range of vignette income: low | high
Design and number of cases: respondents (vignette judgements)

Respondents askedtoevaluate
fairness plausibility
22 22
Rangeofearnings 500to10,000
(438) (438)
presentedin 23 22
vignettes 750to 15,000 (460) (437)
7. Juli 2015
Range and Anchor Effects Reasons
One reasons could be priming effects caused by the income values indicated in the vignettes
(assimilation or contrast effects; M/E 2012; Groves et al. 2011).
Another reason could exist with respondents tendency to use the whole response scale,
including their endpoints (Tourangeau et al. 2000):
VPs income ranging from

500 to 10,000 euros
VPs income ranging from

750 to 15,000 euros
7. Juli 2015
Simply Artifacts? Impact of Range (Anchors)
Mean evaluations with 95%-CIs.
fair unfairly high

justice evaluation
unfairly low
0 5000 10000 15000

income
low high
estimation with robust standard errors
n= 89 respondents; n= 1773 vignette judgments.
7. Juli 2015
Simply Artefacts? Impact of Evaluation Task
9,000
8,000
7,000
6,000
euros
5,000
Vignette earnings
4,000
3,000
2,000
1,000
0
n = 89 respondents, n = 1773 vignette judgements
7. Juli 2015
9,000
8,000
7,000
6,000
Vignette earnings
euros
5,000
4,000 Fair earnings
3,000
2,000
1,000
0
7. Juli 2015
9,000
8,000
7,000
6,000 Vignette earnings
euros
5,000
Fair earnings
4,000
Plausible earnings
3,000
2,000
1,000
0
7. Juli 2015
9,000
8,000
7,000
6,000 Vignette earnings
euros
5,000 Fair earnings
4,000 Plausible earnings

Real earnings
3,000
2,000
1,000
0
7. Juli 2015
Reliability Revisited
Further experiment, comparing rating scales and direct evaluations; D-efficient sample of
vignettes, simple cross-elasticities to calculate fair earnings, n = 68 university students.
Median fair earnings per vignette (annual/monthly)

M/E (2012) Own Survey
800 r = - .07 7000 r = .70***

(without outlier: .25)

6000
indirect measurement
600
5000
400
4000
200
3000
2000
0
0 20 40 60 80 100 2000 4000 6000 8000 10000 12000
direct measurement [100,000$] direct measurement
n= 20 vignettes/45 students, magnitude scale, n= 240 vignettes/68 students, rating scale,

random vignette sample, Jasso formula. D-efficient vignette sample, cross elasticities
7. Juli 2015
The design matters!
Recommendations how to increase reliability and validity
7. Juli 2015 26
Use Standard Response Scales
7. Juli 2015 27
Example for a Magnitude Scale
Jasso 2006, fairness of incomes of CEOs:
7. Juli 2015
Own Survey (Split Ballot)
Experiment with n = 448 university students; online-survey; three-step answer scale (a) fair
or not? b) if not: too high or too low? c) if not: which number would indicate the amount of
injustice?
Respondents didnt use magnitude scales as metric scales.
25
20
15
percent
10
5
0
0 50 100 150 200

fair unfairly paid
magnitude of unjustice
Source: Project Factorial Survey Design, student sample
7. Juli 2015
Own Survey (Split Ballot)
Rating scales clearly outperformed magnitude scales (less missing values, less response-
sets, less implausible results).
Rating Magnitude
Proportion missing values 3% 14%
Proportion of vignettes
23% 32%
evaluated as fair
Source: Project Factorial Survey Design, student sample
With magnitude scales we got implausible and unreliable (or untenable) results that
resembled the findings of Markovsky/Erikson (2012).
At least in self-administered surveys we recommend to use standard response-scales.
7. Juli 2015
Use smart experimental designs for sampling vignettes
7. Juli 2015 31
Sampling Means Confounding
Sampling means reducing the number of vignettes (combinations of dimension levels).

Therewith, some stimuli or regressors get confounded: they are perfectly correlated,
meaning that their effects cannot be separated.
For example, what happens if one samples two vignettes from a 2x2-universe? 4
vignettes with two dimensions (D1, D2) both having 2 levels (+1/-1):
7. Juli 2015 32

SampleA
SampleB
7. Juli 2015 33

Maineffectsgetconfounded
withinteractionterms.
Datadonotallowdeciding
SampleA whichequationsholds:
Y=1D1 +2D2
SampleB
Y=1D1 +2D1D2
7. Juli 2015 34
Dont Leave Confoundings to Chance!
With random samples one leaves confounding patterns simply to chance (Kuhfeld et al. 1994;
Carson et al. 2009).
In contrast, deliberate, fractionalized D-efficient samples allow controlling which effects get
confounded.
At the same time they allow optimizing further target criteria (like precision of parameter
estimates): The higher the D-efficiency, the more precise are parameter estimates.
(With one sample being x-times as efficient as another sample, one would need only about 1/x as many respondents
to get the same amount of precision of parameter estimates; see also Dlmer 2007, forthocoming)
7. Juli 2015 35
Summary and Recommendations
Factorial survey experiments provide appealing possibilities for analyzing discrimination,

testing theories or producing meaningful measurements such as just gender pay gaps.
But: valid results require good (experimental) designs. Based on more than 60 split-ballot
experiments, we derived several recommendations (Auspurg/Hinz 2015):
1. Use standard response scales.
2. Use fractionalized, D-efficient vignette samples.
3. Use a mid-level of complexity (seven plus/minus two dimensions).
4. Ensure a random allocation of vignettes and decks to respondents.
5. Prevent the occurrence of many implausible vignettes.
6. Consider the hierarchical data structure (cluster robust standard errors, multilevel analyses).
7. Juli 2015 36
Thank you for your attention!
Further information:
https://study.sagepub.com/auspurg_hinz
http://www.fb03.uni-frankfurt.de/49649875/publikationen
7. Juli 2015 37
References
Auspurg, K./Hinz, T. (2015): Factorial Survey Experiments. Thousand Oaks: Sage.

Auspurg, K./Hinz, T./Liebig, S. (2009): Complexity, learning effects and plausibility of vignettes in factorial surveys. Paper
presented at the 104th Annual Meeting of the American Sociological Association (ASA), San Francisco.
Dlmer, H. (2007): Experimental Plans in Factorial Surveys: Random or Quota Design? Sociological Methods & Research
35: 382-409.
Faia, M.1980. The vagaries of the vignette world: A comment on Alves and Rossi. American Journal of Sociology, 85(1),
951-954.
Groves, R. M./Fowler Jr, F. J./Couper, M. P./Lepkowski, J. M./Singer, E./Tourangeau, R. (2009): Survey methodology.
Hoboken, NJ: John Wiley & Sons
Homans, G. C. (1974): Social Behavior. Its elementary forms. New York: Harcourt, Brace & World.
Jasso, G. (2006): Factorial survey methods for studying beliefs and judgments. Sociological Methods and Research,
34(3), 334-423.
Jasso, G./Meyersson Milgrom, E. M. 2008. Distributive justice and CEO compensation. Acta Sociologica, 51(2), 123-143.
Markovsky, B./Eriksson, K. (2012). Comparing direct and indirect measures of just rewards: What have we learned?
Sociological Methods and Research, 41(1), 240-245.
Rossi, . H./Anderson, A. B. (1982): The Factorial Survey Approach: An Introduction. In: Rossi, Peter H./Nock, Steve L.
(Eds.): Measuring Social Judgments. The Factorial Survey Approach. Beverly Hills: 15-67.
Tourangeau, R./Rips, L. J./Rasinski, K. A. (2000): The psychology of survey response. Cambridge: Cambridge University
Press.
7. Juli 2015 38
Appendix: Censoring of Responses? Vignette Order Effects
Does the order of vignettes influence parameter estimates?

random order vs. extreme cases first
Source: own data, student sample
No differences between random order vs. extreme cases first!

If ceiling effects occur: cluster robust (!) Tobit or multi-level Tobit regressions.
7. Juli 2015 39
Appendix: Simulations Efficiency by Sample Size
Correlations between dimensions:

Are throughout lower for D-efficient samples.
e.g.:
Mean correlations when sampling 50
vignettes out of a universe of 432 vignettes:
D-eff. : .018; random: .071
Maximum correlations when sampling 100
vignettes: D-eff. : .050; random: .190.
With D-efficient samples much less respondents are needed for the same amount of precision
of parameter estimates, and one has control which effects get confounded.
7. Juli 2015 40

The Vagaries of The Vignette World Revisited: Method Effects in Factorial Surveys

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Vagaries of The Vignette World Revisited: Method Effects in Factorial Surveys

Uploaded by

Copyright:

Available Formats

GESIS Vortragsreihe, Mannheim, July 07, 2015

The Vagaries of the Vignette World Revisited:

Katrin Auspurg (Goethe University, Frankfurt)

1951: Established by Peter H. Rossi in his dissertation

Source: Thompsons Web

1. Idea and Motivation of Factorial Surveys

Combination of experiment and survey research. Respondents evaluate short descriptions

Combination of experiment and survey research. Respondents evaluate short descriptions

Combination of experiment and survey research. Respondents evaluate short descriptions

Measuring influence of:

Persisting pay gaps in the US and European countries.

Source: Project Factorial Survey Design, general population sample 2009

Source: Project Factorial Survey Design, general population sample 2009

male female without vocational university

Some evidence for double standards (e.g. education)

The vagaries of the vignette world revisited

The combination of dimensions often leads to implausible or even illogical cases.

Median fair annual earnings per vignette

Source: own figure based on M/E 2012; n = 20 vignettes/45 students

Median fair annual earnings per vignette

800 r = - .07 But: really caused by indirect evaluations and

M/E actually tested only one very specific answer

Source: own figure based on M/E 2012; n = 20 vignettes/45 students

Research Project The Factorial Survey as a Method for Measuring Attitudes in

Split ballot experiments focusing on more than 60 different design features.

Survey with university students, summer term 2011 (n= 89)

2x2 experimental design:

Design and number of cases: respondents (vignette judgements)

VPs income ranging from

VPs income ranging from

fair unfairly high

0 5000 10000 15000

n= 89 respondents; n= 1773 vignette judgments.

n = 89 respondents, n = 1773 vignette judgements

n = 89 respondents, n = 1773 vignette judgements

n = 89 respondents, n = 1773 vignette judgements

5,000 Fair earnings

4,000 Plausible earnings

n = 89 respondents, n = 1773 vignette judgements

Median fair earnings per vignette (annual/monthly)

800 r = - .07 7000 r = .70***

(without outlier: .25)

n= 20 vignettes/45 students, magnitude scale, n= 240 vignettes/68 students, rating scale,

Jasso 2006, fairness of incomes of CEOs:

0 50 100 150 200

Source: Project Factorial Survey Design, student sample

Source: Project Factorial Survey Design, student sample

Sampling means reducing the number of vignettes (combinations of dimension levels).

Sampling means reducing the number of vignettes (combinations of dimension levels).

Sampling means reducing the number of vignettes (combinations of dimension levels).

Factorial survey experiments provide appealing possibilities for analyzing discrimination,

Auspurg, K./Hinz, T. (2015): Factorial Survey Experiments. Thousand Oaks: Sage.

Does the order of vignettes influence parameter estimates?

Source: own data, student sample

No differences between random order vs. extreme cases first!

Correlations between dimensions:

You might also like