You are on page 1of 40

GESIS Vortragsreihe, Mannheim, July 07, 2015

The Vagaries of the Vignette World Revisited:


Method Effects in Factorial Surveys

Katrin Auspurg (Goethe University, Frankfurt)

7. Juli 2015
Applications of Factorial Surveys in Social Sciences

1951: Established by Peter H. Rossi in his dissertation


1982: First introductory text book by Rossi and Nock
1982 2006: 106 studies published in core journals of sociology

Source: Thompsons Web


of Science
7. Juli 2015 2
Overview

1. Idea and Motivation of Factorial Surveys


2. Methodological Research: Nothing More Than Artifacts?
3. Recommendations
Response Scales
Experimental Design

7. Juli 2015 3
Factorial Surveys: Idea and Motivation

Combination of experiment and survey research. Respondents evaluate short descriptions


of objects or situations (vignettes). Within these descriptions attributes (dimensions)
experimentally vary on their levels.

7. Juli 2015 4
Factorial Surveys: Idea and Motivation

Combination of experiment and survey research. Respondents evaluate short descriptions


of objects or situations (vignettes). Within these descriptions attributes (dimensions)
experimentally vary on their levels.

7. Juli 2015 5
Factorial Surveys: Idea and Motivation

Combination of experiment and survey research. Respondents evaluate short descriptions


of objects or situations (vignettes). Within these descriptions attributes (dimensions)
experimentally vary on their levels.

7. Juli 2015 6
Research Aims and Advantages

Measuring influence of:


Dimensions on evaluations
Respondents characteristics on vignette evaluations
Amount of social consensus across respondents.

Advantages:
Experimental design
Heterogeneous respondents
Less social desirability bias

7. Juli 2015 7
Example: Discrimination of Female Employees?

Persisting pay gaps in the US and European countries.

Caused by discrimination?
Underlying mechanisms?

7. Juli 2015 8
Results for Germany: Mean Justice Evaluations

Source: Project Factorial Survey Design, general population sample 2009

7. Juli 2015 9
Results for Germany: Mean Justice Evaluations

Source: Project Factorial Survey Design, general population sample 2009

7. Juli 2015 10
Just Gender Pay Gaps in %

8.88 9.78
7.25 7.18
5.81

male female without vocational university


degree degree degree
respondent vignette person

7. Juli 2015 11
Hints on Underlying Mechanisms

Some evidence for double standards (e.g. education)


Evidence for priming processes: just gender pay gaps were more pronounced for
respondents how experienced strong actual pay gaps in their own occupations.
Only few evidence for statistical discrimination (pay gaps were found to be very resistant to
increasing amounts of information on employees performance).

7. Juli 2015 12
Or nothing more than artifacts?

The vagaries of the vignette world revisited

7. Juli 2015 13
Criticism: A Test of Intelligence (Faia 1980)

The combination of dimensions often leads to implausible or even illogical cases.


This might cause respondents to rate the plausibility instead of fairness.
(Respondents may have been sophisticated enough to recognize functional untenabilities and to classify
such vignettes as overpaid or underpaid largely on the basis of size of income alone; Faia 1980: 952).
Research on the effects of implausible vignettes showed that respondents place less attention
on the dimensions that cause the implausibility (Auspurg et al. 2009).
So one should exclude at least very implausible cases. But the remaining vignette cases still
might vary in their plausibility.
Research on the construct validity is lacking.
(Do respondents really rate fairness?).

7. Juli 2015
Criticism: Untenable Results? (Markovsky/Eriksson 2012)

The indirect evaluations could lead to untenable, biased results in regard to fair earnings
(M/E 2012 199); one reason would be strong range and anchor effects.
Using direct measurements as a reference point (directly asking for fair amounts of earnings without
providing anchors in the vignettes) M/E found indirect measurements to be completely unreliable:

Median fair annual earnings per vignette

800 r = - .07
(without outlier: .25)
indirect measurement [100,000 USD]

600

400

200

0
0 20 40 60 80 100
direct measurement [100,000$]

Source: own figure based on M/E 2012; n = 20 vignettes/45 students

7. Juli 2015
Criticism: Untenable Results? (Markovsky/Eriksson 2012)

The indirect evaluations could lead to untenable, biased results in regard to fair earnings
(M/E 2012 199); one reason would be strong range and anchor effects.
Using direct measurements as a reference point (directly asking for fair amounts of earnings without
providing anchors in the vignettes) M/E found indirect measurements to be completely unreliable:

Median fair annual earnings per vignette

800 r = - .07 But: really caused by indirect evaluations and


(without outlier: .25) anchor effects?
indirect measurement [100,000 USD]

600

M/E actually tested only one very specific answer


400
scale (magnitude scale), and used in general
very specific design features: e.g. random
samples of vignettes in combination with Jassos
200

0
Justice Evaluation Function
0 20 40 60 80 100 (replication of Jasso/Meyersson Milgrom 2000).
direct measurement [100,000$]

Source: own figure based on M/E 2012; n = 20 vignettes/45 students

7. Juli 2015
Revisiting Research with Split Ballot Experiments

Research Project The Factorial Survey as a Method for Measuring Attitudes in


Population Surveys (funded by the German Research Foundation, DFG 2007-2010; principal investigators:
Thomas Hinz & Stefan Liebig).

Split ballot experiments focusing on more than 60 different design features.


E.g., separate variation of:
response scales (rating, magnitude, direct questioning for amounts)
range of earnings presented in the vignettes
number of dimensions and vignettes, number of levels
order of vignettes and dimensions

Several online surveys with university students (all in all > n = 2,000 students)
[Several large scale population surveys is Germany, Ukraine, Kirgizstan]

7. Juli 2015
Simply Artifacts? Construct Validity and Anchor Effects

Survey with university students, summer term 2011 (n= 89)

2x2 experimental design:


Evaluation task: fairness | plausibility
Range of vignette income: low | high

Design and number of cases: respondents (vignette judgements)


Respondents askedtoevaluate
fairness plausibility
22 22
Rangeofearnings 500to10,000
(438) (438)
presentedin 23 22
vignettes 750to 15,000 (460) (437)

7. Juli 2015
Range and Anchor Effects Reasons
One reasons could be priming effects caused by the income values indicated in the vignettes
(assimilation or contrast effects; M/E 2012; Groves et al. 2011).

Another reason could exist with respondents tendency to use the whole response scale,
including their endpoints (Tourangeau et al. 2000):

VPs income ranging from


500 to 10,000 euros

VPs income ranging from


750 to 15,000 euros

7. Juli 2015
Simply Artifacts? Impact of Range (Anchors)
Mean evaluations with 95%-CIs.

fair unfairly high


justice evaluation
unfairly low

0 5000 10000 15000


income

low high
estimation with robust standard errors

n= 89 respondents; n= 1773 vignette judgments.

7. Juli 2015
Simply Artefacts? Impact of Evaluation Task

9,000
8,000
7,000
6,000
euros

5,000
Vignette earnings
4,000
3,000
2,000
1,000
0

n = 89 respondents, n = 1773 vignette judgements

7. Juli 2015
Simply Artefacts? Impact of Evaluation Task

9,000
8,000
7,000
6,000
Vignette earnings
euros

5,000
4,000 Fair earnings
3,000
2,000
1,000
0

n = 89 respondents, n = 1773 vignette judgements

7. Juli 2015
Simply Artefacts? Impact of Evaluation Task

9,000
8,000
7,000
6,000 Vignette earnings
euros

5,000
Fair earnings
4,000
Plausible earnings
3,000
2,000
1,000
0

n = 89 respondents, n = 1773 vignette judgements

7. Juli 2015
Simply Artefacts? Impact of Evaluation Task

9,000
8,000
7,000
6,000 Vignette earnings
euros

5,000 Fair earnings

4,000 Plausible earnings


Real earnings
3,000
2,000
1,000
0

n = 89 respondents, n = 1773 vignette judgements

7. Juli 2015
Reliability Revisited

Further experiment, comparing rating scales and direct evaluations; D-efficient sample of
vignettes, simple cross-elasticities to calculate fair earnings, n = 68 university students.

Median fair earnings per vignette (annual/monthly)


M/E (2012) Own Survey

800 r = - .07 7000 r = .70***


indirect measurement [100,000 USD]

(without outlier: .25)


6000

indirect measurement
600

5000

400
4000

200
3000

2000
0
0 20 40 60 80 100 2000 4000 6000 8000 10000 12000
direct measurement [100,000$] direct measurement

n= 20 vignettes/45 students, magnitude scale, n= 240 vignettes/68 students, rating scale,


random vignette sample, Jasso formula. D-efficient vignette sample, cross elasticities
7. Juli 2015
The design matters!
Recommendations how to increase reliability and validity

7. Juli 2015 26
Use Standard Response Scales

7. Juli 2015 27
Example for a Magnitude Scale

Jasso 2006, fairness of incomes of CEOs:

7. Juli 2015
Own Survey (Split Ballot)

Experiment with n = 448 university students; online-survey; three-step answer scale (a) fair
or not? b) if not: too high or too low? c) if not: which number would indicate the amount of
injustice?
Respondents didnt use magnitude scales as metric scales.
25
20
15
percent
10
5
0

0 50 100 150 200


fair unfairly paid
magnitude of unjustice

Source: Project Factorial Survey Design, student sample

7. Juli 2015
Own Survey (Split Ballot)

Rating scales clearly outperformed magnitude scales (less missing values, less response-
sets, less implausible results).

Rating Magnitude
Proportion missing values 3% 14%
Proportion of vignettes
23% 32%
evaluated as fair

Source: Project Factorial Survey Design, student sample

With magnitude scales we got implausible and unreliable (or untenable) results that
resembled the findings of Markovsky/Erikson (2012).
At least in self-administered surveys we recommend to use standard response-scales.

7. Juli 2015
Use smart experimental designs for sampling vignettes

7. Juli 2015 31
Sampling Means Confounding

Sampling means reducing the number of vignettes (combinations of dimension levels).


Therewith, some stimuli or regressors get confounded: they are perfectly correlated,
meaning that their effects cannot be separated.
For example, what happens if one samples two vignettes from a 2x2-universe? 4
vignettes with two dimensions (D1, D2) both having 2 levels (+1/-1):

7. Juli 2015 32
Sampling Means Confounding

Sampling means reducing the number of vignettes (combinations of dimension levels).


Therewith, some stimuli or regressors get confounded: they are perfectly correlated,
meaning that their effects cannot be separated.
For example, what happens if one samples two vignettes from a 2x2-universe? 4
vignettes with two dimensions (D1, D2) both having 2 levels (+1/-1):

SampleA

SampleB

7. Juli 2015 33
Sampling Means Confounding

Sampling means reducing the number of vignettes (combinations of dimension levels).


Therewith, some stimuli or regressors get confounded: they are perfectly correlated,
meaning that their effects cannot be separated.
For example, what happens if one samples two vignettes from a 2x2-universe? 4
vignettes with two dimensions (D1, D2) both having 2 levels (+1/-1):

Maineffectsgetconfounded
withinteractionterms.
Datadonotallowdeciding
SampleA whichequationsholds:
Y=1D1 +2D2
SampleB
Y=1D1 +2D1D2

7. Juli 2015 34
Dont Leave Confoundings to Chance!

With random samples one leaves confounding patterns simply to chance (Kuhfeld et al. 1994;
Carson et al. 2009).

In contrast, deliberate, fractionalized D-efficient samples allow controlling which effects get
confounded.
At the same time they allow optimizing further target criteria (like precision of parameter
estimates): The higher the D-efficiency, the more precise are parameter estimates.
(With one sample being x-times as efficient as another sample, one would need only about 1/x as many respondents
to get the same amount of precision of parameter estimates; see also Dlmer 2007, forthocoming)

7. Juli 2015 35
Summary and Recommendations

Factorial survey experiments provide appealing possibilities for analyzing discrimination,


testing theories or producing meaningful measurements such as just gender pay gaps.
But: valid results require good (experimental) designs. Based on more than 60 split-ballot
experiments, we derived several recommendations (Auspurg/Hinz 2015):
1. Use standard response scales.
2. Use fractionalized, D-efficient vignette samples.
3. Use a mid-level of complexity (seven plus/minus two dimensions).
4. Ensure a random allocation of vignettes and decks to respondents.
5. Prevent the occurrence of many implausible vignettes.
6. Consider the hierarchical data structure (cluster robust standard errors, multilevel analyses).

7. Juli 2015 36
Thank you for your attention!

Further information:

https://study.sagepub.com/auspurg_hinz
http://www.fb03.uni-frankfurt.de/49649875/publikationen

7. Juli 2015 37
References

Auspurg, K./Hinz, T. (2015): Factorial Survey Experiments. Thousand Oaks: Sage.


Auspurg, K./Hinz, T./Liebig, S. (2009): Complexity, learning effects and plausibility of vignettes in factorial surveys. Paper
presented at the 104th Annual Meeting of the American Sociological Association (ASA), San Francisco.
Dlmer, H. (2007): Experimental Plans in Factorial Surveys: Random or Quota Design? Sociological Methods & Research
35: 382-409.
Faia, M.1980. The vagaries of the vignette world: A comment on Alves and Rossi. American Journal of Sociology, 85(1),
951-954.
Groves, R. M./Fowler Jr, F. J./Couper, M. P./Lepkowski, J. M./Singer, E./Tourangeau, R. (2009): Survey methodology.
Hoboken, NJ: John Wiley & Sons
Homans, G. C. (1974): Social Behavior. Its elementary forms. New York: Harcourt, Brace & World.
Jasso, G. (2006): Factorial survey methods for studying beliefs and judgments. Sociological Methods and Research,
34(3), 334-423.
Jasso, G./Meyersson Milgrom, E. M. 2008. Distributive justice and CEO compensation. Acta Sociologica, 51(2), 123-143.
Markovsky, B./Eriksson, K. (2012). Comparing direct and indirect measures of just rewards: What have we learned?
Sociological Methods and Research, 41(1), 240-245.
Rossi, . H./Anderson, A. B. (1982): The Factorial Survey Approach: An Introduction. In: Rossi, Peter H./Nock, Steve L.
(Eds.): Measuring Social Judgments. The Factorial Survey Approach. Beverly Hills: 15-67.
Tourangeau, R./Rips, L. J./Rasinski, K. A. (2000): The psychology of survey response. Cambridge: Cambridge University
Press.

7. Juli 2015 38
Appendix: Censoring of Responses? Vignette Order Effects

Does the order of vignettes influence parameter estimates?


random order vs. extreme cases first

Source: own data, student sample

No differences between random order vs. extreme cases first!


If ceiling effects occur: cluster robust (!) Tobit or multi-level Tobit regressions.

7. Juli 2015 39
Appendix: Simulations Efficiency by Sample Size

Correlations between dimensions:


Are throughout lower for D-efficient samples.
e.g.:
Mean correlations when sampling 50
vignettes out of a universe of 432 vignettes:
D-eff. : .018; random: .071
Maximum correlations when sampling 100
vignettes: D-eff. : .050; random: .190.

With D-efficient samples much less respondents are needed for the same amount of precision
of parameter estimates, and one has control which effects get confounded.

7. Juli 2015 40

You might also like