You are on page 1of 192

Visual Design Eects on Respondents

Behavior in Web-Surveys.
An Experimental Comparison of Dierent Input Controls for Scalar
Questions and Visual Analog Scales, and a Software Solution.
Dissertation
zur Erlangung des akademischen Grades
eines Doktors der Sozial- und Wirtschaftswissenschaften
an der Fakultt fr Politikwissenschaft und Soziologie
der Universitt Innsbruck
eingereicht von
DI. Mag. Albert Greincker
Erstbegutachter: Prof. Dr. Gilg Seeber
Zweitbegutachter: Prof. Dr. Martin Weichbold
Innsbruck, Juni 2009
Abstract
The eld of visual design eects on respondents behavior in online surveys is well researched;
however in several cases the outcomes of dierent studies have provided contradictory ndings.
In this thesis, the focus will be on experiments dealing mainly with Visual Analogue Scales
(VAS) and ratings scales. A VAS is an instrument that tries to measure a characteristic or
attitude that is believed to range across a continuum of values and is verbally anchored at each
end (e.g. strongly agree vs. strongly disagree as such anchors). In survey research, the use of
VAS is relatively rare, in part because of operational diculties (Couper et al. (2006)). Hence
a detailed view on technical possibilities and pitfalls should be given.
Three main studies with the same experimental design (ensuring that occurring eects were
reproducible) were carried out, whereby 6 dierent types of such scales were presented to the
interviewees in order to measure the eect of varying appearance and functionality of the controls
used for implementing the scales. To run these experiments, software was developed that focused
on a good support system for Web survey experimenting. The results refer to the general ll
out behavior, completion time, dropout, reliability and usage of extreme points.
2
Acknowledgements
First, I would like to express my most sincere gratitude towards my academic advisor, Professor
Gilg Seeber, for his continuous support throughout the Ph. D program. I deeply appreciate his
constructive advice and input, his time, and ongoing patience. The regular discussions with him
have denitely beneted me and have greatly motivated me to move forward during the period
of dissertation-writing. I would also like to thank Professor Martin Weichbold from Salzburg
University for his methodological input and constructive critics.
Furthermore I would like to thank Professor Herman Denz(), Professor Brigitte Mayer and
Egon Niederacher from the University of Applied Sciences Vorarlberg, who supported the devel-
opment and dissemination of the software.
In addition I would like to thank Mag.
ra
Annabell Marinell for proofreading of the whole work.
Last but not least I would like to thank the Innsbruck University IT-Service center for giving
the infrastructure for the experiments.
Armation
Hereby I arm that I wrote the present thesis without any inadmissible help by a third party
and without using any other means than indicated. Thoughts that were taken directly or indi-
rectly from other sources are indicated as such. This thesis has not been presented to any other
examination board in this or a similar form, neither in Austria nor in any other country.

Albert Greincker
3
Contents
1 Introduction 9
2 Terminology 12
2.1 Online Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.1 Operating Fields and Organizations . . . . . . . . . . . . . . . . . . . . . 14
I Current Literature 15
3 Introduction 16
4 Eects on Visual Analog Scales, Slider Scales, Categorical Scales 17
4.1 Completion Rate/Breakos and Missing Data . . . . . . . . . . . . . . . . . . . . 19
4.2 Dierent Response Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4 Categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.5 Use of Midpoint and Extremes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.6 Actual Position Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.7 Spacing, Positioning, Disproportion, Shading, Labelling . . . . . . . . . . . . . . 23
4.8 Response Time / Completion Time . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.9 Feedback Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5 General Visual Eects in Online Questionnaires 25
5.1 Paging versus Scrolling (All-in-one or Screen-by-screen) . . . . . . . . . . . . . . 25
5.2 Fancy vs. Plain Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.3 Dierent Alternatives Representation in Closedended Questions . . . . . . . . . . 26
5.3.1 Check-all-that-apply vs. Forced-choice . . . . . . . . . . . . . . . . . . . . 26
5.3.2 Grouping of Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.3.3 Double or Triple Banking . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3.4 Response Order Manipulations . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3.5 Dierent HTML-Controls Used . . . . . . . . . . . . . . . . . . . . . . . . 30
5.4 Color Eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.5 Modications on Input Fields for Openended Questions . . . . . . . . . . . . . . 32
5.5.1 Dierent Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.5.2 Lines in Answer Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.6 Inuence of Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.6.1 Personalization - Virtual Interviewer Eects . . . . . . . . . . . . . . . . . 34
5.7 Heaping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.8 Additional Content and Visual Hints . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.8.1 Placement of Instructions and Help Texts . . . . . . . . . . . . . . . . . . 35
5.9 Date Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4
Contents
5.10 Progress Bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.11 Sponsors Logo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6 Non-Visual Design Experiments in Online Surveys 40
6.1 Incentives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
6.2 Invitation and First Contact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.3 Dierent Welcome Screens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.4 Length of the Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.5 Time to Complete-Statement at the Beginning . . . . . . . . . . . . . . . . . . . 42
7 Outlook 44
7.1 Dynamic Forms, AJAX, WEB 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
II Theoretical Background 46
8 Introduction 47
9 Methodological Theories 48
9.1 Total Survey Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
9.1.1 Respondent Selection Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 49
9.1.2 Response Accuracy Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
9.1.3 Measurement Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
9.1.4 Statistical Impact of Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
9.3 Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
9.4 Mode Eects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
10 Psychological Theories 67
10.1 The Response Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
10.2 Visual Interpretive Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
10.3 Gestalt Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
10.3.1 Principle of Proximity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
10.3.2 Principle of Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
10.3.3 Principle of Pragnanz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
10.4 Types of Respondents in Web Surveys . . . . . . . . . . . . . . . . . . . . . . . . 69
III Experiments Conducted by the Author 71
11 Description of the Experiments 72
11.1 General Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
11.2 Dierent Input Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
11.2.1 Radio Button Scale (radio) . . . . . . . . . . . . . . . . . . . . . . . . . . 75
11.2.2 Empty Button Scale (button) . . . . . . . . . . . . . . . . . . . . . . . . . 76
11.2.3 Click-VAS (click-VAS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
11.2.4 Slider-VAS (slider-VAS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
11.2.5 Text Input eld (text) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
11.2.6 Dropdpown Menu (dropdown) . . . . . . . . . . . . . . . . . . . . . . . . . 80
5
Contents
11.2.7 Dierences and Similarities . . . . . . . . . . . . . . . . . . . . . . . . . . 81
11.2.8 Technical Preconditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
11.3 Specic Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
12 Research Questions 85
13 Overall Response 87
13.1 Overall Input Control Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 87
14 Paradata and Additional Information 89
14.1 Overall Operating System Distribution . . . . . . . . . . . . . . . . . . . . . . . . 90
14.2 Overall Browser Agents Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 90
14.3 Overall Screen Resolution Distribution . . . . . . . . . . . . . . . . . . . . . . . . 91
14.4 Overall Distribution of Additional Browser Settings . . . . . . . . . . . . . . . . . 91
14.5 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
15 Demographic Information 93
15.1 Tourism Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
15.2 Webpage Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
15.3 Snowboard Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
15.4 Dierences in Demographic Distributions Across Browser and OS-Versions . . . . 97
15.5 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
16 Feedback Questions / Subjective Evaluation 98
16.1 Boring vs. Interesting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
16.2 Sucient vs. Non Sucient Number of Scale Intervals . . . . . . . . . . . . . . . 99
16.3 Easy to Use vs. Complicated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
16.4 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
17 Response Time / Completion Time 105
17.1 Standardization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
17.2 Outlier Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
17.2.1 Robust Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
17.3 Learning Eect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
17.4 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
18 Dropout 114
18.1 Tourism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
18.2 Webpage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
18.3 Snowboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
18.4 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
19 Response Distributions 122
19.1 Comparison by Mean Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
19.2 Compare the Distributions within the Categories . . . . . . . . . . . . . . . . . . 122
19.3 Midpoint Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
19.4 Analysis per Question Battery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
19.5 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6
Contents
20 A Closer Look at the VAS Distributions 128
20.1 Distributions of the VAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
20.2 Categorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
20.3 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
IV Software Implemented 136
21 Introduction 137
21.1 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
21.2 General Software Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
21.3 Overview of Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
21.4 Supported Question Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
21.5 Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
21.6 Technical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
21.6.1 Technical Preconditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
22 Software Architecture 144
22.1 QSYS-core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
22.1.1 Questionnaire Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
22.1.2 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
22.1.3 Paradata Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
22.1.4 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
22.1.5 Exporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
22.2 StruXSLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
22.2.1 Basic Functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
22.2.2 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
22.2.3 Action Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
22.2.4 Mapper Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
22.2.5 Language Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
22.3 QSYS-Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
22.3.1 Class Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
22.3.2 Conguration and Installation . . . . . . . . . . . . . . . . . . . . . . . . . 163
22.3.3 Additional Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
22.4 Utility Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
22.5 Additional Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
22.5.1 Quality Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
22.5.2 Software Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
22.5.3 Schema RFC for Questionnaires (and Answer Documents) . . . . . . . . . 166
23 Additional Tasks to be Implemented 167
23.0.4 Federated Identity Based Authentication and Authorization . . . . . . . . 167
23.0.5 R Reporting Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
23.0.6 Observation Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
23.0.7 Accessibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
24 Evaluation of the Software 169
25 Existing Open Source Online Survey Tools 171
7
Contents
25.1 Limesurvey (formerly PHPSurveyor) . . . . . . . . . . . . . . . . . . . . . . . . . 171
25.2 FlexSurvey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
25.3 Mod_survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
25.4 MySurveyServer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
25.5 phpESP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
25.6 Rapid Survey Tool (formerly Rostock Survey Tool) . . . . . . . . . . . . . . . . . 173
25.7 Additional Web Survey Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
25.7.1 Tools for HTML Form Processing . . . . . . . . . . . . . . . . . . . . . . . 173
25.7.2 Experiment Supporting Frameworks . . . . . . . . . . . . . . . . . . . . . 174
25.7.3 Tools for Retrieving Paradata . . . . . . . . . . . . . . . . . . . . . . . . . 174
25.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
8
1 Introduction
Internet-based- (or simply online-) research in general has recently become a new and important
topic for empirical, psychological and related research. Therefore the importance of new insights
in this eld as well as tools for conducting online research projects has increased. Within the
scope of this thesis, work was done on both aspects:
Experiments On the one hand, experiments dealing with online surveys were conducted, where
the eects of dierent visual designs on respondents in Web surveys were examined. Concretely,
dierent input controls for scalar questions were given to the respondents to see if any eects
emerge from dierent points of view. Regarding online surveys, the degree of freedom concerning
visual design is much higher than for the usual paper and pencil -questionnaires, and therefore,
a closer examination of this specic phenomenon is important. In addition, since experiments
with mixed-mode studies proved that there were dierences between these modes
1
, it was partly
tested, whether already armed insights are valid for paper and pencil -questionnaires and can
thus also be applied to online questionnaires. Paper and pencil -questionnaires are only relevant
in regard to these questions, e.g. comparisons of online- and oine- questionnaires or other
mixed mode designs are not part of this thesis. Sample selection, a very frequently published
topic for online questionnaires, as well as the eects of incentives, pre-notication of potential
respondents, and online panels are only theoretically considered as part of the overview on the
current status of research within this eld when summarizing the state of the art of online survey
research in section I. This summary covers recent visual design eect experiments on respon-
dents behavior in Web surveys. In some cases the results were contradictory which leads to the
assumption that there is still a lot of research necessary within this eld.
In addition, modications made on question or question alternative texts and comparisons of
eects based on Flash technology
2
were excluded
3
. The research in response behavior using
online questionnaires oers a lot of possibilities. As a result, a clear denition of the main re-
search focus becomes necessary. The main focus is set on the visual design elements oered by
HTML and related technologies, such as Javascript, Java and CSS. The presence of such eects
was indicated in several studies, which is also mentioned by Dillman (2007, p.478): a rapidly
expanding body of literature now shows that visual design has eects on the question answering
process that go far beyond inuencing initial perceptions of what parts of the questionnaire
page are relevant and the navigational process through the questionnaire. Don A. Dillman also
conrms the presence of visual design eects in scalar questions: the eects of visual design are
particularly apparent for scalar questions in which respondents are attempting to express their
degree of agreement, satisfaction of perhaps favorableness to an idea, using an ordinary scale
1
See section 9.4 for the corresponding results
2
Based on very complex graphical audio-visual representations and multimedia respectively
3
As one exception, the number of scale marks of categorical scales can dier in the dierent questionnaire styles
for scalar- and semantic dierential questions.
9
1 Introduction
that attempts to capture both the direction and intensity of feeling Dillman (2007, p.478). In
addition the relevance for future research is mentioned: We expect that research in this area
will continue over the next several years Dillman (2007, p.482). Similarly Couper (2000, p.475):
While the importance of question wording in inuencing respondent answers is well-recognized,
there is a growing literature that suggests that the design of the survey instrument [...] plays an
important role as well as Couper, & Coutts (2004, p.217) (translated): it is clear that exami-
nation of methodological aspects of internet based surveys should be a xed element of further
research. It is known that within self-administered surveys, in the absence of an interviewer,
the respondent tends to seek information from the instrument itself, which means that visual
elements of the questionnaire become even more important.
Moreover, Witmer et al. (1999, p.158) gives a clear statement on the need for additional enlight-
enment for eects in survey research, when the computer environment becomes an additional
factor: computer-mediated research needs specic and carefully designed instruments that not
only accommodate but exploit the features of the electronic environment to attract respondents
who otherwise may have their ngers on a delete key. Researchers cannot merely import paper
and pencil methodologies to online studies, but must adapt them to the electronic environment
and create new methods to expand our knowledge in and of computer-mediated communication.
This work should make a contribution to this need. Similarly Bandilla & Bosnjak (2000, p.19)
(translated): It often can be observed that, when conducting Web surveys, the logics concerning
visual layout simply follows the printed version. Obviously, when following this approach it is
only weakly taken into consideration, that the reading behavior of a Web user when reading
content is not compatible with certain design guidelines of oine-questionnaires. Couper &
Miller (2008, p.831) start their article with A key characteristic of Web surveys is their diver-
sity, which describes one of the major problems with this survey mode. Many more possibilities
are oered to run the surveys, which can cause several problems.
The experiments focused on the use of Visual Analogue Scales (VAS) and other scale types in
Web surveys. Respondents were presented with semantic dierentials in the form of six dierent
input controls for lling out. The impact of the dierent controls on dropout, response times
and response behavior in general was statistically evaluated. However, technical aspects (e.g.
the usage of Java and Javascript) are also discussed. Only a few experiments had already be
conducted with VAS and related scale types
4
with mostly contradictory outcomes. This probably
stems from the fact that in literature a VAS is dened dierently, which means that VAS used
in the experiments behave and look dierently and shows that the look and feel of these scale
controls have a major inuence on general response behavior. Although advantages of these scale
types are mentioned in current literature
5
they are used relatively seldom in online surveys.
Survey Tool Development On the other hand, this research also includes technical aspects:
to provide a basis for running all these experiments, the implementation of software became
necessary, which focuses on the described experimental tasks concerning dierent visual designs.
Therefore, a logically-developed concept was essential for the whole software architecture which
strictly separates questionnaire content from its design. Research projects dealing with online
research in general and online surveys in particular are often limited due to the lack of technical
4
Which are reported in detail in chapter 4
5
See e.g. Couper et al. (2006), p.229
10
1 Introduction
possibilities. In fact, the development of such software created especially for online experiment-
ing initiates new opportunities in the eld of empirical social (online-) research. Nevertheless, a
lot of software for the creation of questionnaires is available (even free of charge), but the suitable
software that is available for free (or at an aordable price) and centres on the experimental
potentials listed above had to be programmed. So a fully functional and fully featured tool for
experimenting with the inuence of visual design eects in Web surveys was created and oered
to survey conductors to enable further research in thiseld (and its related topics).
Furthermore the software enables the completion of surveys without the need to create ones
own experiments. Due to this application it is possible to create questionnaires and to run
surveys under certain criteria with dierent levels of participation control
6
. Even without tech-
nical knowledge, such as programming skills or HTML, one can create an online questionnaire.
For example, one can select from a list of dierent question types and customize the question
content with an easy-to-use online editor. The software should oer the possibility for students
and researchers to run Web surveys eciently and for free. Because of the generic software
architecture of the system, it is possible to integrate new experiments and features necessary in
future studies. The software is published under an open source license to make the attempt of
pushing forward the development of a free, well featured, stable and extendible piece of software.
Documentation for both developers and end users is also (in shortened form) part of this PhD
7
.
Moreover this section also includes an overview of existing (open-source) tools together with
evaluation criteria for such software. Additionally, an XML schema was created for storing and
exchanging questionnaires eciently. This would enable the exchange of questionnaires (and
parts of questionnaires such as a demographic block) between institutions and would clearly
separate content and representation. The schema will not be published within this document,
but it is accessible on http://www.survey4all.org.
6
An overview of all features and possibilities of the software is described in section 21.2
7
See part IV
11
2 Terminology
Visual Analogue Scales (VAS) are one of the main input controls tested within this work. There-
fore it is necessary to give a clear denition of VAS and a clear demarcation of related scale
types. Generally speaking, VAS is a scale type which measures ratings for semantic dierentials.
The most suitable and representative description of VAS is given by Couper et al. (2006, p.227f):
the VAS has a line anchored at each end by the extremes of the variable being measured, this
can represent a continuum between opposing adjectives in a bipolar scale or between complete
absence and the most extreme value in a mono-polar-scale. A similar denition is given by De-
Vellis (1991, p.71). This should serve as a basis and will be distinguished from other denitions
and dierentiations to other scale types.
The previous denitions did not dene a certain mode of how these ratings have to be made
on the VAS. Interestingly, in Funke & Reips (2007a, p.70) and Funke & Reips (2008a), a clear
boundary between VAS and slider scales is drawn with the argument, that inaccuracies result
from using the (relatively) broad slider control of a slider scale. The problem here is that this
denition comes from a very xed concept of a slider scale
1
. It is true that the default slider and
most technologies
2
use a relatively broad slider per default, but, as can be seen in gure 11.4,
the slider has an apex at the bottom which enables positioning on a pixel level. The precision
of the measurement should be as high as possible: VAS are nearly continuous measurement
instruments, each pixel is clickable and results in a raw value Funke & Reips (2008a) adapted
from Funke & Reips (2007a). This denition places a great deal of importance on the assignment
of a measurement point to exactly one pixel.
In general, the question format used most often by questionnaire designers is scalar questions.
DeVellis (1991, p.8f) gives a denition of measurement scales as measurement instruments that
are collections of items intended to reveal levels of theoretical variables, not ready observable by
direct means [...]. We develop scales when we want to measure phenomena that we believe to
exist because of our theoretical understanding of the world, but which we cannot assess directly.
The focus when creating scales namely mainly on visual aspects to achieve best results, will be
part of this work.
GRS Another important measurement scale closely related to VAS are Graphical Rating Scales
(GRS). As mentioned in Couper et al. (2006, p.228), the distinction between VAS and GRS is
blurred. This is how terminology is used within this paper: the key distinction is not how the
scale is presented to the respondents but how they indicate their responses. We will use the term
VAS to indicate a scale on which a respondent directly marks his or her position on the scale,
1
Here, the slider scales do have a slider, but only values at few tick points can be selected, which is then
categorized as being more similar to a radio button scale
2
E.g. Java when using a slider within an Applet
12
2 Terminology
whereas discrete choice measures require the respondent to rst select a number or adjective
and then indicate that preference.
Here are some denitions of GRS:
The GRS adds verbal descriptors along the line and sometimes also check marks dividing
the line into distinct segments (Couper et al. (2006, p.229)).
Cook et al. (2001, p.700) describe an unnumbered graphic scale (as part of graphic rating
scale formats) as a scale which presents a continuous line between two antonyms. Respon-
dents are then asked to draw through the continuum at the point most indicative of their
views regarding the antonyms. The responses are scored by measuring the distance (e.g. in
millimetres) from the left end of the continuum of the line drawn through the continuum.
Here are some denitions and properties of simple discrete measurement scales:
A likert-type scale explicitly presents its scoring metrics. When a scale uses numerous
score intervals, participants are told how many scale points there are, and they not only
can but are expected to accommodate these intervals within their conscious thinking
(Cook et al. (2001), p.700).
[...] respondents select a number or adjective that most closely represents their positions
on the scale. The number of scale points may be relatively small (e.g. 5, 7 or 9) or
large, as in the case of the 101-point feeling thermometer (Couper et al. (2006, p.228)).
Interestingly, Schnemann et al. (2003, p.1171) describe the feeling thermometer as a
Visual Analogue Scale (VAS) shown as a thermometer
3
.
At last, as a summary an attempt was made to put all denitions and distinctions from above
together without any inconsistencies. These are the properties of a VAS as used for the experi-
ments within this thesis:
The scale consists of a simple, continuous line
Only labels exist at the two extreme marks
No labels, tick marks or number labels are placed on the scale
No feedback about the actual position is given to the respondent
Measures are as ne grained as possible (ideally one measurement point per pixel, but this
is not mandatory)
Accurate positioning must be possible
The way the marks can be settable on the scale is not xed: clicking on the scale at a
specic position works as well as using a slider (which means for this denition that slider
scales are also VAS, when the other points are assured)
Methodologically, the advantage is that data from a VAS reaches the desired level of an interval
scale. A good discussion of pros and cons of VAS compared to other scale types can be found
in Flynn et al. (2004). The need of such a ne grained scale like the VAS in all cases is not
clear: the meaningfulness of 100 categories is seriously diminished, given that research has
demonstrated that individuals can only eciently discriminate between a maximum of seven
categories when processing sensory information (Flynn et al. (2004, p.52)).
3
For the use of feeling thermometer scales in social sciences, see e.g. Noelle-Neumann & Petersen (2000, p.149)
and von Kirschhofer-Bozenhardt & Kaplitza (1975)
13
2 Terminology
2.1 Online Research
Because Online Research is the superior topic of this thesis, a general description of this research
eld is given in this section. In this context the internet is focused on as the major eld of
research. A good overview of all parts of this research eld is given in German by Welker et al.
(2005). Advantages of online data collection are discussed in van Selm & Jankowski (2006) and
Granello & Wheaton (2004).
2.1.1 Operating Fields and Organizations
In this chapter, a denition of the research area of Online Research will be attempted as well as
providing a description of the core operating elds. There are a few important institutions in
this eld that provide a lot of useful information:
DGOF
4
(Deutsche Gesellschaft fr Online-Forschung). This institution annually orga-
nizes the most important Online Research conference in Europe, namely General Online
Research
5
. Participating gives a good overview of the state of the art of online research.
DGOF also released a miscellany (Welker & Wenzel (2007)) containing fundamentals and
case studies.
a.o.i.r
6
(Association of Internet Researchers; AoIR). a.o.i.r is an international academic
association dedicated to the advancement of the cross-disciplinary eld of internet studies.
AoIR hosts a free, open access mailing list with over 2000 subscribers and organizes an
annual internet research conference, one of the premier academic conferences in this eld.
Web Survey Methodology
7
is an important resource for getting general information about
the current research on Web surveys. Publications are listed, events are announced and
existing survey software is mentioned.
German Internet Research List (gir-l) is a discussion forum aimed at anybody in German
speaking countries interested in Online Research
8
.
4
http://www.dgof.de
5
http://www.gor.de
6
http://aoir.org
7
http://www.websm.org
8
More information can be obtained at http://www.online-forschung.de
14
Part I
Current Literature
15
3 Introduction
In the following an overview of the current status of research in the eld of visual design ef-
fects in online surveys is given. Experiments with visual design are considered to be all visual
manipulations of the presented questionnaire which could inuence the respondents behavior,
and in turn would indicate the presence or absence of additional signs and information, dierent
colors, dierent positioning of content or dierent input controls used for answering. The out-
comes of the experiments described below may vary depending on the question types
1
. A list
and description of dierent question types (classied by the questions content), can be found
in Holm (1975a, p.32). The same is valid for dierentiations in person and situation, of which
Mummendey (2003, p.41) gives a good overview. All experiments are categorized in several
sections. Some experiments described come up in more than one section, because several dif-
ferent sub experiments were run in one experiment, which could possible have caused some side
eects.
Firstly, a summary of already accomplished experiments and ndings regarding VAS is given,
because VAS are the main focus of this work. This should serve as a basis for the experiments
conducted in this thesis. The design of the experiments was inuenced by these ndings and
concrete results were checked in regard to their reproducibility.
In addition, literature on general visual design eects in online surveys is discussed. For the
experiments conducted, it is also necessary to take these eects into consideration to avoid pos-
sible side eects. Furthermore some eects described in this section, such as the numbering or
labelling of scale items, play a role for the experiments. For the sake of completeness, literature
on non-visual design eects, like eects of incentives or the length of the survey are also part of
this chapter.
Finally, an outlook is given which mainly deals with the eects of new technologies. The most
striking new technology, which is already integrated in many Web pages and would be suitable
for employing in online surveys is AJAX.
1
As regards content and other criteria, which may in some cases be the reason for contradictory ndings
16
4 Eects on Visual Analog Scales, Slider
Scales, Categorical Scales
In this chapter an overview of the state of the art is given, which is particularly important for
this thesis since the central experiments deal with dierent scale types but especially VAS. Sev-
eral studies that deal with this topic but focus on dierent aspects have already been published.
These studies and their ndings are also mentioned in this thesis in the appropriate subchapters.
The summary given here concentrates on the application in Web surveys. For an overview of
VAS applied in paper and pencil questionnaires, see e.g. Flynn et al. (2004) as well as Hasson &
Arnetz (2005), where amongst other things, advantages and disadvantages with VAS and likert
scales are discussed. Mode eects of VAS in comparison to those of likert type responses are
discussed and empirically approved in Gerich (2007).
In recent studies, similar experiments with dierent outcomes were completed, e.g. Couper
et al. (2006) discusses the eectiveness of VAS in Web experiments. For this reason, a slider-
VAS written in Java with 101 scale points was compared to common HTML controls such as
radio buttons and numeric entries, in a text input eld (21 scale points) together with other
variations, such as the presence or absence of numeric feedback or midpoint. Consequently,
the response distributions for the Java VAS did not dier from those with other scale types.
Furthermore, the VAS had higher rates of missing data, noncompletion, and longer completion
times. The reasons for these ndings could partially be explained by technical diculties.
A few other studies in this eld which deal with similar experiments should also be mentioned,
e.g.: Heerwegh & Loosveldt (2002a) did a comparison of radio buttons and dropdown boxes,
Walston et al. (2006) compared three dierent item formats for scales: radio buttons, bigger
buttons and a graphical slider; Hasson & Arnetz (2005) compared VAS and likert scales for
psychosocial measurement; Christian (2003) examines the inuence of visual layout on scalar
questions. Additionally three presentations were given on research regarding VAS at the 9
th
General Online Research (GOR) Conference 2007 in Leipzig (Ltters et al. (2007), Reips &
Funke (2007), Thomas & Couper (2007)). The most innovative experiment presented at this
conference was reported by Ltters et al. (2007), who compared a new kind of scale (so called
sniperscale) to a graphical slider based on Java technology as well as to a classical representa-
tion based on radio buttons. The sniperscale was based on ash technology and enabled the
respondents to shoot at the scale in order to select an item (the mouse pointer was crosslines).
Funke & Reips (2008a) presented an experiment where data and paradata from three scales
were compared, namely visual analogue scales (VAS), slider scales (SLS) and radio button scales
(RBS). The design was as follows: Respondents of a 40 item personality inventory [...] were
randomly assigned to either a VAS with 250 possible values, an SLS with 5 discrete values, or a
17
4 Eects on Visual Analog Scales, Slider Scales, Categorical Scales
5 point RBS. No initial marker was present on the scales. The VAS could only be clicked, but
the SLS marker could be clicked or slid. It is important to mention that the slider scales used in
these experiments had tick marks and only these marked points on this scale could be selected.
A similar (self-selection) study with VAS was reported in Funke & Reips (2008b) which focussed
on respondent burden, cognitive depth and data quality, and VAS was compared to radio button
scales.
One empirical test of interval level measurement reported in Funke & Reips (2007c)
1
varied the
length of VAS (50, 200 and 800 pixel) to check that data collected with VAS was equidistant in
a self-selected student sample (n=355). On each of the VAS, positions had to be located (e.g.
50%, and uneven portions like 67% or 33%). As a result, there is strong evidence, that data
collected with VAS is equidistant, on the level of an interval scale. On average the dierence
to a linear relationship was at 3,2 percentage points, ranging from 2,8 for the medium VAS to
3,9 for the shortest VAS. As length has no great eect, VAS should be robust to dierence in
appearance due to dierent screen sizes. As a general remark: when using pixels as units, the
length is displayed dierently on computers according to the screen resolution set within the
operating system.
Cook et al. (2001) compared the Cronbachs alpha coecients of scores for scales administered
on the Web using a 1-9 radio button format in which respondents used the mouse to toggle on
their responses (i.e. the analog of a likert type scale) to 41 items with Web administered un-
numbered graphic scales. Furthermore sliders (1-5;1-9;1-100) were used as input controls. 3987
respondents got the radio button format assigned and 420 the sliders.
In Walston et al. (2006) an experiment is described within a Web site user satisfaction survey,
where recruitment was done directly via the Web page. Dierent controls for scale questions
were used, namely radio buttons, labelled buttons and a slider bar. This slider had tick marks
corresponding to ve labelled response options but the respondent could drag the slider bar to
any position along the scale. Each control had the same item labels.
Mathieson & Doane (2003) discuss the benets of ne grained scales, and consequently compare
a radio button likert scale to a so-called ne grained likert scale, which contains the same labels
as the radio button scale, but additionally make is possible to click on positions between these
anchor points. This scale is implemented in Javascript (with HTML tables behind). When
clicking on a cell of the table, the actual image is exchanged with a higher one to mark the cur-
rent selection. The question posed in this experiment is if respondents use the additional points
oered between the anchor points. Since the scale had 150 clickable points, there were many
more responses on the anchor points than would have been predicted by chance if the probability
of selecting each clickable point was equal. It seems that, although there is substantial use of
points between the anchors, respondents are still attracted to the anchor points (Mathieson &
Doane (2003, p.7)). Respondents who chose an on-anchor point also tended to use on-anchor
points for the rest of the items. This was also valid for o-anchor points.
van Schaik & Ling (2007) compared likert scales (7-point) and VAS with 103 undergraduate
1
The key information is also given in Reips & Funke (2007)
18
4 Eects on Visual Analog Scales, Slider Scales, Categorical Scales
psychology students. For the VAS, previously dened design principles were applied or violated.
The tests were held under laboratory conditions. Responses to the VAS statements were given
by dragging a slider along the scale. The slider always started at the middle of the scale. The
response format did not include subdivisions, although strongly agree and strongly disagree were
presented on either end of the scale.
Flynn et al. (2004) also conducted an oine study with 112 psychology students comparing VAS
and 7-point likert scales. They used a within-subjects design to compare the data equivalence
of likert scales (LS) and VAS.
4.1 Completion Rate/Breakos and Missing Data
One of the most important tasks for Web surveys is to keep respondents burden as low as
possible to minimize dropout. Subsequently a summary of the comparison of dierent controls
in regard to dropout is given.
Funke & Reips (2008a) found a tendency for VAS to perform better concerning dropout than
SLS and RBS (but this nding was statistically not signicant). Similar results were given by
Funke & Reips (2008b), when they compared VAS to radio button scales, with the result of a
higher proportion of missing data reported with VAS. However, contradictory results to the pre-
vious ndings were reported in Funke (2005), who observed a higher dropout rate, more lurkers
and more nonresponse for VAS in comparison to categorical scales. The same VAS controls were
used in all of these experiments.
Walston et al. (2006) took a look at the dropout in dierent phases of the survey. When Com-
paring the initial exit (when the browser window was closed or the I decline to take this survey
button near the top of the survey page was pressed
2
), the most striking dierence in the out-
comes is that 80.2% of those receiving a slider bar survey exited the surveys as compared to
61.6% (buttons), 62.5% (graphic radio) and 64.7% (plain radio) under the other appearance/item
format conditions.
The results regarding missing data in Couper et al. (2006) are the following: VAS had higher
rates of missing data
3
and higher breako rates (
2
= 26.54, df=2, p<.001) compared to the
other controls, these results may have something to do with technical diculties linked to Java
Applets. An explanation for the high missing data rates of the VAS was the long time it took the
Applet to loaded, and that respondents clicked the next button in error before it had appeared.
Healey (2007) conducted a similar experiment and found no signicant dierences.
In another experiment by Couper, Traugott & Lamias (2004, p.373), respondents were asked
to rate comments on a 10 point scale either by clicking on a radio button next to a number or
by entering a number in a text entry box. Here, more missing data could be found in text input
elds when comparing to radio buttons; respondents receiving the text input eld version were
2
The type of scale input was visible on the page where the button was displayed
3
About twice as high as for the other controls; particularly for the rst item
19
4 Eects on Visual Analog Scales, Slider Scales, Categorical Scales
more likely to leave the box blank. Naturally more invalid responses were also given in the input
eld version because of the absence of an integrity check (each entry was allowed). Heerwegh
& Loosveldt (2002a, p.477) discovered higher completion rates for radio buttons in comparison
to dropdown boxes as input controls (even though these dierences were not statistically signif-
icant).
When oering an innovative implementation of a scale (the sniperscale), diminished dropout
was observed as reported in Ltters et al. (2007): compared to Java slider and classical radio
button scale, the sniper scale fullled the task to keep the respondents attention during the
entire interview, which means the dropout was low compared to the other approaches.
One conclusion of all these results is that Java Applets are possibly not the right technology
for generating VAS because of technical preconditions which need to be fullled on all client
machines.
4.2 Dierent Response Behavior
In a study by Flynn et al. (2004), participants tended to rate higher on the scale when using
a likert scale, and lower when using a VAS. A two-way repeated measures ANOVA revealed a
signicant main eect on the response format. Similar results were found by Funke & Reips
(2008a), who also observed lower mean scores for VAS compared to slider and radio button
scales. In contrast, Thomas & Couper (2007) reported higher means for the VAS in comparison
to a GRS. In a similar study to the previously mentioned ones by Couper et al. (2006), VAS
were compared to radio button scales and no dierence in means could be found. In addition,
no dierences within the response distributions for the VAS compared to those using the other
scale types under control (radio buttons and text input elds) could be found. Distributions
across the three input types were remarkably similar and no signicant eect of input type could
be found (MANOVA, Wilks lambda). Regarding variance (between person/within question and
same person/dierent questions), all tests were non-signicant. This is equally true for the range
of scores.
In one experiment by Christian & Dillman (2004) a ve point likert scale was compared to a
number box. On the ve point likert scale the extreme points were only verbally labelled and
the remaining points only got numbers, and for the number box a number between one and ve
had to be entered. As a result, the use of the number box signicantly increased the mean for
each of the questions tested. When the respondents were given the chance to correct their orig-
inal answers, 10 percent of the answer box respondents
4
scratched out the answers to at least
one of the questions and provided a dierent answer. Most of these errors occurred because
respondents reversed the scale on the answer box version, which means 4 was swapped with 2
and 1 with 5.
In Stern et al. (2007), this experiment was repeated (with a modest modication by including
a dont know response), where the ndings of the original study were supported. An additional
4
Compared to one percent on the polar point scale respondents
20
4 Eects on Visual Analog Scales, Slider Scales, Categorical Scales
nding was that those respondents who received the number input box version were signicantly
more likely to provide a dont know response (Stern et al. (2007, p. 124)). But the main
reason for repeating the experiments was to see if there were stronger dierences in certain
demographic groups (age greater or less than 60 years; college degree or not; gender). The eect
described above could also be found in all demographic groups except men. One conclusion is
that demographic information (if respondents are older or younger than 60 or respondents have
nished college respectively) does not have any inuence.
4.3 Reliability
Cook et al. (2001) found the highest alpha coecients for radio buttons (compared to slider-
VAS). The more items were used, the higher the coecient. Contradictorily ndings are reported
in Flynn et al. (2004), where higher alpha values were found for the VAS. In Funke & Reips
(2008a), test-retest reliability was measured for radio button scales, slider scales and VAS. The
highest scores were found for VAS. Similar ndings were reported in Funke & Reips (2008b),
where radio button scales were compared to VAS. In this case VAS also reached the highest score,
and so it was concluded that VAS were used in a more consistent way. This was also observed
in Funke & Reips (2007c), where a 5-point categorical scale with radio buttons was compared to
VAS. Additionally, lower variance, when comparing to categorical scales, was reported in Funke
(2005).
4.4 Categorization
This section provides a short summary of strategies for categorizing VAS values by using dif-
ferent transformation strategies. Funke & Reips (2006) conducted two experiments: in the rst
one 667 participants were randomly chosen to rate 16 items under 3 dierent conditions. The
only dierence between the experimental conditions consisted in the applied rating scales: ei-
ther a 4-point categorical scale, an 8-point categorical scale or a VAS was presented. Systematic
dierences in the distribution especially concerning the extreme categories were found when
applying linear transformation (VAS had higher frequencies in the extreme categories). Trans-
formation with reduced extremes (see gure 4.1) led to greater accordance between VAS and
categorical scales than linear transformation. The dierence between measurement with VAS
and categorical scales was systematic.
Figure 4.1: Transformation with reduced extremes
21
4 Eects on Visual Analog Scales, Slider Scales, Categorical Scales
Subsequently, in the second experiment space between the two extreme categories and the ad-
joining ones was decreased for the radio buttons. The outcome was that, when using categorical
scales with reduced extremes, the frequencies of extreme categories decreased. In both studies,
no information about statistical signicance was given.
In Funke & Reips (2008a) as well as in Funke & Reips (2007c)
5
, ndings from Funke & Reips
(2006) were revalidated: in both the frequencies at the extreme points of the used VAS were
higher
6
. Similar experiments with the same outcome were reported in Funke (2005), Funke
(2004) and Funke (2003). In contradiction to these ndings, Couper et al. (2006) observed a
signicantly higher use of extreme values for radio and numeric input versions compared to the
VAS version. The problem is that the two VAS used were dicult to compare: in the rst VAS,
clicking was necessary to position on the scale, in the second a slider was used for positioning.
4.5 Use of Midpoint and Extremes
Whether it is useful to provide a midpoint response in scalar questions format is a highly debat-
ted topic. Couper et al. (2006) found no statistically signicant dierences between using and
not using a midpoint, but radio buttons and numeric input elds have a higher use of values
around the midpoint. One interesting nding was that when no real midpoint was on the scale
(1-20), 10 was selected more often in input elds if numbers had to be entered. One explanation
is that 10 seems to best represent the midpoint.
van Schaik & Ling (2007, p.18f) compared radio button likert scales with VAS. After converting
the VAS to the 7-point scale, dierences in frequencies for the middle neutral response category
were within 8% between the two response formats, with more neutral answers for VAS than for
likert scales. Interestingly, respondents believed that likert scales lead to a bias towards neutral
answers. The results of the extreme values were similar: participants believed that likert scales
might lead to the avoidance of extreme responses. After conversion of the VAS, dierences in
frequencies for the extreme lowest response category was within 3% between the two response
formats, and 2% for the highest response category.
4.6 Actual Position Feedback
The provision of feedback in regard to the actual position on the VAS had a signicant eect
on respondents behaviour (e.g. via tooltip or labeled positions). Couper et al. (2006) found
signicant dierences between means when comparing VAS with and without feedback (in 2
questions under control). Another interesting eect was found: when giving feedback on the
VAS, rounded values (heaping) were used more often. These ndings are statistically signicant
and it suggests that providing feedback may negate some of the advantages of using a continuous
measurement input device such as VAS.
5
n=576; 5,7, and 9 categories were used
6
All experiments which used a VAS were generated with http://VASGenerator.net
22
4 Eects on Visual Analog Scales, Slider Scales, Categorical Scales
4.7 Spacing, Positioning, Disproportion, Shading, Labelling
In Tourangeau et al. (2007) two experiments were carried out in order to investigate how the
shading of the options in a response scale (7-point likert scale) aected the answers of the survey
questions. The rst experiment varied two factors (the color scheme of the shading came in two
versions): the rst ranged from dark to light blue, the second from dark red to dark blue (the
middle point was white). In addition numerical labels for the scale points were varied: (a)
verbal labels only, (b) numerical labels only (c) verbal labels plus numerical labels ranging from
-3 to 3 (d) same as (c) but numerical labels ranging from 1 to 7. The second experiment was
a replication of the rst one, with new questions and dierent respondents. The general result
was that when the end points of the scale were shaded in dierent hues, the responses tended
to shift toward the right end of the scale, compared to the scales with both ends shaded in the
same hue. When verbal labels were used, the color eect vanished. The way the points were
numbered also inuenced response behavior. In the case of numerical labels ranging from -3
to 3 instead of 1 to 7, responses were pushed towards the high end of the scale. It would be
interesting to repeat the experiment with two dierent color schemes using dierent colors to
check if the color red, which is a special color as mentioned in Gnambs (2008), has any inuence.
Yan (2005, p.75) also reported the results of a Web experiment, in which the numerical values
on a scale were manipulated (0 to 6 vs. -3 to 3). Signicant eects were found for three of four
items, where the mean ratings were higher for the (-3 to 3)-version, which conrms the ndings
of Tourangeau et al. (2007).
4.8 Response Time / Completion Time
Couper et al. (2006) report longer completion times for VAS (170.6 sec) compared to radio but-
tons (124.8 sec) and numeric input elds (153.8 sec). Similar results were reported in Cook et al.
(2001), who also observed longer response times for slider controls (VAS). Additionally, at the
end of the questionnaire, respondents were asked to estimate how long they thought the survey
took. VAS had the highest mean value (17.49 min) compared to radio buttons (16.41 min) and
numeric input elds (16.69 min). In Heerwegh & Loosveldt (2002a, p.481), dierences when
comparing radio buttons and dropdown boxes were found, where, as expected, radio buttons
led to shorter response times. As a possible reason it was argued that people were less familiar
with dropdown boxes. This experiment was carried out with 3
rd
year students. This eect could
interestingly not be replicated with people from a public internet database. Response times were
tracked on the client side.
Tourangeau et al. (2007) found a correlation between the type of scale point label and completion
time: when items were fully labelled (compared to when only the anchor points were labelled)
it took respondents longer to answer, regardless of whether the scale points were numerically
labelled or not. Presumably the additional time was needed to read the verbal labels.
When comparing text input elds and radio buttons as reported in Couper, Traugott & Lamias
(2004, p.375), no statistically signicant dierences concerning time could be found, but in
Funke (2005) higher response times were found for VAS when compared to categorical scales.
The sniperscale (ash-based scale where the respondent had to shoot at the scale items), as
23
4 Eects on Visual Analog Scales, Slider Scales, Categorical Scales
reported in Ltters et al. (2007), took respondents longer than when using a slider implemented
in Java and the radio button scale.
4.9 Feedback Questions
In Walston et al. (2006, p.285 ), respondents were provided with ve feedback questions and
asked to rate ve qualities of the survey concerning dierent item formats (radio button, but-
ton, slider VAS), whereby the following extreme points were oered: attractive vs. unattractive,
worthwhile vs. a waste of time, stimulating vs. dull, easy vs. dicult, satisfying vs. frustrating.
In all pairs, the slider VAS had the highest mean values, which in this case reects badly on
VAS. However none of these comparisons reached statistical signicance. In Funke & Reips
(2008b), VAS were compared to radio button scales in 2 studies, whereby higher response times
were measured for VAS, but interestingly, response times were underestimated more often when
using VAS (respondents were asked how long they thought the questionnaire had taken).
van Schaik & Ling (2007, p.18f) reported the respondents preferences when comparing radio
button likert scales and VAS. Here, a signicant majority preferred the likert response format.
Based on openended questions, the following advantages and disadvantages for the two response
formats were mentioned: As advantages for both, ease/speed of use was mentioned, an advantage
for the likert scale was clarity of response and for VAS the degree of choice. As disadvantages
for the likert format, diculty of mapping judgement to 7-point numerical scale and response
set, and for the VAS lack of clarity and consistency and usability was frequently answered.
24
5 General Visual Eects in Online
Questionnaires
5.1 Paging versus Scrolling (All-in-one or Screen-by-screen)
The question dealt with in this section is if the entire questionnaire should be visible at once,
or whether each question should be on a separate screen. The advantages of the rst approach
is that the respondent can answer questions faster as there are no additional loading times for
each separate page and question. Furthermore, the rst approach corresponds to the format of
paper and pencil questionnaires which people are more familiar with. Respondents can also very
easily see what they previously lled out. A general recommendation on how many questions
should be placed on one page, is given in Crawford et al. (2005, p.52): As standard we recom-
mend for Web-based surveys, when the capability exists, is to provide only so many questions
on one screen as t without requiring the respondent to scroll to see the navigation buttons,
but consequences of the selection of a certain style is discussed subsequently, when results of
concrete experiments are presented.
In Weisberg (2005, p.123), the screen-by-screen approach was preferred for several reasons: This
enables the computer programmer to control the ow of questions instead of requiring respon-
dents to follow complicated skip patterns themselves. The screen-by-screen approach also facili-
tates sophisticated survey experiments in which dierent half-samples (split-ballot-experiments)
are given dierent question wordings, question frames, and/or question orders. Additionally,
improved observability was given when the pages consisted of single questions (e.g. time tracking
or the exact point where dropout occurred). Furthermore questions are presented in much more
isolation to the respondents. Individual screen construction techniques provide fewer context
than people normally have for answering questions and especially problematic when people were
asked a series of related questions (Dillman et al. (1998)). In addition if backtracking to previous
answers was not enabled for screen-by-screen solutions, it also happened that respondents lost
their sense of context. In some cases however scrolling to the next question had to be prohibited
due to order eects. A hybrid version consisting of blocks of questions which are presented all
at once, is methodologically more similar to the scrolling mode.
Peytchev et al. (2006) checked the dierences in response behavior concerning paging versus
scrolling in an experiment with undergraduate students. No signicant dierences in versions
or break-o rates were observed. Concerning the overall completion times, the scrolling version
took signicantly longer.
A related question is if there is a higher correlation among items when they are placed together.
25
5 General Visual Eects in Online Questionnaires
In Couper, Traugott & Lamias (2004, p.372) and Couper et al. (2001), higher correlations
1
could
be observed (but were statistically not signicant). Furthermore the eect of multi- versus single-
item screens in item-missing data was examined. As a result, nonresponse decreased when the
multi-item screen version was used, because this version was less burdensome to respondents. In
a paper by Toepoel et al. (2008) not directly paging versus scrolling was compared through the
presentation of scale items on one screen versus on two screens. Consequently no evidence was
found that correlations between the items were higher when items were presented on a single
screen than when presented on 2 screens. Moreover no evidence was found that placing all items
on a screen increases nonresponse, even if no dierences in lling-out times could be found.
5.2 Fancy vs. Plain Design
In this section a comparison between very simple and graphically complex designs in Web surveys
is provided. As Dillman et al. (1998, p.3f) showed in their experiments, very complex surveys
with advanced graphical designs (regarding colors, images) can be counterproductive. This is
due to the higher complexity of the Web page than the plain presentations without graphics
and with only black and white colors. It also takes less time to complete the whole survey
2
, and
higher completion rates were observed. This stands in stark contrast to the warnings that can
for example be found in Andrews et al. (2003, p.190), that poorly designed Web based surveys
encourage novice Web users to break o the survey process.
5.3 Dierent Alternatives Representation in Closedended
Questions
5.3.1 Check-all-that-apply vs. Forced-choice
Check-all-that-apply questions list all possible alternatives and the respondents can select the
appropriate ones (in Web surveys this is usually realized with a checkbox). When deciding on
a forced-choice format, respondents are explicitly asked about each alternative, and whether it
should be selected or not is chosen with radio buttons.
A general principle for the use of check-all-that-apply questions is stated by Dillman et al. (1998,
p.13)
3
as: be cautious about using question structures that have known measurement problems
on paper questionnaires, e.g. check-all-that-apply [...]. The drawback, when using check-all-
that-apply on very long lists is that people often satisce, i.e. check answer choices until they
think they have satisfactorily answered the question. Considerable evidence exists that people
often do not read all of the answer choices before going on to the next question (Dillman et al.
(1998, p.13)). This eect increases when scrolling becomes necessary to see all alternatives. An-
other principle can be found in Dillman (2007, p.62): eliminate check-all-that-apply question
formats to reduce primacy eects, which means that items listed rst are more likely to be
checked. In addition it is dicult to gure out the real reason for leaving a checkbox unchecked:
1
Measured by Cronbachs alpha coecient
2
Although some arguments about response times are outdated, e.g. in some cases the initial loading times are
not as relevant as they were in the past due to high speed internet connections
3
And similarly in Dillman (2007, p.398f)
26
5 General Visual Eects in Online Questionnaires
does the option not apply to the respondent, is he or she neutral or undecided about it, or was
the option simply overlooked?
A comparison of these two question formats in two Web surveys and one paper survey is de-
scribed in Smyth et al. (2006a). The general nding was that respondents endorse more options
and take longer to answer
4
in the forced-choice format than in the check-all format (on aver-
age 4.1 versus 5.0 options were selected). Regarding duration, an interesting dierence could
be found: Respondents who spent over the mean response time on check-all questions marked
signicantly more answers on average than those who spent the mean response time or less (5.6
vs. 3.7). [...] In contrast, forced choice respondents using greater than the mean response time
did not mark signicantly more options for most questions than their counterparts who used
the mean response time or less (5.2 vs. 5.0) (Smyth et al. (2006a, p.72)). Additionally it was
found that the use of the more time intense mode forced-choice had no inuence on nonresponse.
Furthermore, when neutral alternatives, such as dont know, were provided in the forced-choice
format, the third (neutral) category did not draw responses from the yes category for either
question. In Stern et al. (2007), all in all, these experiments were repeated and results were
consistent.
Smyth et al. (2008) compared (forced-choice) yes or no answers on the telephone with check-
all-that-apply answers on the Web. Recent experimental research has shown that respondents of
forced-choice questions answer signicantly more options than respondents to check-all questions.
This was also conrmed for this study across modes. Within the Web mode, the forced-choice
question format yielded higher endorsement of options than the check-all format. Overall, the
forced-choice format yielded an average of 4.74 of the options (42.3% of them) endorsed and
the check-all format yielded an average of 4.19 (38.3 %). A similar eect was found within
a telephone survey (4.44 (41.3%) vs. 3.87 (37.2%)). Additional comparisons showed that the
forced-choice format performs similarly across telephone and Web modes (Smyth et al. (2008,
p.108)).
Thomas et al. (2007) examined how these response format modes (amongst others forced choice
and check-all-that-apply) could aect behavioral intention measurement. 56.316 (U.S. 18+)
participated in a Web survey. The forced-choice mode yielded signicantly higher likelihood
endorsement frequencies across behavior than check-all-that-apply.
5.3.2 Grouping of Alternatives
Grouping of alternatives in closedended questions can be achieved in several ways. The results
of experiments where lines, spacing and additional headlines for these groups were manipulated,
are subsequently reported:
Eects of grouping of alternatives of closedended questions are reported in Smyth et al. (2006b)
and Smyth et al. (2004). Three versions of presenting the alternatives of closedended questions
were assigned to the respondents:
(1) an underlined heading was placed above each of two subsets of three response options ar-
ranged in a vertical format.
4
At minimum 45 percent longer and on average two and a half times longer
27
5 General Visual Eects in Online Questionnaires
(2) is same as version 1, but with an additional message in the question text: Please select the
best answer.
(3) all choices were placed in a single vertical line with no indication of sub grouping (which
means no headings and no additional spacing between groups).
The results of these experiments indicated that the use of header and spacing inuenced answer
choices: respondents to the grouped version not only chose more response categories than the
respondents to the version with no sub grouping, but were more likely to select at least one
answer from each of the sub groupings (Smyth et al. (2006b, p.11)). Interestingly, the eect
is stronger in fact-based questions compared to opinion-based questions. Similar ndings were
reported in Healey et al. (2005), where it was found that when response options were placed in
close graphical proximity to each other and separated from other options, respondents perceived
visual sub groups of the categories. This increases the likelihood that they selected an answer
from each sub group
5
.
In Tourangeau et al. (2004), two experiments comparing methods for including nonsubstan-
tive answer categories (like dont know and no opinion responses) along with substantive scale
responses were carried out. In the rst experiment, two versions were compared: (1) nonsubstan-
tive options were presented simply as additional radio buttons; (2) a divider line was placed to
separate the scale points from the nonsubstantive options. In the second experiment, additional
spacing was added to segregate the scale points from the nonsubstantive options. As a result,
the mean values of the substantive answers were higher when there was no separation between
the ve scale points. As a conclusion to these ndings, the following recommendation was given:
As a practical matter these results suggest that nonsubstantive response options (if they are
oered at all) should be clearly separated from the substantive ones. This strategy may have
the drawback of calling attention to the nonsubstantive answers, producing higher rates of item
nonresponse (Tourangeau et al. (2004, p.376)). The reason for the second phenomenon may
be due to the eect created when sub grouping is used (the nonsubstantive items form a group
when they are separated), and consequently the respondent tends to select one item per group
as found out by Smyth et al. (2006b)
6
. Similar ndings can be found in Christian & Dillman
(2004): experiments with equal versus unequal spacing between response boxes in closedended
questions were conducted. Signicant results could be found in nominal scale questions. The
alternative which was more set o from the others was selected more often as it was possibly
seen as one independent group, and thus the ndings of this study do support ndings of the
studies described above.
Another experiment carried out by Tourangeau et al. (2004) also works with the spacing be-
tween the verbally labelled ordinal items of a closedended question, but not to check group
eects. Tourangeau et al. (2004) examined what happened when the answer categories for an
item were unevenly spaced, and, as a result, the conceptual midpoint for an item did not coincide
with the visual midpoint. Spacing between the items was only reduced for items located right of
the conceptual midpoint. When using uneven spacing, 63.4% of the respondents chose answers
from the right side of the scale where there were bigger spaces between the items, compared to
even spacing (58.3%). This shows that not only the verbal label attached to a scale point, but
5
See section 10.2 for near means related principle
6
Who had also enabled multiple selection
28
5 General Visual Eects in Online Questionnaires
also its position in relation to the visual midpoint infers which specic value the scale point is
supposed to represent.
Again Tourangeau et al. (2004, p.387) checked the near means related heuristic by taking a
look if there are stronger interconnections among items that are displayed on a single screen
than among those displayed on separate screens and thereby boost the correlation among them.
Eight items had to be rated on the same 7-point response scale. As expected, the response to
the eight items to rate are more highly correlated when the items were presented in a grid on a
single screen (Cronbachs alpha of 0.621) than when the eight items were presented in two grids
on separate screens (Cronbachs alpha of 0.562).
5.3.3 Double or Triple Banking
When using an ordinal scale, respondents make their decision on a (more or less) implied contin-
uum; when double or triple banking is used, this continuum is interrupted and makes it harder
for the respondent, because some sort of mental transformation has to be achieved in advance.
When a nominal scale is used, there should not be any dierences, because no ordering exists
at all.
Christian & Dillman (2004) conducted two experiments: one with linear layout versus triple
banking of an ordinal scale and the second with linear layout versus double banking of an or-
dinal scale. It was found that respondents are more likely to select responses from the top line
(primacy eect) in the nonlinear version, which is statistically signicant for the triple bank
version. Healey et al. (2005) found that when double banking and no banking were compared,
double banked items showed 6% more checked items per respondent than non-banked items, but
this nding was not statistically signicant.
As a recommendation, a box should be placed around the categories in order to group them as
all being relevant for the question (Dillman et al. (1998, p.12)).
5.3.4 Response Order Manipulations
Stern et al. (2007, p.127f) describe experiments dealing with the order of alternatives in two
closedended questions. In one version, the response options started with the high end and in the
other version with the low end. It would be necessary to check for primacy eects however. For
the rst question (How often do you travel more than 100 miles outside the area) the reversal of
response categories did not result in dierences in response distributions. In contrast, the rever-
sal of response options for the second questions (How often do you use an internet connection to
access the Web for e-mail?) resulted in signicant dierences in response distributions. When
everyday was rst in the list it was selected at much higher rates than when it appeared at
last (56.5% and 37.5%, respectively). The reason was assumed in the similarity of the response
options everyday and nearly everyday: in version 1, nearly everyday appeared below everyday,
15.6% of respondents chose it; whereas in version 2, where it appeared before everyday, 28.8%
of respondents chose this response item. An extension of the experiment described above was
the inclusion of a dont know response option. The items were reversed as in the experiment
29
5 General Visual Eects in Online Questionnaires
above, but the dont know appeared at the end in both versions. It was expected that the dont
know response would be chosen less often when the options were ordered from most positive to
most negative because in this ordering respondents could quickly nd an option that ts them.
One signicant nding was that when options started with the negative categories, the dont
know categories were more likely to be chosen, compared to when the response options began
with positive categories (23.4% and 14.5%). In the same paper experiments with response order
manipulations in a ranking question were accomplished. For this purpose, two versions of a
question that asked respondents to rank eight problems facing the area from the largest to the
smallest were created. The response options that appeared in the rst and last two positions
showed the largest eects. The middle categories seemed unaected by the reversal.
Another experiment reported in Tourangeau et al. (2004, p.380) varied the response options.
Three version were oered: (1) response options were presented in an order that was consis-
tent with the top is rst characteristic
7
, which means the top option was one of the endpoints
and each of the succeeding options followed in order to extremity; in version (2), items were
mildly inconsistent, only two options were exchanged and in version (3), items were ordered
inconsistently. As a result, the hypothesis that when the scale options do not in fact follow
their conceptual rank order, it will slow respondents down (and possibly aect their answers)
was conrmed by this experiment. Furthermore the distribution of responses was aected, par-
ticularly for the option it depends: The proportion of the respondents selecting the it depends
option dropped dramatically when that option came at the bottom of the list than when it came
in the middle or at the top of the list.
Hofmans et al. (2007) conducted a Web experiment with 156 higher educated participants,
whereby the scale orientation was manipulated. The items were rearranged into 8 subsets, sub-
sets 1 and 4 appeared with a decremental scale and the items in subsets 5 and 8 were scored
on an incremental scale. Two randomly selected items from each of subsets 1, 4, 5, and 8 were
repeated in subset 5, 8, 1, and 4 respectively so that these items were lled-out twice but with
reversed scales. The main eect of orientation was non-signicant, which means that the orienta-
tion (incremental or decremental) of the scale had no impact on the average values of the ratings.
Galesic et al. (2008) introduced a new method to directly observe what respondents do and do
not look at during the lling out process by recording their eye-movements
8
. The eye-tracking
data indicated that respondents do in fact spend more time looking at the rst few options in a
list of response options than those at the end of the list. Malhotra (2008) found that response
order eects are even higher when short completion times can be observed.
5.3.5 Dierent HTML-Controls Used
Couper, Tourangeau, Konrad & Crawford (2004) compared the use of a list of radio button,
dropdown box and a drop box (or select eld) for selecting alternatives for closedended ques-
tions together with certain variations like reversing the order of the alternatives and varying the
number of initially visible items. They attempted to check the visibility principle: options that
are visible to the respondent are more likely to be selected than options that are not (initially)
7
See section 10.2 for further explanation
8
This method is called eye-tracking
30
5 General Visual Eects in Online Questionnaires
visible (Couper, Tourangeau, Konrad & Crawford (2004, p.114)). The visibility principle was
conrmed, visible items were more likely to be selected. Concerning response times, no dier-
ences were found.
A similar experiment was carried out by Reips (2002a) who compared a radio button scale with
a dropdown box (10 scale points with numerical labels were presented to the respondent). Two
versions of the questionnaire, one in English and one in German, were created. The German
link was sent to the German Internet Research List (gir-l ) to check possible experts eects.
No dierences concerning the answer distributions could be found. Additionally, the labels were
varied from (-5/+5) to (0/10). Interestingly the means of ratings for the (-5,+5)-labels were
much higher in the English (and non-gir-l) version. It seems as if people on the expert-list were
much more aware of the well-known eects of avoiding negative numbers as described in detail
in Schwarz et al. (1991) as well as in Fuchs (2003, p.30f). Another possible explanation for the
shift to higher values when using (0,10)-labels instead of (-5,+5)-labels is given by Schwarz et al.
(2008, p.20): When the numeric values of the rating scale ranges from 0 to 10, respondents
inferred that the questions refer to dierent degrees of success with not at all successful marking
the absence of noteworthy achievements. But when the numeric values ranged from -5 to +5,
with 0 as the middle alternative, they inferred that the researcher had a bipolar dimension in
mind, with not at all successful marking the opposite of success, namely the presence of failure.
van Schaik & Ling (2007, p.19f) report the results of a comparison of radio buttons and dropdown
boxes (both with 7 scale points) in an experiment with 127 undergraduate psychology students.
In regard to completion times, radio buttons were signicantly faster (109.7 vs. 127.64 seconds).
Moreover, respondents changed their answers more often when using radio buttons. Couper,
Tourangeau, Konrad & Crawford (2004) found out that the primacy eect was largest with
drop-boxes (compared to radio buttons and dropdown boxes) whereby ve options were initially
displayed (compared to all are visible).
Heerwegh & Loosveldt (2002b) conducted two experiments with university students comparing
radio buttons and dropdown boxes. In both experiments, the response rates did not signicantly
dier from each other across conditions
9
. Within the second experiment, signicantly lower
(p<0.01) dropout was produced when radio buttons were used (percent respondents retained:
96.6% versus 86.4%). Furthermore, in the rst experiment, a no answer category was also
available for both controls. No signicant dierence concerning the selection of this item was
observed. Download times between these two controls diered signicantly in the rst experi-
ment (not in the second), and showed that the radio button controls allow faster lling out.
5.4 Color Eects
Since color in general is extensively used in Web design, this section will deal with some rec-
ommendations on how colors should be applied and which problems can occur when color is
inappropriately used. Unfortunately only a few experiments deal with color eects.
9
It has to be mentioned that the controls were visible the moment respondents logged on to the survey
31
5 General Visual Eects in Online Questionnaires
There are some colors which have a special inuence on the respondents, e.g. red (see Gnambs
(2008) who gives a psychological explanation and a more detailed discussion). Two experiments
with the color red in a General Knowledge Test (GKT) are subsequently described, whereby in
the rst the color of a progress bar was manipulated (black, green, red), and in the second one,
the colors of buttons were manipulated (red vs. blue. vs. red/blue (red only on rst screen)).
Color cues in red resulted in signicantly lower scores on the GKT for men and women in study
one. In study two a slight increase in GKT scores under the red condition was observed for
women.
As the Web Content Accessibility Guidelines
10
suggest that color should not be used as the
only visual means of conveying information, indicating an action, prompting a response, or dis-
tinguishing a visual element since this would exclude respondents with visual hanidcaps from
being able to complete surveys. However using color for transporting information in Web pages
is dangerous in any case. Dillman (2007, p.383) exemplied the possible use of color in sur-
veys by allowing respondents in his experiment to chose their preferred color for various articles
of clothing through the selection of the correspondingly colored button. He mentions however
that, although such items are attention getting, it is important to recall what concept is being
measured. Because colors may appear quite dierent on various respondents screens due to
variations in operating systems, browsers and display settings, the blue selected or rejected by
one respondent may be quite dierent to that viewed by another. Even when several displays
are attached to one computer, colors appear dierently when the screens are not calibrated ac-
cordingly.
Some combinations of background and text color make it impossible for people with color blind-
ness to read the contents of the questionnaire at all (Dillman (2007, p.382)). Generally, Crawford
et al. (2005, p.48) recommended that surveys should not contain background colors, since they
may create problems when the contrast is too great an the text becomes dicult to read. Jackob
& Zerback (2006, p.19) are of a similar opinion and recommended the use of standard colors
and fonts.
5.5 Modications on Input Fields for Openended Questions
5.5.1 Dierent Size
Christian & Dillman (2004) carried out experiments with the size of input elds for openended
questions. They doubled the size of the openended answer space on one version of each of three
questions. Varying the amount of answer space on openended questions inuenced both the
number of words and the number of themes provided by respondents. For all three questions, the
larger space produced longer answers with a signicantly greater number of words. In addition
the longer answers generally contained more topics (Christian & Dillman (2004, p.68)). Stern
et al. (2007) repeated this experiment and found consistent results. Moreover there were no
signicant ndings when comparing demographic groups. However, there was a tendency for
respondents who were either over 60 years of age, or who had less than a college degree, or who
where women, to all provide responses that were one to two words longer than their comparison
10
See Caldwell et al. (2008)
32
5 General Visual Eects in Online Questionnaires
group, regardless of the size of the box. Weisberg (2005, p.122) also mentioned this eect:
people are more likely to give a long answer when a large space is left blank after the question
and less likely when questions are tightly packed on the form of the screen.
5.5.2 Lines in Answer Spaces
The Christian & Dillman (2004) experiment described in section 5.5.1 was modied by adding
lines to the openended answer spaces. It was expected that in the version with lines more
detailed answers would be given but no signicant results were found.
5.6 Inuence of Images
Images in general play a major role in Web design and thus are also integrated in Web question-
naires. Couper, Tourangeau & Kenyon (2004, p.257f) distinguish three types of picture usage
in Web surveys:
1. Questions about which images are essential (such as questions on recall of an advertisement,
brand recognition questions, questions on magazine readership).
2. Questions in which images supplement the question text, whether the images are intended
as motivational embellishments or as illustrations of the meaning of the question.
3. Questions in which the images are incidental (providing branding, an attractive back-
ground, etc.)
Both Couper, Conrad & Tourangeau (2007) and Tourangeau, P.Couper & Conrad (2003) carried
out experiments aiming to research the consequences of image availability in a relatively extreme
experimental design on response behavior. They both experimented with visual context eects
namely the eect that pictures of a healthy woman exercising versus a sick woman in a hospital
bed have on self-rated health, whereby size and placement of the images were varied. In the case
the pictures were much more than a simple embellishment. The response scale was a 5-point
fully labelled radio button scale ranging from excellent to poor. The general outcome was that
people consistently rated their own health lower when exposed to a picture of a sick woman.
Interestingly there was no eect, when the picture was placed within the header region (the
authors mentioned banner blindness
11
as possible reason for this eect) in 2 of 3 experiments.
The conclusion was that images can systematically aect responses when their content has rele-
vance to the survey question. Crawford et al. (2005, p.48) recommended a similar use of images
when they state: Care must be taken to using graphics that will not inuence how respondents
respond to the survey question. For example, if a survey were in support of an evaluation of a
program geared to connect police ocers with children in the community, it would not be a good
design to include a logo that showed a friendly-looking police ocer smiling at the respondent
throughout the survey.
Witte et al. (2004) report the results of a national geographic survey tracking the support or
opposition for endangered species (animals). One instrument provided a photographic image
11
The fact that people ignore banners on top of the pages, despite their being designed to be attention getting
(Couper, Conrad & Tourangeau (2007, p.628)
33
5 General Visual Eects in Online Questionnaires
of the species in question and the other used a simple text. As expected, strong support for
protection increased when respondents received a photograph. This eect was found more often
with male respondents. Additionally picture quality played an important role.
Deutskens et al. (2004) give a report of a further study concerned with product evaluations
and experimental modications in two versions of a questionnaire: (1) simply a list with article
names and (2) all articles with names and a picture. Interestingly, the visual representation of
the questionnaire had, with 19.0%, a signicantly lower response than the textual presentation
with 21.9%. This was possibly due to the the additional time which was needed by the browser
to download and align the images correctly, which resulted in a higher burden for the respondent.
Another eect was that when respondents got the textual representation, they were more likely
to select dont know as an answer than in the visual representation.
Couper, Tourangeau & Kenyon (2004) experimented with the use of photographic images to
supplement question texts. In a travel, leisure and shopping questionnaire, respondents were
given 4 versions: (1) no pictures, (2) pictures with low frequency instance (which means a picture
showing an action people seldom do), (3) pictures in high frequency and (4) both pictures.
Respondents were asked about the frequency of certain actions in their daily life, such as listening
to recorded music in the past week. For this question, the low frequency instance was listening to
the hi-, and the high frequency instance was listening to the ear-radio. As a result, four of the
six topics and the means for the high and low frequency conditions diered signicantly (p<.01)
from each other. It was observed that when the picture of the high frequency instance was
shown , respondents reported a higher average than when shown the picture of a low frequency
instance. In addition there was no correlation between the experimental treatment and the age,
sex and education of the respondents.
5.6.1 Personalization - Virtual Interviewer Eects
Interviewer eects in live interviews are well known and documented. This section describes
Web survey experiments with present interviewers (or researcher) in the form of photos.
In an experiment reported by Tourangeau, Couper & Steiger (2003) which deals with interviewer
gender eects in Web surveys, images of a (1) male researcher, a (2) female researcher, or (3)
the study logo was displayed on the questionnaire. This was varied in a second study where a
text message from the (male or female) researcher was additionally displayed or not. Further-
more, questions about the roles of men and women were asked. Concerning socially desirable
responses, with the exception of the eect of personalization on gender attitudes no results were
reported where the eects reached statistical signicance. The authors expected respondents of
both sexes to report the most pro-feminist attitudes when the questionnaire contained pictures
and messages of the female investigator and the least pro-feminist attitudes when the program
displayed the pictures and messages of the male investigator. This pattern was apparent in the
scale means of condition and reached statistical signicance, but the eect was much smaller
than found for live interviews in other studies. One idea behind displaying the photos was to
humanize the questionnaire process, but there was no evidence that adding humanizing features
improved response or data quality in the survey.
34
5 General Visual Eects in Online Questionnaires
Krysan & Couper (2006) conducted a Web experiment focussing on the eects of the interviewers
race. A representative sample of white respondents were asked. The race of interviewer and
social presence versus mere presence (4 pictures) was manipulated using images of black and
white people. Only one of the four attitude scales, namely the stereotype scale
12
had a signicant
(p <0.01) eect in regard to race. When white respondents were presented with images of African
Americans (regardless of mere versus social presence), their endorsement of negative stereotypes
of African Americans was lower (3.11) than when presented with images of whites (3.32). In
addition, the control condition - which had no images of people - was closer to the mean for the
white image condition. No overall eect of social versus mere presence could be found (Krysan
& Couper (2006, p.21)).
5.7 Heaping
It was discovered by several dierent researchers that both, openended questions, where numeri-
cal values have to be entered, and closedended questions using scales with numerical labels pose
a similar diculty: when a numerical question of a big range has to be given and the respondent
does not know the exact value to give, rounded answers are given. The respondents created
their own grouped answer categories. One example would be age heaping with multiples of 5
(Tourangeau et al. (2000, p.233)). These rounded values were used when it was dicult (or
impossible) for respondents to come up with an exact answer. The diculty may arise from
imprecise encoding of the information in memory, from indeterminacy on the underlying quan-
tities, or from the burden of retrieving numerous specic pieces of information. In each of these
cases, the use of rounded values may reect problems in the representation of the quantity in the
question (Tourangeau et al. (2000, p.235)). Heaping introduces bias into survey reports for two
reasons: (1) rounding values are not evenly spaced, because the distances between successive
round values increase as the number gets larger, which has the consequence that more values will
be rounded down than rounded up. (2) Respondents may not round fairly but instead follow a
biased rounding rule, which means they often do not necessarily round to the nearest number,
but instead characteristically round up or down
13
.
5.8 Additional Content and Visual Hints
Christian & Dillman (2004) added an arrow to direct respondents toward answering a subordi-
nate openended question, where detailed information should be provided. The researchers took
care that the arrow was placed inside the foveal view
14
. This arrow signicantly increased the
percentage of eligible respondents answering the subordinate question (unfortunately the same
percentage as ineligible mentions).
5.8.1 Placement of Instructions and Help Texts
Respondents might be more willing to request a denition or explanation from a Web based
resource than from a human interviewer, which makes this section important for Web surveys.
12
Other scales: discrimination, racial policies, race-associated policies
13
For a more detailed description of these eects, see Tourangeau et al. (2000, p.238f)
14
See section 10.3.1 for further information
35
5 General Visual Eects in Online Questionnaires
It was also often observed that when respondents obtain clarication for possibly ambiguous
questions, response accuracy improved dramatically.
Christian & Dillman (2004) varied the location of instructions for yes/no questions by placing
them before and after the response categories. The instructions contained information on how
to skip the question if a certain condition was not fullled. More people skipped the question
(26.2% versus 4.8%) when the instructions where located before the response options because
they were more likely to see and thus read the instructions before answering. Additionally, it
introduced confusion when the instruction was placed after the alternatives, because some re-
spondents appear to have used the instructions for the following question as well, because also
for the successor question, item nonresponse increased.
Two experiments within this eld were conducted by Conrad et al. (2006): a questionnaire about
lifestyle with rather complicated phrases (food nutrition concepts) was used including helpful
denitions. Three ways to retrieve the denitions were provided: (1) one-click interface, where
respondents needed to click on a phrase; (2) two-click interface, where an initial click displayed
a list of terms from which respondents could select one by clicking; (3) click-and-scroll interface,
where clicking on a hyperlinked term displayed a glossary (an alphabetic list) of all denitions.
Only the rst few were initially visible, scrolling was necessary to see the others. In general,
respondents rarely requested clarication (13.8% of those who answered the experimental ques-
tions) and the number of requests is sensitive to the amount of eort (here: number of clicks).
Due to these ndings, another experiment was carried out. By simply moving the mouse to
a certain term the denition hovered above the text. This resulted in an elevation to 22.4%
requesting denitions (Conrad et al. (2006, p.259)). While it is clear that more respondents
request more denitions with rollovers than by clicking, it could be that many of the rollover
requests were less deliberate than requests registered by clicking.
In Conrad et al. (2007), additional help texts were provided by clicking on a highlighted text,
which was meant to improve the comprehension of questions. As a result, respondents who
could obtain clarication provided substantially more accurate answers than those unable to
obtain clarication. In particular, respondents reliably provided more accurate answers when
they could click for clarication than when no clarication was available. The overall goal was
to bring features of human dialogue to Web surveys. A more extreme possibility of bringing
human factors to self-administered Web surveys is to employ interviewer avatars
15
.
Galesic et al. (2008) conducted an eye-tracking experiment which tried to nd out under which
conditions denitions of certain items regarding the questions are read or not. As a result the
eye-tracking data revealed that respondents were reluctant to invest any eort in reading def-
initions of survey concepts that were only a mouse click away, which suggests, that denitions
and help texts should be located next to the corresponding questions or question alternatives.
15
See chapters 7 and 5.6.1 for more information
36
5 General Visual Eects in Online Questionnaires
5.9 Date Formats
Christian et al. (2007) and Smyth et al. (2004) conducted experiments concerned with the inu-
ence of given date formats on the response behavior. A version with equal sized month and year
answer spaces was compared to one where the month space was about half the size of the year
space. In addition the eects of using word labels versus symbols (like MM YYYY) to indicate
the number of digits respondents should use when answering were checked.
As a result, when the month box was about half the size of the year box, respondents were
signicantly more likely to report the date in the desired format. While reducing the size of the
month box did not signicantly impact how respondents reported the month, it did signicantly
increase the likelihood that respondents reported the year using four digits. The use of the
symbols M and Y for month and year had the eect that the number of symbols determined
the number of digits entered by the respondents. Furthermore a slight improvement in giving
the correct format was given when the two input elds were connected to each other (no space
between the elds). Locating the symbols MMMM below answer spaces as opposed to locating
them to the right of the answer spaces resulted in a signicantly larger proportion of respondents
using a correct four-digit format to report the year. This shows that explanation texts should
be located as close as possible to the described input elds.
5.10 Progress Bar
A progress bar is an instrument which should show the respondent the actual position within
the questionnaire (e.g. how many questions or pages remain or how many percent of the whole
questionnaire has already been lled out). Generally speaking, a progress bar should motivate
the respondents to nish the survey and not quit a few questions before reaching the end and
should therefore be an attempt to increase response rates. A progress bar is also called progress
indicator or point of completion (POC) indicator. A progress bar is a feature that was rst intro-
duced for Web surveys, because, as Conrad et al. (2003) outline, they have not been necessary
for questionnaires before, because paper questionnaires inherently communicate information
about respondents progress: the thickness of the yet-to-be-completed part of the booklet pro-
vides immediate and tangible feedback to the respondent about how much work remains. When
branching and additional questioning dependent on previous questions answered is heavily used,
calculating the current progress is somehow dicult. When displayed to the respondent, this
value can lead to confusion.
The most interesting study running an experiment with the inuence of progress bars on dropout
is reported in Heerwegh & Loosveldt (2006a). One group of respondents had a progress bar dis-
played and the other group did not. As a result, the group with no progress bar had a higher,
but not statistically signicant dropout. When a progress bar was present, item nonresponse
(here: proportion of unanswered questions) was lower. The necessity to repeat the study with a
population of non-students is mentioned in the discussion part of the paper. Furthermore, simi-
lar results were reported by supersurvey (Hamilton (2004)), in an experiment which emphasized
the positive eect of a progress bar on completion rates.
37
5 General Visual Eects in Online Questionnaires
Contrary to these ndings, in Crawford et al. (2001, p.156) the progress bar appeared to have
a negative eect. Among those who started the survey, 74.4% completed it when no progress
indicator was present compared with 68.5% of those who received the progress indicator version.
In Crawford et al. (2005, p.49) an additional attempt to give an explanation is given: When
respondents see a progress indicator, we believe they extrapolate the time they have taken thus
so far in the survey and decide on how long the survey will take overall. In a survey that begins
with burdensome questions that may take longer to answer than average, such an interpreta-
tion may result in an evaluation of burden that is too high. Heerwegh (2004b) conducted an
experiment with 2520 university students and found no signicant eect of a graphical progress
bar
16
on dropout. Healey et al. (2005) report of experiments which provide one group of respon-
dents with a progress bar and the other group not. None of the dierences were statistically
signicant, so no support for the ecacy of the progress bar was proven. The same is true for
the results of a similar experiment reported in Couper, Traugott & Lamias (2004, p.370f) and
Couper et al. (2001) where dierences did also not reach statistical signicance. Interestingly,
the average time to complete the survey with the progress indicator took signicantly longer
than without (22.7 compared to 19.8 minutes). Two explanations were given by the author: (1)
download times increased because additional resources had to be downloaded for the progress
bar; (2) Respondents who received the progress indicator took more care over their answers.
In Conrad et al. (2005), the progress of a (textual) progress indicator was modied, so that three
versions were presented to the respondents, (1) progress following a linear function, which means
current page position is divided by total number of questions; (2) fast-to-slow: this was achieved
by dividing the log of the current page by the log of the nal page and (3) slow-to-fast: this was
achieved by dividing the inverse log of the current page by that of the nal page. As a result,
break o rates varied with the speed of the progress indicator. Respondents were more likely
to break-o under slow-to-fast feedback (21.8%) than under fast-to slow (11.3%). This also
aected respondents judgement of the duration of the task, which was asked at the end of the
questionnaire. Fast-to-slow-respondents estimated that it took fewer minutes to complete than
respondents in the other groups. An extended experiment varied the frequency of how often the
progress bar was displayed: (1) always-on, as in the previous experiment; (2) intermittent: at
nine transaction points in the questionnaire; (3) on demand: displayed when respondents clicked
a link labelled show progress. Additionally to this, the speed of progress was varied as in the
previous experiment. Concerning the impact the speed of progress had, the results replicated
what was observed in the previous experiment, but the frequency of progress feedback did not
reliably aect response rates.
In general it can be stated that one should look at the properties of the questionnaire before de-
ciding whether to employ a progress bar or not, e.g. the overall length of the questionnaire could
be crucial for this decision: In case of extensive surveys containing a large number of questions,
it seems to be recommended not to inform the respondent about his individual progress because
it may be particularly de-motivating if the progress indicator indicates no or only marginal
progress. Therefore, it may be recommended to add a progress-bar to short surveys but to re-
move it in longer ones because otherwise dropout rates are likely to increase (Jackob & Zerback
(2006, p.12)).
16
Grey bar with blue as progress color, percentage completed was written below
38
5 General Visual Eects in Online Questionnaires
5.11 Sponsors Logo
On some questionnaires, the logo of the sponsor or the operator (like a research institute, univer-
sity or company) was displayed. The eects of these logos are discussed in this section. Walston
et al. (2006) observed the eects of governmental and non-governmental sponsoring and found
that sponsorship had an eect (at least when combined with a fancy design) when the survey
was marked as governmentally sponsored. Moreover Heerwegh & Loosveldt (2006a) observed
an eect on dropout when a sponsors logo (the logo of the university) was displayed, since the
logo led to a (statistically non signicant) brake-o rate reduction. As a possible reason the
authority principle was mentioned, since the respondents may have held the organization of the
logo in high esteem. Heerwegh (2004b)s experiment found that there was no eect when the
university logo was displayed on a university student questionnaire on the completion rate.
It is hard to make general statements about the inuence of a sponsors logo because the context
is very important in this case. If the respondents hold the organization in high esteem, then
repeating the logo on each survey screen could decrease break-o rates. Conversely, if the
respondents do not hold the organization in high esteem, or even have serious doubts about
its legitimacy, then a logo might produce the opposite eect (Heerwegh & Loosveldt (2006a,
p.196)).
39
6 Non-Visual Design Experiments in Online
Surveys
Of course other experiments not dealing with visual design eects have been reported in recent
publications. Some of these results are discussed in the following sections. The inuence of the
high hurdle technique, condence and security (e.g. oering https as protocol) are topics not
dealt with in this chapter but present additional possibilities for running experiments with Web
surveys.
6.1 Incentives
Deutskens et al. (2004) oered respondents dierent incentives (depending on the length of the
questionnaire assigned) in their study: (1) a voucher with e 2 and 5 for an online book and CD
store; (2) a donation of a maximum of e 500, whereby respondents could choose between WWF,
Amnesty International, or a cancer association and (3) a lottery, respondents had the chance of
winning one of 5 vouchers of e 25 and e 50, respectively. As a result, vouchers and lotteries
had a higher response rate with 22.8% compared to the donation to a charity group with 16.6%
(where 61% chose the cancer association, 25% the WWF and 15% Amnesty International ).
Bosnjak & Tuten (2003) conducted experiments with prepaid and promised monetary incentives.
The positive eects of prepaid monetary incentives in mail surveys are well known. Due to new
technological services (e.g. paypal), which enables money to be transferred to people online in
advance, this incentive-mode is now also possible for Web surveys. The results indicate that
prepaid incentives ($ 2) in Web surveys seem to have no advantages concerning the willingness
to participate, actual completion rates, and the share of incomplete response patterns when
compared with post-paid incentives. Furthermore, post-paid incentives show no advantages in
comparison to when no incentives are given. Finally, compared to no incentives, participation
in prize draws ($ 50 and four $ 24 prizes) increased completion rates and also reduced various
incomplete participation patterns.
Gritz (2006a) took a closer look at the use of cash lotteries as incentives, whereby various
experiments with dierent forms of cash lotteries were conducted. As an overall result, cash
lotteries compared to no incentives did not reliably increase response or retention; neither did
it make a signicant dierence if one large prize or multiple smaller prizes were raed. In the
master thesis of Rager (2001), a survey was announced on a Web page as either a questionnaire
or as lottery. Those who got the lottery as an announcement had a signicantly higher rate of
completion (58.1% compared to 27.2%).
40
6 Non-Visual Design Experiments in Online Surveys
Birnholtz et al. (2004) carried out an experiment in fall 2002 with three dierent incentive modes:
(1) a $ 5 bill sent with the survey instructions via rst class mail; (2) a $ 5 gift certicate code
to amazon.com sent with the survey instructions via rst-class mail, or (3) a $ 5 gift certicate
code to amazon.com sent with the survey instructions via e-mail. Results show that $ 5 bills
led to signicantly higher response rates than either of the gift certicates (57% for cash vs.
36%
1
). This nding was statistically signicant (p<.01) and suggests that cash is a superior
incentive for an online survey, even with technologically sophisticated respondents. This may
be due to the perceived limitations, delayed payo, or reduced visibility of online gift certicates.
In summary, as noted in Bosnjak & Tuten (2003, p.216), the success with prize draws and cash
lotteries will possibly depend on cultural factors, which could explain the dierent ndings in
Gritz (2006a, p.216) and Bosnjak & Tuten (2003).
6.2 Invitation and First Contact
Quintano et al. (2006) compare invitation modes to a Web survey, either via telephone or via
e-mail. The study found that willingness to be a respondent increased when the initial contact
was made by telephone. Whitcomb & Porter (2004) tested complex graphical e-mail designs
and their aect on survey response. Respondents were contacted with one of six e-mail designs
that varied in format (text vs. HTML), color of background (white vs. black), and graphical
design (simple vs. complex). As a result, when using a black background, response rates were
lower (9.2%, compared to 12.6% for white as background color). In regard to the e-mail format,
participants who were sent the HTML e-mail with a white background and a simple header were
more likely to respond to the survey than participants mailed the bare-bones text message, with
a dierence in response rates of 3.6%. When using complex graphical headers in the HTML
format, response rates were also lower than with a simple header (9.9% compared to 11.9%).
Heerwegh et al. (2004) conducted experiments on personalization of e-mail invitation letters.
The rst experiment with the topic attitudes towards marriage and divorce had two conditions,
(1) no personalization as control condition, where respondents where addressed as Dear Student
and (2) personally addressed, e.g. Dear John Smith in the e-mail message itself. As a result, the
response rates diered signicantly (49.1% for the no personalization condition versus 57.7%).
No personalization eects on social desirability bias could be found. The second study had
the same study design and rearmed the positive eects of personalization on the response
rate in Web surveys. The eect of personalization reached statistical signicance in this study:
The average score on the debrieng question to which degree did you feel at ease to honestly
and sincerely respond to the questions? was signicantly higher in the impersonal salutation
group than in the other group, denoting the personalization group to be less at ease to honestly
report opinions, behavior or facts (average scores on a 1 to 5 point scale of 4.13 versus 4.01).
Furthermore, in sex-related questions, responses diered between these two experimental groups:
when a personal salutation was used in the e-mail, the number of sexual partners reported
increased
2
. To sum it up, the eect of personalization was so pronounced (an increase of 8.6
percentage points) that it seems worthwhile to consider personalizing e-mail contacts whenever
1
Having 40% paper- and 32% e-mail-invitation
2
Only respondents who had ever been in a sexual relationship were included in the analysis
41
6 Non-Visual Design Experiments in Online Surveys
possible. Heerwegh & Loosveldt (2006b) also tested the hypothesis that personalization would
induce a social desirability bias. Secondly, they carried out an additional test of the eect of
personalization on Web survey response rates with the result, that personalization signicantly
increased the response rate.
6.3 Dierent Welcome Screens
This section discusses the inuence of the rst visible page to the respondent, which should
contain introductive information for the whole survey. Healey et al. (2005, p.6) carried out an
experiment with 837 respondents and analyzed which inuence the visibility of the rst question
on the screen did or did not have. The outcome was that having the full question visible had
little or no eect on whether or not respondents decided to complete the question or continue
with the survey. Another recommendation comes from Dillman et al. (1998): Begin the Web
questionnaire with a question that is fully visible on the rst screen of the questionnaire, and
will be easily comprehended and answered by all respondents.
6.4 Length of the Survey
Deutskens et al. (2004) varied survey length in an experiment, where the long version took
about 30-45 minutes, the short version about 15 to 30 minutes to nish. As expected, the short
version of the questionnaire had a signicantly higher response rate with 24.4% compared to
17.1%. The analysis of the number of dont knows in the long and in the short version revealed
that there were proportionally more dont know answers in the longer version (statistically sig-
nicant with p<0.05). There were also more semi-completed questionnaires in the longer version.
Ganassali (2008) carried out experiments with the length of the questionnaire in combination
with interaction eects (repetition of previous answers and some types of forced answering),
where a short one had 20 and a long one had 42 questions with interesting results. The length
was not mentioned prior to completion, only on the rst screen via page number indicator. Inter-
estingly, the longer questionnaire obtained longer textual responses in openended questions (78
words compared to approximately 60 words, which is 25% less). Again surprisingly, respondents
who got the longer survey had signicantly higher respondent satisfaction than in the short
questionnaire (with a score of 6.75 versus 6.10) (Ganassali (2008, p.28f)). One would have to
take a look at the questions to see if there are any side eects.
6.5 Time to Complete-Statement at the Beginning
Walston et al. (2006) found that a shorter time-to-complete statement positively aected the
decision to begin the survey (14.4% for 5 minutes vs. 10.8% for 15 minutes), as well as a
general trend for the response rate. Similar results were found by Crawford et al. (2001, p.153):
those who were informed that the survey would take 8 to 10 minutes to complete had a lower
nonresponse than those who were told it would take 20 minutes (63.4% vs. 67.5%), but the 20-
minute group had a lower rate of breako once they started the survey. Similar outcomes were
reported in Heerwegh (2004b), where a vague length statement (as short as possible) produced
42
6 Non-Visual Design Experiments in Online Surveys
a signicantly higher login rate (66.5%) than the more specic length statement (approximately
20 to 25 minutes) (62.8%). The vague length statement did not produce higher break-o rates
than the specic length statement.
43
7 Outlook
Couper (2005, p.487) identies 5 general technology related trends in survey research:
1. The move from interviewer-administered to self-administered surveys: an interesting new
approach is IVR: IVR is the acronym for Interactive Voice Response, which is a data
collection technology in which the computer plays a recording of the question to the re-
spondent over the telephone and the respondent indicates the response by pressing the
appropriate keys on his or her telephone keypad (Steiger & Conroy (2008))
1
.
2. The move from verbal (written or spoken) inputs and outputs to visual and haptic and/or
sensorimotor inputs and outputs. Concerning auditory communication, apart from ad-
vantages such as cost reduction, the capture of verbal inputs could allow the analysis not
only of the selected response to a particular question but could assist in the analysis of
the certainty with which the respondent holds that view, based on the verbal qualiers
used in responding, or even extracted from other nonverbal qualities of the vocal response
(Couper (2005, p.489)).
Applications which employ audio-visual (multimedia) communication are video-CASI, or
the use of videos as stimulus material. Mayntz et al. (1978, p.114) describes an interview
as a social situation where interviewing turns out to be a special form of social interaction.
It is not clear if interviewer avatars can easily substitute this situation and which possible
side eects could result. Some research on this has been reported in Fuchs (2008) and
Gerich (2008). Adding multimedia to Web surveys makes it more attractive for respondents
to complete the survey, but nevertheless the use of multimedia content needs additional
steps to avoid excluding e.g. visually handicapped persons
2
.
Since the early days of Web surveying it was recognized that computer mediated surveying
could potentially enrich studies with multimedia stimuli such as graphics, pictures, spoken
word or other sounds. But in fact, these possibilities have only seldom been put into action.
In recent years Web surveys were enriched by graphics and pictures - some of which were
content bearing. Methodological evaluations have shown that these pictures can have a
serious impact on the perceived question meaning and thus on the responses provided. An
evaluation of this technology on unit non-response, social desirability and social presence
is given by Fuchs & Funke (2007).The use of images as visual communication is currently
making its way into mainstream survey research, because the use of full-color images and
photographs is a technically trivial and relatively inexpensive undertaking
3
.
Another new form of computer assisted data collection is the use of touch screen terminals
as reported in Weichbold (2003) and Weichbold (2005). A touch screen is a display which
can detect the location of touches within the display area, in the case of surveys usually
performed with the human hand.
1
Additional information on IVR can be found in de Leeuw (2008b, p.255)
2
See more on this in Zunic & Clemente (2007) and Caldwell et al. (2008)
3
For research results in this eld, see section 5.6
44
7 Outlook
3. The move from xed to mobile information and communication technology, for data col-
lectors and for respondents (mobile phone). Small devices like Personal Digital Assistants
(PDA) were used by interviewers e.g. for household screening in several large-scale surveys
in the United States.
4. The move from discrete surveys to continuous measurement. Because of new technological
possibilities, diary surveys can be conducted more easily.
5. The move from data only, to data and metadata, and also to paradata
4
.
7.1 Dynamic Forms, AJAX, WEB 2.0
In the following section, a description of new upcoming technology trends in survey research is
given:
Dynamic forms is the generic heading for dynamic text elds and dynamic lists, two innovative
ways of reactive data collection in self-administered online surveys. These Web 2.0 techniques
are described in Funke & Reips (2007b), where it is shown how to combine the advantages of
openended and closedended question formats. When using dynamic text elds, after beginning
with an entry, suggestions for the most probable word are oered in an area below the eld.
With each new letter these suggestions are re-adapted. By using dynamic lists, even questions
with large numbers of response categories can be brought into a hierarchical order and can be
answered like closedended questions. At rst, the respondent sees only a single table with very
general categories. As soon as one of these categories is selected, more specic choices appear
in a second table. The underlying technology which enables this functionality, is called AJAX
5
.
It is always mentioned within the eld of Web 2.0. These two methods have not been examined
in survey research yet. It would be interesting to start experiments regarding their inuence on
data quality or the cognitive processes underlying response behavior.
4
Concrete examples of paradata collection can be found in chapter 14
5
Asynchronous JavaScript and XML
45
Part II
Theoretical Background
46
8 Introduction
Of course it is important to embed the experiments in the general theory of survey research.
The methodological as well as a psychological theory will be given together with terminological
denitions such as reliability and validity.
The main methodological concept is the total survey error. Even though a general overview
of all kinds of errors which can occur when conducting a survey should be given, the focus is
set on those errors which can occur when running Web surveys in general and those which are
important for the experiments in particular. Reducing these errors increases survey quality. The
most important errors for this work are:
Nonresponse at the unit level (see chapter 9.1.1.4): when running Web surveys, this error
is often caused by technical problems. In the experiments some advanced technologies are
used, and it should be checked if these increased this particular error.
Dropout (see chapter 9.1.2.2): one of the main questions is how dierent styles of questions
inuence dropout. This is evaluated in chapter 18 and is closely related to respondents
burden (see chapter 9.1.2.3).
Measurement Error (see chapter 9.1.3): This error is central for the experiments and
concerns evaluations given in chapters 19 and 20.
Dierences between survey modes (mainly dierences between paper and pencil and online) are
also discussed. Because a lot of ndings already made for oine questionnaires should be also
applied on Web surveys, it is important to identify possible mode eects.
Additionally, some psychological theories are given, namely the individual steps involved in the
response process, and some visual interpretive heuristics together with gestalt principles, which
serve as a good basis for visually designing (Web) questionnaires.
Finally, the whole research topic is embedded in online research. Therefore, this topic is described
in more detail. Additionally, all major institutions dealing with online research, are mentioned.
47
9 Methodological Theories
This chapter should summarize methodological theories relevant for conducting surveys in gen-
eral and Web surveys in particular and how to reach high survey quality by reducing survey
error. All sources of errors which can occur in Web surveys are discussed in the subsequent
chapter with a focus on possible problems when carrying out surveys on the Web.
9.1 Total Survey Error
The total survey error covers all errors which bias the result of surveys and is systematically split
up into several specic error types. The experiments carried out for this dissertation should help
to diminish particular errors. Sample surveys are subject to four major sources of error, and
each must be taken into consideration in order to receive valid and reliable sample estimates.
A good overview of all survey errors is given by Weisberg (2005). The book total survey error
approach dened the total survey error as follows: The total survey error approach is based
on analyzing the several dierent sources of error in surveys and considering how to minimize
them in the context of such practical constraints as available money (Weisberg (2005, p. 16)).
Subsequently, all errors are described and their minimization briey discussed but the focus will
be set on those important for the experiments in this thesis. An error in this case is a mistake
made within the whole surveying process. It refers to the dierence between an obtained value
and the true value for the larger population of interest.
Biemer & Lyberg (2003, p.34) give another denition of the total survey error: the quality of
an estimator of a population parameter is a function of the total survey error, which includes
components of error that arise solely as a result of drawing a sample rather than conducting
a complete census called sampling error components, as well as other components that are re-
lated to the data collection and processing procedures called nonsampling error components.
Sampling errors result from selecting a sample instead of the entire population and nonsam-
pling errors are the results of system deciencies. According to Biemer & Lyberg (2003, p.38),
nonsampling-errors can be split up into 5 major sources and potential causes: specication-,
frame-, nonresponse-, measurement- and processingerror. All of the above will be described in
more detail in the following chapters together with error reduction strategies. A particular focus
will be placed on the errors, which play a major role for design eects in online surveys.
Weisberg (2005, p.19) gives a slightly dierent categorization of these error types, whereby the
source of the error is foregrounded
1
:
Respondent selection issues
1
The relevant ones will be discussed in the following chapters; those not relevant are described directly within
the following enumeration
48
9 Methodological Theories
Sampling error
Coverage error
Nonresponse error at the unit level
Response accuracy issues
Nonresponse error at the item level
Measurement error due to respondents: this comes up when respondents do not give
the answers they should, according to the researchers intentions. This can be the
result of wording problems or questionnaire issues such as question order eects or
question context eects.
Measurement error due to interviewers: all experiments carried out and described
deal with self-administered surveys, so no further description is given here
2
.
Survey administration issues
Postsurvey error: this error occurs after the interviews are conducted (e.g. within the
data editing stage). For Web surveys, the potential of creating this kind of error is
smaller because the data matrix is generated automatically. Therefore no manually
entered data errors (e.g. coding- or data-entry-errors) or automated reading errors
using e.g. OCR
3
can occur
4
. In the terminology of Biemer & Lyberg (2003), this
error is called processing error.
Mode eects: see section 9.4
Comparability eects: comparability issues arise when dierent surveys are com-
pared. Surveys taken by dierent organizations, in dierent countries, and/or at
dierent times are often compared, but the dierences that are found are not neces-
sarily meaningful (Weisberg (2005, chap.13))
5
.
In other classications, item- and unit-nonresponse are more closely linked, but here they fall
into dierent superior groups. Nonresponse can occur at two levels, the unit, by which we mean
a person or household (though it can also be an institution such as a business or school); and the
item, which is an individual question in our questionnaire (Czaja & Blair (1996, p.181f)). This
clear separation makes even more sense for Web based surveys because the reasons for unit- and
item- nonresponse dier even more in this mode. Nonresponse (particularly item-nonresponse)
plays a major role for this thesis, when the eect of dierent designs on the dropout-rate is part
of the analysis.
9.1.1 Respondent Selection Issues
9.1.1.1 Probability and Non-Probability Web Surveys
Before discussing respondent selection issues, the dierence between probability and non-probability
Web surveys must be discussed. The latter group has no sampling which has several consequences
for the generated error, because in most cases the self selection error is also added. From the
2
For further information on the interviewing style debate, see Weisberg (2005, chap.4)
3
Optical Character Recognition
4
See Weisberg (2005, chap.11) for further information
5
For a more detailed discussion of equivalence limits, see Weisberg (2005, chap.13)
49
9 Methodological Theories
aspect of sample selection, there are several types of probability and non-probability Web sur-
veys (this enumeration and description is taken from Manfreda & Vehovar (2008, p.265), with
slight modications):
Probability Web Surveys (often perceived as scientic surveys) are performed on a probability
sample if units that are obtained from a sampling frame satisfactorily cover the target population.
There are several types of probability surveys:
1. List-based surveys of high-coverage populations: these lists consist of samples of
e.g. students, members of organizations etcetera, whereby all have access to the Web and
where a sampling frame with satisfactory contact information is available.
2. Surveys on probability pre-recruited lists or panels of internet users: this sample
of internet user is pre-recruited with a sampling method like telephone surveys on a random
sample of households or using random-digit-dialing.
3. Surveys on probability panels of the general population: in this case, not only
pre-recruitment was done, but also hardware and software equipment needed for the par-
ticipation in several Web surveys was provided.
4. Web Surveys as an alternate option in mixed-mode surveys: a probability sample
of respondents can give the opportunity to choose a Web questionnaire among the available
survey modes, or the researcher has the possibility to allocate a part of the sample to the
Web mode
6
.
Intercept surveys should be put somewhere in between: systematic sampling is used to intercept
visitors of a particular Web site. Respondents are supposed to be representative of visitors to
that site, who constitute the target population. According to Manfreda & Vehovar (2008, p.265),
this modication is based on two reasons: (1) it is true that, when taking a look at the log-les,
a list of all visitors can be generated. The problem is that these visitors cannot be uniquely
assigned to a person, because it is likely that one person stands for several visitors (even cookies
cannot help out here). (2) Visitors who do not visit the page while the questionnaire is online,
are lost.
Nonprobability Web surveys (often perceived as non scientic surveys) do not have a probabil-
ity sample of units obtained from a sampling frame covering the target population satisfactorily.
In some cases (e.g. volunteer opt-in panels) probability sampling may be used, however the
sampling frame is not representative of the target population. There are several types of such
Web surveys:
1. Web surveys using volunteer opt-in panels (also called access panels): some con-
trolled selection of units from lists of panel participants is used for a particular survey
project. These lists are basically a large database of volunteer respondents. Opt-in here
means self inclusion. The problem is (at least for some surveys) that the internet is not
structured in a way that allows researchers to construct well dened sampling frames,
which would be a complete list of internet users that can be used to draw probability sam-
ples with known characteristics. One attempt to solve these problems is the employment
6
For possible mode eects as a drawback of this method, see section 9.4
50
9 Methodological Theories
of internet based access panels. One problem with panels is that they consist of volun-
teers, and it is impossible to determine how well these volunteers represent the general
population
7
.
2. Web surveys using purchased lists: these lists typically consist of e-mail addresses
purchased by a commercial provider, usually obtained either by specic computer programs
searching for e-mail addresses on Web sites or by participants self inclusion. Usually, they
have neither access restrictions nor control over multiple completions.
3. Unrestricted self-selected Web surveys: open invitations on dierent Web sites, but
also in online discussion groups and traditional media.
4. Online polls are similar to the group above and are more for entertainment purposes and
as forums for opinion exchange, like public polls or question of the day polls.
For categories 1, 3 and 4, self selection error can cause troubles on Web surveys where no access
limitations are given. Because of the eect of these volunteer respondents, under these condi-
tions, inference from survey respondents to any larger population through inferential statistics
is scientically unjustied (Dillman & Bowker (2001, p.3)). The reasons for selecting this re-
cruitment mode are in most cases cost considerations. Self-selection is a form of nonprobability
sampling and should be used with caution.
Faas & Schoen (2006) examined the eects of self-selection by running an experiment based on a
comparison of online and oine surveys conducted in the context of the German federal election
2002
8
. One of the online versions used self-selection as the recruitment mode. For the other
interviews, an online access panel was used and oine face-to-face interviews were conducted.
As is often the case when self-selection is used, those more interested and possibly involved in
a topic were overrepresented. If respondents are recruited using procedures that permits self-
selection, results can be expected to be biased both in terms of marginal distributions and in
terms of associations among variables (Faas & Schoen (2006, p.179f)). One reason for this is
that advertisements for online surveys are not distributed equally among Web sites; rather it
depends upon the subject of the survey on which Web site ads will be placed: information about
political surveys will be found more frequently on Web sites with political content than on sports
sites. This eect is veried with the empirical ndings: the self-selected respondents are highly
interested in politics, highly interested in the campaign, more polarized, are even more certain to
vote and - most strikingly - almost one in four of them is a party member (Faas & Schoen (2006,
p.179f)). It is also possible that those people who are strongly connected to a certain party, such
as the party members, ll out the questionnaire multiple times to inuence the results in their
favour. It is true that in this case, self selection does not lead to the desired results, but for other
types of surveys, nevertheless this mode can make sense (e.g. when users of a certain Web page
are the target population). Another nding reported by Faas & Schoen (2006, p.839) should
demonstrate the dangers associated with self selection: the marginal distributions of those who
participated in the unrestricted online survey show that these people are substantially younger
(33 years on average), better educated (76% have university entrance diplomas) and more often
male than female (78% were male). This follows the actual gures of internet users.
7
See de Leeuw (2008b, p.250) for further information
8
Which means a topic where most of the respondents will have a relatively clear opinion, if interested in the
topic
51
9 Methodological Theories
As a common strategy, when marginal distributions dier between two samples, weighting is
applied. Faas & Schoen (2006, p.187) had empirically grounded doubts concerning the con-
ducted study: As with distortions of marginal distributions, bias in associations is not reduced
substantially when data are weighted socio-demographically. Similar thoughts are mentioned
in Dever et al. (2008, p.57). Loosveldt & Sonck (2008, p.93) carried out an evaluation of the
weighting procedures for an online access panel survey. They come to the result that weighting
adjustment had only a minor impact on the results and did not eliminate the dierences.
9.1.1.2 Sampling Error
The rst concrete error discussed here is the sampling error. Subsequently, a few dierent
denitions are given: Sampling error arises from the fact that not all members of the frame
population are measured. If the selection process was repeated, a slightly dierent set of sample
persons would be obtained (Couper (2000, p.467)). Or in other words: sampling error is the
result of surveying a sample of the population rather than the entire population (Dillman &
Bowker (2001, p.2)). In contrast to the coverage error, here every person in the frame population
has a nonzero chance of being part of the sample. Representativeness is an important phrase in
this context: Even when interviewing a sample, the survey researcher normally wishes to be able
to generalize beyond the people who were sampled, sometimes even to people in other locations
and at other time points. That is possible only when the sample is representative of the larger
population of interest (Weisberg (2005, p.225)). Weisberg (2005, p.231) gives descriptions
of dierent sampling techniques. In general a distinction can be made between probability
sampling, where there is a known chance for every element in the sampling frame to be selected
for the sample, and the more problematic sampling, where the chance for being selected is not
known, which causes potential biases and the sampling error cannot be estimated
9
.
9.1.1.3 Coverage Error
Coverage error can be dened as a function of the mismatch between the target population and
the frame population, which are the actual entities from the target population with a positive
probability of inclusion in the survey. In other words, coverage error is the result of all units
in a dened population not having a known nonzero probability of being included in the sample
drawn to represent the population (Dillman & Bowker (2001, p.2)).
Coverage error is mainly associated with the sampling frame, the actual set of units from which
the sample will be taken: The sampling frame in a survey is the list from which the sample is
selected. Frame error is the error that can arise when the elements in the sampling frame do not
correspond correctly to the target population to which the researcher wants to make inferences
(Weisberg (2005, p.205f)). Weisberg (2005, p.205) sees coverage error as the most important
frame error and denes it as the mathematical dierence between a statistic calculated for the
population studied and the same statistic for the target population. Weisberg (2005, p.205) gives
another good example: A simple example of a coverage problem involves sampling from phone
books, because they exclude people with unlisted numbers who are certainly part of the target
population. When applying this error to Web surveys, the problem that arises is that there
are people who do not have access to the internet (or e.g. for some special surveys when they
do not have an e-mail address to receive an invitation letter) and are automatically excluded.
9
Strategies to reduce sampling error can be found in Dillman (2007, chap.5)
52
9 Methodological Theories
Similarly Couper (2000, p.467) states that: Coverage error is presently the biggest threat to
inference from Web surveys, at least to groups beyond those dened by access to or use of the
Web. This statement was even more true when it was published in 2000 than it is now, because
this eect has decreased since more people have widespread access to the Web
10
. This is still
present and will not vanish completely in the future. But additionally, even if everyone who is
part of the target population would have internet access, the diculties of constructing a frame
to select a probability sample of such persons are daunting (Couper (2000, p.467)). There are
also demographic dierences between those who have access to the internet compared to those
who do not (e.g. concerning income, residence and education): Handling the coverage problem
by weighting underrepresented groups would assume that rural low-income people with internet
access are like rural low-income people without that access, which would be a risky assumption
(Weisberg (2005, p.213)).
Demographic Dierences The demographic dierences between people who have access to the
internet and those who dont is well documented. For example Couper, & Coutts (2004, p.221)
give some gures for Germany, Switzerland and Austria concerning this problem and list the
variables with problematic dierences. These are enriched with actual gures for Austria
11
, as
far as available:
Age: 2004 in Germany 85% people within the age of 14-24 lived in a household with
internet access, compared to 32% those older than 54. In Switzerland 78% of those aged
20-29 used the internet regularly compared to 31% of those older than 49. In Austria the
percentage varied for those who used the internet during the last 6 months between 81%
(aged 16-24) and 10% (aged 65-74). When looking at the gures for 2008, both groups
have increasing percentages: 91.8% (aged 16-24) and 25.5% (aged 65-74).
Gender: in Germany in the rst quarter of 2004, 63% men and 53% women used the
internet. In Switzerland as well as in Austria the portion of male internet users is even
higher (63% compared to 46 % in Switzerland, 60% compared to 48% in Austria
12
. In
2008, there was still a gap in Austria: we have 77.2% male users compared to 65.3% female
users. The question was if they used the internet within the last 3 months. The gap is
even more extreme for those aged 55-74 years (50.3% versus 29.2%) but almost no gap
exists for those younger than 35 years.
Education: in Switzerland in 2004, 81% of those with a university degree use the internet
several times a week, compared to 54% of those who nished an apprenticeship or profes-
sional school. In Austria 82% of those with a university degree used the internet during
the last year, compared to those who have completed compulsory schooling with a portion
of 24% and 21% of those without compulsory schooling. In Austria in 2008, 95% of higher
educated people have used the internet in the last three months compared to those with
secondary education with 46.2%.
Income: In Germany, 87% of households with an income higher than 2600 e or more had
access to the internet in 2004. The portion of households with an income less than 1300
e is only 34%. In Switzerland, 81% of people with an income higher than SFR 10.000 or
10
For concrete gures see the demographic dierences paragraph below
11
Source: Statistik Austria at http://www.statistik.at/web_de/statistiken/informationsgesellschaft/
ikt-einsatz_in_haushalten
12
For Austria, these are gures from 2003, people were asked if they had used the internet during the last 12
months
53
9 Methodological Theories
more use the internet several times a week. For people with an income equal or lower than
SFR 4000, the portion is 25%. Unfortunately, no gures concerning income are available
for Austria.
Additional gures concerning Germany can be taken from Ehling (2003), e.g. for what purpose
the internet is used and why some households do not have internet access, as well as from Schef-
er (2003), where the percentage of internet users for all European countries for 2003 is given,
and from Wolfgang Bandilla (2003), where e.g. European countries are compared. Actual and
detailed gures for Germany can be taken from the (N)Onliner Atlas
13
.
As a concrete example for the eects of dierent demographic distributions, Bandilla & Bosnjak
(2003) compared a traditional written survey with a Web based survey with a CATI
14
pre-
recruited panel of internet users in a mixed-mode study. When taking a look at the distribution
of the respondents, there are disproportionately more males (66.1% compared to 48.2% for the
paper questionnaire) and on average they are younger and better educated than respondents
from the general population sample, which makes the comparability of the two results question-
able. As already mentioned for nonprobability Web surveys, weighting possibly cannot solve this
problem: Taking the characteristics gender, age and education into account, the adjustment
produces weighting factors for several online respondents exceeding the factor 5 (Bandilla &
Bosnjak (2003, p.238)). For example, Bandilla & Bosnjak (2003)s Web survey had 40.4% of
respondents under 29, compared to 16.4% in the paper survey. Similar problems with internet
sample diversity were reported by Best et al. (2001, p.138f).
Couper, Kapteyn, Schonlau & Winter (2007) reported experiences made from an internet survey
of persons 50 years old and older, where about 30% answered (via telephone) that they use the in-
ternet, and of these 73% expressed willingness to participate in a Web survey. A subset was sent
a mailed invitation to participate in a survey and 78% completed the survey, which is relatively
high for Web survey response rates. The authors imply that noncoverage (which in this case
is a lack of access to the internet) appears to be of greater concern than nonresponse (which is
in this case unwillingness to participate) for representation in internet surveys for this age group.
Demographic dierences are not the only problems, since there may be other dierences between
Web users and those who do not use the internet: Even if internet users matched the target
population on demographic characteristics such as sex, income, and education, there is still likely
coverage bias because they may well dier on other characteristics (Lohr (2008, p.102)). Self-
selection is often a mode used by Web surveys, which causes general problems with coverage
15
:
Coverage cannot be determined in samples that consist of volunteers, such as internet surveys
in which a Web site invites visitors to click here to participate in our online survey (Lohr (2008,
p.102)).
Subsequently, three dierent sources of coverage error are given
16
, from which also related ad-
ditional information can be retrieved:
13
http://www.initiatived21.de/fileadmin/files/08_NOA/NONLINER2008.pdf
14
CATI = Computer Assisted Telephone Interviewing
15
This problem was also discussed for the sampling error
16
Which are taken from Weisberg (2005, p.217)
54
9 Methodological Theories
1. Ineligibles What is commonly meant with coverage error is undercoverage (or underrep-
resentation), but there is overcoverage as well: In some circumstances, individuals not in the
target population are included in the sampling frame and are not screened out of the sample.
These ineligibles can also systematically dier from the members of the target population (Lohr
(2008, p.100)). Another example would be the inclusion of businesses in a household survey.
To avoid these situations (which bias the results) a good strategy is to ask screening questions
at the beginning of the interview, which is not always an easy task. Another option would be
to purchase eligible sampling frames, which is currently very common, e.g. from internet panel
institutions and can also have certain drawbacks as well as sources for bias
17
.
2. Clustering This frame mismatch comes up when groupings of units are listed as one in
the sampling frame, e.g. when household surveys are conducted, the unit being sampled is
actually the household, even when the researcher really wants to interview only one person
in the household
18
. Another possible situation is if a phone number is shared by multiple
households; the chance of these households being selected diminishes when they are divided by
the number of households. If the number of households is known, weighting within the selection
process could be a strategy against this error.
3. Multiplicity (Duplicates) Multiplicity describes the situation when a case appears more
than once in a sampling frame, thus giving it a higher chance of selection than other cases.
It often occurs in list samples when the sampling frame is based on several dierent lists,
as when using lists from multiple environmental organizations that have partially overlapping
membership, giving those people a higher chance of selection. It would also be a problem in
sampling from lists of e-mail addresses, since it is common for people to have multiple e-mail
addresses (Weisberg (2005, p.221f)). When it is not possible to remove these duplicates in
advance, one strategy could be in weighting based on the reciprocal of the cases multiplicity.
Reduction of Coverage Error in Web Surveys Again Weisberg (2005, p.213) mentions two
general strategies which attempt to avoid coverage error:
(1) Usage of probability- based telephone sampling methods, to recruit people with internet
access for an internet panel. People who agree to participate are sent e-mail requests to ll out
particular surveys, with access controlled through passwords.
(2) Oer free Web access to respondents who are originally contacted using telephone sampling
procedures to obtain a probability sample of the full population (Weisberg (2005, p.213))
19
.
Of course systematic data organization becomes a major issue when dealing with samples. Dill-
man (2007, p.198) stresses the importance of a good concept when creating and maintaining
sample lists.
17
For a discussion, see e.g. Sikkel & Hoogendoorn (2008, p.485)
18
The choice of whom to interview in a household is discussed in Weisberg (2005, p.245)
19
This strategy is e.g. applied with WebTV by http://www.knowledgenetworks.com
55
9 Methodological Theories
9.1.1.4 Nonresponse Error at the Unit Level
A further problem found in Web Surveys is unit nonresponse: Unit nonresponse occurs when
some people in the designated sample are not interviewed (Weisberg (2005, p.159)). In other
words the consequences are that (unit-) nonresponse error is the result of nonresponse from
people in the sample, when, if they had responded would have provided dierent answers to the
survey question than those who did respond to the survey (Dillman & Bowker (2001)).
Weisberg (2005, p.160) sees three types of (unit-) nonresponse, which can also be applied to
Web surveys:
(1) Noncontact refers to the situations in which designated respondents cannot be located, e.g.
if an invitation letter cannot be delivered via e-mail. Lynn (2008, p.41) nds some reasons for
unit nonresponse in Web surveys: For invitation-only surveys, where a preselected sample of
persons is sent (typically by e-mail), an invitation to complete the questionnaire, noncontact can
be considerable. This can be caused by incorrect out-of-date e-mail addresses, by the recipients
e-mail system judging the e-mail to be spam and therefore not delivering it or be the recipient
judging the e-mail to be spam and not opening it. In the case of recruitment via popup on Web
pages, noncontact is the case when popup-blockers are activated or Javascript turned o.
(2) Incapacity is when the designated respondent is incapable of being interviewed, due to
technological inability to deal with an internet survey (e.g. when certain necessary technologies
like Java are not installed or activated on the respondents computer).
(3) Noncooperation occurs when the designated respondent refuses to participate in the study.
This form is likely to be non ignorable if people are not willing to participate in a survey because
of its topic, such as if people with conservative social values were less willing to answer a survey
on sexual behavior.
Some theoretical approaches which try to explain the important points of survey participation
are summarized in Weisberg (2005, p.165):
(1) Social exchange theory: This theory assumes that peoples actions are motivated by the re-
turn people expect to obtain. The exchange concept emphasizes that the survey must have value
to the respondents in order to interest them. Social exchange theory is based on establishing
trust that the rewards for participation will outweigh the costs.
(2) Altruism, when one views survey participation as being helpful for a stranger.
(3) Opinion change is another psychological approach, which tries to convince the person that
the survey topic is salient, relevant, and of interest to the respondent.
Weisberg (2005, p.172f) summarizes ndings about demographic correlates of cooperation as
follows:
(1) Age: Younger people are more willing to participate if they are successfully contacted,
but they are less likely to be at home, which results in underrepresentation of young people
56
9 Methodological Theories
in some surveys. This statement is true for telephone surveys, but for internet based surveys,
respondents can choose the time when they ll out the questionnaire on their own. Additionally,
younger people make up a higher percentage of those with internet access
20
. Similar observations
were made in the study by Bech & Christensen (2009), where a signicantly lower response rate
was given in the Web-based survey compared to an equal postal survey. In this study, individuals
were randomly allocated to receive either a postal questionnaire or a letter with a Web link to
an online version of the same questionnaire.
(2) Gender: surveys routinely under represent men, because men are less likely to be at home
and more likely to refuse. This is again true for telephone surveys, but possibly not for internet-
based surveys, because more men have internet access than women. But it is also true for internet
studies that males are more likely to refuse (Weisberg (2005, p.174f)). For (3) race and (4) ed-
ucation, only less consistent results can currently be found.
There have been some eorts to decrease nonresponse, e.g. Thomas M. Archer (2007) sys-
tematically examined 99 Web based surveys to nd out which characteristics were signicantly
associated with increasing the response rate. Thirteen Web deployment characteristics and nine
Web based questionnaire survey characteristics were subjected with the following main outcomes:
(1) Increasing the total days a questionnaire is left open, with two reminders, may signicantly
increase response rates. It may be wise to launch in one week, remind in the next week, and
then send the nal reminder in the third week.
(2) Potential respondents must be convinced of the potential benet of accessing the question-
naire.
(3) Do not be overly concerned about the length or detail of the questionnaire - getting people to
the Web site of the questionnaire is more important for increasing response rates. Additionally,
the problem that no instructor or interviewer is available can in some cases cause break o,
because the respondents do not know how to ll out the questionnaire or do not know how to
work with the input controls.
Weisberg (2005, p.130) stresses the problem with (unit-) nonresponse data: The bias cannot be
estimated since the nonrespondent mean is not known. The bias problem is most serious when
people who would belong to one answer category are most likely not to answer that question, as
when people opposing a candidate from a racial minority do not indicate their voting intentions
in a preelection survey. The meta-analysis on 59 methodological studies in Groves & Peytcheva
(2008) tried to estimate the magnitude of nonresponse bias.
Reduction of Nonresponse Error at the Unit Level Czaja & Blair (1996, p.182f) give some
strategies for the reduction of unit nonresponse (for paper questionnaires), which can also be
applied to Web questionnaires: The options available to prevent unit nonresponse include de-
signing interesting and nonburdensome questionnaire; using eective data collection procedures,
such as advance letters and well-crafted introductions or cover letters.
20
As was already shown in section 9.1.1.3
57
9 Methodological Theories
Specic to Web surveys is the extent of unit nonresponse due to technical problems, like browser
problems or slow connection times (Weisberg (2005, p.189)). Furthermore, the lack of techni-
cal skills (when respondents do not know how to deal with certain input controls necessary for
participating in a survey) can be a barrier for participation. Because of this, it is important
to consider the simplicity of the survey and the use of standard technologies. Any additional
technological risk factors which could cause troubles within the respondents browser should be
avoided.
In the case of Web surveys, technical problems
21
could have an important inuence on this
rate
22
. It is relatively dicult for Web survey software implementers to be responsive to all
possible combinations of dierent browsers, browser settings, operating systems and possibly
also custom networking conditions as they are given in some companies or institutions. This
risk increases with the technical complexity of the survey (e.g. when using multimedia elements
or technologies not installed per default within each browser (like Flash)). Technical problems
can also be a relevant factor for item nonresponse.
After applying all strategies for reducing nonresponse error, there will in most cases still be some
nonresponse error. According to Weisberg (2005, p.193), there are basically two statistical
solutions to (unit-) nonresponse:
1. Weighting: survey respondents are sometimes weighted in a dierent manner so that
the obtained sample better resembles the population of interest. Methods for doing so
are:
(1) Probability of response weights, where observations are weighted inversely to their
ease of acquisition, with people who were easiest to interview being weighted less than the
hardest. The general assumption is that those who are hard to reach are similar to those
who could not be reached.
(2) Weighting class adjustment, which uses variables that were involved in the sample de-
sign and calculates weights as the reciprocal of their response rates.
(3) Poststratication adjustment diers from weighting class adjustment in that informa-
tion about nonrespondents is not available, so these weights are based on populations
instead.
2. Modelling nonresponse: this strategy attempts to model the nonresponse process.
These methods are controversial in that they make assumptions about the nonresponse
and they are dicult to employ because they must be estimated separately for each vari-
able of interest rather than simply deriving a single weight [...]. They require having
some predictors of participation in the survey, including at least one predictor that aects
participation but not the dependent variable of interest.
21
Together with the technical equipment in general as also mentioned in Vehovar et al. (2002, p.235)
22
As was documented for the experiments accomplished in chapter 14
58
9 Methodological Theories
9.1.2 Response Accuracy Issues
9.1.2.1 Nonresponse Error at the Item Level
In contrast to unit nonresponse discussed in the previous section, here the respondent has
already decided to participate, but does not nish lling out the surveys for several reasons.
Nonresponse error arises through the fact that not all people are willing or able to complete the
survey (Couper (2000)). According to Weisberg (2005, p.131), there are three dierent types
of item nonresponse:
1. Dont know or no-opinion responses: in most cases, dont know will simply mean that
the person lacks the information needed to answer the question, but this is not always the
case: dont know-answers are often a form of satiscing, it is an easy option to go on to
the next question. It is also possible that respondents did not understand the question.
Derouvray & Couper (2002) explore alternative designs for such uncertain response, e.g.
it was found out, that a reminder prompt decreases item-missing-data rates.
2. Item Refusal: refusals are relatively rare, even in questions about political voting and
sexual behavior. Interestingly, the question that is most often refused is the income ques-
tion. It is relatively dicult to nd out if the respondent refused to answer in Web surveys
(unless an alternative refuse to answer is given).
3. Not Ascertained: this can occur if skipping of questions is allowed. In Web surveys, so
called navigational errors can come up e.g. when the submit button is accidentally pressed
too fast when the new page is already loaded (in paging mode). It can also happen when
the structure of the questionnaire (and its branching) is not obvious. One can avoid these
problems when single questions are clearly separated and using the paging mode when the
questionnaire contains many branches.
Biemer & Lyberg (2003, p.112) found concrete possible reasons for not answering all items:
The respondent deliberately chose not to respond to a question because it was dicult to
answer or the question was sensitive.
The questionnaire was complicated, and if the mode is self-administered, the respondent
might overlook the instructions or certain questions, or the respondent might simply exit
the response process because it was boring, frustrating, or time consuming.
The questionnaire contains openended questions, which increases the risk for item nonre-
sponse.
The respondent (or the interviewer) makes a technical error so that the answer to a specic
question should be deleted.
The questionnaire is too long.
Reduction of Nonresponse Error at the Item Level Two strategies for minimizing item non-
response are given in Czaja & Blair (1996, p.182f): First, let the respondents know why the
question is necessary. Some respondents will want to know, for example, why some of the demo-
graphic questions (e.g. age, race or income) are needed. Often a simple response will suce. [...]
Second, remind respondents that all their answers are completely condential and will never be
linked to their name, address or telephone number. For Web surveys in particular it is essential
59
9 Methodological Theories
to track as much paradata as possible in order to be able to decide at an early stage whether
these problems have technical origins and intervene.
9.1.2.2 Dropout
A special form of item nonresponse is dropout: the respondent stops lling out the question-
naire, which means that from a certain point on there are no more answers available for a certain
unit. Dropout is mainly a problem if it is systematic: some participants selectively drop out,
for example, people who dislike the subject that an internet study is about. If the systematic
dropout coincides with an experimental manipulation, then the study in many cases is severely
compromised (Reips (2002b, p.242)). The largest proportion of dropout occurs in the rst part
of the questionnaire (Ekman et al. (2007)). Because of this fact, questions at the beginning
should have low burden. The positive eect of this strategy was experimentally approved by
Ekman et al. (2007). They gave respondents two versions of the questionnaire, one with easy
questions at the beginning and one with hard questions at the beginning. The dropout rate for
those with easy questions at the beginning was lower.
The high-hurdle technique stands in contrast to this strategy: the idea behind this technique
is to articially create higher burden at the beginning of the survey to lter out less-motivated
respondents. Gritz & Stieger (2008) carried out two experiments where participants had to
wait for the rst page of the study to appear on the screen. It was expected that those who
would continue were more highly motivated, and so data with higher quality was produced and
dropout was lowered. Against all expectations, the dropout rate and quality of data remained
independent of the loading time. In this case, articially delaying the loading of the rst page
was counterproductive. It was also found that questions about personal information should be
placed at the beginning of an internet study because this lowers dropout rates.
In the experiments conducted as part of this thesis, also the eects of the dierent input controls
on dropout were also evaluated. As mentioned e.g. by Lynn (2008, p.41), breakos are typically
higher for Web surveys than with other survey modes, which can be reduced by good design.
For concrete results, see chapter 18. In Ganassali (2008, p.28), it was hypothesized that a short
questionnaire would, amongst other data quality improvements, produce less dropout and a
higher lling out rate.
9.1.2.3 Respondents Burden
When running the experiments as described in part III, the burden of a certain input control
used is an important inuence particularly on item nonresponse and will be used in several
aspects of data analysis.
Here a denition of respondent burden as taken from Biemer & Lyberg (2003, p.107) is given:
one important correlate of nonresponse is the burden the respondent perceives in completing the
questionnaire and other survey tasks. It is widely accepted that if a survey request is perceived as
interesting and easy, the likelihood of obtaining cooperation will increase. Examples for burden
are: length of the interview, the pressure the respondent might feel confronted with questions,
60
9 Methodological Theories
and also the number of survey requests the respondent receives within a certain time period. For
Web surveys, additional points are also considered as burden, namely: time needed for lling out
with a certain input control (or in some cases loading times), the necessity to learn how certain
input controls are used. Additionally, Galesic (2006, p.315) state that: The eect of burden
might be cumulative whereby the burden experienced at each question is a function of both
specic characteristics of that questions and burden experienced while answering the preceding
questions. According to Funke & Reips (2008b), respondents burden can be measured with the
actual and perceived response time
23
.
Incentives An increasingly popular approach for dealing with noncooperation is giving respon-
dents incentives to participate. A good overview of the state of the art usage of incentives in
Web studies is provided by Gritz (2006b). Weisberg (2005, p.133) even recommends incentives
for answering single questions to minimize dont know answers. However, the question is whether
data quality is really improved when following this strategy.
Galesic (2006) attempted to enlighten the respondents decisions for dropout by registering the
momentary subjective experiences throughout the whole survey. Furthermore, the length of the
questionnaire was announced and the type of incentive was manipulated. Additionally, it was
analyzed whether dropouts could be attributed to changes of interest and burden experienced
while answering survey questions. Characteristics of the respondent were also taken into con-
sideration. Additionally the formal characteristics of questions (like: position; whether question
is open or closed; how many questions on one page) were also relevant for this analysis. An-
nounced length, respondents age
24
as well as block-level interest signicantly aected the risk
of dropout, while incentives, gender, education and work-related education had no signicant
eect. The subjective experience of interest had a high inuence on dropout: in this study, the
respondents with above median interest had 40% lower risk of dropout then the respondents
with below median interest. Similarly, the respondents with above median burden experience
had 20% higher risk of dropout than the respondents with below median experienced burden.
9.1.3 Measurement Error
Measurement error simply stated is the deviation of the answers of respondents from their true
values on the measure (Couper (2000, p.12)). Dillman & Bowker (2001, p.2) mention poor
question wording, poor interviewing, survey mode eects and/or some aspects of the respon-
dents behavior as the main cause for this error. The bias caused by this error is relatively hard
to measure because the true value is normally not known. Again, the risk of receiving higher
measurement errors is higher in self-administered surveys and even higher in Web surveys be-
cause of additional inuences like design and wrong use of input controls. Because of this it is
even more important for Web surveys to keep the survey instruments as simple as possible (even
if it comes at the expense of design).
The experiments accomplished in this thesis strongly focus on reducing this kind of error whereby
23
Which is just one factor of many, see e.g. Hedlin et al. (2008, p.301): Measurement of response burden tends
to focus on response time although response burden as perceived by respondents is not determined by time
alone
24
Older respondents were more likely to complete the questionnaire
61
9 Methodological Theories
the focus was placed on scale questions. Therefore one source for this kind of error can be the
survey instrument itself. Experimental design and results are described in part III.
9.1.4 Statistical Impact of Error
Statisticians distinguish between two general types of errors:
1. Systematic Error: Systematic error is usually associated with a patterned error in the
measurement. This patterned error is also known as bias. An example of systematic bias
would be measuring the average age from a survey if older people were less willing to be
interviewed (Weisberg (2005, p.19)). As the overview above shows, there are a lot of
possible sources for bias, which directly inuence the statistical results and measures (e.g.
the mean).
2. Random Error: Random error is error that is not systematic. For example, if people do
not think hard enough about a question to answer it correctly, some may err on one side
and others may err in the opposite direction, but without systematic tendency in either
direction (Weisberg (2005, p.19)). In contrast to the systematic error where the mean
value is inuenced by this error, the mean, random error has a mean of zero and thus does
not aect the mean of a variable (but does increase the variance of a variable).
Although a general overview was given, the discussion of total survey error is not the main topic
here
25
.
9.2 Reliability
When talking about measurement instruments, reliability is often mentioned as a quality crite-
rion for this instrument. The reliability of a measure has to do with the amount of random error
it contains. The reliability of a measure can be thought of as the extent to which repeated ap-
plications would obtain the same result (Weisberg (2005, p.19)). This denition focuses more
on the repeatability of measures and is reliability based on correlations between scale scores.
Internal consistency An important term when talking about reliability is internal consistency,
which is a commonly used psychometric measure used to assess survey instruments and scales. It
is applied not to single items but to groups of items that are thought to measure dierent aspects
of the same concept. Internal consistency is an indicator of how well the dierent items measure
the same issue (Litwin (1995, p.21)). Or in other words: A scale is internally consistent to
the extent that its items are highly intercorrelated. High interitem correlations suggest that the
items are all measuring the same thing (DeVellis (1991, p.25)). This means that there has to be
a strong link between the scale items and the latent variable
26
. Internal consistency is typically
equated with Cronbachs coecient alpha, which is widely used as a measure of reliability
27
.
25
For detailed information on Web survey error, consider Manfreda (2001), for a general discussion of the total
survey error, see Weisberg (2005), Biemer & Lyberg (2003), Vehovar et al. (2002) and Dillman & Bowker
(2001)
26
Which is the underlying phenomenon or construct that a scale is intended to reect (DeVellis (1991, p.12))
27
See DeVellis (1991, p.26) for an explanation and the formula
62
9 Methodological Theories
Split-half One technique used to measure reliability is split-half
28
. The dataset is split into
two parts which are compared with each other. When using this method, it has to be assured
that both datasets have the same properties; otherwise it is not advisable to simply split the
dataset in the middle. Alternative approaches would be odd-even reliability (which compares
the subset of odd-numbered items to the even-numbered items), balanced halves (important
item characteristics are identied and used as splitting criteria), and random halves (items are
randomly allocated to one of the two subsets) (DeVellis (1991, p.34f)).
Test-retest Ideally, test-retest-correlations are calculated in the case that two identical re-
peated measurements (one person has lled out the questionnaire twice) exist (Mummendey
(2003, p.76)) and the same conditions prevailed, which is seldom the case. It is measured by
having the same set of respondents complete a survey at two dierent points in time to see how
stable the responses are. It is a measure of how reproducible sets of responses are. Correlation
coecients [...] are then calculated to compare the two sets of responses [...]. In general r values
are considered good if they equal or exceed 0.70 (Litwin (1995, p.8)). Again, when applying this
approach, it must be assured that between the two (or more) measurements, no environmental
conditions have changed (temporal stability).
9.3 Validity
The validity of a measure is whether it measures the construct of interest (Weisberg (2005,
p.19)) or similarly Svennsson (2000, p.420): Validity refers to the operational denitions of the
variable, and can be dened as the extent to which an instrument measures what it is intended
to measure. Validity stands in coherence with the systematic bias: variables that are measured
with systematic bias are not valid. A description of the validation process is given in Spector
(1992, p.46) and focusses mainly on the validity of scales: validation of a scale is like the testing
of a theory, in that its appropriateness cannot be proven. Instead, evidence is collected to either
support or refuse validity. When a sucient amount of data supporting validity is amassed, the
scale is (tentatively) declared to be construct valid. Users will accept the theoretical interpreta-
tion of what it represents.
According to DeVellis (1991, p.43), there are three types of validity
29
:
1. Content validity is the extent to which a specic set of items reects a content domain.
Content validity is the easiest to evaluate when the domain is well dened. The issue
is more subtle when measuring attributes, such as beliefs, attitudes, or dispositions, be-
cause it is dicult to determine exactly what the range of potential items is and when a
sample item is representative. The assessment of content validity (compared to face valid-
ity) should be carried out by reviewers who have some knowledge of the subject matter.
The assessment of content validity typically involves an organized review of the surveys
contents to ensure that includes everything it should and does not include anything it
shouldnt (Litwin (1995, p.35)).
28
More information can be found e.g. in Mayntz et al. (1978, p.65)
29
Litwin (1995, p.35) adds a fourth type, face validity, which is based on a cursory review by untrained judges,
to see whether they think the items look ok to them
63
9 Methodological Theories
2. Criterion related validity means that an item or scale is required only to have an
empirical association with the same criterion. In other words, it is the degree of eective-
ness with which the scale predicts practical issues. Spector (1992, p.47f) describes how
to achieve criterion-related validity (mainly for scales): A criterion-related validity study
begins with the generation of hypotheses about relations between the construct of interest
and other constructs. Often a scale is developed for the purpose of testing an existing,
well-developed theory. In this case the scale can be validated against the hypotheses gen-
erated by the theory. If no theory and therefore no hypotheses exist, some theoretical
work must be done in advance to generate hypotheses about the construct. Litwin (1995,
p.37) distinguished between two main components: (1) concurrent validity, which re-
quires that the survey instrument in question is judged against some other method that is
acknowledged as a standard for assessing the same variable, and (2) predictive validity,
which is the ability of a survey instrument to forecast future events, behavior, attitudes
or outcomes. Predictive validity can be calculated as a correlation coecient between the
initial test and the secondary outcome.
3. Construct validity is directly concerned with the theoretical relationship of a variable
(e.g. a score on some scales) to other variables. It is the extent to which a measure behaves
with regard to established measures of other constructs. This means that when we have a
(theoretical) variable, which is related to other constructs, then a scale which purports to
measure that construct should bear a similar relationship to measures of those constructs.
It is a measure of how meaningful the scale or survey instrument is when in practical use
(Litwin (1995, p.43)).
The most frequently used measurement parameters are correlation coecients to measure va-
lidity as well as factor analysis, but the concrete application depends on the type of validity
that is being measured. Related to this topic is the sensitivity of a measurement instrument.
Sensitivity (or responsiveness) is the ability of a scale to discriminate among various systems,
user populations or tasks. In order to be useful, an instrument needs to be sensitive, that it
needs to have the power to detect dierences that are expected to exist (van Schaik & Ling
(2007, p.4)).
9.4 Mode Eects
There are many dierent modes of conducting surveys; self administered with paper and pencil -
questionnaires, face-to-face, telephone, mail and internet. This section focuses on the compari-
son of self-administered (in contrast to interviewer-administered) questionnaires and will mainly
compare paper and pencil with internet based versions. The eects or dierences between these
two modes are subsequently discussed. It is dicult to study, whether the results of a survey
would vary had used a dierent mode, because there are multiple sources of dierences between
modes. The biggest problems with online questionnaires are, compared to other (postal) ways
of delivering, the low response rates and poor data quality (Healey et al. (2005)), which leads
to measurement and non-response errors. In addition, dierent computer skills (which may also
be dependent on age and education) may cause a certain bias.
A further consideration in comparison to paper questionnaires is that computer logic is added
64
9 Methodological Theories
additionally to questionnaire logic. All information entered has to be committed
30
, in most cases
via pressing a (submit-) button. Therefore it becomes necessary to give some advice when run-
ning Web surveys, but this important information is often forgotten when creating Web surveys.
Meshing the demands of questionnaire logic and computer logic creates a need for instructions
and assistance, which can easily be overlooked by the designer who takes for granted the respon-
dents facility with computer and Web software (Dillman (2007, p. 358f)).
There is one fundamental distinction between designing for paper and the internet: the paper
questionnaire designer produces a questionnaire that gives the same visual appearance to the
designer as to the respondent. However, in cases of both e-mail and Web surveys the intentions
of the designer for the creation, sending and receiving of the questionnaire are mediated through
the hardware, software and user preferences. The nal design, as seen by the creator, is sent to
the respondents computer, which displays the questionnaire. It is not only possible but likely
that the questionnaire seen by the respondent will not be exactly the same as that intended by
the creator, for several reasons. The respondent may have a computer with a dierent operating
system, dierent kind of Web browser, or even a dierent release of browser software. Respon-
dents may also choose a dierent setting for the screen conguration, and may choose to view
the questionnaire as a full or partial (tiled) screen. Of course it should be attempted to make
these questionnaire appear as similar as possible across technologies (Dillman (2007, p.361)).
Additionally, browsers display some HTML controls (e.g. radio buttons) in their own special way,
both similar and dierent default fonts are used. Questionnaires are displayed in a well-known
manner to the respondent which is in some cases more desirable than an equal presentation
for all respondents. It is also possible to use custom CSS-settings, which would override those
delivered with the questionnaire. This can modify the visual design dramatically. Of course
some general disparities must be avoided: similar colors for text and background; dierences in
the relative distances between scale categories and alternatives for closedended questions; text
alignment must be equal across all browsers; All content must be initially visible (e.g. all scale
categories or alternatives for closedended questions
31
). But generally speaking, software devel-
opers and Web designers these days usually know all the common pitfalls and diculties when
creating cross-browser-solutions, so the troubles described above can be avoided quite easily.
Nevertheless, pre testing the questionnaire (ideally with pre testers using very heterogeneous
technology) becomes even more important.
With hardware, also dierent input devices are meant, e.g. scroll mice
32
, laptop keyboards
compared to usual keyboards, or in the case of visually handicapped people, braille reader de-
vices, which do have special technical preconditions to be properly readable, such as nested
tables. Furthermore the use of advanced technologies in addition to HTML (like Javascript,
Java, Flash,...) could cause diculties, because some parts of the questionnaire could not be
displayed correctly or not displayed at all.
Converting a paper questionnaire into a Web based questionnaire, to retain data comparability
30
Technically this is not compulsory when using AJAX, but respondents expect this behavior of a questionnaire
so it is recommended
31
For methodological consequences, see the results of systematic experiments in chapter 5
32
In Healey (2007) eects of using scroll mice when the questionnaire contains dropouts are armed
65
9 Methodological Theories
between internet and paper-based responses, is a challenging task for survey designers. Even
minor changes in the layout, color, fonts and other formatting of questions between these two
modes can convey dierent expectations about the kind of answer required. Because both modes
are self-administered, the respondent takes all clues into concern, not only the textual ones. Thus
dierent interpretations of questions can be possible.
Additionally, usability issues must be considered when designing internet forms, as the task of
responding to an internet questionnaire diers in signicant ways from the tasks of responding
on paper (Potaka (2008, p.1)). Filling out should be made as easy and clear as possible, with
low burden for the respondent.
9.4.0.1 Mixed Mode
There are studies which make it possible to ll out the questionnaire in multiple modes (e.g.
paper questionnaire and also an online version). When choosing such a survey design, all mode
eects discussed above must be taken into consideration. The advantage of such a survey design
is that comparisons to determine the extent of such mode eects become possible. Weisberg
(2005, p.293) found another advantage concerning coverage: they are particularly useful in over-
coming noncoverage problems, since the coverage involved in one mode is likely to be dierent
from the noncoverage in the other mode. A drawback was that they of course raise the costs
and administrative eort.
Shih & Fan (2007) report the results of a meta analysis on comparing response rates and mode
preferences in Web mail mixed mode surveys with the overall result that mail surveys are pre-
ferred over Web surveys, but with variation of mode preference across the studies. Because mode
dierences can possibly change the context of the survey dramatically, the article by Smyth et al.
(2008) on context eects may be helpful. Additionally, Heerwegh & Loosveldt (2008) investigated
the dierences in data quality between a face-to-face and a Web survey, where the hypothesis
that Web surveys produce lower data quality was supported. An interesting study dealing with
the eects of mode and question sensitivity is provided in Kreuter et al. (2008) who compare
CATI, IVR and Web surveys with the result that there are dierences between modes. Addi-
tional discussion about mixed mode data collection can be found in de Leeuw (2005), de Leeuw
(2008a), de Leeuw & Hox (2008), Duy et al. (2005), Denscombe (2006), Fricker et al. (2005),
McDonald & Adam (2003), Meckel et al. (2005), Voogt & Saris (2005), Dillman et al. (2008),
Dillman & Christian (2005a) and Dillman & Christian (2005b).
66
10 Psychological Theories
In this chapter psychological processes relevant for lling out questions are documented. The
dierence in response behavior may have some psychological origins, possibly even more for Web
based questionnaires than other types of questionnaires as dierent and additional cognitive
processes are running. Dillman et al. (1998, p.6) identify dierences in the logic of using a
computer and in lling out a questionnaire as one reason for dierent response behavior between
paper and pencil and online surveys. Hand and eye focus positions are set on dierent places
when using a computer.
10.1 The Response Process
Participants respond to interview administered surveys in four basic steps (Tourangeau et al.
(2000, p.8)):
1. Comprehension: comprehension encompasses such processes as attending to the question
and accompanying instructions, assigning a meaning to the surface form of the question,
and inferring the questions point - that is identifying the information sought.It is nec-
essary to remember that respondents of online surveys do more than simply reading the
texts. Although question wording plays a major role, visual design elements are also im-
portant
1
. Biemer & Lyberg (2003, p.129) mention context eects which could inuence
the comprehension of the questions: A context eect occurs when the interpretation of a
question is inuenced by other information that appears on the questionnaire.
2. (Information) Retrieval: the retrieval component involves recalling relevant information
from long-term memory.
3. Judgment: Couper, Tourangeau, Konrad & Crawford (2004, p.114) distinguish between
two modes of selection used by respondents for closedended questions:
a) Serial processing model: if respondents have a pre-existing answer to the question
and search serially through the list of response options until they nd that answer
or if respondents do not have a pre-existing answer but make a dichotomous judge-
ment about the acceptability of each option, stopping as soon as they encounter an
acceptable answer. A linear relation between response times and the position of the
answer selected is expected when following this model.
b) Deadline model: according to this model, respondents allocate a certain amount of
time to answering a question and select the best answer they have considered up to
the time when the deadline expires.
Satiscing and primacy eects play an important role in this process step. [...] respondents
begin checking answers and go down the list until they feel they have provided satisfactory
1
See part I for concrete ndings concerning such inuences
67
10 Psychological Theories
answer. Although certain respondents will read and consider each answer, others will not.
As respondents proceed down the list they may feel that enough has been done. Thus, the
items listed rst are more likely to be checked (Dillman (2007, p.63f)).
4. Response: Biemer & Lyberg (2003, p.125) additionally see a step before these, namely
Encoding and Record Formation, whereby knowledge is obtained, processed, and is either
stored in memory, or a physical record is made. For example, for a respondent to answer
accurately a question regarding the behavior of a household member in a survey, the
behavior must rst be observed and committed to memory so that it can be recalled
during the interview.
There are additional eects besides verbal language eects, namely numeric language
(numbers in the queries and answer categories), graphical language (size, spacing, and
location of information on the page), and symbolic language (e.g. arrows and answer
boxes) (Christian & Dillman (2004)).
10.2 Visual Interpretive Heuristics
Tourangeau et al. (2004, p.370f) as well as Tourangeau et al. (2007, p.94f) distinguish between
ve special visual interpretive heuristics that respondents follow when evaluating the visual
layout of survey questions. Each heuristic assigns a meaning to a spatial or visual cue. The ve
heuristics are
2
:
1. Middle means typical: this means that respondents will see the middle item in an array
(or the middle option in a set of response options) as the most typical, which can cause
the respondents to see the middle point as the anchor point (e.g. as the mean value for
the population) for their own judgements.
2. Left and top means rst: this interpretive principle reects the reading order of English
(and most western languages), which is therefore a cultural phenomenon. Therefore the
topmost or leftmost position in a closedended list will be seen as an extreme point of two
endpoints (the same principle is valid for rightmost and bottommost).
3. Near means related: this heuristic means that respondents expect items that are phys-
ically near each other on the screen to be related conceptually (e.g. items on same screen
versus items on separate screens). This heuristic is related to the gestalt principle of
proximity, which is described in a section below.
4. Up means good: this heuristic is related to the second one and means that, with a
vertically oriented list, the top item or option will be seen as the most desirable.
5. Like means close: this means that items that are visually similar will be seen as closer
conceptually. This heuristic is anticipated by the gestalt law of similarity described below.
10.3 Gestalt Principles
A few principles from gestalt psychology can also be applied to the visual layout of (Web-)
questionnaires, like the principle of similarity (objects of the same size, brightness, color or
2
Additional information can also be found e.g. in Schwarz et al. (2008)
68
10 Psychological Theories
shape are more easily seen together), principle of proximity (objects close to each other are
grouped together), and the principle of pragnanz.
10.3.1 Principle of Proximity
The Gestalt grouping principles suggest that placing instructions for respondents within the
foveal view as well as visually grouping them with the corresponding answer space using prox-
imity should increase the number of respondents who comply with the instruction (Christian
et al. (2007, p. 121)). This phenomenon is known as the principle of proximity and is described
in other words in Smyth et al. (2006b, p.8): we tend to group things together based on the dis-
tance of their spatial separation. In other words, we will see items that are close to one another
as a group and items that are distant as separate. One way to achieve this in Web surveys is
to use space. For example, using greater space between questions than between the stem of a
query and the accompanying response options creates grouping based on proximity and claries
the boundaries between questions (Smyth et al. (2006b, p.9)).
10.3.2 Principle of Similarity
The same principle is valid for visual similarity (e.g. when similar or equal colors are used):
when two options are similar in appearance, respondents will see them as conceptually closer
than when they are dissimilar in appearance (Tourangeau et al. (2007, p.91)). This concept is
also part of underlying theory used by Tourangeau et al. (2000). Similarly in Smyth et al. (2006b,
p.8): respondents are more likely to mentally group images that appear alike. Similarity can
be established through several means such as font, shape and color.
10.3.3 Principle of Pragnanz
The principle of pragnanz states that gures with simplicity, regularity, and symmetry are easier
to perceive and remember (Dillman (2007), Smyth et al. (2006b, p.8) and Smyth et al. (2004,
p.3)).
10.4 Types of Respondents in Web Surveys
To x the terminology concerning the status of response, a short description of all types is
subsequently given. Additionally, graphical representation of observable response patterns leads
to a dierentiation between seven processing types
3
is given in gure 10.1
4
:
3
For a detailed description of these 7 types and their individual motivation, take a look at Bosnjak & Tuten
(2001, p.6) or Bandilla & Bosnjak (2000, p.21).
4
Taken from Bosnjak & Tuten (2001, p.5)
69
10 Psychological Theories
Figure 10.1: User patterns in Web surveys
1. Complete responders are those respondents who view all questions and answer all
questions.
2. Unit nonresponders are those individuals who do not participate in the survey. There
are two possible variations to the unit nonresponder. Such an individual could be tech-
nically hindered from participation, or he or she may purposefully withdraw after the
welcome screen is displayed.
3. Answering dropouts consist of individuals who provide answers to those questions dis-
played, but quit prior to completing the survey.
4. Lurkers view all of the questions in the survey but do not enter any answers to the
questions. Lurkers are potentially easy to persuade to complete an already started Web
questionnaire compared to a nonresponder (Ekman et al. (2007, p.6)).
5. Lurking dropouts: represent a combination of points 3 and 4. Such a participant views
some of the questions without answering, but also quits the survey prior to reaching the
end.
6. Item nonresponders view the entire questionnaire, but only answer some of the ques-
tions.
7. Item nonresponding dropouts: represent a mixture of points 3 and 6.
70
Part III
Experiments Conducted by the
Author
71
11 Description of the Experiments
Experiments in three dierent surveys were conducted which oered respondents six dierent
input controls for answering questions dealing with rating scales. All participants were random-
ized to the dierent scale controls and the assigned control remained for the whole questionnaire.
Although there would have been a lot of possibilities for varying each of the controls (e.g. la-
belling scale points or modifying the width between scale points,...), only one version per control
existed. All questions under control were organized as question batteries, ensuring that scale
controls were used for each sub question. All experimental questions had two anchors (two ex-
tremes) on each side and the respondent had to give a concrete vote to position between these
two extremes. To get ideal support from the survey tool, custom-made software was developed
by the author, which satises all these needs with less eort
1
.
11.1 General Experimental Design
The general composition of the experiments is as follows: if an interviewee opens the Web
page containing the questionnaire, the questions are presented in a certain visual design which
was randomly selected. The probabilities for each control type were xed before starting the
experiment. This design is called split-ballot-experiment
2
, which is generally used for nding
out instrument eects. After assigning a control to each of the respondents, the groups can
be compared and analyzed with statistical methods. Respondents were not informed about the
experiments conducted behind the ostensible questionnaire to avoid a Hawthorne-eect
3
.
Paging Although the software can display multiple questions on one page, the paging (one
question per page) mode was used, because of the increase in observability for the experiments.
Thus it became easier to determine the question where dropout occurred as well as time needed
to ll out one question. A client sided time tracking instrument was used to measure the time
from when the page was initially loaded until data was submitted
4
to avoid bias caused by data
transfer, time needed for loading and processing the answers on the server and to remove any
possible eect of dierential download speeds
5
. Reips (2002c, p.248) has similar arguments,
especially in regard to dropout reports: If all or most of the items in a study are placed on a
single Web page, then meaningful dropout rates can not be reported. For further information
on paging versus scrolling (all pages on one screen) design and the eects of this decision, see
the experiment conducted by Peytchev et al. (2006). The drawback of this design was higher
1
For detailed information about usage and design of the software, see part IV
2
For a more detailed description and discussion about this method see Noelle-Neumann & Petersen (2000, p.192)
3
For an explanation see Diekmann (1999, p.299)
4
Which means in this case pressing the next button to go to the next question
5
Which was also mentioned by Heerwegh (2003) extended from Heerwegh (2002)
72
11 Description of the Experiments
completion times which in turn leads to higher dropout rates. For more information on results
of already accomplished experiments dealing with paging versus scrolling, see section 5.1.
Question Batteries All questions under experiment were organized as batteries, which means
that one question consisted of multiple sub questions. This made it easier for the respondents
to answer more quickly and made it obvious to the respondent that these questions logically
belonged together. Putting each subquestion on a single page would have resulted in a higher
burden due to loading times and time needed for initial orientation when the page was initially
loaded. There are certain drawbacks of this design, such as context eects, diculties when
attempting to track the exact time needed for a sub question, and in the case of dropout
diculties arise when determining the last successfully lled out subquestion
6
. An example for
a context eect is that respondents tend to take the rst answer of a question battery as some
kind of reference, which inuences the answering of the remaining questions.
Mandatory Answering Answering for all questions was mandatory, which meant that if the
next button was pressed before suciently answering the question on the page, feedback was
given directly in form of a soft prompt to explain the lack of response within the clients browser
and data was not delivered to the server. More thoughts on the possibilities of real time valida-
tions of required responses can be found in Peytchev & Crawford (2005, p.239). Validation was
given on client side via Javascript to display the feedback to the respondent immediately. When
Javascript was disabled, the server was checked and the previous question was displayed again
if an answer had not been submitted successfully. This approach was met with criticism
7
. It
may leads to higher dropout particularly when non-standard input controls are used. However,
in all 3 surveys it was desired by the sponsors. No default selection or slider-positioning was
provided, as recommended by Reips (2002b, p.246). Additionally, no midpoints were marked.
Instructions To ease the lling out process, a simple instruction text on the use of the controls
currently assigned to the respondent was displayed on top of each question battery. The position
did not vary as this was not the focus of these experiments
8
.
Technical Preconditions For the assignment and proper use of the dierent input controls, it
was important to determine which technical preconditions had to be fullled by the respondents
browser. This should be illustrated with the following example: if Javascript was disabled,
which was the case for approximately 2% of the respondents
9
, some of the controls could not
work properly. Thus, it was previously detected whether Javascript was enabled within the
clients browser, and if not, a control which was working without this technology was assigned
at random. This approach should minimize technical side eects.
Schwarz & Reips (2001, p.78) discuss possible problems when Javascript is disabled: Missing or
turned o Javascript compatibility can have disastrous consequences for online research projects.
This is particularly abundant whenever Javascript functions interact with other factors that have
6
Similar arguments can be found in Reips (2002c) and Reips (2002b)
7
E.g. Reips (2002c, p.248) argues that it is potentially biasing to use scripts that do not allow participants to
leave any items unanswered
8
Christian & Dillman (2004, p. 61) did however research this in their experiments
9
See chapter 14
73
11 Description of the Experiments
an inuence on motivation for participation or dropout. Participation may be low, dropout rates
may be high, and participants behavior may be systematically biased.
VAS As special controls, Visual Analogue Scales (VAS) were implemented and compared to
the other controls. For denitions and properties of VAS, see chapter 2, for screenshots and
descriptions of concrete behavior, see the corresponding section below, which describes all the
controls. VAS are relatively new in online survey research, nevertheless a few experiments
have already been accomplished, with somewhat contradictory ndings (see chapter 4 for more
information on this).
Feedback To retrieve direct feedback from the respondents, a few questions were placed at the
end of the questionnaire, such as (translated from German): (1) Is the scale ne grained enough
to express the own opinion exactly?; (2) Are the controls too complicated in general?; (3)
Does it take too long to understand the use of the instruments?. These individual respondent
impressions were very important, because dropout could partially be explained through the
feedback on the dierent controls. Unfortunately, those who quit the survey did not reach this
questions and their feedback would have been even more interesting.
Considerations regarding Design When thinking about all the dierent possible settings on a
respondents local PC, it becomes obvious that only providing dierent designs is not enough. It
is also important that the designs are compared. It is not manageable to make the questions look
the same for all congurations and browser settings due to e.g. dierent symbols and lookalike
used for input controls within the dierent browsers. There are a few settings which are not
possible to control, which would signicantly change the appearance of the questionnaire. As an
example, the Web page designer cannot inuence the consequences of using custom style sheets
(CSS) which can be set for browsers like Firefox. Regarding cell size of and spacing between the
scale points, Dillman & Bowker (2001) report possible problems with distances between points
which change as a result of dierent screen resolution or switching to full screen mode. Questions
in the tourism and snowboard questionnaire had a yellow background with black text color; the
webpage had a white background with black text color, which was desired by the sponsors of
the surveys.
Standards The 16 standards for Web experimenting as dened by Reips (2002c) as well as
the points mentioned in Reips (2002c), Reips (2002b), Andrews et al. (2003), Crawford et al.
(2005), Kaczmirek & Schulze (2005), Kaczmirek (2005), Lumsden & Morgan (2005) and in a very
early document byDillman et al. (1998) were taken into consideration and were implemented, if
applicable on the concrete experiments.
Navigation No button enabling respondents to quit the survey was provided for the simple
reason that respondents were always able to quit the survey by simply closing their browser (or
the browsers tab). Responses were stored for each screen, so that the use of such a button was
not necessary. After answering each page, respondents had to explicitly take an action to proceed
(namely pressing the next button) to reach the next page. The next-button was displayed in
the bottom left corner of the screen, which was recommended by Crawford et al. (2005, p.56).
74
11 Description of the Experiments
It was not possible to return to the previous question since the correction of previously given
answers would not have been favourable for the experiments.
Technical Infrastructure For all surveys, the online survey software QSYS (in a very early
version) was used, which was developed as part of this thesis
10
. For the database, an Oracle
version 10g was used, which was hosted by the Central Information Technology Service of Inns-
bruck University, who greatly supported the whole dissertation project. To avoid data loss, a
complete backup was created every day. The application itself runs on a Web Application Server
belonging to the IT-Center of Innsbruck University, which provided system stability. To encrypt
information transported via internet to provide condence, https was used as a protocol.
11.2 Dierent Input Controls
To get an impression of the dierent looks of the dierent controls, see the screenshots below
together with a short description of their use. Only one example is given for each question type,
but all controls look the same in all other questions. For each control, a short name is given in
brackets, which will be used for simplicity within the evaluation chapter. For all controls, 10
scale points were used, except for slider-VAS (200 scale points) and click-VAS (20 scale points).
When the selectable items stand isolated with spaces between them (as is the case for the radio
and button control, focus was placed on using equal spaces between these items to avoid negative
side eects as e.g. reported in Tourangeau et al. (2004, p.379f)).
11.2.1 Radio Button Scale (radio)
The most commonly used control elements for such question types are radio buttons. Radio
buttons appear as a row of small circles; each circle corresponds with a response option. Recent
studies have already gured out the advantages of these types of control elements
11
. The advan-
tages of this type are easy usability (just one single click is necessary to rate) and familiarity.
Radio buttons are frequently used on Web pages in general and they resemble the equivalent
paper-based input elds (in most cases). They are also recommended by standards for Web
surveys e.g. in Crawford et al. (2005, p.55): For single response questions, radio buttons should
be used for respondent input. Funke & Reips (2008a) even categorize this type as Radio Button
Scales (RBS).
10
See part IV for a detailed description of the software and its capabilities
11
For detailed information, see chapter 4
75
11 Description of the Experiments
Figure 11.1: Screenshot of a sample of a radio question
A possible (unproved) drawback could be the small size of the button and the necessity to move
the mouse exactly on to the small circle, which could lead to higher burden (with the possible
consequence of higher dropout rates). It was attempted to nd cross-browser solutions for all
controls, looks that they would all look the same in all browsers as far as possible. In case
of radio buttons this is impossible because each browser uses dierent visualizations of these
control elements, which can also lead to side eects
12
.
11.2.2 Empty Button Scale (button)
To avoid the possible drawback of the size of the radio buttons, bigger buttons were used for this
control, which behave in the same way radio buttons do (with the dierence that these buttons
turn red when selected and gray when not selected), so the clickable action area for each response
option is much larger when using this control compared to radio buttons. Technically, these are
normal buttons used within HTML-forms, but with adapted styling using CSS. Javascript was
used to change the color of the buttons when clicking on them. Again, one simple click was
sucient to answer a (sub-) question with this kind of control.
12
E.g. Welker et al. (2005, p.21) mentions a shadow eect for radio buttons in some browsers, which can have
an inuence on response behavior
76
11 Description of the Experiments
Figure 11.2: Screenshot of a sample of a button question
11.2.3 Click-VAS (click-VAS)
When examining radio- and button-controls, there are still spaces between the buttons. It
would possibly give the respondent the feeling that they are rating on a continuous scale, when
there are no spaces on the scale, which often occurs with this control type. Technically, again
Javascript was used to simply exchange pictures when clicking on them. One picture was used
for each scale item. Again, red was used as color to accentuate the selected scale item. This
control possibly best ts the denition of a VAS given by Funke & Reips (2007a). However, it
is important to mention that one scale point had a width of more than one pixel which was a
requirement implied by Funke & Reips (2007a). No tick marks or labels were present on the
scale to ensure a real continuum
13
.
13
For a denition (a list of properties) of VAS as they are utilized in these experiments, take a look at section 2
77
11 Description of the Experiments
Figure 11.3: Screenshot of a sample of a click-VAS question
This approach had one technical disadvantage: some browsers have image drag-and-drop func-
tionality, used to easily copy pictures from a webpage to a local directory on the computer.
Thus, when some respondents accidentally did not click on the items but tried to slide over the
control (with the left mouse button pressed), this feature was activated and the mouse pointer
changed to the drag-and-drop-mode which could have confused the respondents. A better so-
lution would have been to use simple div-boxes as scale items, which do change background
color when clicking on them. But nevertheless, this behavior in all likelihood did not have any
inuence on data quality; possibly respondents were only confused when this occurred for the
rst time. Again, one simple click was sucient to make the selection.
11.2.4 Slider-VAS (slider-VAS)
This control consists of a VAS with 200 scale points behind which each point was selectable. The
user had to position a slider between the two anchor points. When starting to answer a question
with this slider control, the initial position of the slider was in the middle of the scale. To make
it possible to distinguish between skipping a question and placing the slider in the middle of the
scale deliberately, the slider had to be moved. There were other approaches, e.g. Couper et al.
(2006) used a slider which was not automatically present on the scale for the experiments. It was
necessary to initially click on the horizontal bar to activate (and see the slider). This had several
advantages (e.g. no inuence on response behavior by the starting position), but it was possibly
harder for the respondents to understand how to use this control. When showing the slider, it
became clearer how to handle the control. Unfortunately, the initial positioning had two side
eects: (1) because respondents had to move initially, some moved away from the midpoint
(mostly in the right direction) although the desired position to be selected would have been the
midpoint, thats why there is a small gap at the midpoint for this control
14
. (2) A midpoint
14
Take a look at gure 20.1 to see the eect
78
11 Description of the Experiments
could be found by the respondents when taking a look at the slider positions underneath. This
is problematic, because a condition for all 6 controls was not that no position on the slider was
marked.
Figure 11.4: Screenshot of a sample of a slider-VAS question
Technically, a simple (congurable) Java Applet was used to implement the slider. To commu-
nicate with this Applet, a Javascript callback function was passed to the Applet at initialization
time, which noted each time the slider position was modied. This value was set to a hidden
form eld within the page via Javascript, which was delivered to the server when pressing the
next-button. The reason, why the Applet did not directly communicate with the server was
better integrability into the existing structures of the survey software.
11.2.5 Text Input eld (text)
This control uses simple text form input elds (customized with CSS) where a number has to
be entered to rate. The line with the scale positions is displayed above the input eld. This is
the only control, where additionally the keyboard has to be used to give the answer. To avoid
invalid input of information, a client-sided integrity check was performed before data was sent
to the server. If anything else than a number between 1 and 10 was entered, a soft prompt on
the top of the questionnaire was displayed in red and data submission to the server was blocked
until only valid numbers were entered. This was necessary to keep data clean, but this could
have also increased respondents burden.
79
11 Description of the Experiments
Figure 11.5: Screenshot of a sample of a text question
A negative eect could have been caused by the autocomplete function (according to previously
entered values) oered by most of the common browsers (e.g. Firefox) when using text input
elds with the same names, which could have led to same answers for sub questions.
11.2.6 Dropdpown Menu (dropdown)
This control is similar to the text control, but here dropdown boxes were used to select the
appropriate position on the scale instead of a text input eld. As in all other controls, no default
selection was given
15
but simply an empty item shown. This was for example mentioned in
Weisberg (2005, p.123) as an important point to avoid, since this alternative is then inadvertently
recorded as giving that default answer. To see all possible alternatives (which means values 1-
10), no scrolling was necessary, all were initially visible after clicking on the control. In contrast
to the other controls, an initial click was necessary to see all alternatives.
15
This is especially recommended for dropdown boxes which is e.g. mentioned by Crawford et al. (2005, p.55)
80
11 Description of the Experiments
Figure 11.6: Screenshot of a sample of a dropdown question
Two steps were necessary for answering: clicking on the dropdown box and selecting the desired
entry. As found in previous experiments, this control seems to be the most problematic one.
E.g. one principle in Dillman (2007, p.392) is to use dropdown boxes sparingly. Healey (2007)
found out that measurement errors can occur with this control in combination with scroll mice.
Dillman & Bowker (2001) observed that, Not knowing what to do with a dropdown menu as a
reason for respondent frustration and so a possible source for higher dropout rates. Similarly,
Crawford et al. (2005, p.55) also recommend that list boxes should in general only be used when
the list of responses exceeds twenty.
11.2.7 Dierences and Similarities
All styles have dierent properties, which can inuence results. It is important to evaluate all
controls on this level, because the likelihood of satiscing increases with task diculty and eort
to answer. In table 11.1, all properties of the controls were listed at a glance.
Feedback in this case means if there is any numerical feedback given, so that the respondent can
precisely enter a number in opposition to the controls, where the choice is placed without any
numerical feedback as informative basis provided
16
. Also Cook et al. (2001, p.700) argue why
no feedback concerning interval count should be given on graphic scales: When a scale uses
numerous score intervals, participants are told how many scale points there are, and they not
only can but are expected to accommodate these intervals within their conscious thinking.
16
For more on numerical feedback in scale questions, see Schwarz et al. (1991)
81
11 Description of the Experiments
control feedback continuous input devices
radio no no mouse
button no no mouse
click-VAS no yes mouse
slider-VAS no yes mouse
text yes no mouse, keyboard
dropdown yes no mouse
Table 11.1: Dierent properties of all input controls
11.2.8 Technical Preconditions
Because some of the controls only worked when a certain technology was enabled within the
clients browser, these prerequisites were checked before starting the questionnaire and according
to this information, the control was assigned (e.g. when Javascript or Java was disabled, only
controls were assigned, which did not depend on these technologies, thus controls were assigned
randomly until one with all technical preconditions was assigned). Table 11.2 shows the technical
preconditions for all controls:
control interaction mechanism precondition(s)
radio one click Javascript
button one click Javascript
click-VAS one click Javascript
slider-VAS click to activate and slide Java, Javascript
text one click or tab key pressed; enter text none
dropdown two clicks (initiating and selecting) mouse-move none
Table 11.2: Technical preconditions and interaction mechanisms for all input controls
11.3 Specic Experiments
Three independent studies were accomplished. An overview of each surveys experimental vari-
ables is given in tables 11.3, 11.4 and 11.5 (the dierence between the two question types is
simply that an interval matrix additionally has a subquestion (or statement) on the left side of
each rating scale). The wording for all three surveys was in German, so in later chapters, when
concrete questions are referenced, a translation was provided.
Here, a short overview of the three studies is given (the short names will be used when describing
the results of the experiments):
1. tourism: the most important study for this thesis was a research project conducted at
Salzburg University dealing with students attitudes towards alternate tourism. As recruitment,
all students of Salzburg University received an invitation letter. Additionally, a link was placed
82
11 Description of the Experiments
on the University Innsbruck webpage to also reach Students from Innsbruck
17
. This is the only
survey which gave some sort of incentive: at the end of the questionnaire, an individual trav-
ellers type was implied from the answers given.
10 questions batteries with an overall 72 sub questions were available. In the following, the
positions and types of the questions were listed, together with the number of sub questions for
each battery:
position question type no. of sub questions
6 interval matrix 7
9 interval matrix 12
12 interval matrix 6
13 interval matrix 6
15 semantic dierential 12
16 interval matrix 11
17 interval matrix 6
36 semantic dierential 4
42 semantic dierential 4
43 semantic dierential 3
Table 11.3: Experimental variables - tourism
2. webpage: after the re-launch of the Innsbruck University Web page, a survey to receive
feedback about the new design and functionality was conducted. All students, employees and
alumni of Innsbruck University were invited to participate via e-mail. The survey consisted of
12 scale questions (which means a total of 32 sub questions). This survey was only available
online for a short period of time. Subsequently an overview of the positions and number of
subquestion of these questions is given:
position question type no. of sub questions
2 interval matrix 1
3 interval matrix 4
4 interval matrix 1
5 interval matrix 5
6 semantic dierential 3
8 interval matrix 1
10 interval matrix 1
12 interval matrix 8
14 interval matrix 1
16 interval matrix 3
24 interval matrix 1
36 semantic dierential 3
Table 11.4: Experimental variables - webpage
17
See the concrete distributions in the demographic description section 15
83
11 Description of the Experiments
3. snowboard: the third survey is a diploma thesis written at Innsbruck University. The
topic was Market research within the eld of snowboarding. A dierent recruitment strategy
was chosen here, namely several postings were placed in relevant snowboard forums. Again, an
overview of all questions under control with the number of sub questions is given in the table
below (15 questions with a total of 79 sub questions):
position question type no. of sub questions
10 semantic dierential 19
11 semantic dierential 19
14 interval matrix 8
21 semantic dierential 6
22 semantic dierential 2
23 semantic dierential 4
24 semantic dierential 4
25 semantic dierential 2
26 semantic dierential 3
27 semantic dierential 3
28 semantic dierential 2
29 semantic dierential 2
31 semantic dierential 1
35 semantic dierential 1
42 semantic dierential 3
Table 11.5: Experimental variables - snowboard
Within this survey, an additional experiment was run with dierent styles for the ranking of
closedended alternatives, whereby six dierent controls for ranking alternatives were imple-
mented. Because the number of respondents was too low to nd dierences between controls,
this experiment will be repeated in a survey with more participants and the results will not be
reported within this thesis.
Invitation letter In the case of the rst two surveys, an invitation letter was sent which con-
tained a PIN-number enabling all recipients to enter the survey and as an access control to
prevent uninvited respondents from taking part in the survey. The PIN had to be entered on
a login page. There was an idea to integrate the PIN directly as a parameter into the url, so
no manual entering would have been necessary and the respondent would directly come to the
start page of the questionnaire, but this solution had the drawback of increasing the length of
the URL, which could, when line breaks become necessary within the invitation letter, cause
problems with some e-mail clients. Heerwegh & Loosveldt (2002c) found out in an experiment,
that manual login procedures did not decrease response rates, and as a positive eect, increased
the overall degree of data quality (lower dropout rates) which led to the decision to use the
manual login procedure. The PIN login mode does not prevent respondents from lling out the
questionnaire multiple times, but if each respondent would have his/her own PIN, anonymity
could not be assured anymore, which was very important for the rst two surveys. The letter
simply contained a short description, no personal salutation was used. HTML was used instead
of plain text to integrate the link on the questionnaire (respondents just had to click on the link
to go to the login page of the questionnaire).
84
12 Research Questions
The main hypotheses will strongly focus on VAS. The main research questions discussed in these
experiments are as follows:
Dropout Concerning completion rate and breakos, the hypothesis is that the higher the com-
plexity of an input control, the higher the dropout will be. This was e.g. observed by Couper
et al. (2006). Complexity in this case refers to the number of actions necessary to answer the
question and also the completion time needed for answering.
Another inuence factor is how well-known the input controls are for the respondents. Con-
cretely, this would mean that radio and dropdown would have lower dropout than the other
controls, because these are standard HTML input controls.
Concerning VAS, contradictory ndings were reported in the summary in section 4.1. It seems
that results strongly depend on the type of VAS used in the experiments. Experiments with
VAS which are more similar to click-VAS by trend resulted in lower dropout rates. In contrast,
the experiments where VAS similar to slider-VAS were used, higher dropout rates were reported
compared to the other scales.
Response Distributions Three main important questions should be treated in these experi-
ments:
(1) Is any category on the scale preferred in one control group compared to the others? In
particular, it should be checked if there are any dierences in the selection of extreme values
(minimum, maximum) and the midpoint when comparing the dierent scales.
(2) Are there any general dierences when comparing the means of the scales? Experiments
dealing with these eects are summarized in section 4.2, where a slight tendency for achieving
lower values for VAS can be observed.
(3) Does the numeric feedback have any inuence on the response behavior? Some numeric
feedback is given for text and dropdown, because numbers are shown on the drawn scale. The
question is if the distribution of these two are dierent compared to the other input control
types.
Response Time / Completion Time Can any dierences be found between the dierent con-
trols regarding time needed for lling out the questions and dropout? The general hypothesis
is that on average scales which involve fewer steps to arrive at an answer also led to shorter
processing times. The most numbers of steps necessary for answering are dropdown and text,
which means it should be shown that the use of one of these two input controls result in higher
85
12 Research Questions
response times compared to the other controls. This was also reported in similar experiments.
Additionally to the general trend to avoid dropdown boxes, e.g. Healey (2007) conducted that
dropdown menus in general led to longer response times. It should be shown if this eect can
be reproduced.
The focus is particularly on response times when using VAS compared to the other controls.
Section 4.8 showed contradictory ndings in regard to completion times, which results from
dierent denitions of VAS in the experiments conducted. An interesting question is if the
two VAS, which are technically, visually, and sensationally relatively dierent, behave similarly
regarding their response times..
Learning Eect Another hypothesis is that, when custom controls
1
are used, it takes very long
to nish the question for the rst time, but relative completion time decreases from question to
question due to a learning eect.
Categorization/Transformation click-VAS and slider-VAS both use dierent numbers as scale
marks compared to the 4 other controls. To enable comparability, these values have to be trans-
formed. Funke & Reips (2006)
2
compared two possible ways of transformation: linear transfor-
mation (equal intervals form one category) and transformation with reduced extremes (intervals
at the extreme points are smaller than the other intervals). They found that transformation
with reduced extremes led to higher correspondence with the scale marks used in radio buttons.
This experiment will be carried out again to see whether these ndings could be reproduced.
Feedback Questions The feedback questions
3
asked at the end of the questionnaire should give
an impression of how the interviewee feels about the input control assigned concerning usability,
attractiveness and sucient number of scale points. Two experiments reported by Walston et al.
(2006, p.285 ) and van Schaik & Ling (2007, p.18f), which both had similar questions found
that the use of VAS were not rated highly by respondents. Concerning interestingness, it is
expected that common HTML controls get a score nearer to the boring-pole, but these controls
are also expected to gain higher results in regard to the ease of usage.
Usage of Java Technology It will be evaluated if Java controls are generally suitable for on-
line questionnaires, despite the drawback of long initial loading times. Concerning the Java
version of the VAS (slider-VAS), previous studies such as Couper et al. (2006) reported tech-
nical problems using this type of slider. It will be checked whether these technical diculties
can also be observed and minimized as necessary. Experiments carried out for this thesis had
very detailed paradata tracking in order to examined if technical problems occurred more often
when using a certain operating system or Web browser. This point is strongly related to dropout.
In an experiment conducted by Couper et al. (2006) it was necessary to activate the VAS via
clicking. A dierent strategy was chosen for slider-VAS within these experiments, and this thesis
aims at checking whether displaying the slider without the necessity of clicking had any eect.
1
Which means controls not written in plain HTML
2
And also in Funke (2004) and Funke (2003)
3
See section 16 for a detailed description of the feedback questions.
86
13 Overall Response
In table 13.1, information about the overall response for all three surveys is provided, whereby
the number of respondents for each experiment is shown together with the number and pro-
portion of those, who at least lled out one question. Generally, the portion of respondents
who completely lled out the questionnaire is relatively low (53.11% on average). The number
of overall participants in the snowboard survey is also low (402 participants). Thats why this
survey will partially be excluded from analysis.
tourism webpage snowboard
Participation in general 1262 1538 402
Lurkers (11.73%) 148 (4.81%) 74 (26.62%) 107
At least one question (88.27%) 1114 (95.19%) 1464 (73.38%) 295
completed (60.30%) 761 (55.27%) 850 (43.78%) 176
Table 13.1: Overall response for all 3 surveys
13.1 Overall Input Control Distribution
Due to technical problems when starting the webpage survey, input controls are not equally
distributed. See tables 13.2, 13.3 and 13.4 for the input control distribution for all 3 surveys
including those who nished the survey and those who did not. Another technical problem
may have caused an inappropriately high number of break os for the slider-VAS within the
tourism survey. Unfortunately, it could not be determined if those who quit the questionnaire
had technical diculties or these respondents simply did not know how to use the control.
overall portion not completed completed
radio 15.89% 40 (22.60%) 137
button 17.41% 50 (25.77%) 144
click-VAS 14.36% 31 (19.38%) 129
slider-VAS 21.01% 127 (54.27%) 107
text 14.72% 55 (33.54%) 109
dropdown 16.61% 50 (27.03%) 135
Table 13.2: Input control distribution for the tourism survey
87
13 Overall Response
overall portion not completed completed
radio 8.67% 35 (27.56%) 92
button 8.40% 33 (26.83%) 90
click-VAS 9.08% 42 (31.85%) 91
slider-VAS 9.36% 61 (44.53%) 76
text 30.81% 237 (52.55%) 214
dropdown 33.67% 206 (41.78%) 287
Table 13.3: Input control distribution for the webpage survey
overall portion not completed completed
radio 15.93% 15 (31.91%) 32
button 20.34% 18 (30.00%) 42
click-VAS 17.97% 14 (26.42%) 39
slider-VAS 19.66% 37 (63.79%) 21
text 8.81% 16 (61.54%) 10
dropdown 17.29% 19 (37.25%) 32
Table 13.4: Input control distribution for the snowboard survey
88
14 Paradata and Additional Information
When conducting online surveys, a lot of additional information, apart from the answers given
to the questions, can be collected about the lling out process itself. This information is called
paradata. First some thoughts on which kind of information can be retrieved for further anal-
ysis from questioning. Even if the user takes part quasi anonymously, although some technical
parameters can be retrieved automatically. All of these parameters were tracked when running
the concrete experiments; some of them could only be read when Javascript was enabled within
the clients browser. Heerwegh (2003, p.361) notes one drawback of client sided paradata col-
lection: One drawback of client-sided paradata is that they will be lost if the respondent does
not submit the Web page. This problem was solved by employing AJAX-technology: when the
page containing the question was loaded, client sided collected paradata was sent directly to the
server and not together with the other form data containing the answers of the questions.
Subsequently an overview of collected paradata is given:
The kind of Web browser used: this information was tracked on the server side by reading
out information from the HTTP-header. Numerous dierent browsers are available; here
only the most commonly used ones were classied (IE6, IE7, Mozilla, Safari and Opera)
The Web browser settings currently in use: e.g., it was checked whether Javascript, Java
and Cookies were enabled.
The operating system used: again, this information was extracted out of the HTTP-header
on the server side.
The screen resolution used: the resolution can be implied from the window size of the
browser if maximized, which is read via Javascript within the clients browser.
(geographical) Pinpointing using the users IP-address. From this it is possible to derive
whether the respondent lled out the questionnaire at home, in a public venue (like libraries
or internet cafe) or at work.
Duration: how long did respondents take to ll out the questions? To exactly measure the
time between the respondent receiving a questions and a response being sent, client sided
time tracking was implemented with Javascript
1
.
Referrer: the referrer holds the page where respondents come from. This information is
only meaningful if links were placed on dierent locations within the Web
2
.
Theoretically, much more information could be tracked, such as mouse moves over the Web page,
previously written and deleted text in text input elds, previously selected radio buttons,... ,
but this information was not primarily of interest when running the experiments described.
1
The software used for these experiments can be downloaded fromhttp://survey4all.org. For similar approaches
concerning time tracking, see Heerwegh (2002), Heerwegh (2003) or Heerwegh (2004a)
2
As was the case for the snowboard survey
89
14 Paradata and Additional Information
14.1 Overall Operating System Distribution
One of the most time-intensive-task when generating Web pages in general is cross browser
compatibility. To make sure that no side eects concerning browser and operating system usage
occur, these variables have also been taken into consideration. This should indicate whether
there are more (particularly initial) break os when a certain browser or operating system was
used. Table 14.1 shows the distribution of these variables for all 3 surveys. As expected, the vast
majority of the respondents used Windows as operating system, followed by a Mac operating
system (like OS X). Only a few used Linux.
tourism webpage snowboard
not completed completed not completed completed not completed completed
Windows 337 728 564 799 107 163
Mac 15 26 29 24 11 10
Linux 1 7 21 27 1 3
Table 14.1: Use of operating systems for all three surveys
Concerning the distribution of the operating systems, no noticeable problems could be observed.
More detailed research on this topic would be possible (e.g. which Windows versions or Linux
distributions were used), but this is not of further interest in this work.
14.2 Overall Browser Agents Distribution
An overview of the most common browser agents used by the respondents in the three surveys
is given in table 14.2:
tourism webpage snowboard
not completed completed not completed completed not completed completed
IE 6 124 302 190 310 25 33
IE 7 83 156 115 176 28 31
Firefox 108 279 216 280 51 101
Opera 21 2 30 3 9 0
Safari 8 17 18 14 4 8
others 9 5 45 67 2 3
Table 14.2: Use of browser agents for all three surveys
As can be seen for all three surveys, it seems that there were technical problems with the Opera
browser. Opera users had problems when loading the new page containing the next question
after pressing the next button, because the page with the already answered question was cached
and not newly loaded since the URL did not change
3
. The completion rate is signicantly lower
than with other browsers. This is awkward, but, because of a low proportion of respondents
using Opera, it is not critical for further analysis.
3
This behavior is reported in several forum entries and solved for the actual version of QSYS
90
14 Paradata and Additional Information
14.3 Overall Screen Resolution Distribution
In addition respondents screen resolutions were extracted. The vast majority of all participants
used 1024x768, 1280x800 and 1280x1024. No noticeable problems were found in any of the three
surveys. Table 14.3 shows the most common resolutions for all three surveys. Unfortunately,
for the tourism survey, technical problems prevented tracking of such paradata for about half of
the respondents.
tourism webpage snowboard
not completed completed not completed completed not completed completed
1024x768 149 365 84 169 35 78
1280x800 66 175 37 49 17 23
1280x1024 76 117 72 140 36 38
Table 14.3: Screen resolutions as used by the respondents for all three surveys
14.4 Overall Distribution of Additional Browser Settings
The technical parameters of the clients machines were also tracked similarly as described in
section 14.3. This information was read with Javascript and ran within the clients browser. It
should be mentioned, that when reading out the ag Java enabled within the clients browser,
this did not automatically mean that Java worked properly on the clients machine. A better
way to determine the correct functioning of Java is to run a small invisible Applet within the
Web page, which supplies information on whether initialization was successful or not
4
.
In the tourism survey 1.3% did not have Cookies enabled, 2.7% disabled Java (or the machine
had no Java installation) and 1.3% turned o Javascript. The snowboard survey had higher
rates of technical preconditions disabled: 2,3% with no cookies, 3% with no Java and 2,3% with
no Javascript. These are higher rates concerning Java compared to other online surveys such as
e.g. reported by Couper et al. (2006), where 1,7% did not have Java enabled. Table 14.4 shows
all additional browser settings for all three surveys at a glance
5
.
tourism webpage snowboard
technical condition n.c. c n.c. c n.c. c
false 7 8 4 3
cookies enabled
true 346 753 235 420 115 173
false 14 17 4 5
java enabled
true 339 744 218 406 115 171
false 7 8 4 3
Javascript enabled
true 346 753 236 420 115 173
Table 14.4: Distribution of browser settings for all three surveys (c. = completed; n.c. = not
completed)
4
This strategy is implemented in the current version of QSYS
5
Again, the temporary technical problems the webpage survey had, set all technical conditions to false, thats
why for this survey this information is not available.
91
14 Paradata and Additional Information
14.5 Summary and Conclusion
These gures indicate the heterogeneity of the used combinations of browsers and browser set-
tings, together with the dierent operating systems. Except for troubles with the Opera Web
browser, no technical problems can be implied from the frequency tables given above. Only few
turned o Javascript and Java in their browsers, but even these cases were not lost for further
analysis because were assigned dierent input controls which work without these technologies.
Only few turned cookies o within their browsers. Cookies are used to store a session variable,
which is needed to assign the answers given to a certain question. Disabling had no inuence
on the proper functionality of the survey (because URL-rewriting
6
was enabled). Paradata
tracking in general is an important advantage of online surveys compared to paper and pencil
questionnaires. It would be possible to extend the tracking abilities of the software for further
experiments.
6
When using URL-rewriting, the session id was encoded within the URL for each request
92
15 Demographic Information
Dierent demographic information was collected for all 3 surveys. Unfortunately, in all sur-
veys, this information was only collected at the end, so it is not available for respondents who
dropped out before. It is true that the aim of the experiments was not to make statements
about e.g. students of Salzburg University in general, but it would make sense to know about
the demographic structure of the sample used for this evaluation (e.g. young people may behave
dierently when working with a VAS). For example, Stern et al. (2007) mentions that ndings
concerning the visual eects are limited to their samples. Since many existing studies have used
random samples of college students
1
, it would make sense to take a look at dierences between
the students and the employees responses within the tourism survey.
Concerning all results presented in this chapter, the number of participants in general diers
between tables due to dropout. In other studies, dropouts were erased from the dataset. Data
cleaning was not carried out for the experiments in this thesis to increase the number of cases.
15.1 Tourism Survey
Not only students from Salzburg University (n=678) participated in the survey, but also a small
number from Innsbruck University (n=88). Faculty and eld of study were also asked, but these
variables were not included in the analysis here.
About 80% of all respondents specied Austria as their country of origin. Of those who had
other countries of origin besides Austria, 116 came from Germany and 19 from Italy. About
75% of all respondents were female. The mean value of age was 24.34 and for semester 6.42. To
see the distribution of both, take a look at the boxplots in gures15.1 and 15.2.
1
See chapter 4
93
15 Demographic Information
Figure 15.1: Age distribution of respon-
dents - tourism survey
Figure 15.2: Semester distribution of re-
spondents - tourism survey
About 85% lled out the questionnaire at work (here it is not clear if students answered at work
when at the university because university was not available as an answer option).
Another interesting consideration was the interviewees mood during the course of the lling out
process. The respondents were questioned in regard to their mood at the beginning and at the
end of the questionnaire and answered in the form of 5 smileys showed in a list. A screenshot
of this question is given in gure 15.3
2
.
Figure 15.3: Smileys used for indicating mood of interviewees - tourism survey
2
For more information on research concerned with the inuence of mood as a possible determinant of respon-
dents behavior refer to Bachleitner & Weichbold (2007)
94
15 Demographic Information
These variables could also be combined with the experiments scale variables. In table 15.1, the
absolute change of mood from beginning to end of the questionnaire for each scale control is
given (1 means that mood improved one step, -1 means that mood degraded one step):
controls -4 -3 -2 -1 0 1 2
radio 0.00 0.00 0.13 1.31 14.06 2.23 0.26
button 0.00 0.00 0.13 1.58 13.67 3.29 0.26
click-VAS 0.00 0.00 0.26 1.58 12.75 2.37 0.00
slider-VAS 0.00 0.00 0.00 1.31 10.12 2.10 0.53
text 0.13 0.00 0.13 1.18 11.04 1.71 0.13
dropdown 0.00 0.13 0.00 1.97 12.88 2.37 0.39
Table 15.1: Overall portion of Mood changes for all controls - tourism survey
No signicant dierences could be found concerning mood changes when comparing the dierent
input controls.
15.2 Webpage Survey
In this survey both students and employees were invited to participate. Table 15.2 provides an
overview of the distribution of respondents relation to the university (e.g. student, employee,
alumni). 96 respondents were both students as well as employees. 30% of the respondents were
employees and 65% were students.
relation nished
employee 299
student 636
partner 4
external 7
alumni 28
Table 15.2: Respondents relation to the university (multiple answers possible) - webpage survey
Additionally the distribution of demographic information (age and gender) for the webpage
survey is shown in tables 15.3 and 15.4. From those who nished the survey, a higher rate of
male respondents can be found in the employees category compared to students (57% vs. 49%).
Furthermore, employees are signicantly older than the students are.
gender nished employees students
male 424 172 310
female 421 132 327
Table 15.3: Gender distribution - webpage survey
95
15 Demographic Information
Figure 15.4: Age distribution of respondents - snowboard survey
age nished employees students
under 18 4 3 2
18-25 441 49 424
26-30 174 76 134
31-40 130 92 62
41-50 51 47 4
51-60 26 24 2
over 60 15 9 6
Table 15.4: Age distribution - webpage survey
15.3 Snowboard Survey
Only a few demographic questions were asked, namely gender, origin and age: 129 were male
and 47 female. 130 participants were from Germany, 32 from Austria and 7 from Switzerland.
The age distribution of the respondents can be seen in gure 15.4. The mean value of age in
this survey is 22.33.
96
15 Demographic Information
15.4 Dierences in Demographic Distributions Across Browser
and OS-Versions
According to Funke & Reips (2005), there are dierences in the demographic distributions of the
respondents according to the Web browser used. For example, a higher percentage of woman use
Internet Explorer compared to alternate browsers (75% versus 44%). Furthermore, it was found
out that the people using alternate browsers had a higher level of education. Even response
times are dierent when comparing Internet Explorer and alternate browsers.
The most clear dierence in all three surveys concerning demographic dierences between cer-
tain browser users is the dierence between men and women. As reported in Funke & Reips
(2005), a far higher percentage of Internet Explorer users are female compared to those who
use an alternative browser (tourism: 82.43% vs. 64.80%; webpage: 57.43% vs. 37.28%; in both
surveys ndings were highly signicant). A similar eect was found when comparing the users
of operating systems. Women used Windows more frequently than any other operating system:
tourism: 77.19% (Windows), 42.31% (Mac), 14.29% (Linux); webpage: 50.95% (Windows),
40.00% (Mac), 25.00% (Linux). Concerning age, no signicant dierences in regard to browser
and operating system usage was found.
15.5 Summary and Conclusion
In all three surveys, the majority of the participants were students. However in the webpage
survey employees and alumni of the university (65% of the respondents of this surveys were stu-
dents) also participated. This has the consequence that respondents were relatively young (the
majority is between 20 and 25 years). Respondents were equally distributed concerning gender,
except in the snowboard-survey, where the majority were male (73%). In the tourism-survey,
the current mood of the respondents was asked at the beginning and the end of the survey.
It was seen, that the input control types did not have any inuence on mood changes. When
comparing demographic dierences in the use of browsers and operating systems, it was found
that female respondents had a higher tendency to use Internet Explorer and Windows.
It is always important to take a look at the demographic distribution of the respondents in
order to be able to know for which population the results of the experiments are valid and
applicable. Because the majority of the respondents are relatively young and strongly related
to the university, it would be interesting to repeat the experiments with older people.
97
16 Feedback Questions / Subjective
Evaluation
At the end of each survey, respondents were asked to answer three feedback questions so as
to get a direct evaluation of the dierent input controls. Because respondents did not know
about the experiments, the input controls were not explicitly mentioned in the questions to be
evaluated, but referred to the appearance of the questionnaire as a whole. The strategy was to
get a more general impression and not a rational evaluation of the scale control elements. The
question wording was slightly dierent in all 3 surveys (translated from German here), which
complicated comparison. Generally, no signicant dierences between the scale controls could
be found when taking a look at the snowboard survey; for this reason, no results are reported for
answers to this questionnaire. To answer these questions, the assigned input control was used.
To enable comparison to those controls with more than 10 scale points, linear transformation
was applied.
16.1 Boring vs. Interesting
The idea behind this question was to get an impression how innovative and interesting a certain
control is rated. The concrete wording was the following (translated):
tourism, snowboard: how would you evaluate the online questionnaire itself ? Unfortu-
nately the formulation of this question was not very precise, so maybe some respondents
misinterpreted the meaning of the question (possibly some could have meant the content
of the questionnaire, especially for the rst sub question).
webpage: how did you like the design and technical realization of the online questionnaire
compared to other online questionnaires? This question refers to the input controls and
therefore more clear eects were expected for this survey.
Within the webpage survey, when asking how interesting the design of the questionnaire was,
the following conspicuities can be found: dropdown and text control are more often classied
as boring than other controls, followed by the radio control. click-VAS, slider-VAS and button
controls are more likely to be rated as interesting. This means, that questionnaires with custom
controls are more interesting to the respondents than those with well known standard controls.
This nding is highly signicant and was veried with a Kruskal-Wallis test applied on pairwise
comparisons. Here the dierences in mean values for the input controls are presented and
compared with the overall mean in descending order: click-VAS 0.99, slider-VAS 0.94, button
0.61, radio -0.15, text -0.37 und dropdown -0.42. Highly signicant dierences (p < 0.001) were
observed between controls click-VAS and slider-VAS vs. radio, text and dropdown. Highly
signicant dierences could be found between button vs. text and button vs. dropdown. The
same eect was found in the tourism survey, even though the eect was not as clear-cut as in the
98
16 Feedback Questions / Subjective Evaluation
Figure 16.1: Comparison of results of feedback question 1 (boring=1 vs. interesting=10) - web-
page survey
webpage survey
1
. Here button and click-VAS controls have a higher interest-rate. Subsequently
the dierences of overall mean is given in ascending order: button 0.42, click-VAS 0.19, slider-
VAS 0.00, radio 0.00, text 0.04 and dropdown -0.60. For a graphical illustration of these results,
take a look at gures 16.1, 16.2 and 16.3.
16.2 Sucient vs. Non Sucient Number of Scale Intervals
To ask the respondents, if they feel that the number of scale intervals presented (dependent
on each input control) were sucient or not, the following two anchor points were presented
whereby the respondent had to give an evaluation between these extreme points (translated):
scaling of the rating questions is sucient vs. scaling of the rating questions is not sucient.
Interestingly, no tendency could be observed that controls with more than 10 intervals (namely
click-VAS and slider-VAS) were evaluated as having sucient intervals. Nevertheless, the overall
dierences were signicant (Kruskal-Wallis rank sum test;Kruskal-Wallis
2
=22.42, p<0.001).
The detailed results indicate that for those controls, where the scale points could be seen on
the rst view (either through labeling or because the spaces between the items allowed counting
them easily), the respondents tended to answer that these scale points were sucient. Com-
pared to overall mean, dropdown was nearest to the sucient-extreme with -0.30 (tourism) and
-0.32 (webpage) on the other extreme click-VAS (with 0.29 (tourism) and 0.72 (webpage)) and
slider-VAS (with 0.10 (tourism) and 0.39 (webpage)
2
).
1
Which is a possible result of the imprecise wording
2
Dierences between these extremes are highly signicant
99
16 Feedback Questions / Subjective Evaluation
Figure 16.2: Comparison of results of feedback question 1 (boring=1 vs. interesting=10) -
tourism survey
Figure 16.3: Line diagram comparing results of feedback question 1 (boring=1 vs. interest-
ing=10) for all three surveys
100
16 Feedback Questions / Subjective Evaluation
Figure 16.4: Comparison of results of feedback question 2 (sucient=1 vs. insucient=10 num-
ber of scale points) - webpage survey
The reason why those who had the dropdown input control assigned were the ones who gave the
best score for having sucient number of intervals probably has to do with the way intervals
were selected. Possibly the respondent thought that it would not be feasible to put more than 10
intervals into the dropdown box. Concerning the results of click-VAS and slider-VAS, it is hard
to determine where the origins of this phenomenon lies, but these are the only two scale types
where the number of scale points could not be found easily, because no feedback was given and
no item counting was possible, so it was possibly not obvious to the respondent which scaling
was meant, because the control appeared as a continuum. For a graphical illustration of these
results, take a look at gures 16.4, 16.5 and 16.6.
16.3 Easy to Use vs. Complicated
The best evaluation regarding ease of use was given from those who were assigned the radio
input, followed by click-VAS and button. Here the dierences of the overall mean (webpage:
2.70; tourism: 1.91) are given: -0.65, -0.65, 0.53 (webapge); -0.29, -0.14 and -0.06 (tourism).
Those who got the worst evaluation regarding ease of use were slider-VAS and text input control,
followed by dropdown. Overall dierences between input controls were highly signicant. For the
tourism survey, no signicant dierences could be observed, most likely because of bad question
wording. For a graphical illustration of these results, take a look at gures 16.7, 16.8 and 16.9.
These results correlated with the necessary steps involved in lling out a question and a certain
input control (see table 11.2).
101
16 Feedback Questions / Subjective Evaluation
Figure 16.5: Comparison of results of feedback question 2 (sucient=1 vs. insucient=10 num-
ber of scale points) - tourism survey
Figure 16.6: Line diagram comparing results of feedback question 2 (sucient=1 vs. insu-
cient=10 number of scale points) for all three surveys
102
16 Feedback Questions / Subjective Evaluation
Figure 16.7: Comparison of results of feedback question 3 (easy=1 to use vs. complicated=10)
- webpage survey
Figure 16.8: Comparison of results of feedback question 3 (easy=1 to use vs. complicated=10)
- tourism survey
103
16 Feedback Questions / Subjective Evaluation
Figure 16.9: Line diagram comparing results of feedback question 3 (easy=1 to use vs. compli-
cated=10) for all three surveys
16.4 Summary and Conclusion
Custom controls are, as expected, more likely to be rated as interesting than standard HTML
controls. When asking whether the input controls oered sucient or non-sucient number of
scale points, no consistent tendency was found. The problem here possibly lies in inaccurate
wording of the question, so the respondents did not exactly know what was meant with the
question. It would also be possible to repeat the experiments with clearer statements, possibly
even mention the intention of the experiments at the end of the questionnaire when the input
controls should be evaluated.
The third question aimed at evaluating the ease with which the dierent input controls could
be used. Best results were achieved by radio, click-VAS and button. This result corresponded
with the steps required to give the answer, and therefore was not surprising.
Controls such as dropdown and text were rated badly in both feedback question one and three.
Therefore, input controls of this kind should be avoided if possible. The only meaningful ap-
plication for these two input controls would be scales with many (labelled) items on the scale,
which was not part of this thesis.
104
17 Response Time / Completion Time
In this section, a comparison of the controls in regard to completion time is given. The response
time or completion time is available for each question and was tracked on the client side. The
time needed for lling out is an important factor, because it gives evidence of burden which
could lead to dropout. Time values could only be tracked when Javascript was enabled in the
clients browser.
17.1 Standardization
Unfortunately, for questions containing multiple sub questions, the time needed to ll out a single
sub question could not be determined. For this reason, lling response time standardized by
the number of sub questions + 1
1
. This standardization enabled better comparability between
questions, but it must be mentioned that this approach was just an approximation of the real
values. The real values were very hard to determine because each question was dierent (e.g.
dierent length of the question text, some questions needed more time to think about,...). In
the following, when talking about a question, a subquestion is meant.
17.2 Outlier Detection
As a rst step when analyzing the durations needed for questions under experiment, outliers
have to be detected and eliminated (or weighted). See tables 17.1, 17.2, 17.3 for some statistical
parameters which refer to all respondents (outliers were included; all questions were standardized
by the number of sub questions + 1):
position mean median sd min max
6 6.89 5.61 7.29 0.31 160.12
9 5.35 4.39 4.94 0.18 76.40
12 8.70 6.96 10.72 0.15 239.88
13 9.01 7.54 7.60 0.15 138.86
15 8.18 6.99 6.86 0.16 117.26
16 5.26 4.42 4.99 0.08 110.21
17 5.87 4.79 6.13 0.15 98.88
36 8.21 6.72 7.58 1.36 122.35
42 6.77 5.93 6.98 1.72 123.72
43 6.27 5.53 3.21 0.57 35.69
Table 17.1: Basic parameters concerning duration - tourism survey (in seconds)
1
Which stands for the question text of the whole question itself.
105
17 Response Time / Completion Time
In the tourism and webpage surveys, respondents were detected who had response times for the
experimental questions that were below 0.1 seconds
2
, which was not realistic. This could arise
from technical diculties or from respondents use of an automated form lling function, as e.g.
is oered by the Web developer plug-in for Firefox, which is the most reasonable explanation,
because equal ll out patterns were found for these cases. These respondents were removed from
further analysis.
position mean median sd min max
2 16.27 11.71 38.18 2.52 1064.42
3 8.05 6.44 10.20 0.36 227.25
4 13.61 10.08 20.18 0.44 515.27
5 8.42 6.92 9.32 0.94 184.15
6 10.36 7.61 17.47 1.02 476.78
8 13.21 9.76 29.53 1.71 820.66
10 12.42 8.03 48.35 1.53 1150.43
12 11.46 7.33 13.16 0.26 171.08
14 15.91 11.88 14.38 1.67 165.61
16 9.73 6.72 13.48 1.12 223.97
24 17.61 10.85 26.87 2.98 362.89
30 8.44 7.28 8.38 0.35 147.81
Table 17.2: Basic parameters concerning duration - webpage survey (in seconds)
position mean median sd min max
10 6.55 5.17 5.66 1.29 54.63
11 4.79 3.89 3.35 0.82 27.51
14 4.39 3.95 2.03 1.66 14.71
21 5.00 4.51 2.07 1.01 14.28
22 10.98 8.77 9.22 1.09 79.06
23 7.46 4.00 19.00 1.01 246.70
24 5.29 4.57 3.20 1.08 25.32
25 6.51 5.86 3.27 1.08 27.95
26 4.48 4.04 1.92 0.78 16.45
27 4.69 3.71 4.01 1.06 31.46
28 7.43 6.09 5.16 1.20 47.53
29 10.15 5.65 46.91 1.11 626.42
31 10.20 8.99 4.94 2.39 32.41
35 5.82 5.32 2.45 2.60 14.55
42 6.36 5.99 2.60 0.64 22.93
Table 17.3: Basic parameters concerning duration - snowboard survey (in seconds)
Outliers could occur as a result of e.g. respondents leaving the browser window with the survey
open, but turning to another task and continuing later. Also those who just clicked through
2
For details, take a look at the minimum values in table 17.1, 17.2 and 17.3
106
17 Response Time / Completion Time
the survey without seriously answering the questions should be excluded, which means outlier
detection should be carried out on both sides, those who took too long and those who did not
take long enough. In literature, dierent strategies of excluding outliers before completing time
comparisons can be found:
1. Absolute time: when lling out questions takes longer or shorter than an absolute time
border given (e.g. longer than 1 hour and shorter than 5 minutes for the whole question-
naire). This strategy was chosen e.g. for experiments by Heerwegh & Loosveldt (2002a),
Healey (2007), Tourangeau et al. (2007) and Tourangeau et al. (2004, p.382), who gave
their experiments an absolute time limit for lling out process of the entire questionnaire.
2. Relative time: several questions are excluded when they take the same respondent much
longer to ll out than other questions (also the number of sub questions or even the number
of characters per sub question text or the level of diculty of the question can be taken into
consideration), e.g. Couper et al. (2006, p.242): On a particular question, if a respondent
spent more than six times the average time he or she spent on the other questions, this
response time was removed from the analysis. The major problem for this strategy is that
in some cases it is hard to compare questions with each other concerning their expected
lling out time.
3. Compare to the time needed by other respondents and exclude those e.g. for Heerwegh
(2002, p.15). Outliers at the item level were dened as response times smaller than mean
minus 2 times standard deviation, or greater than mean plus 2 times standard deviation.
4. All observations falling above (or below) the 75
th
percentile (or 25
th
percentile) of the
time spent on any given screen or question 1.5 times the interquartile range of the time
spent on the screen (Crawford et al. (2001, p.161)).
Because questions from dierent experiments are hard to compare with each other, e.g. when
a dierent number of items exist with dierent question text length, strategy 2 may not be the
right choice. This is equally true for strategy 1: it is hard to dene upper and lower borders to
extract the outliers when questions dier in the criteria mentioned before.
Figure 17.1 provides an example of outlier detection when applying strategy 3 ( 2 * standard
deviation). The gure shows a density plot of a question from the tourism survey
3
together with
the detected outliers. There is no visible cut-o or gap between those identied as outliers and
non-outliers, the boundaries between them are constantly in a state of ux: as an average of all
questions, 16.3 outliers were removed from the tourism survey, 17.1 from webpage and 5.6 from
snowboard. As can be seen in this gure, there was even an outlier who completed the question
after about 1.5 minutes, in other questions there were responses for one block after more than
2 minutes. Another problem of this strategy became obvious: Because all time measures were
right skewed, no exclusion concerning those whose completion times were too short was carried
out. Those who only clicked through and marked answer alternatives at random, should also
be ltered out, which could easily be achieved with outlier detection on the left side of the
distribution. This is an advantage of Web based questionnaires as in other modes, this detection
would not be possible
4
. Secondly, the border chosen for the right side is relatively arbitrary.
3
Question number 17, again standardized by subquestion + 1
4
See also Funke & Reips (2007a, p.62f)
107
17 Response Time / Completion Time
Figure 17.1: Sample density plot of question number 17 with outliers - tourism survey
The extraction of the real outliers is an impossible task, because it can never be assessed whether
the respondent is currently busy with the questionnaire or e.g. with responding to an e-mail.
Even employing Javascript to check if there are simultaneous activities on the computer would
not give any clarication. It is only possible to track activities inside the browser; furthermore,
no computer activity, such as mouse moving, would not necessarily mean that the respondent is
not working with the questionnaire at the moment, e.g. during the phase of question comprehen-
sion
5
. To make sure that no technical diculties or other inuences aecting the controls were
responsible for the outliers, the distributions of the outliers concerning input controls were also
checked. When doing so, no noticeable problems were found. But it should be taken into con-
sideration that it is dangerous to exclude outliers in an automatic manner, e.g. Faraway (2005,
p.68f). As an alternative method when comparing durations between the dierent controls,
using the robust regression methods would be a good choice in this case, because estimators are
not so strongly aected by the outliers.
17.2.1 Robust Regression
Detecting outliers the usual way is a procedure which either accepts or rejects items. Robust
regression follows a dierent approach according to Rousseeuw & Leroy (2003, p.8): it is by
looking at the residuals from a robust (or resistant) regression that outliers may be identied,
which usually cannot be done by means of the LS (least squares) residuals. Therefore, diag-
nostics and robust regression really have the same goals, only in the opposite order: when using
5
See more about the phases of the response process in chapter 10.1
108
17 Response Time / Completion Time
diagnostic tools, one tries to delete the outliers and then to t the good data by least squares,
whereas a robust analysis rst wants to t a regression to the majority of the data and then
to discover the outliers as those points which possess large residuals from that robust solution.
When using robust regression, weights are determined for each item which can be used as a
factor for further analysis. The most common function for performing robust tting of linear
models in R is applying the rlm-function, whereby tting is done using an M estimator with
the Huber method. This method is well described in Faraway (2005, p.98). Fitting is done
by iterated re-weighted least squares (IWLS). For weighting, all time measures are taken into
consideration, as well as the dependent variable, which can be seen as a benet of this procedure.
As a demonstrative example, take a look at gure 17.2, which presents a sample question battery
taken from the middle section of the tourism survey. Colored marks show the weights of the
dierent input controls, and it can be seen that the weights are calculated dierently between
controls. It can also be seen that there is no direct association from duration to weight. The
weights in this example are only based on a single regression model with the logarithm of the
duration of only one question and the control type.
Figure 17.2: Density plot with weights of question number 17 - tourism survey
To quantify the dierences, the results of the comparison of the durations for this question after
removing the outliers
6
can be seen in gure 17.3 and table 17.4. Although many outliers were
removed, dierences between the controls concerning completion time are relatively clear. The
patterns which can be extracted from these two sources can also be found within the other
questions. When comparing to the overall mean (5.034 seconds) and median (4.71 seconds) for
this question three groups can be found:
6
All weights smaller than 0.5 were eliminated, which were 38 items in this case
109
17 Response Time / Completion Time
1. text and dropdown are more than 0.4 seconds slower than average.
2. slider-VAS lie in the middle close to the mean value.
3. button and radio are faster than average. The fastest control for this question was click-
VAS, but this control was added to this group because time measurements for this control
lay closer to button and radio in the other questions.
Figure 17.3: Boxplots comparing the response times across input controls for question 17 of the
tourism survey
control mean median
button 4.74 4.53
dropdown 5.70 5.32
slider-VAS 5.05 4.82
click-VAS 4.37 3.96
radio 4.95 4.55
text 5.47 4.95
Table 17.4: Basic parameters concerning duration for question 17 of the tourism survey (in
seconds)
The ndings of Heerwegh & Loosveldt (2002a), when simply comparing radio button and drop-
down menu, were conrmed with the experiments reported here. In some other experiments
reported
7
, faster download times for dropdown boxes were observed, which obviously had no
eect in these experiments. When taking a look at average speed of internet connections, the
7
E.g. Heerwegh & Loosveldt (2002b, p.1)
110
17 Response Time / Completion Time
minimal increase of HTML code
8
produced by radio buttons did not have an eect.
A summarized overview of all three surveys is given in table 17.5, which contains the same results
as the detailed view of one single question above. In this table, multiple robust regression was
used with control type and question number as independent variables and the logarithm of the
duration as a dependent variable for each survey. Pairwise comparisons were carried out using
Tukeys HSD-Test
9
. All highly signicant results with the direction are listed in the table: a +
- sign means that the control standing in the row took longer to complete compared to those in
the column (e.g. dropdown was slower than button) and a - - sign means that lling out took
longer with the control in the column.
button dropdown slider-VAS click-VAS radio text
button -|-|- -|-|. .|-|+ .|.|+ -|-|-
dropdown +|+|+ +|.|. +|.|+ +|+|+ .|.|.
slider-VAS +|+|. -|.|. +|+|+ +|+|+ -|.|.
click-VAS .|+|- -|.|- -|-|- .|.|. -|.|-
radio .|.|- -|-|- -|-|- .|.|. -|-|-
text +|+|+ .|.|. +|.|. +|.|+ +|+|+
Table 17.5: Overview of duration comparisons for all three surveys
17.3 Learning Eect
Figure 17.4 oers an example of a learning eect, which was calculated for the tourism survey.
The eect is similar for the webpage survey (gure 17.5). These gures show the percentage of
each input control on the overall time needed for lling out a certain question (e.g. when taking
all mean values of a certain question, the percentage of the mean value of the text input control
for question number 6 is 17%). Only questions under experiment are shown. Mean values were
used for this line plot; it should be shown whether the portion of sliders increase or decrease
over the course of the questionnaire.
For the non-custom input controls, especially for slider-VAS, van Schaik & Ling (2007, p.7) sees
the diculty in (learning to) use because of lack of indication of intermediate points (only end
points are displayed) as a disadvantage of VAS in general. This is one possible explanation,
the second is that the two VAS controls are simply not very commonly used in Web pages and
therefore respondents need some orientation time (how the controls have to be handled). Another
explanation for the longer response times when using slider-VAS could be initial loading times
10
,
but when testing the survey with this control, only minimal delays were observed, because the
Applet containing the control was relatively small (it was simply programmed to select a value
from a slider and did not contain any additional program logic).
8
As always, this depends on the way it is written
9
For more details, see Faraway (2005, p.186f)
10
This was e.g. reported by Couper et al. (2006)
111
17 Response Time / Completion Time
Figure 17.4: Percentage of time needed for each control per questions - mean values - tourism
survey
Figure 17.5: Percentage of time needed for each control per questions - mean values - webpage
survey
17.4 Summary and Conclusion
Outlier detection using robust regression generated better results than other approaches and
served as a good basis for further analysis. When comparing the response times of the input
controls, it was observed that text and dropdown take more time than slider-VAS and much
more time than button, radio and click-VAS. This conrms the hypothesis that response time
correlates with the number of steps necessary to answer the question. Because the response times
112
17 Response Time / Completion Time
for the two VAS dier, it can be implied that VAS in itself does not inuence the response times
and that the number of steps for answering are more important. Furthermore, the hypothesis
can be conrmed that response times for custom controls are initially (for the rst questions)
longer than for questions at the end when comparing them to the standard input controls. This
learning eect is most obvious for the slider-VAS.
It would not make sense to give a general recommendation for input controls with faster response
times, because the other control options also have advantages. It possibly depends on the type
of question: if dicult questions are asked and there is time to think about the answers, the
time needed for lling out is not so important; if easy questions are asked, the chance to give
quick answers and move on to the next question is possibly more important.
113
18 Dropout
Dropout in general occurs more often in Web surveys than in other survey modes
1
. Therefore
it is important to establish the possible factors which motivate respondents to quit the ques-
tionnaire. It should be evaluated if the use of certain input controls can diminish dropout.
Consequently, the assignment of certain input controls and the corresponding dropout rates will
be analysed. Ganassali (2008, p.25) gives a denition of the dropout rate: The dropout rate
represents the frequency of the respondents who started the survey and nally did not end it.
An exit after viewing only the rst screen of the questionnaire is considered as dropout as well
2
.
All questions in all three surveys were mandatory, so if a question was not lled out satisfac-
torily, a soft prompt was displayed, which may have annoyed respondents and led to dropout.
Unfortunately whether respondents received a prompt before quitting the questionnaire was not
tracked (due to the fact that every question was mandatory, a message was displayed when one
part of the question was not lled out). These messages possibly increased respondent frustra-
tion and thus led to survey termination.
A typical eect in online questionnaires is that dropout is more likely at the beginning of the
questionnaire than at the end
3
, which was also the case for all three surveys under experiment.
When taking a look at all three Kaplan Meyer curves (gure 18.1,18.2 and 18.3) very high
dropout occurred at the rst question when a slider-VAS was used. This is probably due to
either technical problems or the high level of burden linked with using slider-VAS. In Walston
et al. (2006), a signicantly higher dropout for sliders was reported, and Java technology was
also used, which would support the theory that technical problems are the reason for higher
dropout rates. Reips (2002c, p.248) shares this opinion: Sophisticated technologies and soft-
ware used in Internet research [...] may contribute to dropout. Potential participants unable
or unwilling to run such software will be excluded from the start, and others will be dropped,
if resident software on their computer interacts negatively with these technologies. Funke &
Reips (2008b) and Funke & Reips (2008a) did not report any technical problems during their
experiments (as faced for the experiments described with Java) which could strengthen the as-
sumption that the diculties were based on technical diculties and not on the use of VAS
itself. For these experiments, employing Javascript was sucient.
For all further analysis, lurkers were excluded and only those included which showed initial coop-
eration (which means those who lled out at least one question were included). Unfortunately,
due to conceptual mistakes, it cannot be determined if dropout occurred at text blocks placed
between the questions or at the successor question.
1
E.g. Lynn (2008, p.41)
2
In the methodological theory section, there is a section on dropout in general (see section 9.1.2.2), which
provides the theoretical basis for the subsequent analysis
3
This was also observed by Weichbold (2005, p.221), Galesic (2006, p.317) and Hamilton (2004)
114
18 Dropout
To statistically evaluate this question, survival analysis was used
4
. In the following, Kaplan
Meyer survival curves are shown to compare the dropout rate between the input controls together
with tables where the dropout at the questions under control are quantied (the columns are
the positions of the question, where dropout occurred).
18.1 Tourism
In the tourism survey (see gure 18.1 and table 18.1), technical problems concerning the slider-
VAS seem to be dramatic; only about 55% of the participants survived the rst question under
experiment. Concerning the click-VAS, only 3 dropouts were observed (these occurred within the
rst two questions). This is the lowest rate of dropout especially compared to the other controls.
This low rate of dropout was closely followed by button and radio controls (which behaved
almost identically within this survey with a survival rate of about 92%). Higher dropouts can
be observed when dropdown is used. Text control has the highest dropout rates, which narrowly
fails to reach statistical signicance based on the cox-regression when comparing to the button
control. As expected, about 10% dropouts at the second question conrm previous ndings that
most dropout can be observed at the beginning of the questionnaire.
4
For the usage of survival analysis in R (http://cran.r-project.org), see Everit & Hothorn (2007, p.143pp)
115
18 Dropout
Figure 18.1: Survival times for the tourism survey comparing input controls
control/pos 6 9 12 13 15 16 17 16 42 43
button 2.58 1.03 1.03 0.00 1.55 0.52 0.00 0.00 0.00 0.00
dropdown 1.62 1.62 1.08 0.54 2.70 1.62 0.54 1.08 0.00 0.54
slider-VAS 29.06 0.85 0.00 2.14 2.14 0.43 0.43 0.00 0.00 0.00
click-VAS 1.25 0.00 0.00 0.00 0.62 0.00 0.00 0.00 0.00 0.00
radio 0.56 0.56 1.13 1.13 2.82 0.00 0.00 0.00 0.00 0.00
text 0.00 3.66 1.22 1.22 3.05 1.22 1.22 0.00 0.61 0.00
Table 18.1: Dropout questions - tourism survey (in percent for each control)
116
18 Dropout
18.2 Webpage
The webpage survey presented a similar situation (see gure 18.2 and table 18.2). Button, radio
and click-VAS were nearly identical. Highly signicant dierence from the button control can
be observed for text (p<0.01) and java-VAS controls. There was also a signicant dierence
between button controls and dropdown controls (p<0.05).
The largest dropout rate for the middle section of the questionnaire was observed when respon-
dents reached question 12. Specic for this question was that each of the eight sub questions
was linked to a screenshot. One reason for high dropout may have to do with the time intensive
process of clicking on each link to reach the screenshot and then the necessity of returning back
to the questionnaire to rate the image. Another reason may be that respondents were confused
because multiple browser pages were open. Interestingly, the highest dropout rate occurred when
the text input control was used. It is possible that the likelihood of dropout increases when en-
tering answers is made more complex through the implementation of an additional input device
(namely: the keyboard).
117
18 Dropout
Figure 18.2: Survival times for the webpage survey comparing input controls
control/pos 2 3 4 5 6 8 10 12 14 16 24 30
button 4.07 4.07 0.81 1.63 1.63 0.81 0.00 7.32 0.81 1.63 0.00 0.00
dropdown 4.87 5.48 1.42 3.85 1.22 1.42 0.81 9.33 1.01 1.42 0.00 0.61
slider-VAS 19.71 3.65 2.19 2.19 0.00 0.73 0.73 6.57 0.00 3.65 0.73 0.73
click-VAS 3.76 4.51 0.75 0.00 1.50 1.50 2.26 4.51 2.26 1.50 0.00 0.00
radio 3.15 5.51 0.00 2.36 0.79 0.00 0.79 6.30 0.00 2.36 0.00 0.00
text 6.87 3.55 1.77 3.55 3.10 1.11 0.44 16.85 1.11 1.77 0.44 0.22
Table 18.2: Dropout questions - webpage survey
118
18 Dropout
18.3 Snowboard
Similar behavior can be observed for the snowboard survey (see gure 18.3 and table 18.3
5
).
Again, click-VAS shows the best results.
Figure 18.3: Survival times for the snowboard survey comparing input controls
5
Missing columns here only contain zeros
119
18 Dropout
control/pos 10 11 14 21 25 27
button 5.00 3.33 0.00 0.00 0.00 0.00
dropdown 5.88 5.88 0.00 0.00 0.00 1.96
slider-VAS 36.21 5.17 0.00 0.00 0.00 0.00
click-VAS 5.66 3.77 0.00 0.00 0.00 0.00
radio 4.26 4.26 2.13 0.00 2.13 2.13
text 11.54 7.69 0.00 3.85 0.00 0.00
Table 18.3: Dropout questions - snowboard survey
In table 18.4, an overview of the outcome of a pairwise comparison between input controls for
all three surveys is given. The signs in a cell stand for the three surveys (tourism, webpage,
snowboard). A plus-sign means that the input control listed in the row led to a signicantly
(p<0.05) higher dropout (or a higher hazard-rate) in the particular survey compared to the
input control given in the column, a - - sign indicates the opposite and a dot should show
that no signicant dierence between input controls could be found. To create this table, a
cox proportional hazards model with all three experiments, was used and the non-experimental
variables censored. Results for the slider-VAS are relatively clear in this table, but it is again
important to mention that these eects possibly result from technical diculties with the (Java
based) input control.
button dropdown slider-VAS click-VAS radio text
button .|-|. -|-|- +|.|. .|.|. .|-|-
dropdown .|+|. -|-|- +|+|. .|+|. .|-|.
slider-VAS +|+|+ +|+|+ +|+|+ +|+|+ +|.|.
click-VAS -|.|. -|-|. -|-|- .|.|. -|-|-
radio .|.|. .|-|. -|-|- .|.|. -|-|.
text .|+|+ .|-|. -|.|. +|+|+ +|+|.
Table 18.4: Overview of dropout for all three surveys - paired comparisons
Slider-VAS, dropdown and text led to higher dropout than the other controls. This stands in
contrast to button, click-VAS and radio, which form the group of controls which lead to lower
dropout.
Finally, an overall overview on dropout rates
6
for all three surveys is given in table 18.5. The
dropout rates were only counted as dropout in controlled questions and are given in brackets.
Even though a big portion of dropouts occurs in controlled questions, there is also a (varying)
portion of dropouts within uncontrolled questions, but which could also be aected by the input
type of the antecessor questions. The table indicates that lowest dropout rates can be observed
for the controls: button, click-VAS and radio within all three surveys.
6
How many of those who started the survey did not complete
120
18 Dropout
tourism webpage snowboard
button 27.87% (6.71%) 26.84% (22.78%) 30.00% (8.33%)
dropdown 29.17% (11.34%) 41.38% (31.44%) 39.20% (13.72%)
slider-VAS 55.15% (35.05%) 44.53% (40.88%) 63.80% (41.38%)
click-VAS 19.98% (1.87%) 31.56% (22.55%) 26.41% (9.43%)
radio 23.70% (6.20%) 28.35% (21.26%) 31.94% (14.91%)
text 34.77% (12.20%) 51.86% (40.78%) 61.56% (23.08%)
Table 18.5: Overview of dropout for all three surveys - dropout rates
18.4 Summary and Conclusion
The most obvious point which can be seen in the Kaplan Meyer curves is the high dropout for
slider-VAS. One important research question was to check the usability of Java Applets within
questionnaires. Because slider-VAS are based on this technology, it can be implied that the use
of advanced technologies like Applets is at least critical, because it seems that some respondents
did not have Java installed properly. Therefore avoiding these technologies is recommended,
unless it is possible to ensure the installation and conguration of the respondents PCs, which
would e.g. be the case for employee attitude surveys.
When looking at tables 17.5 and 18.4, similarities apart from slider-VAS can be found, which
means that there is a slight correlation between the time needed to complete the lling out
process with a certain control (and the number of steps necessary for lling out) and the dropout
rate. This shows that a higher burden for the respondents directly results in higher dropout
rates, which conrms one of the hypotheses of this work. Controls with smallest dropout were
click-VAS, button and radio.
121
19 Response Distributions
A central question is if there are dierences in the response distributions themselves across
sliders. For all these analyses, the evaluation questions at the end of each survey
1
were not
included. In the following, comparisons of the means across input controls were carried out as
well as examining the distributions of the answer categories themselves.
19.1 Comparison by Mean Values
As a rst step, a comparison of the mean values across input types was prepared for all three
surveys. To enable comparison, panel normalization
2
was applied to all input control values,
which resulted in values between 0 and 1. Each question was treated separately. Table 19.1
contains the mean value of dierences from the overall mean for each question. When taking
a look at these values, no consistent trend can be found. In the tourism-survey (and similarly
for the webpage-survey), dropdown tends to the left pole. Slider-VAS has a tendency to the
right pole in the webpage-survey. The results of a MANOVA for the tourism- and webpage-
survey conrm these ndings and bring more clear results: dropdown had signicant dierences
(p<0.05) to all other scale types
3
with a tendency to the left pole (highly signicant dierences
(p<0.01) for click-VAS and button in both surveys). Text has a similar tendency compared to
button and radio (tourism), same as click-VAS and slider-VAS (webpage).
tourism webpage snowboard
button 0.0175 0.0044 0.0045
dropdown -0.0180 -0.0156 -0.0035
slider-VAS 0.0030 0.0246 0.0054
click-VAS -0.0055 0.0133 -0.0158
radio 0.0106 -0.0130 0.0282
text -0.0076 -0.0136 -0.0188
Table 19.1: Deviations from mean per sub-question (unit: normalized (0-1) scale point)
19.2 Compare the Distributions within the Categories
To get a deeper understanding of how response behavior diers in the certain answer categories
between controls, further analysis was conducted. As an initial data analysis step, paired dier-
ences between the input controls for each answer category were analyzed by creating contingency
1
See chapter 16
2
V
stand
=
V
i
V
min
V
max
V
min
3
Except radio, where a strong dierence (p<0.001) was only observed in the webpage-survey
122
19 Response Distributions
tables for each case, which means for each column there were two input controls and for each
row items were selected or not selected.
All crosstabs with values smaller than 5 were eliminated. For each of these tables, a
2
value
was calculated and those with high signicant dierences were taken and summarized. See ta-
ble 19.2 for an example of one concrete subquestion in which two input controls were compared
for one category. This crosstab would result in a p-value of 0.0052 using Fishers Exact Test.
This would add a point to the score, because the dierence is highly signicant.
To express the direction of the dierence, odds ratio was taken as measure
4
. An odds ratio
greater than one means that there is a tendency for the control shown above to be selected more
often, compared to the control given below. In the example in table 19.2, the resulting odds
ratio of 3.09 would mean that there is a tendency that dropdown is selected more often for this
category. The results are summarized in gures 19.1 and 19.2
56
.
style selected not selected
dropdown 25 101
slider-VAS 9 113
Table 19.2: Example of a 2x2 table for questionnaire tourism, question 9, sub question 11, com-
parison of dropdown and slider-VAS, category 5
4
The mean value of odds ratios for a certain contingency table with high signicant dierences
5
The snowboard-survey had too few observations to get meaningful results, so results will not be shown.
6
As a remark, slider-VAS and click-VAS were abbreviated to slider and click to save space within the gure.
123
19 Response Distributions
Figure 19.1: Signicant dierences between input controls for each scale item - tourism survey
124
19 Response Distributions
Figure 19.2: Signicant dierences between input controls for each scale item - webpage survey
Figures 19.1 and 19.2 show the number of highly signicant dierences in the tables described
in table 19.2. The odds ratio is given in brackets for each category. The results of these two
surveys are not equal in some points, but go in the same direction. The most obvious dierences
could be found when comparing text and dropdown with the other controls particularly at the
midpoint (one has to remember that text and dropdown are the only two input controls where
numerical feedback is given). When there is no real midpoint available, the impending question
is: which category is more likely to be selected, 5 or 6? On examining the tourism survey,
it became clear that for these two controls, 5 is more often selected as midpoint if numerical
feedback is given. Similarly in the webpage survey the results showed that 6 as scale item was
selected more often when no numerical feedback was given.
125
19 Response Distributions
19.3 Midpoint Selection
To examine the preferableness of scale item 5 over 6 for the input controls with the numeric
feedback, the ratio of use of 5 over 6 was calculated and compared across the input controls.
See table 19.3 for median values of the 5 over 6 ratios for each of the three surveys. In the table
it is easy to see that for dropdown and text input controls there was a higher use of scale item
5 over 6. Additionally, for the webpage survey, there was a general preference for scale item 5.
In relation to other input controls, the slider-VAS had a higher use of scale item 6, because
initial movement was necessary and respondents possibly moved the slider (mostly in writing
direction) and so out of the (linear reduced) scale item region 5 to region 6. A similar approach
can be found in Couper et al. (2006, p.239) who compared radio buttons with text input and
found that respondents tended to choose 10 over 11 when using the text input eld on a scale
ranging from 1 to 20. One possible conclusion would be that the use of a real midpoint makes
sense when working with scales.
tourism webpage snowboard
button 0.69 1.67 1.00
dropdown 1.50 2.47 2.00
slider-VAS 0.64 1.24 1.00
click-VAS 1.00 1.20 1.38
radio 0.91 1.57 1.00
text 1.25 2.67 2.00
Table 19.3: Median values of ratio of use of 5 over 6 for all three surveys, compared by input
controls
19.4 Analysis per Question Battery
In this section, the answer distributions of each battery were examined. The approach was as
follows: for each individual, certain parameters like span (max - min), min, max, standard de-
viation and mean are calculated for each item battery and compared with each other by input
control using analysis of variance with input control and battery (variable name) as predictors.
For paired comparisons between input controls, the Tukey HSD test was used.
In the list of scale questions, such as the question batteries under experiment, the respondents
sometimes tended to choose more or less the same points and only pick a very narrow range of
responses from all the possibilities. This behavioural pattern is called use of subscales or non-
dierentiation. It shows a lack of interest and a weak level of eort for answering. Variety in the
answers should increase the degree of involvement in the survey (Ganassali (2008)). It should be
shown if any dierences between the dierent input controls concerning these denitions can be
found. Concerning the range of answers (span) each individual had within one question battery,
in the tourism and the webpage survey the radio input control had the smallest range. This
was signicant (p<0.05) compared to the slider-VAS (which had the highest range in the linear
reduced version) for the tourism survey and compared to the click-VAS (which had the highest
value for this survey) and dropdown input control for the webpage survey. Additionally, in this
survey, a signicant dierence between text input and click-VAS was found.
126
19 Response Distributions
When facing the minimum values selected for one battery, the radio input control had the highest
values in the tourism survey with signicant dierences when comparing to click-VAS, dropdown
and text. In contrast, the results for the webpage survey were very dierent: radio had the low-
est values together with the text input control. Here slider-VAS had the highest values with
signicant dierences to text, radio and dropdown. The unequal distribution of input controls
assigned to respondents in the webpage survey heavily biased the results concerning minimum,
maximum and span in general. This may be because controls which were underrepresented were
e.g. more likely to have higher minimum values than other controls.
19.5 Summary and Conclusion
A simple comparison of the mean values across input controls showed no clear trend across input
controls, only that dropdown (and also more moderately text) had a tendency to the (lower) left
pole. This is possibly because of the need to move the mouse down to reach the (higher) right
pole. Similar results were found when analysing input controls per battery. It would make sense
to repeat these experiments (possibly with a dierent population).
On examining the single categories, it was found that input controls with numerical feedback
(such as dropdown and text) had category 5 selected more often. None of the input controls
had a real midpoint. However, when 10 scale points were used, 5 was often interpreted as the
midpoint if numbers were shown on the scale. This becomes even more obvious when the ratio
of categories 5 and 6 were calculated as is shown in gure 19.3. To avoid this eect, a real
midpoint should be used.
127
20 A Closer Look at the VAS Distributions
20.1 Distributions of the VAS
In this chapter, possible strategies for transforming the 200 scale items of the slider-VAS and
the 20 categories of the click-VAS control to 10 categories to enable comparability between
those controls are presented. Before categorization strategies are discussed, a closer look at
the distributions of the two VAS controls (compared to all other controls) might be useful. In
gure 20.1, bar plots showing the distribution of a summary of all variables of the tourism survey
are presented for the slider-VAS, click-VAS, and a summary of all controls with 10 scale points.
The purpose of this gure should not be the direct comparison
1
, but to show special properties
of the two VAS. Aggregation is suitable in this case to show the eects described below more
clearly.
1
Which would not be advisable because of bias when taking the distributions of all variables together
128
20 A Closer Look at the VAS Distributions
Figure 20.1: Comparison of the categories slider-VAS, click-VAS and others - tourism survey
Here, it can be seen that the extreme values of the slider-VAS 1 and 200 were selected much
more often than non-extreme values. It can be implied that this was done to make sure that only
the most extreme category was selected or that categories would have been necessary which were
even more extreme. Another eect which can be observed for the slider-VAS was that the points
around the midpoint categories were disproportionally seldom selected compared to values at
the right from the midpoint. One possible reason for this was that all questions were mandatory
and the slider was positioned in the middle. To distinguish between non-respondents and those
129
20 A Closer Look at the VAS Distributions
who really wanted to select the midpoint, the slider had to be moved initially. Possibly those
who wanted to select the midpoint just moved the slider a bit away from the midpoint and did
not move back. Interestingly, more respondents moved the slider initially to the right, possibly
because this is the reading direction. van Schaik & Ling (2007) conducted a similar experiment,
where the VAS slider was also initially positioned in the middle, but no such eects were reported.
A similar, but not as intense eect concerning the higher selection of the extreme points can be
also seen within the bar plots of the click-VAS. The second point which can be seen for this input
control is the need for a middle category (which does not exist for the 10 scale item controls).
Categories 10 and 11 were disproportionally higher than their direct neighbours. All eects
mentioned above could be repeated for the other two surveys. To quantify these eects, see
table 20.1, where the ratios of adjacent categories are compared. For all three surveys, highest
deviations from 1 can be found at the extreme points and at the midpoint.
1/2 3/4 5/6 7/8 9/10 11/12 13/14 15/16 17/18 19/20
tourism 1.43 1.05 1.18 1.03 0.46 1.46 0.97 0.87 1.05 0.60
webpage 1.61 0.92 1.16 0.85 0.73 1.76 0.85 1.25 1.01 0.62
snowboard 1.59 1.09 1.08 1.31 0.39 1.80 0.98 1.06 0.84 0.61
Table 20.1: Ratios of the adjacent categories for the click-VAS control
20.2 Categorization
As described in section 4.4, Funke & Reips (2006) found out that linear categorization was not
the best suitable strategy for transforming ne grained (in this case the slider-VAS) to coarse
grained scales, but a categorization with reduced extremes
2
. These ndings cannot be directly
supported by the information gained from the experiments conducted. A problem with nding
meaningful categorizations came from the overrepresentation of the two extreme points 1 and
200.
The following transformations were conducted with the results of the slider-VAS:
linear: linear transformation (which means 10 categories of size 20)
reduced1: reduced extremes with value 16 at each end, four 20-categories at the outer
areas and four 22 scale points for the middle area.
reduced2: reduced extremes with value 12 at each end and for the rest 22 scale points.
To check which transformation best suited at the extreme points, the following test was made:
for all variables of a questionnaire, the portions of the extreme values after applying the dierent
transformation strategies were retrieved. These values were compared to a control which initially
used a 10 point scale
3
. Thus for all transformations, comparisons were made with the radio
2
Which means that at both extremes, less scale points from the more ne grained scale are added to the extremes
of the coarse grained scale
3
Because the most established input control for such scales is the radio control, so this was taken for comparison
130
20 A Closer Look at the VAS Distributions
control for all variables (except the 3 evaluation questions at the end
4
). The comparison was
simply performed with a
2
test, whereby only the lower and upper extreme points were taken
into consideration
5
.
survey trans count sig (lower) hsig (lower) sig (upper) hsig (upper)
tourism linear 68 2 0 1 0
tourism reduced1 68 3 1 3 0
tourism reduced2 68 3 2 4 2
webpage linear 28 1 0 1 1
webpage reduced1 28 3 0 0 4
webpage reduced2 28 5 0 0 5
Table 20.2: Number of signicant dierences when using dierent categorization strategies of
the slider-VAS control
After removing the evaluation questions, for the tourism survey, 68 comparisons and 28 for the
webpage survey were left for evaluation. Table 20.2 indicates that the more reduced the extreme
categories were, the higher signicant dierences between the categorized slider-VAS and the
radio control resulted for both extremes. Sig here means signicance level of p<0.05 and hsig a
signicance level of p<0.01. The same is valid for the webpage survey. As a conclusion, for these
two surveys, the theory of reduced extremes as observed within studies accomplished by Funke
& Reips (2006) is not applicable for the webpage and tourism surveys. No signicant dierences
at all regarding the extreme points were found for the snowboard survey.
Although linear categorization is most suitable for the two surveys, there are still signicant
dierences in regard to the radio input control. To nd out which exactly are the most suitable
category sizes for both extreme categories, an attempt was made to calculate the best category
sizes based on the distributions of all 10 item scale controls. This approach possibly had the
drawback of being at risk of getting results which were only suitable for the respective surveys.
Figure 20.2 should explain the approach of calculating the best extreme category sizes.
4
For details, see chapter 16
5
Which results in a 2x2 table with selected and not selected as columns.
131
20 A Closer Look at the VAS Distributions
Figure 20.2: Compare calculated categorization with linear categorization - tourism survey
Here, the black line is the distribution function of the slider-VAS control aggregated for all
tourism questions. On the y-axis, the portions of the 10 scale categories are drawn. To achieve
the same portions for the slider-VAS, the intersection with the distribution function is taken
which is vertically transformed to the x-axis. Here, the cut-points can then be read and taken
as ideal category sizes. To compare with the linear transformation (20 scale items each), green
lines are drawn.
For this particular survey, it was observed that the lower extreme category was bigger than 20
and the upper extreme category was almost 20. In the webpage survey, both extreme categories
were size 26 and were therefore in both cases bigger than if a linear transformation had been
applied. Category 5 was bigger than category 6 in both surveys. In the snowboard-survey, the
lower extreme category was size 20, the upper one had size 29
6
. Additionally in all 3 surveys,
6
But again, this should considered in regard to small number of respondents
132
20 A Closer Look at the VAS Distributions
the left middle categories (5) were bigger than the right ones (6), which ts the results reported
in previous chapters.
Figure 20.3: Compare linear categorization points of simple-VAS with 10-scale controls - tourism
survey
Figure 20.3 gives a similar view of the outcomes of the same questionnaire. Here again the
distribution function of the slider-VAS is presented in green. In red, the portions of the 10-scale
point controls are given. The top-x-axis contains the dierences between 10-scale controls and
slider-VAS control. This gure should only give a dierent view of the same task visualized in
gure 20.2.
133
20 A Closer Look at the VAS Distributions
Figure 20.4: Compare Boxplots for all cut-points - tourism survey
Figure 20.4 shows the distributions of the deviance from linear categorization for all cut-points
of the tourism survey. A value greater than zero indicates that the cut-point for the slider-VAS
was higher than if linear categorization was used (which means multiples of 20), which suggests
that the cut-point was nearer to the right extreme. It can be seen that there are a lot of outliers
located far away from the median. Because of this, it is hard to nd ideal cut-points.
20.3 Summary and Conclusion
On examination of the slider-VAS distribution, it can be seen that categories 1 and 200 are
overrepresented. One possible conclusion for this phenomenon is: the number of scale items for
this input control is too high, because no ne nuances when answering at the two extremes is
given. Respondents just wanted to express that they agreed or disagreed.
134
20 A Closer Look at the VAS Distributions
In the click-VAS distribution, categories 10 and 11 were higher than their direct neighbours,
which shows a tendency towards the (not existent) midpoint. These results should show that
oering a midpoint category is recommended. A similar, but not so strong eect concerning the
extreme point as described above for the slider-VAS can also be found for the click-VAS.
Several categorization strategies were evaluated for slider-VAS and it seems as if linear reduction
is the most suitable one. The attempt to nd better cut-points when using linear categorization
did not lead to satisfying results because of dierences and contradictions between the questions
in all three surveys.
135
Part IV
Software Implemented
136
21 Introduction
The software itself (the conceptual and technical work) is an essential part of this dissertation.
For this reason, a short description of its functionality and technical background is given. The
software is published as open source, the intention behind it is to create a community of devel-
opers who help to stabilize and improve this tool and apply it to their own needs in such a way,
that the community can also benet from these additional implementations. These descriptions
should also serve as a short introduction to how the software can be extended
1
. The accomplished
experiments as described in part III were run with an early version of the current software. For an
example of the current version and additional information, visit http://www.survey4all.org.
21.1 Conventions
It was attempted to abide by a few general conventions in regard to the software:
Only open source tools were used for development, running, administering, documenting
and all other tasks which have to do with the software itself.
The software was published as open source with all its components and subprojects.
The software development process should be driven by the end users, which means, that the
needs of the end users (those who are using this tool to run surveys) should be considered.
This is a good test of the extensibility of the software architecture and is an interesting
experiment in which direction the development will go.
But nevertheless QSYS will always focus on being an online survey tool, there is no inten-
tion to integrate e-learning capabilities or the possibility of using data analysis. Of course
it is desirable to create such functionality, but the focus should be kept on separating
this functionality in external projects, where QSYS serves as a core system oering basic
functionality via Web services.
The software is already published as open source (GNU General Public License (GPL)
2
or
see St.Laurent (2004) for basic principles of open source software licensing in general and a
detailed explanation of GPL on pages 35-48), visit http://www.survey4all.org or http://
qsys.sourceforge.net for further information
3
.
1
E.g. add question types and access modes for the respondents or modify the styling of the Web page
2
See: http://www.gnu.org/licenses/
3
Holtgrewe & Brand (2007) provide a sociological approach to open source strongly related to the shift in labor
in general
137
21 Introduction
21.2 General Software Description
QSYS is a Web based survey software intentionally developed for running experiments dealing
with visual design eects to respondents behavior. In the current version, the software includes
an easy to use online-editor supporting the online surveys creation, accomplishment and ad-
ministration. The users can develop an online-questionnaire without any previous knowledge of
HTML.
As online survey software constitutes a competitive market, one has to ask, why do we need
another product in this eld? Conversely to other commercial and non-commercial products,
QSYS gives users and developers access to the source code and the permission to create deriva-
tive works from the original survey software. This option allows the adaptation and extension of
the software to customers needs and the installation of the complete package without any license
fees. Additionally, there is no available software which provides good support for visual design
experiments in Web surveys. QSYS attempts to bridge this gap. Of course it would have been
much easier to use an existing package (preferentially an open source package) and customize
it according to the special needs for the experiments. But all the existing packages have one
major drawback: separation between content, style and design of the questionnaire respectively
is not strictly implemented. The conclusion would have been to copy all questionnaires as often
as dierent stylings are needed and assign the styles to them, which is not an optimal approach.
The second motivation was simply to write a well structured and extendable open source survey
development tool, publish it as open source, and see what happens.
The development process focused on the design of reliable and extendable software architecture.
Therefore, new question types, extensions of existing question types, style and appearance of
questions can be created and adapted quite easily without the necessity of breaking with exist-
ing software design concepts. Moreover, questions visualization is strictly separated from the
questionnaires content, so each survey can be presented in an individual style with low demand
on resources. QSYS supports all common question types together with a few innovative ones
(like image map questions), which are customizable as well, and a diverse range of participa-
tion modes. Additionally, features like PDF and XML export are implemented. The elaborate
software architecture allows a rapid extension, customization and embedment into a proprietary
infrastructure.
21.3 Overview of Features
This section provides an overview of all the QSYS features:
Several dierent question types are supported with various conguration possibilities.
For all questions and alternative texts, a WYSIWYG-editor is oered to the questionnaire
designer to easily assign the desired styling, so changing font types, adding links, tables,
images and all other controls supported by HTML can be added without the necessity of
any Web technology knowledge
4
.
4
See http://tinymce.moxiecode.com/ for further information.
138
21 Introduction
Questions can be displayed in a paging (one page per question) or in a scrolling (multiple
questions on one page) mode. When applying the scrolling mode, page separators support
the questions categorization as logical units.
Branching on behalf of answers to closedended questions is possible.
Questions can be marked as mandatory to give feedback to the respondent, when the
question is not answered suciently.
PDF export creates an oine version of the questionnaire (e.g. for mixed mode studies).
XML export serves as an exchange and archiving possibility (answers of respondents can
also be exported as XML). When a survey creator is familiar with the XML schema used for
questionnaires, survey creation or e.g. repeating modications can be done very eciently
by directly editing the XML-document. It is even possible to write a custom editor tailored
to ones needs.
Paradata tracking is implemented e.g. for exporting the time needed for lling out one
question together with browser and operating system information. In addition the export
of the IP address for each participant can be enabled. Storage can be turned o or masked
5
to assure privacy, because this feature has to be used with caution: Organizational re-
searchers need to be sensitive to the condentiality and security issues associated with the
use of Internet survey tools, as participants may be unwilling to provide candid responses
to a survey if their anonymity is not ensured (Truell (2003, p.35)).
Language independency is secured, as language tokens are stored externally. As a result,
a simple translation of these tokens adapts the software to a further language.
An optional summary of all answers for one respondent can be oered at the end.
The questionnaires completion can be limited to a certain period.
Advanced interviewee restriction modes (all can participate, common PIN-code, user/pass-
word list, members of certain LDAP
6
-groups,...).
Entire software is published under an Open Source License allowing extension and cus-
tomization to distinct needs.
A common Servlet engine is sucient to install QSYS on a server. Not even a database is
needed (but it is possible to use one).
A status (DEVELOPMENT, PRETEST, OPERATIONAL, DISABLED) can support the
user to distinguish between dierent phases of the survey stored together with the given
answers.
Data can be exported as CSV (which can be imported into all statistical analysis tools
including Excel), SPSS (currently sps-les are generated) and native XML.
All these tasks are described in more detail in section 21.5.
21.4 Supported Question Types
QSYS supports a wide range of question types (all allow several variations):
5
To export only parts necessary to identify e.g. an institution rather than an individual person, which would
for example look something like this: 138.232.xxx.xxx
6
Lightweight Directory Access Protocol
139
21 Introduction
Closedended questions.
Dichotomous questions: similar to closedended questions. Here, the respondent can select
one of two alternatives.
Pictogram questions: similar to closedended questions. Here with pictures instead of
textual alternatives.
Closedended ranking questions: alternatives have to be brought in the right order.
Question matrix: multiple questions with the same selection of alternatives.
Question matrix with column grouping: integrating multiple matrices with the same sub
questions in one view.
Semantic dierential or VAS: Visual Analogue Scale (VAS) constitutes a measurement
instrument measuring a characteristic or attitude believed to range across a continuum of
values. VAS is verbally anchored on each end, e.g. very good vs. very bad (support for a
couple of sub questions and the anchor points can be set for each subquestion individually.
Interval question matrix: just like a semantic dierential, except for this type, labeling of
anchor points is the same for all subquestions.
Interval questions: similar to interval question matrix, but without sub questions (the
main question itself is rated)
Openended questions: for this type, the size of input eld can be varied; also number and
date elds are supported.
Openended ranking questions: presenting multiple openended input elds for one question
to the user.
Openended question matrix: presenting an openended input eld for each sub question of
the matrix to the user.
Image map questions: respondents can select a certain region of an image map (e.g. a
geographical map).
Text blocks and page separators: these are workow elements, not questions, for the
purpose of separating parts of the questionnaire and to add HTML code between questions.
21.5 Editor
Generally, the CMS area of which the editor is part of, is split up into groups. Each group has
a group administrator with a password used for logging in. One group can contain multiple
questionnaires, which are all listed and ready for editing on the group administrators overview
page.
The editor for creating and customizing the survey is easy to use and featured with AJAX
technology. This has the positive eect that pressing a submit button and waiting for the
browser to reload the whole page becomes unnecessary. A simple click on e.g. a checkbox or
button is sucient to send edited data to the server for storage (although feedback is given if the
action was performed successfully or if an error occurred). This drastically speeds up entering
and editing questionnaire content.
140
21 Introduction
Figure 21.1: A sample view on the questionnaire editor
In gure 21.1, the main editor window is shown: on the left, a navigation bar listing all ques-
tions together with an abbreviation of the question type is given. This has been done to give
an overview of the whole questionnaire and to enable easy navigation. A full view mode for the
navigation bar is oered, which means that this bar occupies the whole screen, so full question
texts are visible. This should ease and speed up exchanging, copying, and removing one or more
questions. To add a new question, simply select the desired type and the position from the tab
on the left, where the question should be added. In addition, page separators can also be added
this way.
The main section of the editor view is split up into dierent tabs. The tab common question
settings is the same for all question types and is used to enter the general question text as well
as a description text
7
. Furthermore, a text to be displayed after the question can be specied.
As additional settings for each question, some attributes can be set, like mandatory status of
the question and general successor of the question, which is used for branching.
All other tabs depend on the question type which is currently being edited. There is a tab
specically for questionnaire settings, where question type specic editing can be done (e.g. for
question batteries, column width and alignment can be set). All tabs to the right are also
question-dependent. For example, when a question battery is edited, additional tabs like row
7
Which should not be part of the question itself, but should give hints how the question should be lled out.
This text is usually written under the question text itself, in smaller letters
141
21 Introduction
and column come up, where sub questions and alternatives for the battery can be added. The
last tab gives a preview, which is also available for all question types. The currently edited
question is displayed as it will be to the interviewee.
The following basic settings can be applied to the whole questionnaire:
Short link: when the link to the questionnaire is published (in invitation letters, e-mails
or news groups), it is benecial to have a link with only a few and short parameters (to
avoid e.g. line break problems with some e-mail clients).
Should the IP-address be stored: when anonymity plays a major role for a survey, IP-
addresses can be made anonymous (which is the default setting), or masked.
Number of questions per page: it is possible to distinguish between one question per page,
all questions on one page and using separators (which can be added in the main editor
view as described above).
Should a progress bar be oered to the respondent or not.
Should a summary of entered data be oered to the respondent after lling out.
A logo can be uploaded which is displayed on each page of the questionnaire.
A data range can be specied within which the questionnaire is set as active. If the
current date is outside this interval, an appropriate message is displayed when accessing
the questionnaire and participation is impeded.
21.6 Technical Background
In this chapter, an overview of the technical background and technologies used is provided. In
addition the technical preconditions necessary to install and run the software are listed.
The application is based on Java Servlet technology. For the creation of the application Jakarta
Struts framework
8
was applied as controller (in a slightly unconventional way). XML plays a ma-
jor role on all layers of the software architecture. All questionnaires and additional information
are stored as native XML documents. To render these questionnaires as HTML or PDF, XSLT
and XSL-FO
9
is used, which results in a complete separation of content and view. The decision
which style sheet is used for which content le (which means e.g. for which questionnaire) is
administered in an external conguration le. It is even possible to assign a certain style sheet at
random or based on certain conditions. In the case of the experiments, technical preconditions
fetched from the browser settings
10
determined the necessary style sheet. An Oracle database,
eXist
11
or even the le system alone (which is currently the preferred option) can be used for
data storage.
It should also be mentioned that during the development of the whole project, a standalone
Struts XSLT framework was created, which meets the demands of both, struts and XML/XSLT
8
http://struts.apache.org
9
Extensible Stylesheet Language - Formatting Objects
10
E.g. if Javascript was enabled
11
See: http://exist.sourceforge.net
142
21 Introduction
(with language independency and other useful features). Some features of the framework were
specially created for the needs of the whole project, such as selecting an XSLT style sheet for
one view by chance.
21.6.1 Technical Preconditions
For installing and running the software, only a Servlet engine like e.g. Tomcat
12
(all tests
have been run on this engine) or Jetty
13
is necessary. Data is stored (at least in the default
conguration) directly within the le system, which means not even a database system has to
be installed or connected to. Because of these few technical necessities, it is even possible to run
the software on the local machine to prepare a questionnaire oine and upload it later to the
productive system or use QSYS for small surveys (like evaluation sheets in the eld of education)
directly on your local machine or laptop (which would work as a small server in this case).
12
http://exist.sourceforge.net
13
http://www.mortbay.org/jetty
143
22 Software Architecture
In this chapter, a brief overview of the software architecture of the whole system and its com-
ponents is given.
The software is split up into subprojects to ensure reusability of the particular components in
other projects. Figure 22.1 shows a component model of the whole QSYS-system. All parts are
described in the following sections:
QSYS-core: the core functionality, like creating, storing and managing questionnaires of
the whole system.
QSYS-web: the Web interface for the online version of QSYS with all XSLT style sheets
and mapping information.
QSYS-tools: console based tools for automated processing are implemented accessing
QSYS-core.
struXSLT: a standalone XSLT-extension for the Struts framework. Amongst other projects,
QSYS-web is based on this framework.
q-utils: a simple utility classes used by all components.
QSYS
QSYS-core
QSYS-web
QSYS-tools
q-utils
StruXSLT
Figure 22.1: Component model of the whole QSYS-system
22.1 QSYS-core
This is the core component of the QSYS system. It was separated from the Web frontend to
be open to a possible branch which would lead to a standalone (and not Web-) application for
creating questionnaires. Currently, only a Web version for creating the surveys is available.
One of the main tasks this component is responsible for is the storage of questionnaires. A
questionnaire consists of a list of questionnaire items. Each questionnaire item has two rep-
resentations, one as object and the other one as XML fragment. Mapping between these two
manifestations is carried out within each class, which means each question type has its own
144
22 Software Architecture
XML-(de-)serialization methods. There are tools which could do this in an automated way (like
e.g. JAXB
1
), but most of them have drawbacks which in some cases inuence software archi-
tecture (e.g. concerning visibility of member-variables). Due to a of strong similarity of the
questions, inheritance is heavily used. For one simple question type, an UML
2
class diagram is
shown in gure 22.2 to demonstrate the hierarchy and organization of the questionnaire item
classes:
22.1.1 Questionnaire Items
Subsequently a short overview of the class hierarchy of questionnaire items
3
is given. For a graph-
ical representation see gure 22.2, which provides examples of the concrete question classes: an
interval question (giving a rating on a scale between two anchor points) and a simple openended
question. For the concrete questions, the isDataComplete method checks if the question was
completely lled out by the respondent (this method is called by canDataBeStored). All other
question types are organized in the same manner. For a complete list of supported question
types, see section 21.4. For all questions that hold sub questions, an interface exists which man-
ages all the tasks necessary for holding multiple questions.
In both, the textual description and UML class diagram only the core concepts are illustrated,
for a more detailed view refer to the source code which can be browsed and downloaded at
http://www.survey4all.org.
1
http://jaxb.dev.java.net
2
Unied Modeling Language
3
A questionnaire item is the base class for questions and workow elements
145
22 Software Architecture
org.qsys.quest.model.question
abstract
QQuestionnaireItem
id : int
dispId : int
toXML(Document doc) : Element
getItemType() : String
ndSuccessorQuestion(QAnswer answer) : int
clone() : Object
equals() : boolean
QTextBlock
xHtmlText : String
clone() : Object
equals() : boolean
toXML(Document doc) : Element
abstract
QBaseQuestion
qText : String
afterText: String
succ : int
toXML(Document doc) : Element
ndSuccessorQuestion(QAnswer answer) : int
clone() : Object
equals() : boolean
QQuestion
obligatory : boolean
explanationText : String
style : String
getProgressValue() : int
getVariables() : Vector<QVariable>
ndSuccessorQuestion(QAnswer answer) : int
clone() : Object
equals() : boolean
toXML(Document doc) : Element
canDataBeStored(QAnswer answer) : boolean
QOpenendedQuestion
noLines : int
clone() : Object
equals() : boolean
toXML(Document doc) : Element
isDataComplete(QAnswer answer) : boolean
QPageSeparator
clone() : Object
QSubQuestion
clone() : Object
equals() : boolean
toXML(Document doc) : Element
QIntervalQuestion
min : int
max : int
step : int
min_label : String
max_label : String
clone() : Object
equals() : boolean
toXML(Document doc) : Element
isDataComplete(QAnswer answer) : boolean
Figure 22.2: Question classes diagram showcase
A QQuestionnaireItem-class has two member variables, the id, which is unique for the whole
questionnaire and used for referencing the questions, and dispId, which is the id to be displayed
on the questionnaire. The item-type of each item is set via annotations
4
. It is possible to get a
list of all concrete questions (or questionnaire items respectively) with their class and the type.
This mechanism is also used to generate an instance of a class only by knowing the type (such
4
Annotations provide data about a program that is not part of the program itself. They have no direct eect
on the operation of the code they annotate. (taken from the documentation-pages of http://java.sun.com)
146
22 Software Architecture
a type could be e.g. questionMatrix, the corresponding annotation would be @QuestionA(name
= "questionMatrix", order = 5), where order is the position of the question when presenting all
questions within a list. Here another example of an annotation for the page separator: @Ques-
tionA(name = "pageSeparator", order=2, isQuestion=false)). Class QFactory is responsible for
retrieving concrete questionnaire item instances by question type. To achieve this, the reection-
technique is heavily used. Suitable constructors are invoked dynamically.
For all question types, methods of class Object (like clone
5
, equals
6
) are overridden. Each ques-
tion has a default successor question, which is used for branching and will be described in a
later section. All of these implement the Cloneable interface, and all subclasses implement a
toXML method used for serialization to XML, and a constructor taking an XML element used
for generating a concrete questionnaire item based on the content of this element. In addition
each questionnaire item knows best how to nd out the successor question according to the
answer given
7
.
On each position of the questionnaire text blocks can be integrated which apply all capabilities
of HTML. These blocks are also created with a WYSIWYG-editor within the Web presentation
layer. A page separator has the same functionality, but if manual page separation is enabled,
these elements act as page separators and are displayed as the rst entry of a section containing
multiple questions. The items have to be strictly separated from questions, because they have
no corresponding answers, which aects dierent central parts of the software.
All question types inherit QBaseQuestion. Each question must contain a question text (qText),
a text after the question (afterText) and a default successor question (succ). Again, all texts
stored can contain HTML. When a question consists of multiple sub questions, these classes
contain a list of QSubQuestion.
All standalone questions (which means all but sub questions) are inherited from QQuestion.
Therein the information is stored whether a question is obligatory and an additional explana-
tion text can be set, which is usually displayed under the question text itself and should guide
the respondent through the lling out process, as well as whether a style attribute can be set
per question. This should give the presentation layer a directive how this question should be
displayed. An example of dierent styles would be the display of radio buttons instead of a
text input eld for an interval question. The method getProgressValue calculates some kind of
expected duration weights for one question. These values e.g. depend on the number of sub
questions and the question type itself, but these strategies can be extended according to the
own needs. The method canDataBeStored determines if the answer given by the respondent is
sucient when the question is marked as mandatory. The method getVariables returns a list of
variables which are generated for one question, which is primarily used for data exporting.
The whole questionnaire is held as an instance of QQuestionnaireObj, whereby a list of QQues-
tionnaireItem objects are stored together with a QHeader instance, which holds administrative
5
To create deep copies of question objects
6
For comparing questions, mainly used within the test classes
7
For a description of the Answer-class hierarchy see section 22.1.2
147
22 Software Architecture
data regarding the questionnaire itself (e.g begin- and end-date, status, creator, creation date).
All modications on such a questionnaire object are done by QuestionManager. This layer is
necessary to retrieve and store the modications carried out on the questionnaire object on the
preferred medium
8
.
22.1.2 Answers
One strategic consideration was to strictly separate the storage of the questionnaire and the
answers given, which has the consequence of a class hierarchy for answer classes parallel to the
questionnaire items hierarchy. To weave question and answer objects together (which means to
determine which answer class is responsible for storing data for which question type), once again
annotations are used. For example, the annotation @QuestionsA(names={"questionMatrix",
"questionMatrixMult"}) when placed for QClosedMatrixAnswer would mean that this answer
type is responsible for processing answers for question types questionMatrix and questionMa-
trixMult. There are less answer classes than question classes, because some questions generate
the same answer data, and so, some answer classes can be used twice.
In gure 22.3, as for the questionnaire-items, an UML class-diagram together with a short
description of the classes and their methods, is shown. Again, only a few concrete answer
classes are added to the diagram just to show the basic class structure. All classes are inherited
directly from QAnswer and simply contain the answered values, which were fetched out from
QAnswerDictionary. For example, in the case of the openended answer class, this is a simple
string, in case of the closed answer, a list of possible selected alternatives and reason values
9
and in the case of the interval questions a position on the scale is given.
org.qsys.quest.model.answer
abstract
QAnswer
id : int
dateEntered : Date
duration : int
toXML(Document doc) : Element
addVarValues(Vector<QVariable>, Vector<String>) : void
QOpenendedAnswer
textAnswer : String
toXML(Document doc) : Element
addVarValues(Vector<QVariable>, Vector<String>) : void
QClosedAnswer
alternatives : TreeMap<Integer, String>
reasonVals : TreeMap<Integer, String>
toXML(Document doc) : Element
addVarValues(Vector<QVariable>, Vector<String>) : void
QIntervalAnswer
intervalVal : int
toXML(Document doc) : Element
addVarValues(Vector<QVariable>, Vector<String>) : void
Figure 22.3: Answer class diagram showcase
Similar to the for questionnaire items, the toXML-method exists together with a constructor
including an element used for XML-serialization. Additionally, a constructor exists for each
concrete QAnswer-object with QAnswerDictionary as the parameter for stung the answer ob-
jects with content. This class inherits from Hashtable<String, String> and is used to transfer
8
For for further information see section 22.1.4
9
Which mean additional texts which can be entered next to a selected alternative
148
22 Software Architecture
the answers from the frontend to the model in a generic way. Each answer class then knows
best how to interpret the content of this Hashtable and reads the appropriate information. In
addition a method addVarValues(Vector<QVariable> var, Vector<String> vals) exists, through
which concrete variable values are generated and used for exporting. All values are appended to
variable vals (call-by reference) according to the variable settings given in var. The list of vari-
ables is the only communication point which should show the concrete answer how and which
variable values should be exported. Additionally, for each answer, the duration is stored in
milliseconds. In the current Web implementation, this duration is tracked on client side, which
means, measurement starts when the page is completely loaded, and ends, when the submit
button is pressed.
Class QAnswersObj holds all answers given for a questionnaire together with a header, which
holds some metadata about the lling out process, such as: the time when lling out started,
an unique interviewee id, a ag whether the respondent nished lling out or not, which style
was assigned
10
and paradata
11
collected amongst others directly within the clients browser.
22.1.3 Paradata Tracking
Online-surveys give the additional opportunity to track information about the lling out process
and the instrument used by the respondent. Subsequently a list of paradata tracked by the
software and what can possibly be implied from this information is provided:
Session identier: identies answers lled out immediately one after the other on the same
computer.
Screen resolution (width and height)
User agent (the Web browser used)
Operating system
Cookies enabled
Java enabled
Javascript enabled: this and previous paradata within the list can be used to possibly
identify side eects or technical problems e.g. certain browsers have special congurations
set (e.g. when Javascript is disabled or low screen resolution).
First referrer: in the case of recruitment from dierent Web pages, it can be determined
from which Web sites the respondents came.
Remote address (the IP-address of the respondent). Storing the IP address unmasked is
turned o by default because of possible problems concerning the anonymity of the re-
spondent. If those who run the survey are aware of these problems and turn IP-tracking
on, additional possibilities come up: in some cases it is possible to determine which ques-
tionnaires were lled out on the same machine. This statement is qualied, because in
some cases (in general within big institutions) dierent computers appear with the same
IP-address on the internet, which is in most cases the IP-address of the proxy-server of
the institution. But in this case at least the institution can be found. In addition some
10
This was essential for the experiments
11
See section 22.1.3
149
22 Software Architecture
providers use dynamic IP-addresses, which means that when the respondents PC is con-
nected to the internet sequentially and multiple times, always a dierent IP-address is
assigned and there is no chance to identify PCs individually. In some cases it is possible
e.g. if the respondent lled out the questionnaire at home or at work, if this information
is of any interest for the survey.
Technically, tracking is directly done within the clients browser via Javascript. An AJAX call
delivers all the parameters to the server whilst the Web page is loading in the background of
the respondents browser. The connection to the answers given by the respondent is done via
session-id. If Javascript is turned o, most of these parameters cannot be accessed. Some
parameters like operating system, user agent, session-id and rst referrer can also be fetched
from the HTTP header on the server side.
22.1.4 Storage
Basically, three types of storage are currently supported; information is held...
1. ...directly within the le system as native XML-documents.
2. ...within an oracle database, where XML-documents are stored as XMLType. A free ver-
sion of an oracle database (Express Edition
12
) is available for free use, which fulls the
requirements of QSYS.
3. ...within the open source native XML-database eXist
13
.
To make QSYS work with one of these options, setting the entry DB within qsys.properties
(which is located in the QSYS-projects WEB-INF/classes) is sucient. Possible entries are:
FILESYSTEM, ORACLE, EXIST.
The rst option is preferred for several reasons: no preconditions
14
concerning storage have to be
fullled. It is also the most frequently used version for all currently installed versions of QSYS,
so its the most stable alternative. In addition some features are currently not implemented
for the other two alternatives. This mode is also very fast and does not need much space on
the server. Furthermore, backup copies of simple XML-les can easily be made. If any urgent
modications have to be done, simple direct changes on the les are sucient. For these reasons,
this storage mode is described here in more detail, the other 2 alternatives work very similar.
22.1.4.1 File Organization
To set the root folder for XML-storage, one should alter the le WEB-INF/classes/fsdb.properties
within the QSYS-project and set the parameter PATH to the desired storage location. The les
are hierarchically organized as follows: within folder conf, two les are located: groups.xml,
which contains basic general and status information for each group and shortlinks.xml which
contains a mapping of group id and questionnaire id to a tiny link to shorten the link to a
questionnaire. All other information is stored in group folders: Each group has a folder
15
with a
12
http://www.oracle.com/technology/products/database/xe
13
http://exist.sourceforge.net
14
Which means in this case no installation or access to a running database is necessary
15
The id of the group is the folder name
150
22 Software Architecture
document questionnaires.xml which contains basic overview information on each questionnaire
and a subfolder for each questionnaire. The organization of les within a group can be seen in
listing 22.1. In this example, all terms within angle brackets should symbolize a variable for any
string. The following les are directly located within the groups root directory:
basicSettings.xml holds basic settings on how questioning should be conducted (e.g how
many questions should be displayed on one page, should the IP-address be stored, should
a summary be displayed at the end and should a progress bar be displayed).
interviewees.xml stores the mode of interviewee recruitment and basic settings for the
selected mode.
questionnaire.xml is the questionnaire itself.
The subfolder answer contains all answers given by the respondents, one document for each.
The name of the le consists of the unique interviewee id together with a timestamp when
the interview was made to ensure uniqueness. When taking a look at the questionnaire during
creation, the results of the test runs are stored in groupadmin.xml in order be able to exclude
this document from further analysis.

1 $ tree <group_1 >
2 <group_1 >
3 |-- questionnaires
4 | |-- <quest_1 >
5 | | |-- answer
6 | | | |-- ADUDOIIO_1197533982151.xml
7 | | | |-- AIQTONZY_1196676852825.xml
8 | | | |-- ALFQWJMO_1196595314843.xml
9 | | | |-- AVJRVCGO_1196589478675.xml
10 | | | -- groupadmin.xml
11 | | |-- basicSettings.xml
12 | | |-- interviewees.xml
13 | | -- questionnaire.xml
14 | -- <quest_2 >
15 | |-- answer
16 | | |-- TPQDDYSJ_1195763591505.xml
17 | | -- groupadmin.xml
18 | |-- basicSettings.xml
19 | |-- interviewees.xml
20 | -- questionnaire.xml
21 -- questionnaires.xml

Listing 22.1: A simplied example of a tree-view for les stored for one group in QSYS
22.1.4.2 XML Queries within the File System
Even if there is no database available and all answers are stored as simple XML documents,
easy queries make it possible to get an impression of the current response using standard UNIX
tools such as xmlstarlet
16
, uniq and sort. As an example, a simplied answer document is given
in listing 22.2 which is used to demonstrate the query examples in the following listings. All ex-
amples run very quickly, at least not appreciably slower than if a database would have been used.
16
Which even supports XPath 2.0 functions like min, max,...
151
22 Software Architecture

1 <?xml version ="1.0" encoding ="ISO -8859 -1"? >
2 <response >
3 <header finished ="true">
4 <tracker remoteaddress ="130.82.1.40" />
5 <creDate >28/01/2008 03:36:59 PM </creDate >
6 </header >
7 <answer dateEntered ="28/01/2008 03:39:21 PM" duration ="135967"
8 id="2" type=" QClosedMatrixAnswer">
9 <subquestion id="1">
10 <alternative value ="1"/ >
11 </subquestion >
12 <subquestion id="2">
13 <alternative value ="2"/ >
14 </subquestion >
15 </answer >
16 <answer dateEntered ="28/01/2008 03:56:09 PM" duration ="33251"
17 id="46" type=" QOpenendedAnswer">
18 <answer >1962 </ answer >
19 </answer >
20 <answer dateEntered ="28/01/2008 03:56:09 PM" duration ="35142"
21 id="48" type=" QClosedAnswer">
22 <alternative value ="5"/ >
23 </answer >
24 </response >

Listing 22.2: A simplied example of an answer document
To perform queries on a folder containing such XML documents, the combination of simple
UNIX tools (as mentioned above) is sucient. This should be illustrated with the following
commands: to nd out how many respondents have currently lled out a questionnaire (with a
distribution of how many nished and how many dropped out), run the following command:

1 $ xmlstarlet sel -t -v "// header/@finished" * | sort | uniq -c
2 156 false
3 73 true

Listing 22.3: Command for respondents overview of one questionnaire
Here (and also in the following script example) an XPath query is applied to all documents
in the answer folder. All resulting values are sorted and counted by the uniq-command. The
result of the example shows that 73 completed the questionnaire and 156 dropped out. Another
example script which provides a distribution of the last lled out ids (NaN means not a single
question was lled out):

1 $ xmlstarlet sel -t -v "math:max(// answer/@id)" * | sort -n | uniq -c
2 93 NaN
3 5 2
4 31 12
5 1 13
6 2 47
7 1 49
8 71 50

Listing 22.4: Command for nding out the distribution of the last lled out question
The same can be done with concrete results of single questions, e.g. for openended questions
17
:

1 $ xmlstarlet sel -t -v "// answer[@id =46]/ answer/text ()" * | sort | uniq -c
2 14 1968
17
See question with id=46 in listing 22.2
152
22 Software Architecture
3 15 1969
4 11 1970
5 5 1971
6 1 1972

Listing 22.5: Command for querying the results of an openended question
The same procedure is valid for closedended questions:

1 $ xmlstarlet sel -t -v "// answer[@id =48]/ alternative/@value" * | sort | uniq -c
2 57 1
3 1 2
4 6 4
5 5 5

Listing 22.6: Command for querying the results of a closedended question
These are just small examples. The possibilities these tools present are enormous. If these scripts
were combined and more logic and existing information integrated (like the question texts) a
whole reporting system could be implemented quite easily.
Naturally also some of the drawbacks of storing native XML documents in the le system com-
pared to using a database have to be mentioned:
Data integrity cannot be assured as it could be done within a database when using con-
straints.
Central storage and access control are not as easily possible.
22.1.5 Exporting
When implementing the exporting package, it was attempted to generate a common framework
for writing tabular data to le, because this task is, regardless which content is written, always
the same. Furthermore, introducing a new export format should be possible without much ef-
fort. So the strategy was to build a clear class hierarchy, whereby methods in the base classes do
all common tasks and assigning dierent content should be achieveable within the sub classes.
BaseExporter and BaseDataExporter implement already common tasks for writing (or abstract
methods are dened to leave concrete implementations open) a header line and multiple content
lines together with the whole exporting process. Concrete implementations are then located
within the two sub classes. It is sucient to request a concrete data-exporter (either CVS or
SPSS) from class ExporterFactory to get the desired output format.
Currently CSV
18
is used as an exporting format. As a second format, SPSS is supported, but
currently not fully implemented. Because the format for SPSS data les (.sav) is relatively
complex and only commercial libraries exist (like Java SPSS Writer from pmStation
19
), and
these are solutions which do not t into the overall concept, data is written into an SPSS syntax
le (which has the same eect: all necessary information can be set, and after running the syntax,
data is written as a common SPSS data le). Exporting can also be done in a separate thread to
18
Character Separated Values, as eld separator character, TAB is used
19
http://spss.pmstation.com
153
22 Software Architecture
enable immediate return when requesting an export from the frontend. ParadataTimeExporter
exports either client or server sided duration per question. An overview of the classes within the
export package is given in gure 22.4.
22.2 StruXSLT
Because the Web frontend of QSYS is based on this framework and its concepts, it is described
here before QSYS-web is covered.
22.2.1 Basic Functionality
StruXSLT sits on top of Struts, extending its existing functionality to allow Action classes to
return XML that will be transformed by technologies like XSLT and XSL-FO. One motivation
to use StruXSLT is to remove the need to use JSP
20
and Taglibs
21
for the presentation layer of
the Struts framework. However, StruXSLT does not necessarily force the XML way exclusively,
both technologies will work side by side. The basic idea was taken from an article published
at http://www.javaworld.com
22
, see gure 22.5 to get an idea of this concept. Here no JSP
or Taglibs are used for visualizing data, but XSLT documents. Within the control layer, an
XML document is created instead of storing multiple variables to the session scope. When using
XML/XSLT instead of the conventional Struts-approach, the separation between view and the
other layers of the MVC-model is stricter. Furthermore, XSLT is standardized at the W3C
23
and is vendor and technology independent, so all style sheets generated can be reused in other
projects, even in those who use a completely dierent technology (like e.g. .NET). These and
other advantages are also mentioned in Karg & Krebs (2008).
Figure 22.5: MVC-2.x Model as taken from http://www.javaworld.com
20
Java Server Pages
21
http://jakarta.apache.org/taglibs
22
Mercay & Bouzeid (2002)
23
http://www.w3.org/TR/xslt
154
22 Software Architecture
org.qsys.quest.model.export
abstract
BaseExporter
out : PrintWriter
qDoc : Document
setTarget(String)
closetTarget(String)
export() : void
header() : Vector<String>
write(String) : Vector<String>
writeLine(Vector<String>, boolean
ParadataTimeExporter
noQuests : int
answerObjs : Vector<QAnswersObj>
boolean client
header() : Vector<String>
writeContent() : void
abstract
BaseDataExporter
answerObjs : Vector<QAnswersObj>
variableSet : QVariableSet
export() : void
writeLine(Vector<String>, boolean
writeVarDelimiter() : void
writeLineDelimiter() : void
writeValue(String val) : void
exportHeader() : void
header() : Vector<String>
writeContent() : void
cleanValue(String) : String
TABDataExporter
export() : void
writeLine(Vector<String>, boolean
writeVarDelimiter() : void
writeLineDelimiter() : void
writeValue(String val) : void
exportHeader() : void
SPSSDataExporter
export() : void
writeLine(Vector<String>, boolean
writeVarDelimiter() : void
writeLineDelimiter() : void
writeValue(String val) : void
exportHeader() : void
Figure 22.4: Export class diagram showcase
155
22 Software Architecture
The framework described here is an essential part of QSYS and the concept behind it (even
though it has become a standalone framework) and therefore it is necessary to give a brief
description of this component here.
Several other solutions within the eld of open source exist (like StrutsCX
24
and stxx
25
), but
none of these packages implemented random or conditional assignment of style sheets which
was necessary for the experiments of this thesis (how this can be done is described below). For
this reason, this new framework was implemented, which is also employed in other projects
independent of QSYS. StruXSLT is also published as open source and can be downloaded from
sourceforge
26
The basic features of this framework are listed here:
Language independency: all language tokens are set within external XML les, which
are woven together with the content of the page to be visualized via XSLT afterwards.
The settings for linking language les to actions are described in section 22.2.4, where the
XSLT mapper is described in detail.
No JSP necessary: The main concept of the framework is to simply use XSLT for
rendering content. Nevertheless usual Actions working with e.g. Struts Taglibs
27
can be
used.
AJAX-support: data generation and forwarding can be inuenced by an action param-
eter delivered when actions in the view-layer are requested.
Support for debugging: several functions e.g. showing the generated XML which is
used for rendering directly within the browser were implemented.
Predened methods for preparing information for actions in the view-layer, e.g. addAd-
ditionalMenueEntries, where e.g. menu entries, which should be oered on the Web page
are listed in the XML to be rendered.
Feedback functions: from every position within the action classes, message and error
codes can be set and transferred to the view-layer. Only error and message codes are
set, the textual version is automatically read from the language les to assure language
independency also for error and warning messages.
Central conguration: one main conguration le exists which maps all Struts Ac-
tions
28
, XSLT style sheets and language token XML les.
Style sheets can be assigned by chance. The probability for selecting a certain style can
be set within the main conguration le.
Style sheets can be assigned following certain conditions. A concrete example: it is
sometimes useful to assign dierent style sheets depending on whether Javascript is turned
on or o within the clients browser. Another application would oer dierent styles for
dierent client types, e.g. to oer an iPhone version for a Web page.
24
http://it.cappuccinonet.com/strutscx/doc/v08/en/intro/index.html
25
http://stxx.sourceforge.net
26
http://struxslt.sourceforge.net
27
http://struts.apache.org/1.x/struts-taglib/index.html
28
Which generate XML content in this case
156
22 Software Architecture
org.webq.struxslt
abstract
BaseStruXSLAction
sess : BaseStruXSLSessionManager
preCondition : void
abstract
BaseStruXSLViewAction
generateXML() : Document
abstract
BaseStruXSLAjaxAction
abstract
BaseStruXSLProcessAction
doProcessData() : String
Figure 22.6: Base Action classes of the StruXSLT-framework
The way XSLT style sheets are selected can easily be customized and extended. To do so,
simply implement a new XsltMapEntry class
29
.
The framework also clearly dierentiates between Action classes for processing data and
actions for generating content for visualization, which improves the architecture and gives
a better overview.
22.2.2 Usage
To use this package, a new base class has to be used instead of Struts Action class, but it
is not necessary to exchange the ActionServlet, which would be the case with other packages
such as stxx, so registering Struts within web.xml can be done the customary way. It is simply
necessary to copy q-struxsl.jar (which is the archive containing StruXSLT) to the applications
WEB-INF/lib folder and extend from the Action classes described below.
22.2.3 Action Classes
StruXSLT strictly separates view-Actions from process-Actions, so two main base classes exist
within the framework. This diagram should give an overview of the methods to be overridden
from the perspective of the framework user.
22.2.3.1 BaseStruXSLAction
This class serves as the base class for the Action classes within the whole framework. The main
purpose of this common base class is to band together core functionality necessary for all con-
crete Actions described below. For example, an instance of a session manager is stored as well
as the basic settings for the users session, which are common for all sub classes. In general,
messages, errors and other information, which can be either used for communication between
the concrete Action classes and for the presentation layer (all set messages are also written to
the XML used for rendering), can be set from any Action. So basic managing (like adding errors
and messages) of all these messages is implemented here. All messages to be set are language
independent, which means that simply an error or message code is set. This code is afterwards
29
See section 22.2.4 below
157
22 Software Architecture
substituted by the denite message text taken from the XML language token les.
Furthermore, enabling a debug mode is implemented here, which is simply a ag that is also
written to the view-XML. This can be used to allow the output of additional helpful information
during development. Additionally, the retrieval of language tokens and logging also resides here.
This class works in the background and when using this framework the developer is confronted
with the base classes described below.
22.2.3.2 BaseStruXSLViewAction
This class serves as the base class for all Actions generating (XML-) data to be used for ren-
dering via XSLT or XSL-FO. All information specic for a certain Action is generated within
the generateXML-method, so one must implement this abstract method in the project specic
view class when overriding BaseStruXSLViewAction and the document is returned and can be
used for rendering. Within this Action, XML can be transformed into the desired output format
(e.g. HTML or PDF) according to the stylesheet assigned is done. Style sheet assignment is
done within the mapper classes described below, but storing and managing preconditions (e.g.
if Javascript is enabled) are handled here.
Language tokens are added to the XML document according to the desired language automati-
cally. If debug modus is enabled, parameter out can be used to directly see the XML document
used for rendering when adding parameter out=xml. The same goes for xsl which shows the
XSL-document used for rendering. To uniquely add a stylesheet to an application-dependent
key, which is used to assign the same stylesheet for the whole session to a certain identier, the
method getXsltKey has to be overridden in the concrete sub classes.
22.2.3.3 BaseStruXSLProcessAction
This is the base class for all actions which should process any data (which in general means
writing information to the database, le system or to the session). The abstract method do-
ProcessData has to be overridden within the concrete subclass to do so. As a return value,
we receive an ActionForward, where we would normally receive a forward to a concrete BaseS-
truXSLViewAction. Within this method, error and warning messages can be generated (which
are stored within the session) and used by the view-layer to give feedback to the user as to
whether data processing was successful or not.
Two parameters can be set to inuence the given output:
ajax: if this parameter is set to true, just errors and feedback messages are delivered
to the requesting page. These are used to give the user feedback after performing an
AJAX-request.
tfb: if this parameter is set to true, a simple text message is delivered to the requesting
page. This is used in the case of automated data processing or when certain functions are
called from external systems, e.g. in case everything worked ne, the return value is just
OK in mime type text/plain.
158
22 Software Architecture
22.2.3.4 BaseStruXSLAjaxAction
This class is integrated for future use and should handle simple AJAX requests when no response
is delivered.
22.2.4 Mapper Classes
The central conguration le for the whole framework is xslt-map.xml, where the XML-generating
action is woven together with the stylesheet and language token documents to ensure language
independency. Place this document directly into your Web applications WEB-INF folder. In-
ternally, one XML document is generated per request, which is rendered by the XSLT-document
specied.
The main settings within this document have the following structure:

1 <xsltEntries debug="false">
2 <supportedLangs langs="EN , DE" />
3 <messageCodes lang_file="global/msgs.xml"/>
4 <global_langs >
5 <global_lang lang_file="global/menue.xml"/>
6 <global_lang lang_file="global/head.xml"/>
7 </global_langs >
8 </xsltEntries >

Listing 22.7: An example of a simple XSLT map header
Here with the attribute debug the debug mode can be turned on by default. All supported
languages can be listed within the second element. Mapping from message codes to the language
dependent messages itself is done within element messageCodes. Global language token les
which are needed by the view-layer can be specied within global_lang elements. These and
the message-code documents are automatically copied to the XML used for rendering. In the
following, the two currently implemented mapping modes are described. It is easy to write an
additional mapping.
simple A simple entry which weaves together an Action, a language token le and an XSLT
le can be dened as shown in the following example:

1 <entry path="/basicsettingsv" type="simple"
2 lang_file="admin/basicsettings.xml">
3 <xslt path="admin/basicsettings.xsl" />
4 </entry >

Listing 22.8: An example of a simple XSLT map entry
Each simple entry-element consists of the following attributes and child-elements:
path: specify the path of the Struts action which is responsible for generating XML
content used for rendering.
lang_le give a comma separated list of language token les. These are dynamically
selected according to the language currently set within the running system.
159
22 Software Architecture
also the mime type can be set
30
, which would make sense in the case of employing
XSL-FO for PDF-generation would need an application/pdf mime type. No additional
tasks have to be done within the subclassed Actions. The correct transforming engine is
automatically invoked and dependant on the mimetype being HTML, XML, PDF or plain
text.
xslt contains an attribute path, where the relative path to the XSLT-stylesheet is speci-
ed.
Such an entry is necessary for all views of the Web application. These entries have to be placed
directly as child elements of the root element xsltEntries.
random Another more complex mapping type called random can be congured in the following
way:

1 <entry path="/doqv" type="random" group="univ_uibk" questionnaire="webpage"
2 lang_file="do/do_main.xml">
3 <xslt path="/to/any.xsl" name="personal_1" prob="34"
4 conditions="java , javascript"/>
5 <xslt path="/to/any.xsl" name="personal_2" prob="33" conditions="javascript"/>
6 <xslt path="/to/any.xsl" name="personal_3" prob="33"/>
7 </entry >

Listing 22.9: An example of a random XSLT map entry
These settings have the following meanings (those who are equal to the simple-example are not
explained here):
keys: for the attributes group and questionnaire, concrete groups and questionnaires can
be set and only if these two values are equal to those when requesting the map entry,
this map entry will be selected. This allows the distinction between dierent groups
and questionnaires and assigns dierent style sheets to each without any necessary code
manipulation. Consequently, it is possible that two entries or more entries with the same
path exist in the conguration le. The one to be chosen for rendering is selected via
these two additional attributes. Of course, the terms group and questionnaire are just
examples
31
and can be specied within the application. In fact, these two attributes could
also be named key1 and key2.
prob: dene the likelihood for a certain style to be selected. The sum of all chances must
be 100.
conditions: some preconditions can be set for a style to be selected (e.g. if Javascript
and/or Java is enabled or not). Again, the naming of these conditions is not xed and can
be freely chosen for each application. If a certain xslt-entry was selected, but conditions
are not satised, random selection is repeated until an XSLT-entry is selected which fulls
all conditions. Therefore it is necessary to pay attention to these conditions, because there
is no way of avoiding innite loops.
name: the name of the selected style can be used for branching within an XSLT-document.
In contrast to the example above, the path-attribute values of course can dier.
30
Which is not shown in the example above because default response mimetype is set to text/html
31
It was used that way within QSYS when running surveys with dierent styling in parallel
160
22 Software Architecture
22.2.5 Language Independence
All language token les have to be located within the following path (the root for all language
les is WEB-INF/lang):
<language_abbreviation>/<path_to_lang_le>, e.g. EN/admin/login.xml.
A normal language token le has the following structure: because these language documents are
directly copied to the XML document used for rendering, hierarchies of elements or attributes
can also be set and used within the style sheets. One simply adds all child-elements of the
root-element <lang>, e.g. within the QSYS project. Language key is the element name and
language value is stored within the text node.
Introducing a new language could easily be done when following these steps:
1. Add an abbreviation for the language in xslt-map.xml at supportedlangs/langs, e.g. IT for
Italian.
2. Copy all language tokens from an existing language (e.g. EN) to the new language folder
and translate the text nodes.
22.3 QSYS-Web
QSYS-web is the Web frontend for the QSYS system. It is based on the StruXSLT framework
and uses the functionality of QSYS-core, so sections 22.1 and 22.2 must be read before attention
to this section can be given. For rendering XML content to other output formats, two open source
projects from the Apache group are used, namely Xalan
32
for generating HTML and FOP
33
for
PDF output. Both work very well, style sheets simply need to be written and the desired output
is generated. The FOP developers plan to improve RTF
34
, which would be used as an additional
output format for questionnaires with QSYS. The advantage of RTF over PDF is that it can
be modied after generation by the user using e.g. Open Oce or Microsoft Word.
22.3.1 Class Hierarchy
The concept of separating between Actions responsible for generating content for the view-layer
on the one hand and on the other hand Actions for processing requests as predetermined by
StruXSLT will be retained and extended with another dierentiation, namely between public
accessible Actions and those only accessible by the group or system administrators. Because core
functionality is already implemented within StruXSLT, it is sucient to concentrate on project
32
Xalan-Java is an XSLT processor for transforming XML documents into HTML, text, or other XML document
types. It implements XSL Transformations (XSLT) Version 1.0 and XML Path Language (XPath) Version
1.0 and can be used from the command line, in an Applet or a Servlet, or as a module in another program
(http://xml.apache.org/xalan-j)
33
Apache FOP (Formatting Objects Processor) is a print formatter driven by XSL formatting objects (XSL-FO)
and an output independent formatter (http://xmlgraphics.apache.org/fop
34
Rich Text Format
161
22 Software Architecture
specic classes and methods without any distraction from basic and workow functionality.
In the following the main structure of these Action classes should be shown exemplarily together
with some concrete derivations.
22.3.1.1 View
org.webq.struxslt.view
abstract
BaseStruXSLViewAction
generateXML(ActionMapping, [...] : Document
getXsltKey(HttpServletRequest,path)
org.qsys.quest.action.view
abstract
BaseQsysViewAction
sessq : BaseQsysSessionManager
sessq(HttpServletRequest) : BaseQsysSessionManager
additionalStatusXmlEntries(request) : Dictionary<String,String>
preCondition(HttpServletRequest, ActionForm) : void
doTrack(HttpServletRequest)
getXsltKey(HttpServletRequest,path)
abstract
BaseQSysAdminViewAction
execute(ActionMapping, [...]) : ActionForward
abstract
BaseGroupAdminViewAction
preCondition(HttpServletRequest, ActionForm) : void
AnswerLoginViewAction
generateXML(ActionMapping, [...] : Document
AdminViewAction
generateXML(ActionMapping, [...] : Document
BasicSettingsViewAction
generateXML(ActionMapping, [...] : Document
Figure 22.7: QSYS-web view class diagram showcase
In gure 22.7, a sample showcase is described for the view-classes hierarchy within the QSYS-
web component. BaseQsysViewAction serves as a base class for all view-actions used within
QSYS. Here getXsltKey generates a key consisting of the group-id, the questionnaire-id and the
interviewee-id to assure proper assigning of the style sheets. Within the overridden method
precondition, necessary environment variables are set. doTrack is responsible for tracking user
paradata (gathered either from the client or from the server side).
162
22 Software Architecture
22.3.1.2 Process
org.qsys.quest.action.process
abstract
BaseQsysProcessAction
sessq : BaseQsysSessionManager
sessq(HttpServletRequest) : BaseQsysSessionManager
additionalStatusXmlEntries(request) : Dictionary<String,String>
preCondition(HttpServletRequest, ActionForm) : void
abstract
BaseAdminProcessAction
execute(ActionMapping, [...]) : ActionForward
AnswerLoginProcessAction
doProcessData(ActionForm form, HttpServletRequest request) : void
abstract
BaseGroupAdminProcessAction
preCondition(HttpServletRequest, ActionForm) : void
addQuestionnaire([...]) : void
AdminProcessAction
execute(ActionMapping, [...]) : ActionForward
BasicSettingsProcessAction
doProcessData(ActionForm form, HttpServletRequest request) : void
org.webq.struxslt.process
abstract
BaseStruXSLProcessAction
doProcessData() : String
Figure 22.8: QSYS-web Process class diagram showcase
Figure 22.8 shows a sample showcase of the process-classes hierarchy. BaseQsysProcessAction
serves as a base class for all process-actions.
22.3.2 Conguration and Installation
22.3.2.1 Preconditions
To run QSYS on your server or local machine, a Servlet engine like Tomcat
35
with Java Run-
time Environment (version >= 1.5) is sucient, which is pre-installed in most cases. Not even
a database is necessary when running the le system storage-mode. Because of this, it is also
possible to install QSYS locally (e.g. on the laptop) so QSYS can also be used oine. Ques-
tionnaires can then easily be imported into the online-system. To install, simply download the
.war
36
-document from the projects homepage and deploy
37
.
22.3.2.2 Property Files
Two les have to be created and located on the class path:
qsys.properties, which stores general settings used within the whole system.

1 DB=FILESYSTEM #mandatory
2 MAIL_ADDRESS =[ mail_address] #optional
3 SMTP_SERVER =[ full_path_to_smtp_server] #optional
4 DATA_EXPORT_DIR =[ data_export_dir] #mandatory
5 ADMIN_PW =[ admin_pw] #mandatory
6 LDAP_URL =[ path_to_ldap] #optional
35
The software was tested with version 6
36
Web archive
37
Which means either copying the archive to tomcats webapps directory or deploying via tomcat manager
163
22 Software Architecture
7 LDAP_GROUP_PREFIX =[ ldap_group_prefix] #optional

Listing 22.10: Settings within qsys.properties
fsdb.properties, which denes the location where all information is stored when storing on the
le system is the selected storage method.

1 PATH=[ path_to_root_storage_dir] #mandatory

Listing 22.11: Settings within fsdb.properties
22.3.2.3 Compile from Scratch
To compile the sources, currently a shell script (for Linux and Mac-users; qsys-web/build/build.sh)
exists which completely builds all projects necessary to create the whole Web archive le. This
script calls ANT scripts and copies created libraries to the WEB-INF/lib directory. Each project
contains a build-folder, where the appropriate ANT script (which is always named build.xml )
is located. The resulting qsys.war le will be located according to the webapp-property as set
within qsys-web/build/build.xml.
22.3.3 Additional Tools
Several tools were used to ease the development process, here only the most important ones are
described:
ANT
38
(which is an acronym for Another Neat Tool ) is a Java-based build tool. In the
current version, many extensions are available enabling the use of ANT for a lot of other
tasks (e.g. integrating JUnit-tests within the build process). In the QSYS-project, ANT
is used for automatically building all subprojects, generating a Web archive for deploying
on the Servlet engine, generating JavaDoc and running XDoclet-statements.
XDoclet
39
: The generation of the struts-cong.xml is done via XDoclet-tasks, which has
several advantages: it is error-prone to edit struts-cong.xml directly, because for bigger
projects this le becomes very complex and hard to view as a whole. Additionally, it is
benecial when the whole conguration for Struts is directly written next to the concerned
classes. See listing 22.12 for a sample of an XDoclet statement for a Struts Action (here for
the login process action). The instructions are directly placed over the class declaration
integrated into a JavaDoc comment. This information goes directly to the struts-cong.xml
when running the corresponding ANT task.
In line 2 of the sample, an ActionForm-class and a Web path is assigned to the Action-
class. In the subsequent lines, the action forwards are dened with a name and a path
indicating where the forward should go. If redirect is set to true, the URL within the
navigation bar of the browser changes to this new URL.
38
http://ant.apache.org
39
XDoclet is an open source code generation engine. It enables Attribute-Oriented Programming for Java. In
short, this means that you can add more signicance to your code by adding meta data (attributes) to your
Java sources. This is done in special JavaDoc tags. http://xdoclet.sourceforge.net/xdoclet
164
22 Software Architecture
Eclipse with Lomboz-plugins: As IDE
40
, Eclipse was used with certain plug-ins
41
. To run
the Tomcat Servlet engine within Eclipse, a special plugin was used
42
, which even enabled
debugging of Servlets.

1 /**
2 * @struts.action name="answerloginform" path="/answerlogin" scope="request"
3 * @struts.action -forward name="success" path="/answerstartv.qsys" redirect="true"
4 * @struts.action -forward name="toquestion" path="/doqv.qsys" redirect="true"
5 * @struts.action -forward name="interviewee_exists" path="/intervieweeexists.qsys"
6 redirect="true"
7 * @struts.action -forward name="error" path="/answerloginv.qsys" redirect="true"
8 */
9 public class AnswerLoginProcessAction extends BaseQsysProcessAction { ... }

Listing 22.12: An example of XDoclet metadata attributes (for the login-process)
22.4 Utility Classes
The utility classes package is called q-utils. It simply contains some classes with handy functions
which are in constant requisition. For example, classes are provided to ease XML processing
as well as accessing databases eciently, working with generics and reection or connecting to
an LDAP server. This should just serve as a toolkit for all Java projects to ensure reusability
of common functionality. This has several benets: The functionality grows with every new
project, and stability of these functions increases because they are heavily used, so errors or
strange behavior is revealed very early on. Unit tests are implemented (using JUnit
43
) for most
of these utility classes to protect against side eects with regression testing after modication.
Of course source code is also available for this package when downloading the actual QSYS or
struXSLT release, but making an extra release for these classes is not worth the eort. There
are better solutions like e.g. Jakarta Commons
44
.
22.5 Additional Notes
22.5.1 Quality Control
To assure software quality is as good as possible, testing of the software became a major point
during development. As tool for unit and regression testing, JUnit was used.
22.5.2 Software Metrics
Here just a few gures to describe the complexity of the generated software: The number of
classes for QSYS-core is 152 (12.514 LOC), QSYS-web consists of 98 (4834 LOC + all XSLT-
40
Integrated Development Environment
41
The most essential ones are already bundled within the Lomboz distribution
42
taken from http://www.eclipsetotale.com/tomcatPlugin.html
43
http://www.junit.org
44
http://commons.apache.org
165
22 Software Architecture
documents), StruXSLT of 20 (1581 LOC) and the utility package of 86 classes (5785 LOC)
45
22.5.3 Schema RFC for Questionnaires (and Answer Documents)
A schema to describe XML documents which hold questionnaires and answers was implemented
within the scope of the thesis and will be enrolled at the W3C in the form of a request for
comments (RFC). Standardization would be one big step forward to enable exchangeability of
questionnaire denitions between surveys. Furthermore, archiving of questionnaires and answer
documents would be simplied and generalized. The goal is not to create the standard but more
to put it up for discussion so possible further steps are set by other survey software developers.
The schema denitions can be found at http://www.survey4all.org/qsys-xsd.
45
Lines of code were determined with http://www.dwheeler.com/sloccount
166
23 Additional Tasks to be Implemented
23.0.4 Federated Identity Based Authentication and Authorization
A rst step towards integrating QSYS into the infrastructure of companies and institutions
is given through the support of LDAP. This support, for example, enables the limitation of
respondents for a certain questionnaire to members of one or more LDAP groups. Further steps
in this direction could be the integration of technologies like shibboleth
1
and OpenId
2
to benet
e.g. from single-sign-on. Additionally, it is planned to oer most of the functionality of QSYS
via Web services.
23.0.5 R Reporting Server
Most commercial and open source online survey tools contain basic (or in some cases advanced)
reporting functionality. The strategy for QSYS is that reporting should be implemented outside
of the system, the software itself should concentrate on its core competence and further jobs
should be excluded (the same is true for e.g. panel management tasks) but communication with
these external components should be done over clearly dened interfaces. One idea was to use
the R environment for statistical computing
3
and its capabilities for reporting. R has several
advantages compared to usual reporting packages:
The possibilities of statistical computing with R are enormous (also for generating certain
graphics and charts). Of course, in the rst version, simple descriptive statistics, like
frequency distributions, and graphics, like histograms, will be part of the report. However
when generating the infrastructure for using R as a reporting tool, improvement and
extending the reports can be carried out with less eort.
R supports several output formats, like PDF, HTML, RTF and even Latex.
R contains a fully object oriented language, which enables well structured development.
Even wrappers exist for all common programming languages.
R is also open source, which ts well into the license strategy QSYS has.
R has a big community behind it, which means masses of packages and functionality from
dierent areas exist. Furthermore, a very active mailing list exists, which provides support
for concrete questions.
1
http://shibboleth.internet2.edu
2
http://openid.net
3
More information about the huge functionality of R can be found at http://www.r-project.org and R De-
velopment Core Team (2006)
167
23 Additional Tasks to be Implemented
Because R was also used for the statistical analysis of the experiments, some code fragments
generated for this purpose can be reused, which is also a positive side eect
4
.
23.0.6 Observation Features
Because of the introduction of Web 2.0 technologies like e.g. AJAX, observation of the user is not
limited to information gathered when the submit button is pressed, but it is also possible to track
user behavior during the lling out process (which means what is done on the webpage). Concrete
scientic questions would be: which text was written in the text box before pressing the submit
button (possibly the rst statement written was deleted and substituted by a statement which
ts the social desirability for openended questions more; the same is applicable for closedended
questions: did the respondent select another alternative than the one which was selected when
submitting the form). In the future concrete experiments regarding which of these features
should be implemented could be conducted.
23.0.7 Accessibility
More eort will be invested in designing barrier-free questionnaires. The goal is to support all
should-criteria (AA) from http://www.w3.org/TR/WCAG20.
4
A lot of literature has been published on dierent levels of R usage, e.g.: introduction to R: Ligges (2007),
Everit & Hothorn (2007); graphics with R Murell (2006); linear modelling with R Faraway (2005), Wood
(2006)
168
24 Evaluation of the Software
In this chapter, a short evaluation of QSYS with external criteria is given. E.g. Kokemller
(2007) already evaluated eight commercial software solutions. The criteria are taken from this
article and applied to QSYS. Additionally, some ideas for an academic evaluation of survey
software were taken from Pocknee & Robbie (2002). Batinic (2003, p.10) identies the following
main requirements for Web based survey tools (evaluation concerning QSYS is directly given
next to these points):
Progress bar
1
(graphical or simply the current and number of pages of the questionnaire).
Yes.
Question lters used for branching. Yes, but actually branching can only be done depending
on the direct antecessor question.
Randomized assignment of respondents to dierent conditions. Yes, this feature is exten-
sively supported, all possible variations are available (but HTML knowledge is necessary
for implementation, because XSLT documents must be created).
Item rotation to avoid ranking eects
2
. No, this will be done in the next release.
Plausibility checks to automatically identify data inconsistency (ideally during entering
data). Yes, both, on client and on server side.
Possibility to send invitation letters and reminders. No, these tasks should be kept outside,
a tool is planned which has these capabilities and communicates with QSYS.
Access limitation (e.g. the usage of passwords). Yes, there are several dierent access
modes.
(Real time) report statistics. No, again this feature should be implemented in an additional
tool communicating with QSYS
3
.
Possible integration of multimedia (pictures, lms,...). Yes, everything which can be done
with HTML, CSS and Javascript is possible.
Data export in common statistic-software (like SPSS, SAS
4
, etc.). Yes, CSV, which can
be imported in all common statistics packages, is supported
Multiple project managers can create questionnaires concurrently. Yes
Secure data transmission (SSL encryption). This depends on the server, but it is not a
problem to run QSYS e.g. on Tomcat using https.
In addition, Manfreda & Vehovar (2008, p.281f) give a list of features a professional survey
software packages should support
5
:
1
See section 5.10 for actual ndings
2
As described in section 5.3.4
3
See section 23.0.5 for further details
4
http://www.sas.com
5
Again, QSYS evaluation is given next to the criteria
169
24 Evaluation of the Software
Sample management allowing the researcher to send out prenotications, initial invitations
and follow ups for nonrespondents. Here again, this functionality
6
should be implemented
externally with interfaces to QSYS.
User-friendly interface for questionnaire design, with several features. For instance, manu-
als, online help, tutorials, but also question/questionnaire libraries, and export from other
software packages. Yes, everything mentioned here exists within QSYS (e.g. a detailed
documentation of the editor exists in both English and German).
Flexible questionnaire design regarding layout (e.g. background color/pattern, fonts, mul-
timedia, progress indicator), question forms (e.g. open, close, grid, semantic dierential,
yes/no questions), and features of computer-assisted survey information collection (e.g.
complex branching, variable stimuli based on previous respondents selections, range con-
trols, missing data and consistency checks. As already mentioned for Batinic (2003, p.10)s
criteria, all these points are implemented, except complex branching and related features.
Reliable and secure transfer and storage of data. Storage of data was approved with sev-
eral surveys run with QSYS, secure data transfer can be assured with https, if the Servlet
container is congured this way.
Here a few additional criteria are given:
User, developer and administrator documentation: documentation exists for all of these
three roles.
Printable survey version: it is possible to export the survey as PDF and print out this docu-
ment. The way the questionnaire should look is fully customizable in editing the appropriate
XSL-FO document. It is also possible to assign dierent style sheets which generate PDFs
for dierent questionnaires.
Sucient oer of question types: Yes
7
Software should not only be suitable for market and social research, but also for other sci-
entic elds. Blter & Blter (2005) discuss the special needs of epidemiological research,
where e.g. the necessity to support Visual Analogue Scales is mentioned. VAS are fully
supported in dierent graphical representations.
6
Possibly in combination with an open source panel management tool, one candidate could be phpPanelAdmin
(http://www.goeritz.net/panelware
7
As can be seen in section 21.4
170
25 Existing Open Source Online Survey
Tools
There is a huge oer of online survey tools on the web, many of which are even available for
free. In this chapter, a short overview of the most important tools will be given. Only open
source tools are taken into consideration (although QSYS would bear comparison with some of
the commercial providers), but the selection is not limited to those implemented in Java technol-
ogy. Only the most striking tools were selected and described shortly in the following sections.
Only the key features were presented as it is not worthwhile to give a detailed evaluation when
planning to evaluate survey tools, a closer look at all tools mentioned below will be necessary
anyway. Assessing online providers for online surveys (commercial and non-commercial) is not
subject of this chapter as the focus is on the software
1
.
In the course of this thesis several open source portals were scanned for appropriate projects, e.g.
http://sourceforge.net as well as Web pages from institutions dealing with online surveys:
http://www.gesis.org At GESIS, only two active open source projects are currently
listed, Lime Survey and Open Survey Pilot. Because it is essential for open source projects
to be constantly improved (bug xes, functional increments), only those with an active
development status are evaluated.
http://www.websm.org: A lot of existing commercial and non-commercial software tools
for survey research are listed on the website run by Web Survey Methodology.
25.1 Limesurvey (formerly PHPSurveyor)
Limesurvey
2
seems to be the most widely distributed open source online survey tool. It is
written in PHP, has extensive user documentation and some basic descriptions on how to install
the system. Most of the common question types are supported (some concrete templates are
also integrated like yes/no-questions or questioning gender). It oers support for multi lingual
surveys, comes with a WYSIWYG HTML Editor, can integrate pictures and movies into a
survey, it is possible to print out the survey, dierent access control modes are supported (also
LDAP is supported), data export in several formats is possible (like SPSS), basic reporting is
oered, and it even has screen reader accessibility. A WIKI for documentation has been installed
which enables information for users, developers and administrators to be entered (in multiple
languages) as well as a forum, where users can post their problems.
1
For further information on this see e.g. McCalla (2003, p.60) for a short overview
2
http://www.limesurvey.org
171
25 Existing Open Source Online Survey Tools
25.2 FlexSurvey
FlexSurvey
3
is a small but powerful tool written in PHP which allows fast and exible creation
of online surveys. It does not provide a GUI for creating the surveys, which need not necessarily
be a disadvantage. If one is acquainted with the technology and the structure of this software,
developing surveys becomes much faster, because only (PHP-)les have to be edited and no
form elds of a (Web) editor frontend has to be lled out
4
. Of course creation becomes more
exible when code is directly edited, but the necessary knowledge excludes the majority of survey
designers, because basic knowledge of (X)HTML, PHP, and CSS is needed, at least if you want
to move beyond very simple surveys.
25.3 Mod_survey
Mod_Survey
5
is a mod_perl
6
module for Apache. In the core version, no editor exists, XML
les have to be edited to create a survey. Documentation exists, but only the tags to be set
within the XML le are described. Recently, a graphical editor for creating these questionnaire
XML documents was created (separately from the core tool). This approach of course has
the drawback of not being very end-user friendly, but there are also some advantages: (1) the
more possibilities oered to inuence code creation, the more exibility oered in designing the
survey. Also (2) improvements (like new question types or new features) can be developed much
faster because no modications have to be conducted on an editor. (3) The way in which the
XML documents are generated is free to the end user, which means these les can be generated
automatically from any source or a custom editor could be written to generate these les.
25.4 MySurveyServer
MySurveyServer
7
is a Java developed online survey tool (the Web-GUI is based on Struts 1.1
and the business logic uses EJB Session Beans) currently in alpha state. It was last released in
March 2003, which means that there have not been any recent developments in the past years.
Unfortunately, no documentation could be found.
25.5 phpESP
phpESP
8
is a collection of PHP scripts to let non-technical users create and administer surveys,
gather results and view statistics. All tasks can be managed online after database initialization.
3
http://www.flexsurvey.de
4
QSYS oers a similar approach in directly editing the questionnaire XML-les and uploading them when
nished. This is only possible if the creator is very familiar with the XML-schema used for questionnaires,
but if so, creation becomes incomparable fast.
5
http://www.modsurvey.org
6
mod_perl is an optional module for the Apache HTTP server (http://httpd.apache.org). It embeds a Perl
interpreter into the Apache server, so that dynamic content produced by Perl scripts can be served in response
to incoming requests (http://perl.apache.org)
7
http://mysurveyserver.sourceforge.net
8
http://phpesp.sourceforge.net
172
25 Existing Open Source Online Survey Tools
There is a demo online which allows the examination of the editor. Input controls like simple
text boxes, radio buttons, dropdown boxes, rating scales and numeric and date-input elds are
supported. Although the editor looks very old fashioned, creation of simple questions is very
easy and intuitional. All actions which can be performed with the software are described in
cooking recipe style.
25.6 Rapid Survey Tool (formerly Rostock Survey Tool)
Rapid Survey Tool
9
is an online survey tool written in Perl, which is responsible for creating
and displaying survey pages, storing the variables entered by the respondent, exporting data
(SPSS export is supported) as well as generating short reports out of the results. Data is not
stored within a database but in single les. A questionnaire is created by generating a text
le containing all question denitions written in a custom markup language. All markup signs
are described together with an installation guide on the documentation page of the software.
Again, this is a simple way to eciently create and run small surveys. The questionnaire itself
appears relatively old-fashioned, but this can be improved by editing the source les responsible
for creation of the questionnaire.
25.7 Additional Web Survey Tools
Subsequently a list of tools which are not described is given (reasons are e.g. lack of doc-
umentation, no active development or too small range of functionality): Phpsurvey (http:
//phpsurvey.sourceforge.net), ActionPoll from Open Source Technology Group (http://
sourceforge.net/projects/actionpoll, Open Survey Pilot (http://www.opensurveypilot.
org), Web Survey Toolbox (http://www.aaronpowers.com/websurveytoolbox), PHPSurvey
(http://phpsurvey.sourceforge.net), Socrates Questionnaire Engine (http://socrates-qe.
sourceforge.net), SurJey (http://surjey.sourceforge.net), ZClasses Survey/quiz product
(http://www.zope.org/Members/jwashin/Survey), Multi-Platform Survey Architect (http:
//sourceforge.net/projects/mpsa) and Internet Survey Engine (Insuren) (http://insuren.
sourceforge.net).
25.7.1 Tools for HTML Form Processing
25.7.1.1 Generic HTML Form Processor
This piece of PHP code is not a survey software as a whole, but it can assist students and
researchers to quickly set up surveys that can be administered via the Web. A simple (but
eective) mapping of entries from a HTML form to database, with additional functionality like
input validation, random assignment of participants to experimental conditions, and password-
protection, can be carried out. The software can be obtained from http://www.goeritz.net/
brmic and additional information can be found in Gritz & Birnbaum (2005).
9
http://www.hinner.com/rst
173
25 Existing Open Source Online Survey Tools
25.7.1.2 SurveyWiz and FactorWiz
These online tools
10
give a helping hand for the generation of HTML forms used directly for
online experimenting or for any other needs where HTML forms are required. The functionality
is written in Javascript. The technology is a bit outdated but can nevertheless be helpful in
giving introductory assistance in HTML form creation. For further information follow the links
or consider Birnbaum (2000).
25.7.2 Experiment Supporting Frameworks
25.7.2.1 WEXTOR
WEXTOR
11
is a Web based tool that lets you quickly design and visualize laboratory ex-
periments and Web experiments in a guided step-by-step process. It dynamically creates the
customized Web pages needed for the experimental procedure anytime, anywhere, on any plat-
form. It delivers a print-ready display of your experimental design. WEXTOR can be seen
as an attempt to standardize Web experimenting with the support of a framework. For more
information on this project, take a look at the website or at Reips & Neuhaus (2002).
25.7.3 Tools for Retrieving Paradata
25.7.3.1 Scientic LogAnalyzer
Scientic LogAnalyzer is a platform independent interactive Web service for the analysis of log
les. Scientic LogAnalyzer oers several features not available in other log le analysis tools,
for example, organizational criteria and computational algorithms suited to aid behavioral and
social scientists. For more information, see Reips & Stieger (2004). It is hard to nd the
additional scientic impact of this tool compared to other log analyzing tools (there are a lot in
the eld of open source).
25.8 Conclusion
After this short evaluation, the value of QSYS increases. Except for LimeSurvey, there is no other
open source tool which oers so many opportunities and has comparable software architecture.
10
http://psych.fullerton.edu/mbirnbaum/programs/surveyWiz1.htm
11
http://psych-wextor.unizh.ch/wextor/en
174
Bibliography
Andrews, D., Nonnecke, B. & Preece, J. (2003), Conducting Research on the Internet: On-
line Survey Design, Development and Implementation Guidelines, International Journal of
Human-Computer Interaction 16(2), 185210.
Bachleitner, R. & Weichbold, M. (2007), Bendlichkeit - eine Determinante im Antwortverhal-
ten, Zeitschrift fr Soziologie 36(3), 182196.
Blter, O. & Blter, K. A. (2005), Demands on Web Survey Tools for Epidemiological Research,
European Journal of Epidemiology 20, 137139.
Bandilla, W. & Bosnjak, M. (2000), Online-Surveys als Herausforderung fr die Umfrage-
forschung - Chancen und Probleme, in Peter Ph. Mohler and Paul Luettinger (2000), pp. 7182.
Bandilla, W. & Bosnjak, M. (2003), Survey Administration Eects? A Comparison of Web-
Based and Traditional Written Self-Administered Surveys Using the ISSP Environment Mod-
ule, Social Science Computer Review 21(2), 235243.
Batinic, B. (2003), Internetbasierte Befragungsverfahren, sterreichische Zeitschrift fr Sozi-
ologie (4), 618.
Batinic, B., Reips, U.-D. & Bosnjak, M., eds (2002), Online Social Sciences.
Bech, M. & Christensen, M. B. (2009), Dierential Response Rates in Postal and Web-Based
Surveys Among Older Respondents, Survey Research Methods 3(1), 16.
Best, S. J., Krueger, B., Hubbard, C. & Smith, A. (2001), An Assessment of the Generalizability
of Internet Surveys, Social Science Computer Review 19(2), 131145.
Biemer, P. B. & Lyberg, L. E. (2003), Introduction To Survey Quality, Wiley Interscience,
Hoboken, New Jersey.
Birnbaum, M. H. (2000), SurveyWiz and FactorWiz: JavaScript Web Pages that make HTML
Forms for Research on the Internet, Behavior Research Methods, Instruments, & Computers
32(2), 339346.
Birnholtz, J. P., Horn, D. B., Finholt, T. A. & Bae, S. J. (2004), The Eects of Cash, Electronic,
and Paper Gift Certicates as Respondent Incentives for a Web-Based Survey of Technologi-
cally Sophisticated Respondents, Social Science Computer Review 22(3), 355362.
Bosnjak, M. & Tuten, T. L. (2001), Classifying Response Behaviors in Web-Based Surveys,
Journal of Computer-Mediated Communication 6(3).
Bosnjak, M. & Tuten, T. L. (2003), Prepaid and Promised Incentives in Web Surveys: An
Experiment, Social Science Computer Review 21(2), 208217.
175
Bibliography
Box-Steensmeier, J. M. & Jones, B. S. (2004), Event History Modeling. A Guide for Social
Scientists, Cambridge University Press.
Caldwell, B., Cooper, M., Reid, L. G. & Vanderheiden, G. (2008), Web Content Accessibility
Guidelines 2.0. Online; accessed 21-May-2009.
URL: http://www.w3.org/TR/WCAG20/
Christian, L. M. (2003), The Inuence of Visual Layout on Scalar Questions in Web Surveys,
Masters thesis, Washington State University. Department of Sociology.
Christian, L. M. & Dillman, D. A. (2004), The Inuence of Graphical and Symbolic Lan-
guage Manipulations on Responses to Self-Administered Questions, Public Opinion Quarterly
68(1), 5780.
Christian, L. M., Dillman, D. A. & Smyth, J. D. (2007), Helping Respondents Get it Right the
First Time: The Inuence of Words, Symbols and Graphics in Web Surveys, Public Opinion
Quarterly 71(1), 113125.
Conrad, F. G., Couper, M. P. & Tourangeau, R. (2003), Interactive Features in Web Surveys,
Joint Meetings of the American Statistical Association San Francisco, CA.
Conrad, F. G., Couper, M. P., Tourangeau, R. & Peytchev, A. (2005), Impact of Progress Feed-
back on Task Completion: First Impressions Matter, Proceedings of SIGCHI 2005: Human
Factors in Computing Systems Portland, OR.
Conrad, F. G., Couper, M. P., Tourangeau, R. & Peytchev, A. (2006), Use and Non-use of
Clarication Features in Web Surveys, Journal of Ocial Statistics 22(2), 245269.
Conrad, F. G., Schober, M. F. & Coiner, T. (2007), Bringing Features of Human Dialogue to
Web Surveys, Applied Cognitive Psychology 21, 165187.
Cook, C., Heath, F., Thompson, R. L. & Thompson, B. (2001), Score Reliability in Web-
or Internet-Based Surveys: Unnumbered Graphic Rating Scales versus Likert-Type Scales,
Educational and Psychological Measurement 61(4), 697706.
Couper, M. P. (2000), Web Surveys. A Review of Issues and Approaches, Public Opinion
Quarterly 64, 464494.
Couper, M. P. (2001), Web Survey Research: Challenges and Opportunities, Proceedings of
the Annual Meeting of the American Statistical Association.
Couper, M. P. (2005), Technology Trends in Survey Data Collection, Social Science Computer
Review 23(4), 486501.
Couper, M. P., & Coutts, E. (2004), Probleme und Chancen verschiedener Arten von Online-
Erhebungen, Klner Zeitschrift fr Soziologie und Sozialpsychologie Sonderheft 44, 217243.
Couper, M. P., Conrad, F. G. & Tourangeau, R. (2007), Visual Context Eects in Web Surveys,
Public Opinion Quarterly 71(4), 623634.
Couper, M. P., Kapteyn, A., Schonlau, M. & Winter, J. (2007), Noncoverage and Nonresponse
in an Internet Survey, Social Science Research 36, 131148.
176
Bibliography
Couper, M. P. & Miller, P. V. (2008), Web Survey Methods, Public Opinion Quarterly
72(5), 831835.
Couper, M. P., Tourangeau, R., Conrad, F. G. & Singer, E. (2006), Evaluating the Eectiveness
of Visual Analog Scales: A Web Experiment, Social Science Computer Review 24(2), 227245.
Couper, M. P., Tourangeau, R. & Kenyon, K. (2004), Picture This! Exploring Visual Eects in
Web Surveys, Public Opinion Quarterly 68, 255266.
Couper, M. P., Tourangeau, R., Konrad, F. G. & Crawford, S. D. (2004), What They See Is
What We Get. Response Options for Web Surveys, Social Science Computer Review 22, 111
127.
Couper, M. P., Traugott, M. W. & Lamias, M. J. (2001), Web Survey Design and Administra-
tion, Public Opinion Quarterly 65, 230253.
Couper, M. P., Traugott, M. W. & Lamias, M. J. (2004), Web Survey Design and Administration,
in Questionnaires, SAGE Publications, London, pp. 362381.
Crawford, S. D., Couper, M. P. & Lamias, M. J. (2001), Web Surveys: Perceptions of Burden,
Social Science Computer Review 19(2), 146162.
Crawford, S., McCabe, S. E. & Pope, D. (2005), Applying Web-Based Survey Design Standards,
Journal of Prevention and Intervention in the Community 29(1/2), 4366.
Czaja, R. & Blair, J. (1996), Designing Surveys. A Guide To Decisions and Procedures, SAGE
Publications Ltd., Thousand Oaks, California.
de Leeuw, E. D. (2005), To Mix or Not to Mix Data Collection Modes in Surveys, Journal of
Ocial Statistics 21(2), 233255.
de Leeuw, E. D. (2008a), Choosing the Method of Data Collection, in de Leeuw et al. (2008),
pp. 113135.
de Leeuw, E. D. (2008b), Self-administered Questionnaires: Mail Surveys and other Applications,
in de Leeuw et al. (2008), pp. 239263.
de Leeuw, E. D. & Hox, J. J. (2008), Mixed-mode Surveys:When and Why, in de Leeuw et al.
(2008), pp. 299316.
de Leeuw, E. D., Hox, J. J. & Dillman, D. A., eds (2008), International Handbook of Survey
Methodology, European Association of Methodology.
Denscombe, M. (2006), Web-Based Questionnaires and the Mode Eect: An Evaluation Based
on Completion Rates and Data Contents of Near-Identical Questionnaires Delivered in Dif-
ferent Modes, Social Science Computer Review 24(2), 246254.
Derouvray, C. & Couper, M. P. (2002), Designing a Strategy for Reducing No Opinion Re-
sponses in Web-Based Surveys, Social Science Computer Review 20(1), 39.
Deutskens, E., de Ruyter, K., Wetzels, M. & Oosterveld, P. (2004), Response Rate and Response
Quality of Internet-Based Surveys: An Experimental Study, Marketing Letters 15(1), 2136.
177
Bibliography
DeVellis, R. F. (1991), Scale Development. Theory and Applications, Applied Social Research
Methods Series. Volume 26, SAGE Publications Inc., Newbury Park, California.
Dever, J. A., Raerty, A. & Valliant, R. (2008), Internet Surveys: Can Statistical Adjustments
Eliminate Coverage Bias?, Survey Research Methods 2(2), 4760.
Diekmann, A. (1999), Empirische Sozialforschung. Grundlagen, Methoden, Anwendungen, fth
edn, Rowohlts Enzyklopdie, Reinbeck bei Hamburg.
Dillman, D. A. (2007), Mail and Internet Surveys. The Tailored Design Method, second edn,
John Wiley and Sons, Inc., Hoboken, New Jersey.
Dillman, D. A. & Bowker, D. K. (2001), The Web Questionnaire Challenge to Survey Method-
ologists, in Reips & Bosnjak (2001).
Dillman, D. A. & Christian, L. M. (2005a), Survey Mode as a Source of Instability in Responses
across Surveys, Field Methods 17(1), 3052.
Dillman, D. A. & Christian, L. M. (2005b), Survey Mode as a Source of Instability in Responses
across Surveys, Paper presented at the Workshop on Stability of Methods for Collecting,
Analyzing and Managing Panel Data, American Academy of Arts and Sciences.
Dillman, D. A., Phelps, G., Tortora, R., Swift, K., Kohrell, J., Berck, J. & Messer, B. L. (2008),
Response Rate and Measurement Dierences in Mixed Mode Surveys Using Mail, Telephone,
Interactive Voice Response (IVR) and the Internet, Social Science Research forthcoming.
Dillman, D. A., Tortora, R. & Bowker, D. (1998), Principles for Constructing Web Surveys,
Technical report, SESRC.
Duy, B., Smith, K., Terhanian, G. & Bremer, J. (2005), Comparing Data from Online and
Face-to-Face Surveys, International Journal of Market Research 47(6), 615639.
Ehling, M. (2003), Online-Erhebungen - Einfhrung in das Thema, in Statistisches Bundesamt
(2003), pp. 1120.
Ekman, A., Klint, A., Dickman, P. W., Adami, H.-O. & Litton, J. E. (2007), Optimizing the
Design of Web-Based Questionnaires - Experience from a Population Based Study among
50,000 Women, European Journal of Epidemiology 22(5), 293300.
Everit, B. S. & Hothorn, T. (2007), A Handbook fo Statistical Analyses using R, Chapman &
Hall/CRC, Boca Raton.
Faas, T. & Schoen, H. (2006), Putting a Questionnaire on the Web is not Enough - A Compar-
ison of Online and Oine Surveys Conducted in the Context of the German Federal Election
2002, Journal of Ocial Statistics 22(2), 177190.
Faraway, J. J. (2005), Linear Models with R, Chapman & Hall/CRC Texts in Statistical Science
Series, Boca Raton, Florida.
Flynn, D., van Schaik, P. & van Wersch, A. (2004), A Comparison of Multi-Item Likert and
Visual Analogue Scales for the Assessment of Transactionally Dened Coping Function, Eu-
ropean Journal of Psychological Assessment 20(1), 4958.
178
Bibliography
Fricker, S., Galesic, M., Tourangeau, R. & Yan, T. (2005), An Experimental Comparison of
Web and Telephone Surveys, Public Opinion Quarterly 69(3), 370392.
Fuchs, M. (2003), Kongitive Prozesse und Antwortverhalten in einer Internet-Befragung, ster-
reichische Zeitschrift fr Soziologie (4), 618.
Fuchs, M. (2008), Die Video-untersttzte Online-Befragung. Auswirkungen auf den Frage-
Antwort-Prozess und die Datenqualitt, Presentation at Grenzen und Herausforderungen
der Umfrageforschung, Salzburg.
Fuchs, M. & Funke, F. (2007), Multimedia Web Surveys: Results from a Field Experiment on
the use of Audio and Video Clips in Web Surveys, in The Challenges of a Changing World.
Proceedings of the Fifth International Conference of the Association for Survey Computing,
M. Trotman et al.
Funke, F. (2003), Vergleich Visueller Analogskalen mit Kategorialskalen in Oine- und On-
linedesign, Masters thesis, Institut fr Soziologie, Justus-Liebig-Universitt Gieen.
Funke, F. (2004), Online- und Oinevergleich Visueller Analogskalen mit 4- und 8-stug
skalierten Likert-Skalen bei einem Fragebogen zum Verhalten in sozialen Gruppen, in Soziale
Ungleichheit, Kulturelle Unterschiede - Verhandlungen des 32. Kongresses der Deutschen
Gesellschaft fr Soziologie in Mnchen, Campus, Frankfurt am Main, pp. 48264838.
Funke, F. (2005), Visual Analogue Scales in Online Surveys, Poster Presentation at 7
th
General
Online Research (GOR) Conference, Zrich, Switzerland.
Funke, F. & Reips, U.-D. (2005), Stichprobenverzerrung durch Browserbedingten Dropout, Vor-
trag bei der Tagung Methodensektion der Deutschen Gesellschaft fr Soziologie in Mannheim.
Funke, F. & Reips, U.-D. (2006), Visual Analogue Scales in Online Surveys: Non-Linear Data
Categorization by Transformation with Reduced Extremes, Poster Presentation at 8
th
General
Online Research (GOR) Conference, Bielefeld, Germany.
Funke, F. & Reips, U.-D. (2007a), Datenerhebung im Netz: Messmethoden und Skalen, in Welker
& Wenzel (2007).
Funke, F. & Reips, U.-D. (2007b), Dynamic Forms: Online Surveys 2.0, Paper Presentation at
9
th
General Online Research (GOR) Conference, Leipzig, Germany.
Funke, F. & Reips, U.-D. (2007c), Improving Data Quality in Web Surveys with Visual Analogue
Scales, Paper presented at the second Conference of the European Research Association,
Prague (CZ).
Funke, F. & Reips, U.-D. (2008a), Dierences and Correspondences Between Visual Analogue
Scales, Slider Scales and Radio Button Scales in Web Surveys, Poster Presentation at 10
th
annual General Online Research (GOR) Conference, Hamburg, Germany.
Funke, F. & Reips, U.-D. (2008b), Visual Analogue Scales versus Categorical Scales: Respondent
Burden, Cognitive Depth, and Data Quality, Paper presented at the 10
th
annual General
Online Research (GOR) Conference, Hamburg, Germany.
Galesic, M. (2006), Dropouts on the Web: Eects of Interest and Burden Experienced During
an Online Survey, Journal of Ocial Statistics 22(2), 313328.
179
Bibliography
Galesic, M., Tourangeau, R., Couper, M. P. & Conrad, F. G. (2008), Eye-Tracking Data. New
Insights on Response Order Eects and other Cognitive Shortcuts in Survey Responding,
Public Opinion Quarterly 72(5), 892913.
Ganassali, S. (2008), The Inuence of the Design of Web Survey Questionnaires on the Quality
of Responses, Survey Research Methods 2(1), 2132.
Gerich, J. (2007), Visual Analogue Scales for Mode-Independent Measurement in Self-
Administered Questionnaires, Behavior Research Methods 39(4), 985992.
Gerich, J. (2008), Multimediale Elemente in der Computerbasierten Datenerhebung, Presen-
tation at Grenzen und Herausforderungen der Umfrageforschung, Salzburg.
Gnambs, T. (2008), Response Eects of Colour Cues in Online Surveys: Exploratory Findings,
Poster Presentation at 10
th
annual General Online Research (GOR) Conference, Hamburg,
Germany.
Gritz, A. S. (2006a), Cash Lotteries as Incentives in Online Panels, Social Science Computer
Review 24(4), 445459.
Gritz, A. S. (2006b), Incentives in Web Studies: Methodological Issues and a Review, Inter-
national Journal of Internet Science 1(1), 5870.
Gritz, A. S. & Birnbaum, M. H. (2005), Generic HTML Form Processor: A Versatile PHP
Script to Save Web-Collected Data into a MySQL Database, Behavior Research Methods
37(4), 703710.
Gritz, A. S. & Stieger, S. (2008), The High-Hurdle Technique put to the Test: Failure to
Find Evidence that Increasing Loading Times Enhances Data Quality in Web-Based Studies,
Behavior Research Methods 40(1), 322327.
Granello, D. H. & Wheaton, J. E. (2004), Online Data Collection: Strategies for Research,
Journal of Counseiling and Development 82(4), 387393.
Groves, R. M., Dillman, D. A., Eltinge, J. L. & Little, R. J. A., eds (2002), Survey Nonresponse,
John Wiley & Sons, New York.
Groves, R. M. & Peytcheva, E. (2008), The Impact of Nonresponse Rates on Nonresponse Bias.
A Meta-Analysis, Public Opinion Quarterly 72(2), 167189.
Hamilton, M. B. (2004), Attrition Patterns in Online Surveys. Analysis and Guidance for In-
dustry, White paper, Tercent inc. Online; accessed 21-May-2009.
URL: http://www.supersurvey.com/papers/supersurvey_white_paper_attrition.htm
Hassenzahl, M. & Peissner, M., eds (2005), Usability Professionals 2005, German Chapter of
the Usability Professionals Association e.V.
Hasson, D. & Arnetz, B. B. (2005), Validation and Findings Comparing VAS vs. Likert Scales for
Psychosocial Measurements, International Electronic Journal of Health Education (8), 178
192.
Healey, B. (2007), Drop Downs and Scroll Mice. The Eect of Response Option Format and In-
put Mechanism Employed on Data Quality in Web Surveys, Social Science Computer Review
25(1), 111128.
180
Bibliography
Healey, B., Macpherson, T. & Kuijten, B. (2005), An Empirical Evaluation of Three Web Survey
Design Principles, Marketing Bulletin 16(Research Note 2), 19.
Hedlin, D., Lindkvist, H., Bckstrm, H. & Erikson, J. (2008), An Experiment on Perceived
Survey Response Burden Among Businesses, Journal of Ocial Statistics 24(2), 301318.
Heerwegh, D. (2002), Describing Response Behavior in Websurveys Using Client Side Paradata,
Paper presented at the International Workshop on Websurveys held by ZUMA, 17-19 October
2002, Mannheim, Germany.
Heerwegh, D. (2003), Explaining Response Latencies and Changing Answers Using Client-Side
Paradata from a Web Survey, Social Science Computer Review 21(3), 360373.
Heerwegh, D. (2004a), Uses of Client Side Paradata in Web Surveys, Paper presented at the
International Symposium in Honour of Paul Lazarsfeld, Brussels, Belgium.
Heerwegh, D. (2004b), Using Progress Indicators in Web Surveys, Paper presented at the 59
th
AAPOR Conference, Phoenix, Arizona.
Heerwegh, D. & Loosveldt, G. (2002a), An Evaluation fo the Eect of Response Formats on
Data Quality in Web Surveys, Social Science Computer Review 20(4), 471484.
Heerwegh, D. & Loosveldt, G. (2002b), An Evaluation of the Eect of Response Formats on
Data Quality in Web Surveys, Paper presented at the International Workshop on Household
Survey Nonresponse, Copenhagen, Denmark.
Heerwegh, D. & Loosveldt, G. (2002c), Web Surveys: The Eect of Controlling Survey Access
Using PIN Numbers, Social Science Computer Review 20(1), 1021.
Heerwegh, D. & Loosveldt, G. (2006a), An Experimental Study on the Eects of Personalization,
Survey Length Statements, Progress Indicators, and Survey Sponsor Logos in Web Surveys,
Journal of Ocial Statistics 22(2), 191210.
Heerwegh, D. & Loosveldt, G. (2006b), Personalizing e-Mail Contacts: Its Inuence on Web
Survey Response Rate and Social Desirability Response Bias, International Journal of Public
Opinion Research 19(2), 258268.
Heerwegh, D. & Loosveldt, G. (2008), Face-to-Face versus Web Surveying in a High-Internet-
Coverage Population. Dierences in Response Quality, Public Opinion Quarterly 72(5), 836
846.
Heerwegh, D., Vanhove, T., Loosveldt, G. & Matthijs, K. (2004), Eects of Personalization on
Web Survey Response Rates and Data Quality, Paper presented at the Sixth International
Conference on Logic and Methodology (RC-33).
Hofmans, J., Theuns, P., Baekelandt, S., Mairesse, O., Schillewaert, N. & Cools, W. (2007),
Bias and Changes in Perceived Intensity of Verbal Qualiers Eected by Scale Orientation,
Survey Research Methods 1(2), 97108.
Holm, K. (1975a), Die Frage, in Holm (1975b), pp. 3290.
Holm, K., ed. (1975b), Die Befragung, Francke Verlag GmbH.
181
Bibliography
Holtgrewe, U. & Brand, A. (2007), Die Projektpolis bei der Arbeit. Open-Source Softwareen-
twicklung und der Neue Geist des Kapitalismus, sterreichische Zeitschrift fr Soziologie
(3), 2545.
Jackob, N. & Zerback, T. (2006), Improving Quality by Lowering Non-Response - A Guideline
for Online Surveys, Paper presented at the WAPOR-Seminar Quality Criteria in Survey
Research VI, Cadenabbia, Italy.
Joinson, A., McKenna, K., Postmes, T. & Reips, U.-D., eds (2007), Oxford Handbook of Internet
Psychology.
Kaczmirek, L. (2005), Web Surveys. A Brief Guide on Usability and Implementation Issues, in
Hassenzahl & Peissner (2005), pp. 102105.
Kaczmirek, L. & Schulze, N. (2005), Standards in Online Surveys. Sources for Professional
Codes of Conduct, Ethical Guidelines and Quality of Online Surveys. A Guide of the Web
Survey Methodology Site. Online; accessed 21-Mai-2009.
URL: http://www.websm.org/2009/05/Home/Community/Guides/
Karg, M. & Krebs, S. (2008), Der saubere Weg. Herstellerunabhngiges Reporting mit XSL
und Co., iX. Magazin fr professionelle Informationstechnik (6), 106109.
Kokemller, J. (2007), Online-Umfragen: Acht Softwarelsungen, iX. Magazin fr profes-
sionelle Informationstechnik (11), 110115.
Kreuter, F., Presser, S. & Tourangeau, R. (2008), Social Desirability Bias in CATI, IVR,
and Web Surveys. The Eects of Mode and Question Sensitivity, Public Opinion Quarterly
72(5), 847865.
Krysan, M. & Couper, M. P. (2006), Race of Interviewer Eects: What Happens on the Web?,
International Journal of Internet Science 1(1), 1728.
Ligges, U. (2007), Programmieren mit R, second edn, Springer Verlag, Berlin, Heidelberg, New
York.
Litwin, M. S. (1995), How To Measure Survey Reliability and Validity, SAGE Publications, Inc.,
Thousand Oaks, California.
Lohr, S. L. (2008), Coverage and Sampling, in de Leeuw et al. (2008), pp. 97112.
Loosveldt, G. & Sonck, N. (2008), An Evaluation of the Weighting Procedures for an Online
Access Panel Survey, Survey Research Methods 2(2), 93105.
Lumsden, J. & Morgan, W. (2005), Online-Questionnaire Design: Establishing Guidelines and
Evaluating Existing Support, Presentation at the 16
th
Annual International Conference of
the Information Resources Management Association.
Ltters, H., Westphal, D. & Heublein, F. (2007), SniperScale: Graphical Scaling in Data Col-
lection and its Eect on the Response Behaviour of Participants in Online Studies, Paper
Presentation at 9
th
General Online Research (GOR) Conference, Leipzig, Germany.
Lynn, P. (2008), The Problem of Nonresponse, in de Leeuw et al. (2008), pp. 3555.
182
Bibliography
Malhotra, N. (2008), Completion Time and Response Order Eects in Web Surveys, Public
Opinion Quarterly 72(5), 914934.
Manfreda, K. L. (2001), Web Survey Errors, PhD thesis, University of Ljubliana.
Manfreda, K. L. & Vehovar, V. (2008), Internet Surveys, in de Leeuw et al. (2008), pp. 264284.
Mathieson, K. & Doane, D. P. (2003), Using Fine-Grained Likert Scales in Web Surveys,
Alliance Journal of Business Research 1(1), 2734.
Mayntz, R., Holm, K. & Hbner, P. (1978), Einfhrung in die Methoden der empirischen Sozi-
ologie, fth edn, Westdeutscher Verlag, Opladen.
McCalla, R. A. (2003), Getting Results from Online Surveys - Reections on a Personal Jour-
ney, Electronic Journal of Business Research Methods 2(1), 5562.
McDonald, H. & Adam, S. (2003), A Comparison of Online and Postal Data Collection Methods
in Marketing Research, Marketing Intelligence & Planning 21(2), 8595.
Meckel, M., Walters, D. & Baugh, P. (2005), Mixed-Mode Surveys Using Mail and Web Ques-
tionnaires, Electronic Journal of Business Research Methods 3(1), 6980.
Mercay, J. & Bouzeid, G. (2002), Boost Struts with XSLT and XML. Online; accessed 21-Mai-
2009.
URL: http://www.javaworld.com/javaworld/jw-02-2002/jw-0201-strutsxslt.html
Mummendey, H. D. (2003), Die Fragebogen-Methode, fourth edn, Hogrefe-Verlag GmbH, Gt-
tingen.
Murell, P. (2006), R Graphics, Chapman & Hall/CRC, Boca Raton.
Noelle-Neumann, E. & Petersen, T. (2000), Alle, Nicht Jeder, third edn, Springer Verlag, Berlin.
Peter Ph. Mohler and Paul Luettinger, ed. (2000), Querschnitt - Festschrift fr Max Kaase,
ZUMA, Mannheim.
Peytchev, A., Couper, M. P. & McCabe, S. E. (2006), Web Survey Design. Paging Versus
Scrolling, Public Opinion Quarterly 70(4), 596607.
Peytchev, A. & Crawford, S. (2005), A Typology of Real-Time Validations in Web-Based Sur-
veys, Social Science Computer Review 23(2), 235249.
Pocknee, C. & Robbie, D. (2002), Surveyor: A Case Study of a Web-Based Survey Tool for
Academics, Paper Presentation at the ASCILITE 2002, December 8-11, 2002; Auckland, New
Zealand.
Potaka, L. (2008), Comparability and Usability: Key Issues in the Design of Internet Forms for
New Zealands 2006 Census of Populations and Dwellings, Survey Research Methods 2(1), 1
10.
Quintano, C., Castellano, R. & DAgostino, A. (2006), The Transition from University to Work:
Web Survey Process Quality, Metodoloski zveski 3(2), 335354.
183
Bibliography
R Development Core Team (2006), R: A Language and Environment for Statistical Computing,
R Foundation for Statistical Computing, Vienna, Austria. Online; accessed 21-Mai-2009.
URL: http://www.R-project.org
Rager, M. (2001), Sozialforschung im Internet. Fragebogenuntersuchungen im World Wide Web,
Masters thesis, Institut fr Kultursoziologie, Universitt Salzburg.
Reips, U.-D. (2002a), Context Eects in Web Surveys, in Batinic et al. (2002).
Reips, U.-D. (2002b), Internet-Based Psychological Experimenting. Five Dos and Five Donts,
Social Science Computer Review 20(3), 241249.
Reips, U.-D. (2002c), Standards for Internet-Based Experimenting, Experimental Psychology
49(4), 243256.
Reips, U.-D. & Bosnjak, M., eds (2001), Dimensions of Internet Science, Pabst Science Publish-
ers Lengerich, Germany.
Reips, U.-D. & Funke, F. (2007), VAS Generator - A Web-Based Tool for Creating Visual
Analogue Scales, Paper Presentation at 9
th
General Online Research (GOR) Conference,
Leipzig, Germany.
Reips, U.-D. & Funke, F. (2008), Interval-Level Measurement with Visual Analogue Scales in
Internet-Based Research: VAS Generator, Behavior Research Methods 40(3), 699704.
Reips, U.-D. & Neuhaus, C. (2002), WEXTOR: A Web-based Tool for Generating and Visu-
alizing Experimental Designs and Procedures, Behavior Research Methods, Instruments, &
Computers 34(2), 234240.
Reips, U.-D. & Stieger, S. (2004), Scientic LogAnalyzer: A Web-based Tool for Analyses
of Server Log Files in Psychological Research, Behavior Research Methods, Instruments, &
Computers 36(2), 304311.
Rousseeuw, P. J. & Leroy, A. M. (2003), Robust Regression and Outlier Detection, Wiley &
Sons, Hoboken, New Jersey.
Scheer, H. (2003), Online-Erhebungen in der Marktforschung, in Statistisches Bundesamt
(2003), pp. 3141.
Schnemann, H. J., Grith, L., Jaeschke, R., Goldstein, R., Stubbing, D. & Guyatt, G. H.
(2003), Evaluation of the Minimal Important Dierence for the Feeling Thermometer and
the St. Georges Respiratory Questionnaire in Patients with Chronic Airow Obstruction,
Journal of Clinical Epidemiology 56, 11701176.
Schwarz, N., Knauper, B., Hippler, H.-J., Noelle-Neumann, E. & Clark, L. (1991), Rating
Scales: Numeric Values May Change the Meaning of Scale Labels, Public Opinion Quarterly
55(4), 570582.
Schwarz, N., Knuper, B., Oyserman, D. & Stich, C. (2008), The Psychology of Asking Ques-
tions, in de Leeuw et al. (2008), pp. 1834.
Schwarz, S. & Reips, U.-D. (2001), CGI versus Javascript: A Web Experiment on the Reversed
Hindsight Bias, in Reips & Bosnjak (2001), pp. 7590.
184
Bibliography
Shih, T.-H. & Fan, X. (2007), Response Rates and Mode Preferences in Web-Mail Mixed-Mode
Surveys: A Meta-Analysis, International Journal of Internet Science 2(1), 5982.
Sikkel, D. & Hoogendoorn, A. (2008), Panel Surveys, in de Leeuw et al. (2008), pp. 479499.
Sills, S. J. & Song, C. (2002), Innovations in Survey Research. An Application of Web-Based
Surveys, Social Science Computer Review 20(1), 2230.
Smyth, J. D., Christian, L. M. & Dillman, D. A. (2008), Does yes or no on the Telephone Mean
the Same as check-all-that-apply on the Web?, Public Opinion Quarterly 72(1), 103113.
Smyth, J. D., Dillman, D. A. & Christian, L. M. (2007), Context Eects in Internet Surveys:
New Issues and Evidence, in Joinson et al. (2007).
Smyth, J. D., Dillman, D. A., Christian, L. M. & Stern, M. J. (2004), How Visual Grouping
Inuences Answers to Internet Surveys, Revision of paper presented at the 2004 Annual
Meeting of the American Association for Public Opinion Research, Phoenix, AZ.
Smyth, J. D., Dillman, D. A., Christian, L. M. & Stern, M. J. (2006a), Comparing Check-All
and Forced-Choice Question Formats in Web Surveys, Public Opinion Quarterly 70(1), 6677.
Smyth, J. D., Dillman, D. A., Christian, L. M. & Stern, M. J. (2006b), Eect of Using Vi-
sual Design Principles to Group Response Options in Web Surveys, International Journal of
Internet Science 1(1), 616.
Spector, P. E. (1992), Summated Rating Scale Construction, Quantitative Applications in the
Social Sciences, SAGE Publications Inc., Newbury Park, California.
Statistisches Bundesamt, ed. (2003), Online-Erhebungen. 5. Wissenschaftliche Tagung der ADM,
Bonn, Vol. 7, ADM, Informationszentrum Sozialwissenschaften.
Steiger, D. M. & Conroy, B. (2008), IVR: Interactive Voice Response, in de Leeuw et al. (2008),
pp. 285298.
Stern, M. J., Dillman, D. A. & Smyth, J. D. (2007), Visual Design, Order Eects and Respon-
dent Characteristics in a Self-Administered Survey, Survey Research Methods 1(3), 121138.
St.Laurent, A. M. (2004), Unterstanding Open Source and Free Software Licensing, O Reilly,
Gravenstein Highway North, Sebastopol.
Svennsson, E. (2000), Comparison of the Quality of Assessments Using Continuous and Discrete
Ordinal Rating Scales, Biometrical Journal (4), 417434.
Thomas M. Archer (2007), Characteristics Associated with Increasing the Response Rates of
Web-Based Surveys, Practical Assessment, Research & Evaluation 12(12), 19. Online; ac-
cessed 21-Mai-2009.
URL: http://pareonline.net/getvn.asp?v=12&n=12
Thomas, R. K. & Couper, M. P. (2007), A Comparison of Visual Analog and Graphic Rat-
ing Scales, Paper Presentation at 9
th
General Online Research (GOR) Conference, Leipzig,
Germany.
185
Bibliography
Thomas, R. K., Klein, J. D., Benhnke, S. & Terhanian, G. (2007), The Best of Intentions:
Response Format Eects on Measures of Behavioral Intention, Paper Presentation at 9
th
General Online Research (GOR) Conference, Leipzig, Germany.
Toepoel, V., Das, M. & Soest, A. V. (2008), Eects of Design in Web Surveys. Comparing
Trained and Fresh Respondents, Public Opinion Quarterly 72(5), 9851007.
Tourangeau, R., Couper, M. P. & Conrad, F. (2004), Spacing, Position, and Order. Interpretive
Heuristics for Visual Features of Survey Questions, Public Opinion Quarterly 68(3), 368393.
Tourangeau, R., Couper, M. P. & Conrad, F. (2007), Color, Labels, and Interpretive Heuristics
for Response Scales, Public Opinion Quarterly 71(1), 91112.
Tourangeau, R., Couper, M. P. & Steiger, D. M. (2003), Humanizing Self-Administered Surveys:
Experiments on Social Presence in Web and IVR Surveys, Computers in Human Behavior
19, 124.
Tourangeau, R., P.Couper, M. & Conrad, F. (2003), The Impact of the Visible: Images, Spacing,
and Other Visual Cues in Web Surveys, Paper Presented at the WSS/FCSM Seminar on the
Funding Opportunity in Survey Methodology.
Tourangeau, R., Rips, L. J. & Rasinski, K. (2000), The Psychology of Survey Response, rst
edn, Cambridge University Press, New York.
Truell, A. D. (2003), Use of Internet Tools for Survey Research, Information Technology, Learn-
ing, and Performance Journal 21(1), 3137.
van Schaik, P. & Ling, J. (2007), Design Parameters of Rating Scales for Web Sites, ACM
Transactions on Computer-Human Interaction 14(1), 135.
van Selm, M. & Jankowski, N. W. (2006), Conducting Online Surveys, Quality and Quantity
(40), 435456.
Vehovar, V., Bagatelj, Z., Manfreda, K. L. & Zaletel, M. (2002), Nonresponse in Web Surveys,
in Groves et al. (2002), pp. 229242.
von Kirschhofer-Bozenhardt, A. & Kaplitza, G. (1975), Der Fragebogen, in K. Holm, ed., Die
Befragung, Francke Verlag GmbH, Mnchen, pp. 92126.
Voogt, R. J. & Saris, W. E. (2005), Mixed Mode Designs: Finding the Balance Between Non-
response Bias and Mode Eects, Journal of Ocial Statistics 21(3), 367387.
Walston, J. T., Lissitz, R. W. & Rudner, L. M. (2006), The Inuence of Web-based Ques-
tionnaire Presentation Variations on Survey Cooperation and Perceptions of Survey Quality,
Journal of Ocial Statistics 22(2), 271291.
Weichbold, M. (2003), Befragtenverhalten bei Touchscreen-Befragungen, sterreichische
Zeitschrift fr Soziologie 28(4), 7192.
Weichbold, M. (2005), Touchscreen-Befragungen. Neue Wege in der empirischen Sozialforschung,
Peter Lang Verlag, Frankfurt am Main.
Weisberg, H. F. (2005), The Total Survey Error Approach. A Guide to the new Science of Survey
Research, University of Chicago Press, Chicago.
186
Bibliography
Welker, M. & Wenzel, O., eds (2007), Online-Forschung 2007. Grundlagen und Fallstudien,
Halem Verlag, Kln.
Welker, M., Werner, A. & Scholz, J. (2005), Online-Research. Markt- und Sozialforschung mit
dem Internet, dpunkt Verlag GmbH, Heidelberg.
Whitcomb, M. E. & Porter, S. R. (2004), E-Mail Contacts: A Test of Complex Graphical
Designs in Survey Research, Social Science Computer Review 22(3), 370376.
Witmer, D. F., Colman, R. W. & Katzman, S. L. (1999), From Paper-and-Pencil to Screen-and-
Keyboard, in S. Jones, ed., Doing Internet Research. Critical Issues and Methods for Examing
the Net, SAGE Publications, Thousand Oaks, London, New Delhi, part 7, pp. 145161.
Witte, J. C., Pargas, R. P., Mobley, C. & Hawdon, J. (2004), Instrument Eects of Images in
Web Surveys, Social Science Computer Review 22(3), 363369.
Wolfgang Bandilla (2003), Die Internet-Gemeinde als Grundgesamtheit, in Statistisches Bunde-
samt (2003), pp. 7182.
Wood, S. N. (2006), Generalized Additive Models. An Introduction with R, Chapman & Hal-
l/CRC Texts in Statistical Science Series, Boca Raton, Florida.
Yan, T. (2005), Gricean Eects on Self-Administered Surveys, PhD thesis, Faculty of the Grad-
uate School of the University of Maryland, College Park.
Zunic, R. R. & Clemente, V. A. (2007), Research on Internet Use by Spanish-Speaking Users
with Blindness and Partial Sight, Universal Access in the Information Society (1), 103110.
187
List of Figures
4.1 Transformation with reduced extremes . . . . . . . . . . . . . . . . . . . . . . . . 21
10.1 User patterns in Web surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
11.1 Screenshot of a sample of a radio question . . . . . . . . . . . . . . . . . . . . . . 76
11.2 Screenshot of a sample of a button question . . . . . . . . . . . . . . . . . . . . . 77
11.3 Screenshot of a sample of a click-VAS question . . . . . . . . . . . . . . . . . . . 78
11.4 Screenshot of a sample of a slider-VAS question . . . . . . . . . . . . . . . . . . . 79
11.5 Screenshot of a sample of a text question . . . . . . . . . . . . . . . . . . . . . . . 80
11.6 Screenshot of a sample of a dropdown question . . . . . . . . . . . . . . . . . . . 81
15.1 Age distribution of respondents - tourism survey . . . . . . . . . . . . . . . . . . 94
15.2 Semester distribution of respondents - tourism survey . . . . . . . . . . . . . . . 94
15.3 Smileys used for indicating mood of interviewees - tourism survey . . . . . . . . 94
15.4 Age distribution of respondents - snowboard survey . . . . . . . . . . . . . . . . . 96
16.1 Comparison of results of feedback question 1 (boring=1 vs. interesting=10) -
webpage survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
16.2 Comparison of results of feedback question 1 (boring=1 vs. interesting=10) -
tourism survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
16.3 Line diagram comparing results of feedback question 1 (boring=1 vs. interest-
ing=10) for all three surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
16.4 Comparison of results of feedback question 2 (sucient=1 vs. insucient=10
number of scale points) - webpage survey . . . . . . . . . . . . . . . . . . . . . . . 101
16.5 Comparison of results of feedback question 2 (sucient=1 vs. insucient=10
number of scale points) - tourism survey . . . . . . . . . . . . . . . . . . . . . . . 102
16.6 Line diagram comparing results of feedback question 2 (sucient=1 vs. insu-
cient=10 number of scale points) for all three surveys . . . . . . . . . . . . . . . . 102
16.7 Comparison of results of feedback question 3 (easy=1 to use vs. complicated=10)
- webpage survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
16.8 Comparison of results of feedback question 3 (easy=1 to use vs. complicated=10)
- tourism survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
16.9 Line diagram comparing results of feedback question 3 (easy=1 to use vs. com-
plicated=10) for all three surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
17.1 Sample density plot of question number 17 with outliers - tourism survey . . . . 108
17.2 Density plot with weights of question number 17 - tourism survey . . . . . . . . . 109
17.3 Boxplots comparing the response times across input controls for question 17 of
the tourism survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
17.4 Percentage of time needed for each control per questions - mean values - tourism
survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
188
List of Figures
17.5 Percentage of time needed for each control per questions - mean values - webpage
survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
18.1 Survival times for the tourism survey comparing input controls . . . . . . . . . . 116
18.2 Survival times for the webpage survey comparing input controls . . . . . . . . . . 118
18.3 Survival times for the snowboard survey comparing input controls . . . . . . . . . 119
19.1 Signicant dierences between input controls for each scale item - tourism survey 124
19.2 Signicant dierences between input controls for each scale item - webpage survey 125
20.1 Comparison of the categories slider-VAS, click-VAS and others - tourism survey . 129
20.2 Compare calculated categorization with linear categorization - tourism survey . . 132
20.3 Compare linear categorization points of simple-VAS with 10-scale controls - tourism
survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
20.4 Compare Boxplots for all cut-points - tourism survey . . . . . . . . . . . . . . . . 134
21.1 A sample view on the questionnaire editor . . . . . . . . . . . . . . . . . . . . . . 141
22.1 Component model of the whole QSYS-system . . . . . . . . . . . . . . . . . . . . 144
22.2 Question classes diagram showcase . . . . . . . . . . . . . . . . . . . . . . . . . . 146
22.3 Answer class diagram showcase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
22.5 MVC-2.x Model as taken from http://www.javaworld.com . . . . . . . . . . . . 154
22.4 Export class diagram showcase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
22.6 Base Action classes of the StruXSLT-framework . . . . . . . . . . . . . . . . . . . 157
22.7 QSYS-web view class diagram showcase . . . . . . . . . . . . . . . . . . . . . . . 162
22.8 QSYS-web Process class diagram showcase . . . . . . . . . . . . . . . . . . . . . 163
189
List of Tables
11.1 Dierent properties of all input controls . . . . . . . . . . . . . . . . . . . . . . . 82
11.2 Technical preconditions and interaction mechanisms for all input controls . . . . 82
11.3 Experimental variables - tourism . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
11.4 Experimental variables - webpage . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
11.5 Experimental variables - snowboard . . . . . . . . . . . . . . . . . . . . . . . . . . 84
13.1 Overall response for all 3 surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
13.2 Input control distribution for the tourism survey . . . . . . . . . . . . . . . . . . 87
13.3 Input control distribution for the webpage survey . . . . . . . . . . . . . . . . . . 88
13.4 Input control distribution for the snowboard survey . . . . . . . . . . . . . . . . . 88
14.1 Use of operating systems for all three surveys . . . . . . . . . . . . . . . . . . . . 90
14.2 Use of browser agents for all three surveys . . . . . . . . . . . . . . . . . . . . . . 90
14.3 Screen resolutions as used by the respondents for all three surveys . . . . . . . . . 91
14.4 Distribution of browser settings for all three surveys (c. = completed; n.c. = not
completed) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
15.1 Overall portion of Mood changes for all controls - tourism survey . . . . . . . . . 95
15.2 Respondents relation to the university (multiple answers possible) - webpage survey 95
15.3 Gender distribution - webpage survey . . . . . . . . . . . . . . . . . . . . . . . . . 95
15.4 Age distribution - webpage survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
17.1 Basic parameters concerning duration - tourism survey (in seconds) . . . . . . . . 105
17.2 Basic parameters concerning duration - webpage survey (in seconds) . . . . . . . 106
17.3 Basic parameters concerning duration - snowboard survey (in seconds) . . . . . . 106
17.4 Basic parameters concerning duration for question 17 of the tourism survey (in
seconds) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
17.5 Overview of duration comparisons for all three surveys . . . . . . . . . . . . . . . 111
18.1 Dropout questions - tourism survey (in percent for each control) . . . . . . . . . 116
18.2 Dropout questions - webpage survey . . . . . . . . . . . . . . . . . . . . . . . . . 118
18.3 Dropout questions - snowboard survey . . . . . . . . . . . . . . . . . . . . . . . . 120
18.4 Overview of dropout for all three surveys - paired comparisons . . . . . . . . . . 120
18.5 Overview of dropout for all three surveys - dropout rates . . . . . . . . . . . . . . 121
19.1 Deviations from mean per sub-question (unit: normalized (0-1) scale point) . . . 122
19.2 Example of a 2x2 table for questionnaire tourism, question 9, sub question 11,
comparison of dropdown and slider-VAS, category 5 . . . . . . . . . . . . . . . . . 123
19.3 Median values of ratio of use of 5 over 6 for all three surveys, compared by input
controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
20.1 Ratios of the adjacent categories for the click-VAS control . . . . . . . . . . . . . 130
190
List of Tables
20.2 Number of signicant dierences when using dierent categorization strategies of
the slider-VAS control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
191
Listings
22.1 A simplied example of a tree-view for les stored for one group in QSYS . . . . 151
22.2 A simplied example of an answer document . . . . . . . . . . . . . . . . . . . . . 152
22.3 Command for respondents overview of one questionnaire . . . . . . . . . . . . . . 152
22.4 Command for nding out the distribution of the last lled out question . . . . . . 152
22.5 Command for querying the results of an openended question . . . . . . . . . . . . 152
22.6 Command for querying the results of a closedended question . . . . . . . . . . . . 153
22.7 An example of a simple XSLT map header . . . . . . . . . . . . . . . . . . . . . . 159
22.8 An example of a simple XSLT map entry . . . . . . . . . . . . . . . . . . . . . . . 159
22.9 An example of a random XSLT map entry . . . . . . . . . . . . . . . . . . . . . . 160
22.10 Settings within qsys.properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
22.11 Settings within fsdb.properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
22.12 An example of XDoclet metadata attributes (for the login-process) . . . . . . . . 165
192

You might also like