Professional Documents
Culture Documents
269
c 1999 Kluwer Academic Publishers. Printed in the Netherlands.
Abstract. The aim of the current research is to examine the ability of people to learn a
computer system by exploration and to asses the efficacy of a user interface with properties
that are supposed to support exploration. The study described in this paper used the think-
aloud method to obtain detailed information about the goals of the user and their realization
during the initial learning phase. The focus here will be on discussing the role of thinking
aloud and reflection during performing complex cognitive activities. In one condition of the
experiment described here an interface was used with exploration-supportive properties. In
the other condition these properties were removed from the interface. Subjects (university
students) solved a number of (e-mail) tasks with these interfaces, without training, and had to
think aloud during the first half. After solving the tasks a knowledge test about the interface was
presented. The results on the three kinds of measures (think aloud measures, task performance
and performance on the knowledge test) broadly were in favor of exploration-supportive
interface. In the discussion attention is paid to the positive influence of thinking aloud, probably
occurring because subjects were encouraged to use the available information on screen in the
exploration-supportive condition. The consequences and potential disadvantages of display-
based, exploratory learning in relation to planning are also discussed.
Key words: display-based problem-solving, exploratory learning, think-aloud method
Introduction
Make a letter folder titled ‘January’. Put the letter ‘Conclusion’, which you can find
in the letter folder ‘Read’, in this new letter folder. Throw away the folder ‘Read’.
Correct sequence of operations:
Create object (letter folder ‘January’); Open object (letter folder ‘January’); Edit
object (letter folder ‘January’); Open object (letter folder ‘Read’); Get object
(letter ‘Conclusion’) from folder (‘Read’); Put object (letter ‘Conclusion’) in
folder (letter folder ‘January’); Close object (letter folder ‘Read’); and Trash
object (letter folder ‘Read’).
Write a letter to your ‘Bank’ about a mistake at your disadvantage. Copy that letter,
send one copy to your ‘Bank’, and store one copy in your folder ‘Finances’.
Correct sequence of operations:
Create object (letter); Open object (letter); Edit object (text and title of letter
‘Mistake’); Close object (letter ‘Mistake’); Copy object (letter ‘Mistake’); Send
letter (‘Mistake’ to person (‘Bank’); and Put object (letter ‘Mistake’) in folder
(‘Finances’).
most actions can be undone. The control version of the interface lacks these
exploration-supportive features.
273
In a previous study (for details see De Mul, Van Oostendorp and White,
1994) we examined the effect of the exploration-supportive interface by
comparing it to an interface that lacked these properties. In addition, we
studied the effect of a manual by adding a third condition in which subjects,
working with this ‘bare’ interface without exploration-support, could consult
a one-page user guide. The results of this experiment, however, showed no
significant differences in task performance between the three conditions. See
Figure 2 for a summarization of the main results.
In the previous experiment the subjects solved three blocks of 9 tasks. Sub-
jects who worked with the interface that lacked the exploration-supportive
properties performed no worse in terms of accuracy than the subjects who
used the other version of the interface. The one-page manual did not seem to
have an effect on performance: in the condition without the manual the per-
formance was approximately equal (for details see De Mul, Van Oostendorp
and White, 1994). Neither were there any significant differences in speed of
tasks correctly solved between the three conditions (not shown here).
Although in the previous experiment the performance on the tasks in the
different conditions was found to be approximately equal, we do not know
whether the process of exploration that led to this performance was equal as
274
well. In the study that will be presented here, a thinking aloud method was used
to obtain more detailed information on this process during the initial learning
phase. The protocols should give an indication of the ‘explorability’ of various
parts of the interface. For practical reasons, we left out one condition – the
user-guide condition. Thus, in the current study we compared the condition
with the exploration-supportive interface to the condition with the control
interface, that lacked these properties. Subjects had to think aloud during the
first half of the tasks. During the second half of the tasks they worked silently.
Afterwards they received a knowledge test consisting of items measuring
declarative and procedural knowledge, respectively (Anderson, 1983).
In the rest of this article we briefly describe the general characteristics of
the method and results of this experiment (for details see De Mul and Van
Oostendorp, 1996). We want to focus here on the method of thinking aloud
during performing complex activities, and we will discuss critically the role of
thinking aloud and reflection during display-based problem solving. In the last
part of the Discussion Section we will address the consequences and potential
disadvantages of display-based, exploratory learning in relation to planning.
Here we will argue that problem solving that is highly display-oriented and
display-supported may have its restrictions.
Method
Subjects
Materials
The subjects solved 12 tasks (see Table I for some examples) and had to
think aloud during the first 6 tasks. A knowledge test about the interface
was also presented. This knowledge test contained two kinds of items. The
first category of questions (12 items) asked for specific details regarding
the appearance of the interface. For instance, the subjects were shown two
icons that differed in color only. One of these icons actually appeared in the
interface, and the subjects had to indicate which one they thought they had
seen before. The second category (24 items) concerned the actions required
to solve a particular task, and the order of these actions. These questions
were of the form: ‘You want to do X. Action Al comes before action A2.
True or False? These categories are supposed to measure ‘declarative’ and
‘procedural’ knowledge, respectively (Anderson, 1983).
275
Equipment
Procedure
The subjects did not receive any training, nor did they have a manual at their
disposal. They were asked to solve a series of 12 tasks and to think aloud
while solving the first 6 tasks. The computerized knowledge test about the
interface was presented after completion of the tasks.
In the analysis of the think-aloud protocols, we adopted a method suggested
by Draper and Barton (1993). In our email-application thirteen elementary
actions or ‘categories’, such as ‘Copy object’ or ‘Open object’, were dis-
tinguished. In most cases these actions consisted of a simple drag-and-drop
operation. The protocols of all six tasks for each subject were analyzed by
first dividing them into units. Each unit, often, but not always, a sentence, in
the think-aloud protocols was judged in terms of these (13) categories, and it
was determined whether the action was successful (+) or not (–). See Table 2
for an example-fragment of this analysis. Notice that more than one action
can be assigned to one sentence, and also that one action can be assigned to
more than one sentence.
To obtain an impression of the reliability of judging the think-aloud units
Cohen’s kappa was determined for assigning the units to the 13 different types
of actions or functions. The intracoder reliability (‘stability’, with a 3 month
interval) was 0.996, while the intercoder reliability also appeared to be very
high, 0.992.
Next, the protocol units and their categorizations were split up into thirteen
separate groups according to the type of action, e.g., the category ‘Open
object’. Within each of these 13 sets the events for all six tasks were sorted
chronologically, with the event with the lowest number at the top, i.e., the
event occurring first. This resulted in one table for each of the thirteen action
categories.
From these categorizations of actions, and judged in terms of success or
failure, three kinds of figures have been calculated. First, for each type of
action the failures occurring before the first successful operation have been
counted; this gives an idea of the learnability of the system functions. It
provides an indication of how easy a function, such as ‘Open object’, can
be found or how successful it is in suggesting its meaning when tried out
by the user. The lower this figure, the easier this type of action can be
understood or found. Secondly, for each type of action the number of failures
after the first success has been computed. This provides an indication of
276
Table 2. Example of a think aloud protocol fragment corresponding to the second example-
task, division in units (indicated by / < number > /), and the categorization in terms of
success(+) or failure (–) (Translated from Dutch)
how consistent the knowledge about a function is. The lower this figure, the
higher the consistency. Finally, by dividing the number of successes by the
sum of successes and failures, a more general figure is obtained that reflects
the general success of a function. The minimum and maximum values are 0
and 1.
277
Table 3. Means of the thinking aloud measures
The pairs of figures that differ significantly (p < 0.05, Mann–Whitney test) are shown in bold.
Results
Task performance
We also analyzed how well the tasks were solved (Figure 3a). In the first
half of the tasks (while thinking aloud) there was no significant difference
between conditions. However, in the second half the exploration-supportive
condition solved significantly (p < 0.05) more tasks correctly. The subjects in
the exploration-supportive condition also took less time (p < 0.05) working
on these tasks (not shown here).
After completion of the tasks, subjects received the knowledge test. In the
exploration-supportive condition subjects scored significantly (p < 0.05)
higher on the procedural questions; the scores on the declarative questions
did not differ significantly (see Figure 3b).
Figure 3. (a) Mean percentual scores for task performance (1st half thinking aloud; 2nd half
not thinking aloud), and (b) the knowledge test.
Not only was there a difference between conditions during the first half
of the tasks in the current study, but the experimental condition also outper-
formed the control condition in the second half (and on the knowledge test),
when the subjects were no longer thinking aloud. Apparently, there appears
to occur positive (learning) transfer from the first half to the second one.
Thinking aloud seems to force subjects to watch the screen more closely in
order to describe what they are doing, increasing the chance that they notice
and use the facilities that support the exploration of the system, or – by the
280
addition of text labels to the objects (one of the exploration-supportive facil-
ities) – to become better aware of the metaphor that is used. The procedural
knowledge about the interface was also higher in the exploration-supportive
condition. Verbalization in the exploration-supportive condition might have
led to a better explicit, procedural knowledge, because the interface used
in this condition mainly helped the users with determining their next step
by referring to task-like units in the support texts that were provided at the
bottom of the screen. The better procedural knowledge could in turn have
resulted in a better task performance.
It seems that thinking-aloud positively influenced the metacognition of our
subjects, in the sense of monitoring and controlling one’s own problem solv-
ing behavior. It is, however, worthwhile to note that this factor alone is not
a sufficient condition: also the subjects in the control condition verbalized
their thoughts and still we find a significant difference in task performance
and procedural knowledge. So, it seems that the combination of increased
attention paid to the environment (the interface) and use of the information
provided in the environment by the higher degree of externalization of infor-
mation (available in the exploration-supportive interface) is responsible for
the facilitation.
This result confirms notions that thinking aloud may indeed encourage users
to focus on one’s own behavior, leading to more reflections on that behavior
and to a higher investment in cognitive monitoring and control and, finally, to
an improvement of performance (Reither, 1977; Berry, 1983; Van Someren
and Elshout, 1985; for a review see Ericsson and Simon, 1984). The study by
Van Someren and Elshout (1985) showed, for instance, that learning by doing
was fostered by reflecting on one’s own problem solving process during task
performance. The tasks consisted of solving end situations in chess games.
Half of the subjects were instructed to verbalize after each move how they
came to their move. This group of subjects performed better than the non-
verbalization group on transfer problems. The acquisition of knowledge and
skills relevant to solving problems can apparently be enhanced by reflection.
In the context of human-computer interaction similar results are obtained
by Trudel and Payne (1995). They showed that the acquisition of knowledge
derived by the interaction with the system can be improved by encouraging
users to think about actions and their consequences. Reflection of the subjects
was manipulated by imposing a key-stroke limit on subjects, that is, a limit
on the amount of physical interaction with the device (a digital watch). The
number of allowed key-strokes was here limited. They found that despite
spending less total time on interaction with the device, subjects who had been
imposed a key-stroke limit learned better.
281
Summarizing, it seems rather clear from these studies on the role of reflec-
tion that exploratory learning a complex system can be positively affected
by the degree to which subjects reflect on their behavior, and use available
supportive information. This means that thinking aloud can influence ‘nor-
mal’ processing. In this case it happened to be a positive influence, probably
because subjects were encouraged to use the available information on screen
in the exploration-supportive condition. The side-effect of thinking aloud also
means that, in general, ‘normal’ processing of complex cognitive problems
can be altered by thinking aloud, in particular if paying attention to visual
information in the environment plays an important role to problem solving
(Ericsson and Simon, 1984).
The main question of this study was whether the exploration-support indeed
supports exploration and leads to better task performance.
The effect on exploration itself is rather difficult to evaluate, because explo-
ration can be seen from two perspectives. If we consider exploration simply
as the ability to gather useful information in an unfamiliar situation, then
exploration is not necessarily linked to task performance, at least not directly.
Efficiency does not need to be the primary goal of someone who just started to
learn an interface. Information may be gathered that is not directly relevant for
the task at hand. On the other hand, if we see exploration as the ability to cope
successfully with a situation that is partly unknown to us, then exploration
is more or less linked to performance. The better the task performance, the
better the exploration must have been. The thinking aloud data showed that
a number of the system’s functions were understood or found more quickly
and remembered for a longer time in the exploration-supportive condition.
These data, reflecting primarily the process of exploration, support the notion
that better task performance is indeed caused by a more efficient explorative
behavior.
The question whether there is an effect of exploration-support on task
performance (and knowledge acquisition) is easier to answer. As mentioned
in the introduction, we conceive the function of exploration-support as being
formally equivalent to providing an external representation on the level of
navigating. The representation is continually updated and offers constraints
on how to proceed during problem solving. Other research has shown that
externalization can considerably enhance performance (Larkin, 1989; Zhang
and Norman, 1994). Also in the current experiment we find a positive effect
of externalization on the level of navigation (by the exploration-support) on
performance and knowledge acquisition, though it appeared to be dependent
on thinking aloud, which somehow forced subjects to process that what is
shown on screen more deeply. The fact that we did not find a significant
difference between the conditions with external and internal representations
282
in the previous experiment could be due to the small difference between the
‘external’ and ‘internal’ conditions, or by the fact that subjects did pay too
little attention to the external representation in the exploration-supportive
condition.
The findings discussed here may have significant practical consequences,
which are not restricted to the specific email application we studied: It might
apply to complex computerized information systems in general. Users can
learn these systems by themselves when they receive exploration-support, at
least if that support is sufficiently attracting the attention of users. This effect
might be achieved by making the support-features more prominent in order to
attract the attention of the user, for instance, by using more distinctive colors.
At this point we want to make two qualifications concerning exploratory
learning. First, in the introduction we approached exploratory learning and
exploration-support as an alternative to learning from a manual. Exploration
and exploration-support may, however, also be seen as complementary to,
rather than an alternative for, a manual of a system. The manual may, for
instance, provide support for the high-level planning of tasks, while the more
detailed operations are learned by exploration.
The second qualification concerning exploratory learning, especially in
the context of display-based problem solving, might be that the resulting
knowledge is relatively volatile. Draper and Barton (1993a, b) indeed found
that users tend to forget what they discovered, even within the same session.
Payne and Howes (1992) identify “exploration traps” into which learners
frequently fall: for instance, they may accomplish a goal but forget how they
did so, or they discover nonideal methods and use them thereafter.
There could be a potential negative influence and constraints of too strong
a reliance on visual displays with regard to planning of behavior and transfer
of skills. It is worthwhile to introduce the distinction here between situation
or display-based problem solving and plan-based problem solving (Vera and
Simon, 1993). The first is characterized by the execution of procedures trig-
gered by information in the external situation, in our context often perceptual
information on the display (or screen), while the second one is characterised
by using an elaborated internal representation to determine a future sequence
of actions.
O’Hara and Payne (in preparation) report a series of experiments which
manipulated the user interface to computerized versions of classic problems
like the Tower of Hanoi. When the cost of making a move in the problem
space is high, because the computer command is clumsy, subjects solve these
problems in significantly fewer moves, and acquire problem solving skills
more rapidly. The clumsiness of commands was manipulated here by varying
the number of keystrokes per command.
283
O’Hara and Payne explain their findings by assuming that learners evaluate
the costs and benefits of planning, and plan more when they have more
to gain, in terms of saved efforts. These studies show that, within limits,
learners can decide how much to reflect on the problems they are solving,
and that the more they reflect, the more they will learn. In other words,
they demonstrated that increasing the cost of interaction at a user interface
can improve performance (less errors) and learning (less trials to achieve a
certain criterion). They also showed that high-cost training interfaces may
lead to better subsequent performance (transfer) with a low-cost interface.
They assume that when the mental cost associated with an operator was
relatively high, problem solving strategy shifted towards the plan-based end
of the continuum, resulting in less error-prone performance. Conversely, when
the cost associated with the operator was relatively low, subjects shifted to
the situation-based end of the continuum, to a strategy which was essentially
reactive and display-based. The implication of this work is that it may be
desirable to increase interface costs when the number of errors has to be
reduced, longer lasting, efficient plans have to be acquired, and transfer is
important. Under these circumstances it could be preferable to encourage
subjects to pay more attention to, and to think harder about the interaction
with the interface.
References
Anderson, J.R. (1983). The Architecture of Cognition. Cambridge, MA: Harvard University
Press.
Berry, D.C. (1983). Metacognitive experience and transfer of logical reasoning. Quarterly
Journal of Experimental Psychology 35A: 39–49.
Briggs, P. (1990). Do they know what they’re doing? An evaluation of word-processor user-
s’ implicit and explicit task-relevant knowledge, and its role in self-directed learning.
International Journal of Man-Machine Studies 32: 385–398.
Carroll, J.M. (1990). The Núrnberg Funnel: Designing Minimalist Instruction for Practical
Computer Skill. Cambridge, MA: The MIT Press.
Carroll, J., Mack, R., Lewis, C., Grischkowsky, N. and Robertson, S. (1985). Exploring a word
processor. Human-Computer Interaction 1: 283–307.
Charney, D., Reder, L. and Kusbit, G.W. (1990). Goal setting and procedure selection in acquir-
ing computer skills: A comparison of tutorials, problem solving and learner exploration.
Cognition and Instruction 7: 323–342.
De Mul, S., Van Oostendorp, H. and White, T. (1994). Learning user interfaces by exploration.
In R. Opperrnann, S. Bagnara and D. Benyon, eds, ECCE-7 Conference Proceedings.
September 5–8, Bonn.
De Mul, S. and Van Oostendorp, H. (1996). Learning user interfaces by exploration. Acta
Psychologica 91: 325–344.
Draper, S.W. and Barton, B. (1993a). Learning by exploration, and affordance bugs. In S.
Ashlund, K. Mullet, A. Henderson, E. Hollnagel and T. White, eds, Adjunct Proceedings
of INTERCHI’93, pp. 75–76. Amsterdam, Netherlands: ACM.
284
Draper, S.W. and Barton, B. (1993b). Detecting bugs with learning by exploration. Department
of Psychology. University of Glasgow, unpublished report.
Ericsson, K.A. and Simon, H.A. (1984). Protocol Analysis: Verbal Reports as Data. Cam-
bridge: MIT Press.
Gaver, W.W. (1991). Technology Affordances. In Proceedings of the CHI’91 Conference.
ACM Press.
Kerr, M.P. and Payne, S.J. (1994). Learning to use a spreadsheet by doing and watching.
Interacting with Computers 6: 3–22.
Larkin, J.H. (1989). Display-based problem solving. In D. Klahr and K. Kotovsky, eds.,
Complex Information Processing. Hillsdale, NJ: Erlbaum.
Lewis, C.H. and Polson, P.G. (1990). Theory-based design for easily learned interfaces. Special
Issue: Foundations of human-computer interaction. Human Computer Interaction 5: 191–
220.
Newell, A. and Simon, H.A. (1972). Human Problem Solving. Englewood Cliffs, NJ: Prentice-
Hall.
O’Hara, K.P. and Payne, S.J. (in preparation). Cost of operations affects planfullness of problem
solving. Department of Psychology, University of Glasgow.
Payne, S.J. and Howes, A. (1992). A task-action trace for exploratory learners. Behaviour &
Information Technology 11: 63–70.
Reither, F. (1977). Der Einfluss der Selbstreflexion auf die Strategie und Qualitát des Prob-
lemlösens. In H.K. Garten, ed., Diagnose von Lernprozessen. Braunschweig: Westermann.
Shneiderman, B. (1983). Direct manipulation: A step beyond programming languages. IEEE
Computer 16: 57–69.
Trudel, C.I. and Payne, S.J. (1995). Reflection and goal management in exploratory learning.
International Journal of Human-Computer Studies 42: 307–339.
Van Oostendorp, H. and Walbeehm, B.J. (1995). Towards modelling exploratory learning in
the context of direct manipulation interfaces. Interacting with Computers 7: 3–25.
Van Someren, M.W. and Elshout, J.J. (1985). Het effekt van zelfreflektie op leren probleemo-
plossen (The effect of self-reflection on learning to solve problems). In J.G.L.C. Lodewijks
and P.R.J. Simons, eds, Zelfstandig Leren (Independent Learning). Proceedings Educa-
tional Research Conference, 1985. Lisse, The Netherlands: Swets & Zeitlinger.
Vera, A.H. and Simon, H.A. (1993). Situated action: a symbolic interpretation. Cognitive
Science 17: 7–48.
Zhang, J. and Norman, D.A. (1994). Representations in distributed cognitive tasks. Cognitive
Science 18: 87–122.