You are on page 1of 23

CHAPTER FIVE

Reinforcement Learning
and Tourette Syndrome
Stefano Palminteri*,1, Mathias Pessiglione

*Laboratoire des Neurosciences Cognitives (LNC), Ecole Normale Supe`rieure (ENS), Paris, France

Motivation Brain and Behaviour Team (MBB), Institut du Cerveau et de la Moelle (ICM), Paris, France
1
Corresponding author: e-mail address: stefano.palminteri@gmail.com

Contents
1. Reinforcement Learning: Concepts, and Paradigms
2. Neural Correlates of Reinforcement Learning
2.1 Electrophysiological correlates in monkeys
2.2 Functional magnetic resonance imaging correlates in humans
2.3 Parkinsons disease and reinforcement learning
3. Tourette Syndrome and Reinforcement Learning
3.1 Experimental study 1: Tourette syndrome and subliminal instrumental
learning
(Palminteri, Lebreton, et al., 2009)
3.2 Experimental study 2: Tourette syndrome and reinforcement of motor skill
learning
(Palminteri et al., 2011)
3.3 Experimental study 3: Tourette syndrome and probabilistic reinforcement
learning
(Worbe et al., 2011)
4. Conclusions and Perspectives
Acknowledgments
References

132
135
135
136
138
139

140

141

143
144
148
148

Abstract
In this chapter, we report the first experimental explorations of reinforcement learning
in Tourette syndrome, realized by our team in the last few years. This report will be preceded by an introduction aimed to provide the reader with the state of the art of the
knowledge concerning the neural bases of reinforcement learning at the moment of
these studies and the scientific rationale beyond them.
In short, reinforcement learning is learning by trial and error to maximize rewards and
minimize punishments. This decision-making and learning process implicates the dopaminergic system projecting to the frontal cortexbasal ganglia circuits. A large body of evidence suggests that the dysfunction of the same neural systems is implicated in the
pathophysiology of Tourette syndrome. Our results show that Tourette condition, as well

International Review of Neurobiology, Volume 112


ISSN 0074-7742
http://dx.doi.org/10.1016/B978-0-12-411546-0.00005-6

2013 Elsevier Inc.


All rights reserved.

131

132

Stefano Palminteri and Mathias Pessiglione

as the most common pharmacological treatments (dopamine antagonists), affects reinforcement learning performance in these patients.
Specifically, the results suggest a deficit in negative reinforcement learning, possibly
underpinned by a functional hyperdopaminergia, which could explain the persistence
of tics, despite their evident inadaptive (negative) value. This idea, together with the
implications of these results in Tourette therapy and the future perspectives, is discussed
in Section 4 of this chapter.

ABBREVIATIONS
ADHD attention deficit-hyperactivity disorder
DA dopamine
DBS deep brain stimulation
fMRI functional magnetic resonance imaging
OCD obsessivecompulsive disorder
PD Parkinson disease
RL reinforcement learning
TD temporal difference (learning)
TS Tourette syndrome
VPFC ventral prefrontal cortex
VS ventral striatum

1. REINFORCEMENT LEARNING: CONCEPTS, AND


PARADIGMS
Reinforcement learning (RL) deals with the ability of learning the
associations between stimuli, actions, and the occurrence of pleasant events,
called rewards, or unpleasant events called punishments. The term reinforcement indicates the process of forming and strengthening of these associations by the reinforcer, which encompasses both rewards (positive
reinforcers) and punishments (negative reinforcers). These associations affect
the learners behavior in a variety of fashions: they shape vegetative and
automatic responses as a function of reward and punishment anticipation
and they also bias learners actions. RL has an evident adaptive value, and
it is unsurprising that it has been observed in extremely distant zoological
phyla, such as nematoda, arthropoda, mollusca and, of course, chordata
(Brembs, 2003; Murphey, 1967; Rankin, 2004).
Modern neurocomputational accounts of RL are situated on the convergence of two scientific threads of the twentieth century: animal learning and
artificial intelligence (Dickinson, 1980; Sutton & Barto, 1998). The heritage
of the first thread includes behavioral paradigms and psychological concepts;

Reinforcement Learning and Tourette Syndrome

133

the heritage of the second one is to be found in the mathematical formalization of these concepts and paradigms.
The computational and the psychological views share the basic idea that
the learner (the animal or the automaton) wants something (goal-directness).
This feature distinguishes RL from the other learning processes, such as procedural or observational learning (Theoretical Neuroscience: Computational
and Mathematical Modeling of Neural Systems. The MIT Press 2005.
ISBN: 0262541858). From this standpoint two features emerge: RL is
selectional (the agent must try and select among several alternative choices)
and associative (these choices must be associated with a particular state).
In animal learning literature, RL was originally referred as conditioning.
The experimental paradigms in conditioning have belonged to two main
classes: classical conditioning and instrumental conditioning.
The minimal conditioning processes imply the building up of associations between a reinforcer and a stimulus or an action. In classical conditioning,
the reinforcer is delivered irrespective of the learners behavior, and the
observed response is represented by innate preparatory responses. The typical example is Pavlovs dog learning to salivate (innate response) in response
to a bell (stimulus), which announced the delivery of food (reinforcer)
(Pavlov, 1927). In instrumental conditioning, the reinforcers delivery is
contingent on a behavioral response. This feature was observable in the early
experimental observations of this process, provided by Thorndike and
Skinner: an animal closed in a box had to learn to perform specific actions
(string pulling, lever pressing) in order to escape captivity or get food
(Skinner, 1938; Thorndike, 1911).
Looking at the causal forces of conditioning several conditions shown to be
necessary: temporal contiguity (an action or a stimulus must be temporally close
to the outcome for an association to be established), contingency (the probability of an outcome should be higher after the action or the stimulus, i.e.,
the action or stimulus should be predictors of the outcome), and prediction
error (an action or a stimulus is associated to an outcome if the same outcome
was not already fully predicted by the learner) (Rescorla, 1967).
Rescorla and Wagner have first introduced the latter idea (Rescorla &
Wagner, 1972). They were interested in understanding a particular conditioning effect called the blocking effect (Kamin, 1967). In Kamins blocking paradigm, an animal is exposed to a first conditioned stimulus (i.e., a bell
ring), which predicts the occurrence of a reinforcer (i.e., food). After learning the association between the bell and the food, another stimulus (i.e., a
light) is presented with the food. Hence both the bell and the light are stimuli

134

Stefano Palminteri and Mathias Pessiglione

that predict the food. However, when tested, the animal does not learn the
association between the light and the food, as if it were blocked by the first
association. Rescorla and Wagner proposed that conditioning occurs not
only because two events co-occur, but because that co-occurrence is unanticipated on the basis of current knowledge. In the example above, the
occurrence of food is already fully predicted by the bell, so no novel association with the light is learned. The primitive learning signal of their model
is a prediction error, defined as the difference between the predicted and
the obtained reinforcer. The reinforcer (reward or punishment) prediction
error is a measure of the predictions accuracy and the Rescorla and Wagner
model is an error minimization model.
RL in the artificial intelligence perspective is a field of machine learning
aimed to find computational solutions to a class of problems closely related to
the psychological paradigms described in the case of instrumental conditioning (Sutton & Barto, 1998). The agent is conceived as navigating through
states of the environment selecting actions and collecting a quantitative
reward,1 which should be maximized. From this learning perspective,
two main functions arise as necessary for a RL agent: predicting the expected
reward in a given state (reward prediction) and optimal action selection for
reward maximization (choice).
Most influential modern RL models incorporate a temporal difference
(TD) learning. TD learning algorithm builds accurate reward predictions
from delayed rewards; the learning rule of this model is not dissimilar of that
used in the Rescorla and Wargner model, and it is based on a reward prediction error term. Q-learning is an extension of TD learning that learns separately the reward to be expected following each available action. The
optimal choice becomes to simply choose the action with the highest reward
expectation (Watkins & Dayan, 1992). Also Q-learning is based on a TD
error.
Thus, the experimenter can, thanks to RL algorithms, extrapolate key
computational variables of these models and make quantitative predictions
on how neural and behavioral data should evolve under the assumptions of
the model. These computational constructs are referred as hidden
variables, as opposed to the experimental observables (choices, reaction
times), from which they are derived. In the next section, we shall see where

The reward of computational modeling is a quantitative term that can take negative values and therefore represent punishments as well.

Reinforcement Learning and Tourette Syndrome

135

these computational hidden variables, focusing on prediction errors, have


been mapped in the primate brain.

2. NEURAL CORRELATES OF REINFORCEMENT


LEARNING2
2.1. Electrophysiological correlates in monkeys
In this section, we will review the most significant contributions to the understanding of neural substrates of RL, coming from electrophysiological studies
in monkeys and functional neuroimaging and neuropsychology in humans.
A large series of experiments performed by Wolfram Schultz and colleagues in the 1990s provided the first evidences of a neural system representing RL variables in the primate brain. At this time dopamine (DA)
function was mainly associated with many debilitating conditions including
Parkinsons disease, Tourettes syndrome, schizophrenia, attention deficithyperactivity disorder (ADHD), and addictions (Kienast & Heinz, 2006).
It appeared soon that, instead of motor parameters, the behavioral variables
most prominently associated with dopaminergic response were rewardrelated (Mirenowicz & Schultz, 1996).
In a seminal paper published in Science in 1997, Schultz and colleagues
showed that during a classical conditioning task, at the moment of the outcome,
the activity of midbrain dopaminergic neurons encoded the discrepancy
between the reward and its prediction, such that an unpredicted reward elicits
an augmentation of activity (positive prediction error), a fully predicted reward
elicits no response (no prediction error), and the omission of a predicted reward
induces a depression (negative prediction error; Fig. 5.1A) (Schultz, Dayan, &
Montague, 1997). The prediction error hypothesis of dopaminergic neurons
response during learning has since been replicated with other paradigms and by
other groups (Bayer & Glimcher, 2005; Fiorillo, Tobler, & Schultz, 2003;
Morris, Nevet, Arkadir, Vaadia, & Bergman, 2006).
In summary, single-cell recording studies in monkeys consistently showed
that during RL, dopaminergic neurons represent a theoretical learning signal
(hidden variable): the reward prediction error. In the next section we will
show that the same learning signal has been shown to underpin human RL.
2

A vast and rich literature exists concerning the neural bases of reinforcement learning in rodents. We
opted for restricting this chapter to primate studies, because they were the first to test and adopt computational concepts and models. Please note that recent studies strongly suggest that the same neurobiological and computational models are valid for both orders. Steinberg et al., 2013.

136

Stefano Palminteri and Mathias Pessiglione

Healthy subjects
Reward occurs

DA level

Level of DA to reach for effective positive


prediction errors Reward learning occurs

Level of DA to reach for effective negative


prediction errors Punishment learning occurs

Time
Punishment occurs

PD patients

PDs wih l-dopa

TS patients

TSs wih neuroleptics

DA level

DA level

Figure 5.1 (A) A schematic representing dopaminergic signals (gray) following positive
and negative outcomes compared to baseline level in healthy subjects. The green and
the red line, respectively, represent the level to reach (either above or below the baseline) to express a signal strong enough to induce either reward or punishment learning.
This schematic is based on the results originally reported in Schultz, Dayan, and
Montague (1997). (B) The same processes are represented for unmedicated and medicated PD and TS patients, where the DA baselines are supposed to be modified by the
clinical and the pharmacological condition. When the dopaminergic signal does not
reach the green line, reward learning does not occur (this is the case for unmedicated
PD and medicated TS). When the dopaminergic signal does not reach the red line punishment learning does not occur (this is the case in unmedicated TS and medicated PD).
This schematic represents a possible interpretation for the results obtained in the experimental study 1 (Palminteri et al., 2009).

2.2. Functional magnetic resonance imaging correlates


in humans
The findings from electrophysiological studies in nonhuman primates presented earlier motivated subsequent functional magnetic resonance imaging
(fMRI) experiments in humans aimed to find corresponding neural representations of reward prediction error in the human brain. The first evidence
for reward prediction error encoding in the human brain was provided by
Berns, McClure, Pagnoni, and Montague (2001). In this study, they gave

Reinforcement Learning and Tourette Syndrome

137

squirts of juice and water either in a predictable or in an unpredictable manner and they found that unpredictable reward sequences selectively induced
activation of the ventral striatum (VS) and in the ventral prefrontal cortex
(VPFC) (both target structures of the midbrain dopaminergic neurons) compared to predictable reward sequences, indicating that positive prediction
errors, instead of reward itself, induced increased activity in these areas.
These results have been replicated later (ODoherty, Deichmann,
Critchley, & Dolan, 2002).
These first studies used the so-called categorical approach to fMRI data
analysis. Though this approach has the advantages of being easy to implement and explain, it has the great disadvantage of preventing one from capturing the online temporal evolution of RL signals (Friston et al., 1996). This
is crucial for RL variables, such as reward predictions and reward prediction
errors, supposed to change radically during time. In fact as learning occurs,
reward prediction signals increase and prediction errors decrease: a feature
completely missed with cognitive subtraction. A second wave of studies used
a different approach, called model-based fMRI, which allows following of
learning-related changes in reward prediction and prediction error encoding
(ODoherty, Hampton, & Kim, 2007). This approach begins with computing the model estimation of the hidden computational variables according to
the RL algorithm (most often simple TD learning model for classical conditioning tasks and Q-learning model for instrumental conditioning tasks),
from subjects behavioral data. The fMRI data analysis consists of the search
of the brain areas whose neural activity covary with the models estimate of
the computational variables.
Following this model-based approach, a study from ODoherty and colleagues using classical conditioning procedure revealed that responses in the
VS and in the VPFC were significantly correlated with this error signal
(ODoherty, Dayan, Friston, Critchley, & Dolan, 2003). Similar results were
obtained by the same group in a subsequent experiment in which they contrasted a classical conditioning and an instrumental conditioning procedure
(ODoherty et al., 2004). These first results have been replicated consistently
hereinafter with different kinds of rewards (primary, as well as secondary),
different paradigms (classical, as well as instrumental conditioning), and
by different groups (see e.g., Abler, Walter, Erk, Kammerer, & Spitzer,
2006; Kim, Shimojo, & ODoherty, 2006; Palminteri, Boraud, Lafargue,
Dubois, & Pessiglione, 2009; Rutledge, Dean, Caplin, & Glimcher, 2010).
Thus, reward prediction errors have been reported consistently in the
basal ganglia (VS) and in the VPFC, which are main projections site of

138

Stefano Palminteri and Mathias Pessiglione

the dopaminergic neurons (Draganski et al., 2008). The consensual interpretation, built in analogy with electrophysiology studies in nonhuman primates, of these results has been that these signals reflect the midbrain
dopaminergic input in these areas. This idea has been further supported
by another experiment in which the authors utilized a special MRI sequence
to enhance the sensitivity in the midbrain. They reported that the responses
of dopaminergic nuclei were compatible with the reward prediction error
hypothesis (DArdenne, McClure, Nystrom, & Cohen, 2008).
However, functional imaging uniquely provides us with functional
correlates, which, in principle, could be merely epiphenomenal.
A limitation which is not specific to fMRI; it is also common to other electrophysiology techniques. To assess causal relations between a neural system
and a behavior, neuroscientists must observe the behavioral output of systems perturbation (Siebner et al., 2009). The perturbation can be the
administration of a given molecule or an accidental brain injury.
The causal implication of dopaminergic transmission in fMRI prediction
error signals has been given by a pharmacological perturbation fMRI study
in which subjects performed an instrumental learning task with probabilistic
monetary rewards and were given a dopaminergic treatment. The treatment
was either a DA enhancer (levodopa), or a DA antagonist (aldol) or placebo
(Pessiglione, Seymour, Flandin, Dolan, & Frith, 2006). fMRI results showed
again that reward prediction errors were represented in the VS; furthermore,
they showed that DA treatments modified the amplitude of these signals, so
that l-dopa amplified prediction errors and aldol blunted them, establishing a
direct link between dopaminergic transmission and fMRI prediction error
signals. Moreover, these medications affected learning performances accordingly to their neural effects (enhancement under l-dopa, impairment under
aldol), suggesting a causal role of DA modulation in reward learning.
In summary, the study of the neural bases of RL in humans has consistently shown that (1) reward prediction errors are represented in the striatum
and in the prefrontal cortex (mainly in the ventral parts, VS and VPFC) and
that (2) dopaminergic pharmacological manipulation significantly affect
these signals and consequently the behavioral performance.

2.3. Parkinsons disease and reinforcement learning


Given the prominent implication of dopaminergic system in RL, it is unsurprising that the first neuropsychological investigations implicated
Parkinsons disease (PD). In a seminal study, Frank and colleagues

Reinforcement Learning and Tourette Syndrome

139

administrated an instrumental learning task to a cohort of PD patients medicated (on) or unmedicated with l-dopa (off ) (Frank, Seeberger, & OReilly,
2004). Their results showed that the off patients were impaired in learning
from positive outcomes, whereas on patients were impaired in learning from
negative outcomes. This result is consistent with the idea that reward and
punishment learning are driven by dopaminergic positive and negative prediction errors, respectively. According to this interpretation, the level of
dopaminergic transmission cannot increase enough to produce positive prediction errors in off patients, because of their neural loss in the midbrain DA
nuclei, so that positive outcomes are not able to induce learning. On the
contrary in the on patients, where the level of DA has been artificially
increased by the treatment with l-dopa, negative prediction errors (pauses
in DA transmission) are not possible, leading to the impairment in learning
from negative outcomes (Fig. 5.1B). These results have been further replicated (partially or totally) by our and other groups (Bodi et al., 2009; Frank,
Samanta, Moustafa, & Sherman, 2007; Palminteri, Lebreton, et al., 2009;
Rutledge et al., 2009; Voon et al., 2010).
In summary these results indicated that dopaminergic motor disease,
such as PD could display nonmotor symptoms in a fashion that is fully compatible with the hypothesis of dopaminergic encoding of prediction errors
during RL. On this basis a natural extension of these studies has been to
study RL in Tourette syndrome (TS).

3. TOURETTE SYNDROME AND REINFORCEMENT


LEARNING
In 1895 Georges Albert Edouard Brutus Gilles de la Tourette, working at the Pitie-Salpetrie`re in Paris, published an article reporting the accurate clinical description of nine patients presenting a tic disorder
characterized by onset in childhood and multiple physical (motor) tics
and at least one vocal (phonic) tic. This article is still considered as the first
accurate description of this particular tic disorder, which has since been
named TS after the author (Rickards & Cavanna, 2009).
There is no cure for TS and no medication that works universally for all
individuals without significant adverse effects (Hartmann & Worbe, 2013).
The classes of medication with the most proven efficiency in treating tics are
typical and atypical neuroleptics, DA antagonists, including risperidone, haloperidol, and pimozide. They can have long-term and short-term adverse
effects, both at the motor level (DA antagonists induced Parkinsonian

140

Stefano Palminteri and Mathias Pessiglione

syndrome) and at the cognitive (executive dysfunction) and at the affective


level (blunted affects). Recently, a promising new molecule, the
aripiprazole, a DA partial agonist, has been proposed for the symptoms of
TS syndrome. For drug-resistant adulthood-persistent TS patients, deep
brain stimulation (DBS) treatment is under study in several clinical centers
(McNaught & Mink, 2011).
The precise etiology behind TS is unknown. An influent pathophysiologic
hypothesis implicates a condition of elevated DA levels. This hypothesis was
first suggested by the observation of the beneficial effects of DA antagonists
on TS symptoms. Consistent with this hypothesis several genetic, postmortem,
and neuroimaging studies supported a dopaminergic hyperfunctioning, though
these observations have recently been challenged (Gilbert et al., 2006; Malison
et al., 1995; McNaught & Mink, 2011; Tarnok et al., 2007; Yoon, Gause,
Leckman, & Singer, 2007; Yoon, Rippel, et al., 2007).
TS displays frequent comorbidities with other psychiatric diseases.
Among them the most common are obsessivecompulsive disorder
(OCD) and ADHD, which have also been associated with cortico-striatal
dysfunction (Worbe, Mallet, et al., 2010). Recent anatomy and functional
connectivity studies suggest that different phenotypes of TS (in terms of tic
complexity or psychiatric comorbidities) could be associated with the
impairment of distinct cortico-striatal circuits (Worbe et al., 2012;
Worbe, Gerardin, et al., 2010).
In summary, TS tics are believed to result from DA-induced dysfunction
in cortical and subcortical regions. More precisely, even if still debated, TS
seems to be associated to a functional hyperdopaminergia. In addition, TS is
treated with DA antagonists. These observations, together with the wellestablished implication of DA and frontal striatal circuits in RL, motivated
a series of experiments, aimed to explore RL in this pathology,
described below.

3.1. Experimental study 1: Tourette syndrome and subliminal


instrumental learning (Palminteri, Lebreton, et al., 2009)
The first investigation of RL in TS patients employed a subliminal instrumental learning task, with monetary gains and losses, which had already been
shown to activate the VS in a previous fMRI study (Pessiglione et al., 2008).
The learning task was subliminal since the contextual cues, associated with
gains and losses, were presented for a very short period (50 ms) and between
two visual masks (Kouider & Dehaene, 2007). The advantage of using subliminal visual presentation is to ensure that basic RL processes are not

Reinforcement Learning and Tourette Syndrome

141

perturbed by high level cognitive processes (Dehaene, Changeux,


Naccache, Sackur, & Sergent, 2006).
In addition to TS patients, in this experimental study we tested a cohort
of PD patients. The PD patients were tested twice: once on and once off
dopaminergic medication. TS patients were split into two groups according
to their medication status: unmedicated or medicated with DA antagonists.
Behavioral data analysis showed the following pattern: unmedicated PD
patients were impaired in reward learning, as were medicated TS patients;
while medicated PD patients and unmedicated TS were impaired in punishment learning (Fig. 5.2A).
Several significant conclusions could be drawn from these results: (1) the
double dissociation between medication status (on vs. off l-dopa) and outcome valence (reward vs. punishment), first shown by Frank and colleagues
in PD patients, is robust across tasks, and particularly in the unconscious case;
(2) this double dissociation is robust across pathologies and pharmacological
models, since it was replicated in TS patients (Frank et al., 2004).
These results support the hypothesis that bursts in DA transmission encode
positive prediction errors and therefore drive reward learning, whereas dips in
DA transmission encode negative prediction errors and therefore drive
punishment learning (Fig. 5.1). Thus, this study provides the first experimental
evidence of a functional hyperdopaminergia outside the motor domain in TS
by showing that unmedicated TS patients behave in a similar way that
l-dopa-medicated PD patients.

3.2. Experimental study 2: Tourette syndrome and reinforcement


of motor skill learning (Palminteri et al., 2011)
In a subsequent study, we investigated the effect of reward-based reinforcement in motor skill learning and the role of DA of this process. RL theory
has been extensively used to understand choice behavior: DA signals reward
prediction errors in order to update action values and ensure better choices
in the future. However, educators may share the intuitive idea that reinforcers not only affect choices but also motor skills such as playing piano
or football. Here, we employed a novel paradigm to demonstrate that monetary rewards could improve motor skill learning in humans. Indeed, healthy
participants got progressively faster at executing sequences of key presses that
were repeatedly rewarded with 10 euro compared with 1 cent. Interestingly,
control tests revealed that the effect of reinforcement on motor skill learning
was independent of subjects awareness of sequence-reward associations: a
result that is reminiscent of what we have shown in the experimental study

142

Stefano Palminteri and Mathias Pessiglione

Parkinson

Tourette
PD OFF
PD ON

TS OFF

Learning
performances

Learning
performances

Reward Punishment

TS ON

Reward Punishment

B
Reinforcement learning

Reaction time
(1 cent)

Motor learning

TS OFF
Controls
TS ON

15

Trial

15

Neural RPE

C
Reward prediction errors (RPE)

Ventral striatum

Group

Trial

Behavioral RPE

Learning curves

Learning performances

TS OFF
TS AA
TS PA

24

Trial

Figure 5.2 (A) A schematic summarizing the behavioral results of the experimental
study 1 (Palminteri, Lebreton, et al., 2009). The graphs show the interaction between
reinforced valence (positive or negative) and medication studies. The same pattern
can be observed in PD and TS (ON, medicated; OFF, unmedicated). (B) This schematic
summarizes the main results of the experimental study 2 (Palminteri et al., 2011). Motor
skill learning is impaired in TS, compared to controls, irrespective of the medication status, whereas reinforcement leaning effect on motor learning follows a completely different pattern: it is exacerbated in unmedicated TS patients (TS OFF) compared to
healthy controls and unmedicated TS (TS OFF) in which it is absent. (C) This schematic
summarizes the main results of the second experimental study (Worbe et al., 2011).
Reward prediction encoding has been found in the VS (among other areas, such as
the VPFC). Learning performances were blunted in DA antagonist medicated patients
(TS AA) compared to unmedicated patients (TS OFF) and partial agonist medicated
patients (TS PA). Note that all the graphs here represent ideal values meant to illustrate
the pattern of the experimental results, but not the experimental results themselves
(except for the ventral striatal activation).

Reinforcement Learning and Tourette Syndrome

143

1 concerning the possibility of unconscious instrumental learning. TS


patients, who were either medicated or unmedicated with DA antagonists
as in the previous study, performed the same behavioral task. We also
included patients with focal dystonia, as an example of hyperkinetic motor
disorder unrelated to DA. The behavioral data analysis, based on computational modeling, showed the following dissociation: while motor skills were
affected in all patient groups, RL was selectively enhanced in unmedicated
patients with TS syndrome and impaired by DA antagonists (Fig. 5.2B).
These results support the idea that RL has multiple behavioral effects, which
are all mediated by DA transmission (Niv, Daw, Joel, & Dayan, 2007; Suri &
Schultz, 1998). Clinically, the results further support the hypothesis that
overactive DA transmission leads to excessive reinforcement of motor
sequences, which might explain the formation of tics in TS (see the
Section 4 of this chapter).

3.3. Experimental study 3: Tourette syndrome and


probabilistic reinforcement learning (Worbe et al., 2011)
In the last study, we investigated instrumental learning with a task associated
with probabilistic monetary gains, which have already been shown to activate the VPFC and VS as a function of reward prediction and prediction
error (Palminteri, Boraud, et al., 2009). In this study, we investigated the
effect of different clinical phenotypes in terms of tic complexity and psychiatric comorbidity. Indeed, it has been suggested that the heterogeneity of
clinical phenotypes in TS may relate to the dysfunction of distinct frontal
cortexbasal ganglia circuits (Worbe et al., 2012; Worbe, Gerardin,
et al., 2010).
To assess RL performances across various clinical phenotypes and different pharmacological treatments, we recruited a large cohort of TS patients.
Subjects (patients and controls) were scanned using functional MRI while
they performed the probabilistic instrumental learning task. fMRI data analysis confirmed the implication of the VPFC and the VS in reward encoding.
Reward-related activation in limbic circuits was independently reduced by
two factors: presence of associated obsessivecompulsive symptoms and
medication with DA antagonists. Computational modeling with standard
RL algorithms indicated that for both factors, the diminished reward-related
activation could account for the impaired choice performance. Furthermore, RL performance and related brain activations were not affected by
aripiprazole, a recent medication that acts as a DA partial agonist
(Kawohl, Schneider, Vernaleken, & Neuner, 2009). These results support

144

Stefano Palminteri and Mathias Pessiglione

the hypothesis that the heterogeneity of clinical phenotypes in TS patients


relates to dysfunction of distinct frontal cortexbasal ganglia circuits and suggest that, unlike DA antagonists, DA partial agonists may preserve reward
sensitivity and hence avoid blunting motivational drives. In summary, this
study replicated the finding of a reward learning impairment in TS patients,
associated with (1) medication with DA antagonists and extended (2) this
finding to comorbid OCD. This experiment also showed that RL performances and reward-related brain activities were significantly correlated
(Fig. 5.2C).

4. CONCLUSIONS AND PERSPECTIVES


Long-lasting experimental research in cognitive neuroscience indicates that RL is based on a teaching signal called the prediction error, the
difference between the obtained and the expected outcome and that this signal is represented by DA neurons projecting to frontal cortexbasal ganglia
circuits. This has been confirmed by different techniques, such as electrophysiology, fMRI, and neuropharmacology, in nonhuman primates,
healthy subjects, PD patients and, more recently, in TS patients.
The smoothing of positive prediction errors observed in PD off patients
might provide a mechanism for the expressed symptoms of PD at both the
motor and the cognitivepsychiatric level. For instance, if an action is not
reinforced when rewarded, selection of that action will not be facilitated
in the future and the consequent deficit in movement selection could
account for some motor symptoms, such as akinesia and rigidity (BarGad & Bergman, 2001). At another level a reduced reward sensibility could
account for psychiatric symptoms, such as depression or apathy (Agid et al.,
2003; Weintraub, Comella, & Horn, 2008). The resulting impairment in
punishment avoidance, explained by the impossibility to express negative
prediction errors, may account for DA dysregulation syndrome, which
encompasses the manifestation of different impulse control deficits (addictions, pathological gambling, hypersexuality) secondary to DA replacement
therapy (Lawrence, Evans, & Lees, 2003; Voon et al., 2009). Abnormally
high DA level blunts negative prediction errors, and therefore makes the
punishing consequences of these inadaptive behaviors (i.e. losing money
in the case of gambling) ineffective to impede them in the future.
In the experimental study 1, we found that TS mirrored PD patients with
respect to reward and punishment learning, since unmedicated TS patients
were impaired in punishment learning. A parsimonious explanation of their

Reinforcement Learning and Tourette Syndrome

145

deficit is to hypothesize impairment in coding negative prediction errors,


which is compatible with the idea of overactive DA transmission in TS
(Fig. 5.1B). The idea of a functional hyperdopaminergia in TS patients
has also been replicated in the experimental study 2, in which we showed
that TS condition, in unmedicated patients, was associated with exacerbated
effect reinforcement on motor learning performances compared to healthy
controls and captured by enhanced reward prediction errors in the computational analysis.
Traditionally, the pathological hyperactivity of DA system has been
linked to tic generation, supposing a role on the disinhibition of inappropriate motor patterns (the tics) and their positive reinforcement (Leckman,
2002; Mink, 2003). Against this view, we speculate that the absence of negative reinforcement, instead of an excessive positive reinforcement, is to be
linked to the tics. Accordingly, the most plausible scenario is that the absence
(or the reduction) of negative reinforcement, due to an excessive dopaminergic state, impedes the negative selection (the extinction) of inappropriate
motor patterns. In healthy subjects an inappropriate movement could be
occasionally emitted during life time, but it is rapidly inhibited by negative
reinforcement occasionally (tics are very frequent during childhood). On the
contrary, in subjects with abnormally high dopaminergic level (TS patients)
this negative selection process would fail, and the tic will persist. This would
also consistently explain the beneficial effects of DA antagonists treatment
for TS symptoms: reducing the DA level, these molecules allow the tic to
be negatively reinforced and finally suppressed. At the cognitive level, a
pathophysiological process, similar to the one proposed for PD, could
account for the impulse control disorders, whose frequencies are enhanced
in TS (Frank, Piedad, Rickards, & Cavanna, 2011).
In all three experimental studies, we showed that DA antagonists
administration in TS patients blunted reward prediction errors at both the
computationalbehavioral and the neural level. This observation has also
been reported in healthy subjects (Pessiglione et al., 2006). A possible mechanism to achieve this reward learning inhibition is to blunt the expression
of dopaminergic positive prediction errors, therefore reducing reward sensitivity (Fig. 5.1B). This property would also explain certain cognitive and
affective side effects of DA antagonists treatment, such as apathy, a general
lack of motivation (Hartmann & Worbe, 2013). Interestingly, DA partial
agonist has been shown to preserve reward sensitivity at both behavioral
and neural. The preserved reward sensitivity could explain then why this
molecule displays reduced side effects.

146

Stefano Palminteri and Mathias Pessiglione

In the experimental study 3, we also found an instrumental learning deficit in TS patients with OCD comorbidity, which correlated with blunted
activity in the VPFC. This finding is consistent with evidences describing
RL deficits in OCD patients (Cavanagh, Grundler, Frank, & Allen, 2010;
Chamberlain et al., 2008; Figee et al., 2011; Nielen, den Boer, & Smid,
2009; Palminteri, Clair, Mallet, & Pessiglione, 2012; Remijnse et al.,
2006). Thus, since it has been shown with a variety of behavioral tasks
and clinical models, RL deficit may represent a neuropsychological feature
of OCD. These findings of a neural and behavioral reward processing
impairment are consistent with the alleged dysfunction of ventral frontal
cortexbasal ganglia loops that has been reported in OCD and in comorbid
TS-OCD (Aouizerate et al., 2004; Rotge et al., 2010; Worbe, Gerardin,
et al., 2010). Although the connection between RL impairment and
obsessivecompulsive symptoms remains to be articulated, we speculate here
that repetitive behaviors or thoughts might come from aberrant reinforcement processes in a similar way described for tics in TS syndrome. Future
research should also focus on studying reinforcement in other pathology of
the TS spectrum, such as ADHD, which is characterized by monoaminergic
dysfunction and dopaminergic treatment (Biederman & Faraone, 2005).
In summary, RL is a process whose dysfunction could in part be responsible for the behavioral manifestation of TS at different levels (from lower
motor symptoms to higher cognitive and psychiatric symptoms). Thus,
the formal framework of RL study can provide fundamental insights
for the comprehension of neuropsychiatric disorders. From this perspective
the experimental studies presented here can be considered part of a newborn
and promising discipline, computational psychiatry, which aims to explain
neuropsychiatric diseases with formal and quantitative behavioral models
(Maia & Frank, 2011; Montague, Dolan, Friston, & Dayan, 2012).
Beyond the interest for the physiopathology of TS, these data have
also implications for the implementation of current treatments and the
development of new ones, at both the pharmacological, surgical, and the
behavioral therapy levels (Hartmann & Worbe, 2013; McNaught &
Mink, 2011). We already showed that different kinds of pharmacological
treatment differentially affect RL, possibly explaining the different expression of side effects. On the other side, behavioral therapy, which is largely
based on conditioning procedures, should take into account the medication
status of the patient. For instance, on the basis of our results, negative
reinforcement is not likely to be effective in unmedicated TS patients,
whereas the opposite can be true for medicated ones. Concerning surgical

Reinforcement Learning and Tourette Syndrome

147

approaches, the known implication of different subcortical nuclei in RL


could inform and affect the choice of new target nuclei.
As previously mentioned, the sensitivity of the RL paradigms with
respect to the dopaminergic status has proven very robust across neuropsychiatric pathologies and treatments. Accordingly, RL tasks, such as the
probabilistic RL task with monetary gain and losses, with proven sensitivity
to the dopaminergic status and well-established dopaminergic subcortical
neural correlates, could potentially be adapted and standardized in order
to be used in daily neuropsychological assessment as a proxy of dopaminergic functioning, and therefore used to assess the patients propensity to display particular psychiatric symptoms or treatment side effects (Murray
et al., 2008; Palminteri, Justo, et al., 2012; Pessiglione et al., 2006;
Voon et al., 2010).
Short- and mid-term experimental perspectives include the study of the
effect of DBS in RL, as it has already been done in PD, but not in TS (Frank
et al., 2007; Palminteri et al., 2013). DBS is becoming more and more studied as a treatment for TS. Targeted nuclei included so far the globus pallidum
and the VS (Viswanathan, Jimenez-Shahed, Baizabal Carvallo, & Jankovic,
2012). All these structures have been previously implicated in RL (see previous sections). Studying the effect of DBS on RL performances should
extend the studies presented earlier. From the fundamental perspective, local
field potentials recording in these subjects will give us the unique opportunity to study RL-related electrophysiological signals in humans (Cohen
et al., 2009; Priori et al., 2013).
Studying reinforcement learning performances in TS in a developmental
perspective would represent another interesting perspective. As a matter of
facts, brain circuits mature with different speeds; for instance, the motor circuit matures before the VPFC (Giedd et al., 1999; Gogtay et al., 2004). TS
being a development disorder it would be interesting to map the RL capabilities of TS patients within the time course of brain maturation.
Finally, the decision-making and learning community has recently
witnessed the blossoming of studies directed toward the understanding
of model-based RL (model-based here is used in a difference sense
compared to model-based fMRI described earlier). This learning
approach, though more precise and flexible, is computationally more complex, because it requires mentally simulating alternative courses of action
(Daw, Niv, & Dayan, 2005; Samejima & Doya, 2007). Recently, modelbased computation has been shown to be underpinned by the dorsal
prefrontal cortex region, which has been classically associated with cognitive

148

Stefano Palminteri and Mathias Pessiglione

control (Glascher, Daw, Dayan, & ODoherty, 2010; Koechlin &


Summerfield, 2007; Wunderlich, Dayan, & Dolan, 2012). Further research
should investigate model-based learning performances in TS patients, as well
as the effect of DA antagonists in this process.

ACKNOWLEDGMENTS
Mael Lebreton took very important and active place in the experimental studies 1 and 2.
Yulia Worbe designed and conducted the experimental study 3. Yulia Worbe and
Andreas Hartmann took care of the TS patients and provided clinical data. David Grabli
had a similar role, but for PD and dystonic patients. S. P. received a PhD fellowship from
the Neuropole de Recherche Francilien (Nerf ). The studies were funded by the Fyssen
Fondation (FF), the Ecole de Neuroscience de Paris (ENP), the Agence National de la
Recherche (ANR), and the Association Francaise du Syndrome de Gilles de la
Tourette (AFSGT).

REFERENCES
Abler, B., Walter, H., Erk, S., Kammerer, H., & Spitzer, M. (2006). Prediction error as a
linear function of reward probability is coded in human nucleus accumbens. NeuroImage,
31(2), 790795. http://dx.doi.org/10.1016/j.neuroimage.2006.01.001.
Agid, Y., Arnulf, I., Bejjani, P., Bloch, F., Bonnet, A. M., Damier, P., et al. (2003).
Parkinsons disease is a neuropsychiatric disorder. Advances in Neurology, 91, 365370.
Retrieved from, http://www.ncbi.nlm.nih.gov/pubmed/12442695.
Aouizerate, B., Guehl, D., Cuny, E., Rougier, A., Bioulac, B., Tignol, J., et al. (2004). Pathophysiology of obsessive-compulsive disorder: A necessary link between phenomenology, neuropsychology, imagery and physiology. Progress in Neurobiology, 72(3), 195221.
http://dx.doi.org/10.1016/j.pneurobio.2004.02.004.
Bar-Gad, I., & Bergman, H. (2001). Stepping out of the box: Information processing in the neural
networks of the basal ganglia. Current Opinion in Neurobiology, 11(6), 689695. Retrieved
from, http://www.ncbi.nlm.nih.gov/pubmed/11741019.
Bayer, H. M., & Glimcher, P. W. (2005). Midbrain dopamine neurons encode a quantitative
reward prediction error signal. Neuron, 47(1), 129141. http://dx.doi.org/10.1016/j.
neuron.2005.05.020.
Berns, G. S., McClure, S. M., Pagnoni, G., & Montague, P. R. (2001). Predictability modulates human brain response to reward. The Journal of Neuroscience: The Official Journal of
the Society for Neuroscience, 21(8), 27932798. Retrieved from, http://www.ncbi.nlm.nih.
gov/pubmed/11306631.
Biederman, J., & Faraone, S. V. (2005). Attention-deficit hyperactivity disorder. Lancet,
366(9481), 237248. http://dx.doi.org/10.1016/S0140-6736(05)66915-2.
Bodi, N., Keri, S., Nagy, H., Moustafa, A., Myers, C. E., Daw, N., et al. (2009). Rewardlearning and the novelty-seeking personality: A between- and within-subjects study of
the effects of dopamine agonists on young Parkinsons patients. Brain: A Journal of Neurology, 132(Pt. 9), 23852395. http://dx.doi.org/10.1093/brain/awp094.
Brembs, B. (2003). Operant conditioning in invertebrates. Current Opinion in Neurobiology,
13(6), 710717. http://dx.doi.org/10.1016/j.conb.2003.10.002.
Cavanagh, J. F., Grundler, T. O. J., Frank, M. J., & Allen, J. J. B. (2010). Altered cingulate
sub-region activation accounts for task-related dissociation in ERN amplitude as a function of obsessive-compulsive symptoms. Neuropsychologia, 48(7), 20982109. http://dx.
doi.org/10.1016/j.neuropsychologia.2010.03.031.

Reinforcement Learning and Tourette Syndrome

149

Chamberlain, S. R., Menzies, L., Hampshire, A., Suckling, J., Fineberg, N. A., del
Campo, N., et al. (2008). Orbitofrontal dysfunction in patients with obsessivecompulsive disorder and their unaffected relatives. Science (New York, N.Y.),
321(5887), 421422. http://dx.doi.org/10.1126/science.1154433.
Cohen, M. X., Axmacher, N., Lenartz, D., Elger, C. E., Sturm, V., & Schlaepfer, T. E.
(2009). Neuroelectric signatures of reward learning and decision-making in the human
nucleus accumbens. Neuropsychopharmacology: Official Publication of the American College of
Neuropsychopharmacology, 34(7), 16491658. http://dx.doi.org/10.1038/npp.2008.222.
DArdenne, K., McClure, S. M., Nystrom, L. E., & Cohen, J. D. (2008). BOLD responses
reflecting dopaminergic signals in the human ventral tegmental area. Science (New York,
N.Y.), 319(5867), 12641267. http://dx.doi.org/10.1126/science.1150605.
Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8(12),
17041711. http://dx.doi.org/10.1038/nn1560.
Dehaene, S., Changeux, J.-P., Naccache, L., Sackur, J., & Sergent, C. (2006). Conscious,
preconscious, and subliminal processing: A testable taxonomy. Trends in Cognitive
Sciences, 10(5), 204211. http://dx.doi.org/10.1016/j.tics.2006.03.007.
Dickinson, A. (1980). Contemporary animal learning theory. Cambridge University Press.
Retrieved from, http://books.google.com/books?hlit&lr&id2y84AAAAIAAJ&pgis1.
Draganski, B., Kherif, F., Kloppel, S., Cook, P. A., Alexander, D. C., Parker, G. J. M., et al.
(2008). Evidence for segregated and integrative connectivity patterns in the human basal
ganglia. The Journal of Neuroscience, 28(28), 71437152. http://dx.doi.org/10.1523/
JNEUROSCI.1486-08.2008.
Figee, M., Vink, M., de Geus, F., Vulink, N., Veltman, D. J., Westenberg, H., et al. (2011).
Dysfunctional reward circuitry in obsessive-compulsive disorder. Biological Psychiatry,
69(9), 867874. http://dx.doi.org/10.1016/j.biopsych.2010.12.003.
Fiorillo, C. D., Tobler, P. N., & Schultz, W. (2003). Discrete coding of reward probability
and uncertainty by dopamine neurons. Science (New York, N.Y.), 299(5614), 18981902.
http://dx.doi.org/10.1126/science.1077349.
Frank, M. C., Piedad, J., Rickards, H., & Cavanna, A. E. (2011). The role of impulse control
disorders in Tourette syndrome: An exploratory study. Journal of the Neurological Sciences,
310(12), 276278. http://dx.doi.org/10.1016/j.jns.2011.06.032.
Frank, M. J., Samanta, J., Moustafa, A. A., & Sherman, S. J. (2007). Hold your horses: Impulsivity, deep brain stimulation, and medication in parkinsonism. Science (New York, N.Y.),
318(5854), 13091312. http://dx.doi.org/10.1126/science.1146157.
Frank, M. J., Seeberger, L. C., & OReilly, R. C. (2004). By carrot or by stick: Cognitive
reinforcement learning in parkinsonism. Science (New York, N.Y.), 306(5703),
19401943. http://dx.doi.org/10.1126/science.1102941.
Friston, K. J., Price, C. J., Fletcher, P., Moore, C., Frackowiak, R. S., & Dolan, R. J. (1996).
The trouble with cognitive subtraction. NeuroImage, 4(2), 97104. http://dx.doi.org/
10.1006/nimg.1996.0033.
Giedd, J. N., Blumenthal, J., Jeffries, N. O., Castellanos, F. X., Liu, H., Zijdenbos, A., et al.
(1999). Brain development during childhood and adolescence: A longitudinal MRI
study. Nature Neuroscience, 2(10), 861863. http://dx.doi.org/10.1038/13158.
Gilbert, D. L., Christian, B. T., Gelfand, M. J., Shi, B., Mantil, J., & Sallee, F. R. (2006).
Altered mesolimbocortical and thalamic dopamine in Tourette syndrome. Neurology,
67(9), 16951697. http://dx.doi.org/10.1212/01.wnl.0000242733.18534.2c.
Glascher, J., Daw, N., Dayan, P., & ODoherty, J. P. (2010). States versus rewards: Dissociable
neural prediction error signals underlying model-based and model-free reinforcement
learning. Neuron, 66(4), 585595. http://dx.doi.org/10.1016/j.neuron.2010.04.016.
Gogtay, N., Giedd, J. N., Lusk, L., Hayashi, K. M., Greenstein, D., Vaituzis, A. C., et al.
(2004). Dynamic mapping of human cortical development during childhood through

150

Stefano Palminteri and Mathias Pessiglione

early adulthood. Proceedings of the National Academy of Sciences of the United States of America,
101(21), 81748179. http://dx.doi.org/10.1073/pnas.0402680101.
Hartmann, A., & Worbe, Y. (2013). Pharmacological treatment of Gilles de la Tourette syndrome. Neuroscience and Biobehavioral Reviews, 37(6), 11571161. http://dx.doi.org/
10.1016/j.neubiorev.2012.10.014.
Kamin, L. J. (1967). Attention-like processes in classical conditioning. Hamilton, Ontario:
Department of psychology, McMaster University.
Kawohl, W., Schneider, F., Vernaleken, I., & Neuner, I. (2009). Aripiprazole in the pharmacotherapy of Gilles de la Tourette syndrome in adult patients. The World Journal of
Biological Psychiatry: The Official Journal of the World Federation of Societies of Biological Psychiatry, 10(4 Pt. 3), 827831. http://dx.doi.org/10.1080/15622970701762544.
Kienast, T., & Heinz, A. (2006). Dopamine and the diseased brain. CNS & Neurological
DisordersDrug Targets, 5(1), 109131. http://dx.doi.org/10.2174/187152706784111560.
Kim, H., Shimojo, S., & ODoherty, J. P. (2006). Is avoiding an aversive outcome rewarding? Neural substrates of avoidance learning in the human brain. PLoS Biology, 4(8),
e233. http://dx.doi.org/10.1371/journal.pbio.0040233.
Koechlin, E., & Summerfield, C. (2007). An information theoretical approach to prefrontal
executive function. Trends in Cognitive Sciences, 11(6), 229235. http://dx.doi.org/
10.1016/j.tics.2007.04.005.
Kouider, S., & Dehaene, S. (2007). Levels of processing during non-conscious perception:
A critical review of visual masking. Philosophical Transactions of the Royal Society of London
Series B, Biological Sciences, 362(1481), 857875. http://dx.doi.org/10.1098/rstb.2007.2093.
Lawrence, A. D., Evans, A. H., & Lees, A. J. (2003). Compulsive use of dopamine replacement
therapy in Parkinsons disease: Reward systems gone awry? Lancet Neurology, 2(10), 595604.
Retrieved from, http://www.ncbi.nlm.nih.gov/pubmed/14505581.
Leckman, J. F. (2002). Tourettes syndrome. Lancet, 360(9345), 15771586. http://dx.doi.
org/10.1016/S0140-6736(02)11526-1.
Maia, T. V., & Frank, M. J. (2011). From reinforcement learning models to psychiatric and neurological disorders. Nature Neuroscience, 14(2), 154162. http://dx.doi.org/10.1038/nn.2723.
Malison, R. T., McDougle, C. J., van Dyck, C. H., Scahill, L., Baldwin, R. M., Seibyl, J. P.,
et al. (1995). [123I]beta-CIT SPECT imaging of striatal dopamine transporter binding in Tourettes disorder. The American Journal of Psychiatry, 152(9), 13591361. Retrieved from,
http://www.ncbi.nlm.nih.gov/pubmed/7653693.
McNaught, K. S. P., & Mink, J. W. (2011). Advances in understanding and treatment of
Tourette syndrome. Nature Reviews. Neurology, 7(12), 667676. http://dx.doi.org/
10.1038/nrneurol.2011.167.
Mink, J. W. (2003). The basal ganglia and involuntary movements. Archives of Neurology, 60,
13651368.
Mirenowicz, J., & Schultz, W. (1996). Preferential activation of midbrain dopamine neurons
by appetitive rather than aversive stimuli. Nature, 379(6564), 449451. http://dx.doi.
org/10.1038/379449a0.
Montague, P. R., Dolan, R. J., Friston, K. J., & Dayan, P. (2012). Computational psychiatry.
Trends in Cognitive Sciences, 16(1), 7280. http://dx.doi.org/10.1016/j.tics.
2011.11.018.
Morris, G., Nevet, A., Arkadir, D., Vaadia, E., & Bergman, H. (2006). Midbrain dopamine
neurons encode decisions for future action. Nature Neuroscience, 9(8), 10571063. http://
dx.doi.org/10.1038/nn1743.
Murphey, R. M. (1967). Instrumental conditioning of the fruit fly, Drosophila melanogaster.
Animal Behaviour, 15(1), 153161. http://dx.doi.org/10.1016/S0003-3472(67)80027-7.
Murray, G. K., Corlett, P. R., Clark, L., Pessiglione, M., Blackwell, A. D., Honey, G., et al.
(2008). Substantia nigra/ventral tegmental reward prediction error disruption in psychosis. Molecular Psychiatry, 13(3), 267276. http://dx.doi.org/10.1038/sj.mp.4002058, 239.

Reinforcement Learning and Tourette Syndrome

151

Nielen, M. M., den Boer, J. A., & Smid, H. G. O. M. (2009). Patients with obsessivecompulsive disorder are impaired in associative learning based on external feedback. Psychological Medicine, 39(9), 15191526. http://dx.doi.org/10.1017/S0033291709005297.
Niv, Y., Daw, N. D., Joel, D., & Dayan, P. (2007). Tonic dopamine: Opportunity costs and
the control of response vigor. Psychopharmacology, 191(3), 507520. http://dx.doi.org/
10.1007/s00213-006-0502-4.
ODoherty, J. P., Dayan, P., Friston, K., Critchley, H., & Dolan, R. J. (2003). Temporal
difference models and reward-related learning in the human brain. Neuron, 38(2),
329337. Retrieved from, http://www.ncbi.nlm.nih.gov/pubmed/12718865.
ODoherty, J. P., Dayan, P., Schultz, J., Deichmann, R., Friston, K., & Dolan, R. J. (2004).
Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science (New
York, N.Y.), 304(5669), 452454. http://dx.doi.org/10.1126/science.1094285.
ODoherty, J. P., Deichmann, R., Critchley, H. D., & Dolan, R. J. (2002). Neural responses
during anticipation of a primary taste reward. Neuron, 33(5), 815826. Retrieved from,
http://www.ncbi.nlm.nih.gov/pubmed/11879657.
ODoherty, J. P., Hampton, A., & Kim, H. (2007). Model-based fMRI and its application to
reward learning and decision making. Annals of the New York Academy of Sciences, 1104,
3553. http://dx.doi.org/10.1196/annals.1390.022.
Palminteri, S., Boraud, T., Lafargue, G., Dubois, B., & Pessiglione, M. (2009). Brain hemispheres selectively track the expected value of contralateral options. The Journal
of Neuroscience, 29(43), 1346513472. http://dx.doi.org/10.1523/JNEUROSCI.150009.2009.
Palminteri, S., Clair, A.-H., Mallet, L., & Pessiglione, M. (2012). Similar improvement of
reward and punishment learning by serotonin reuptake inhibitors in obsessivecompulsive disorder. Biological Psychiatry, 72(3), 244250. http://dx.doi.org/10.1016/
j.biopsych.2011.12.028.
Palminteri, S., Justo, D., Jauffret, C., Pavlicek, B., Dauta, A., Delmaire, C., et al. (2012).
Critical roles for anterior insula and dorsal striatum in punishment-based avoidance
learning. Neuron, 76(5), 9981009. http://dx.doi.org/10.1016/j.neuron.2012.10.017.
Palminteri, S., Lebreton, M., Worbe, Y., Grabli, D., Hartmann, A., & Pessiglione, M.
(2009). Pharmacological modulation of subliminal learning in Parkinsons and Tourettes
syndromes. Proceedings of the National Academy of Sciences of the United States of America,
106(45), 1917919184. http://dx.doi.org/10.1073/pnas.0904035106.
Palminteri, S., Lebreton, M., Worbe, Y., Hartmann, A., Lehericy, S., Vidailhet, M., et al.
(2011). Dopamine-dependent reinforcement of motor skill learning: Evidence from
Gilles de la Tourette syndrome. Brain, 134(8), 22872301. http://dx.doi.org/
10.1093/brain/awr147.
Palminteri, S., Serra, G., Buot, A., Schmidt, L., Welter, M.-L., & Pessiglione, M. (2013). Hemispheric dissociation of reward processing in humans: Insights from deep brain stimulation.
Cortex: A Journal Devoted to the Study of the Nervous System and Behavior, pii: S0010-9452(13)
00072-5. http://dx.doi.org/10.1016/j.cortex.2013.02.014. [Epub ahead of print].
Pavlov, I. P. (1927). Conditioned reflexes: An investigation of the physiological activity of the
cerebral cortex. London: Oxford University press. Retrieved from,http://psychclassics.
yorku.ca/Pavlov/.
Pessiglione, M., Petrovic, P., Daunizeau, J., Palminteri, S., Dolan, R. J., & Frith, C. D.
(2008). Subliminal instrumental conditioning demonstrated in the human brain. Neuron,
59(4), 561567. http://dx.doi.org/10.1016/j.neuron.2008.07.005.
Pessiglione, M., Seymour, B., Flandin, G., Dolan, R. J., & Frith, C. D. (2006). Dopaminedependent prediction errors underpin reward-seeking behaviour in humans. Nature,
442(7106), 10421045. http://dx.doi.org/10.1038/nature05051.
Priori, A., Giannicola, G., Rosa, M., Marceglia, S., Servello, D., Sassi, M., et al. (2013). Deep
brain electrophysiological recordings provide clues to the pathophysiology of Tourette

152

Stefano Palminteri and Mathias Pessiglione

syndrome. Neuroscience and Biobehavioral Reviews, 37(6), 10631068. http://dx.doi.org/


10.1016/j.neubiorev.2013.01.011.
Rankin, C. H. (2004). Invertebrate learning: What cant a worm learn? Current Biology,
14(15), R617R618. http://dx.doi.org/10.1016/j.cub.2004.07.044.
Remijnse, P. L., Nielen, M. M., van Balkom, A. J., Cath, D. C., van Oppen, P.,
Uylings, H. B. M., et al. (2006). Reduced orbitofrontal-striatal activity on a reversal
learning task in obsessive-compulsive disorder. Archives of General Psychiatry, 63(11),
12251236. http://dx.doi.org/10.1001/archpsyc.63.11.1225.
Rescorla, R. A. (1967). Pavlovian conditioning and its proper control procedures. Psychological Review, 74(1), 7180.
Rescorla, R. A., & Wagner, A. R. (1972). RescorlaWagnerChapter1972.pdf. In
A. H. Black & W. F. Prokasy (Eds.), Classical conditioning II: Current research and theory
(pp. 6499). New York: Appleton-Century-Crofts.
Rickards, H., & Cavanna, A. E. (2009). Gilles de la Tourette: The man behind the
syndrome. Journal of Psychosomatic Research, 67(6), 469474. http://dx.doi.org/
10.1016/j.jpsychores.2009.07.019.
Rotge, J.-Y., Langbour, N., Guehl, D., Bioulac, B., Jaafari, N., Allard, M., et al. (2010). Gray
matter alterations in obsessive-compulsive disorder: An anatomic likelihood estimation
meta-analysis. Neuropsychopharmacology: Official Publication of the American College of
Neuropsychopharmacology, 35(3), 686691. http://dx.doi.org/10.1038/npp.2009.175.
Rutledge, R. B., Dean, M., Caplin, A., & Glimcher, P. W. (2010). Testing the reward prediction error hypothesis with an axiomatic model. The Journal of Neuroscience: The Official
Journal of the Society for Neuroscience, 30(40), 1352513536. http://dx.doi.org/10.1523/
JNEUROSCI.1747-10.2010.
Rutledge, R. B., Lazzaro, S. C., Lau, B., Myers, C. E., Gluck, M. A., & Glimcher, P. W.
(2009). Dopaminergic drugs modulate learning rates and perseveration in Parkinsons
patients in a dynamic foraging task. The Journal of Neuroscience: The Official Journal of
the Society for Neuroscience, 29(48), 1510415114. http://dx.doi.org/10.1523/
JNEUROSCI.3524-09.2009.
Samejima, K., & Doya, K. (2007). Multiple representations of belief states and action values in
corticobasal ganglia loops. Annals of the New York Academy of Sciences, 1104, 213228.
http://dx.doi.org/10.1196/annals.1390.024.
Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward.
Science, 275(5306), 15931599. http://dx.doi.org/10.1126/science.275.5306.1593.
Siebner, H. R., Bergmann, T. O., Bestmann, S., Massimini, M., Johansen-Berg, H.,
Mochizuki, H., et al. (2009). Consensus paper: Combining transcranial stimulation with
neuroimaging. Brain Stimulation, 2(2), 5880. http://dx.doi.org/10.1016/j.brs.2008.11.002.
Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. Oxford, England:
Appleton-Century. Retrieved from, http://en.wikipedia.org/wiki/The_Behavior_of_Organisms.
Steinberg, E. E., Keiflin, R., Boivin, J. R., Witten, I. B., Deisseroth, K., & Janak, P. H.
(2013). A causal link between prediction errors, dopamine neurons and learning. Nature
Neuroscience, 16(7), 966973. http://dx.doi.org/10.1038/nn.3413.
Suri, R. E., & Schultz, W. (1998). Learning of sequential movements by neural network
model with dopamine-like reinforcement signal. Experimental brain research.
Experimentelle Hirnforschung. Experimentation cerebrale, 121(3), 350354. Retrieved from,
http://www.ncbi.nlm.nih.gov/pubmed/9746140.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. IEEE Transactions on Neural Networks, 9(5), 1054. http://dx.doi.org/10.1109/TNN.1998.712192.
Tarnok, Z., Ronai, Z., Gervai, J., Kereszturi, E., Gadoros, J., Sasvari-Szekely, M., et al.
(2007). Dopaminergic candidate genes in Tourette syndrome: Association between
tic severity and 30 UTR polymorphism of the dopamine transporter gene. American Journal of Medical Genetics Part B, Neuropsychiatric genetics: The Official Publication of the

Reinforcement Learning and Tourette Syndrome

153

International Society of Psychiatric Genetics, 144B(7), 900905. http://dx.doi.org/10.1002/


ajmg.b.30517.
Theoretical neuroscience: computational and mathematical modeling of neural systems. (2005). The
MIT Press. ISBN:0262541858.
Thorndike, E. L. (1911). Animal intelligence: Experimental studies. New York: The Macmillan
Company. Retrieved from, http://www.psycontent.com/content/xg6830n4711655l1/.
Viswanathan, A., Jimenez-Shahed, J., Baizabal Carvallo, J. F., & Jankovic, J. (2012). Deep
brain stimulation for Tourette syndrome: Target selection. Stereotactic and Functional
Neurosurgery, 90(4), 213224. http://dx.doi.org/10.1159/000337776.
Voon, V., Fernagut, P.-O., Wickens, J., Baunez, C., Rodriguez, M., Pavon, N., et al. (2009).
Chronic dopaminergic stimulation in Parkinsons disease: From dyskinesias to impulse
control disorders. Lancet Neurology, 8(12), 11401149. http://dx.doi.org/10.1016/
S1474-4422(09)70287-X.
Voon, V., Pessiglione, M., Brezing, C., Gallea, C., Fernandez, H. H., Dolan, R. J.,
et al. (2010). Mechanisms underlying dopamine-mediated reward bias in compulsive behaviors. Neuron, 65(1), 135142. http://dx.doi.org/10.1016/j.neuron.
2009.12.027.
Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(34), 279292.
http://dx.doi.org/10.1007/BF00992698.
Weintraub, D., Comella, C. L., & Horn, S. (2008). Parkinsons diseasePart 3:
Neuropsychiatric symptoms. The American Journal of Managed Care, 14(2 Suppl.),
S59S69.Retrieved from,http://www.ncbi.nlm.nih.gov/pubmed/18402509.
Worbe, Y., Gerardin, E., Hartmann, A., Valabregue, R., Chupin, M., Tremblay, L., et al.
(2010). Distinct structural changes underpin clinical phenotypes in patients with Gilles de
la Tourette syndrome. Brain: A Journal of Neurology, 133(Pt. 12), 36493660. http://dx.
doi.org/10.1093/brain/awq293.
Worbe, Y., Malherbe, C., Hartmann, A., Pelegrini-Issac, M., Messe, A., Vidailhet, M., et al.
(2012). Functional immaturity of cortico-basal ganglia networks in Gilles de la Tourette
syndrome. Brain: A Journal of Neurology, 135(Pt. 6), 19371946. http://dx.doi.org/
10.1093/brain/aws056.
Worbe, Y., Mallet, L., Golmard, J.-L., Behar, C., Durif, F., Jalenques, I., et al. (2010). Repetitive behaviours in patients with Gilles de la Tourette syndrome: Tics, compulsions, or
both? PloS One, 5(9), e12959. http://dx.doi.org/10.1371/journal.pone.0012959.
Worbe, Y., Palminteri, S., Hartmann, A., Vidailhet, M., Lehericy, S., & Pessiglione, M.
(2011). Reinforcement learning and Gilles de la Tourette syndrome. Archives of General
Psychiatry, 68(12), 12571266.
Wunderlich, K., Dayan, P., & Dolan, R. J. (2012). Mapping value based planning and extensively trained choice in the human brain. Nature Neuroscience, 15(5), 786791. http://dx.
doi.org/10.1038/nn.3068.
Yoon, D. Y., Gause, C. D., Leckman, J. F., & Singer, H. S. (2007). Frontal dopaminergic
abnormality in Tourette syndrome: A postmortem analysis. Journal of the Neurological
Sciences, 255(12), 5056. http://dx.doi.org/10.1016/j.jns.2007.01.069.
Yoon, D. Y., Rippel, C. A., Kobets, A. J., Morris, C. M., Lee, J. E., Williams, P. N., et al.
(2007). Dopaminergic polymorphisms in Tourette syndrome: Association with the DAT
gene (SLC6A3). American Journal of Medical Genetics Part B, Neuropsychiatric genetics: The
Official Publication of the International Society of Psychiatric Genetics, 144B(5), 605610.
http://dx.doi.org/10.1002/ajmg.b.30466.

You might also like