Professional Documents
Culture Documents
Frank H. Guenther
Department of Cognitive and Neural Systems, Boston University Division of Health Sciences and Technology, Harvard University / M.I.T. Research Laboratory of Electronics, Massachusetts Institute of Technology
Collaborators
Satrajit Ghosh Alfonso Nieto-Castanon Jason Tourville Oren Civier Kevin Reilly Jason Bohland Jonathan Brumberg Michelle Hampson Joseph Perkell Virgilio Villacorta Majid Zandipour Melanie Matthies Shinji Maeda
Talk Outline
Overview of the DIVA model Mirror neurons in the model Learning in the model Simulating a hemodynamic response from the model
Feedback control subsystem Auditory perturbation fMRI experiment Somatosensory perturbation fMRI experiment
Boxes in the schematic correspond to maps of neurons; arrows correspond to synaptic projections. The model controls movements of a virtual vocal tract, or articulatory synthesizer. Video shows random movements of the articulators in this synthesizer. Production of a speech sound in the model starts with activation of a speech sound map cell in left ventral premotor cortex (BA 44/6), which in turn activates feedforward and feedback control subsystems that converge on primary motor cortex.
HST 722 Speech Motor Control 5
Somatosensory State
Feedforward Command
To Muscles
Somatosensory State
Feedforward Command
To Muscles
Somatosensory Goal Region Speech Sound Map (Premotor Cortex) Auditory Goal Region Auditory State
Somatosensory State
Feedforward Command
This is done with babbling movements of the vocal tract which provide paired sensory and motor signals that can be used to tune these transformations.
To Muscles
Model projections tuned during the imitation process are shown in red.
Somatosensory Goal Region Speech Sound Map (Premotor Cortex) Auditory Goal Region Auditory State
Somatosensory State
Feedforward Command
To Muscles
(2) The model practices production of the sound to tune the feedforward commands and learn a somatosensory target.
HST 722 Speech Motor Control 10
Then it tries to repeat the target, initially under auditory feedback control. With each repetition, the model relies less on feedback control and more on feedforward control, resulting in better and better productions.
HST 722 Speech Motor Control 11
Top panel: Spectrogram of target utterance presented to the model. Remaining panels: Spectrograms of the models first few attempts to produce the utterance.
Note improvement of auditory trajectories with each practice iteration due to improved feedforward commands.
12
Lip
DS
DA Lat Cbm
Larynx Aud
The anatomical locations of the models components have been finetuned by comparison to the results of previous neurophysiological and neuroimaging studies (Guenther, Ghosh, and Tourville, 2006).
HST 722 Speech Motor Control 13
The models cell activities during simulations can be directly compared to the results of fMRI and PET studies.
14
Talk Outline
Overview of the DIVA model Mirror neurons in the model Learning in the model Simulating a hemodynamic response from the model
Feedback control subsystem Auditory perturbation fMRI experiment Somatosensory perturbation fMRI experiment
15
Somatosensory State
Feedforward Command
To Muscles
16
Somatosensory Goal Region Speech Sound Map (Premotor Cortex) Auditory Goal Region Auditory State
Somatosensory State
Feedforward Command
To Muscles
17
The model also predicts that this auditory error cell activation will give rise to increased activity in motor areas, where corrective articulator commands are generated.
18
19
Unexpectedly shifting the feedback caused subjects to compensate within the same syllable as the shift (gray 95% confidence intervals):
Response to downshift
Normalized F1
Response to upshift
Time (sec)
DIVA model productions in response to unexpected upward (dashed line) and downward (solid line) perturbations of F1 fall within the distribution of productions of the speakers in the fMRI study (shaded HST 722 Speech Motor Control 20 regions).
Somatosensory Goal Region Speech Sound Map (Premotor Cortex) Auditory Goal Region Auditory State
Somatosensory State
Feedforward Command
This in turn should lead to increased activity in motor areas where corrective commands are generated.
To Muscles
22
On 1 in 7 utterances, a small balloon was rapidly inflated between the teeth during the initial vowel. The balloon inhibits upward jaw movement for the consonant and final vowel, causing the subject to compensate with larger tongue and/or lip movements.
23
L L
R R
24
Talk Outline
Overview of the DIVA model Mirror neurons in the model Learning in the model Simulating a hemodynamic response from the model
Feedback control subsystem Auditory perturbation fMRI experiment Somatosensory perturbation fMRI experiment
25
These commands are encoded in synaptic projections from premotor cortex to primary motor cortex, including both corticocortical (blue) and transcerebellar (purple) projections.
HST 722 Speech Motor Control 26
Somatosensory Goal Region Speech Sound Map (Premotor Cortex) Auditory Goal Region Auditory State
Somatosensory State
Feedforward Command
To Muscles
27
28
The interactions between the feedforward and feedback control subsystems in the model lead to the following predictions:
If a speakers auditory feedback is perturbed consistently over many consecutive productions of a syllable, corrective commands issued by the auditory feedback control subsystem will become incorporated into the feedforward commands for that syllable.
Speakers with better hearing (auditory acuity) will adapt more than speakers with worse hearing. If the perturbation is then removed, the speaker will show after-effects due to these adjustments to the feedforward command.
Results for 20 subjects shown by lines with standard error bars. Shaded region is the 95% confidence interval for model simulation results (one simulation per speaker, target region size determined by speakers auditory acuity).
HST 722 Speech Motor Control 31
32
Summary
The DIVA model elucidates several types of learning in speech acquisition, e.g.: Learning of relationships between articulations and their acoustic and somatosensory consequences Learning of auditory targets for speech sounds in the native language from externally presented examples Learning of feedforward commands for new sounds through practice The model elucidates the interactions between motor, somatosensory, and auditory areas responsible for speech motor control. The model spans behavioral and neural levels and makes predictions that are being tested using a variety of experimental techniques.
HST 722 Speech Motor Control 33
34
Collaborators
Alfonso NietoCastanon Satrajit Ghosh Jason Tourville Kevin Reilly Oren Civier Jonathan Brumberg Jason Bohland Michelle Hampson Joseph Perkell Majid Zandipour Virgilio Villacorta Melanie Matthies Shinji Maeda
35
36
An event-triggered paradigm was used to avoid movement artifacts and scanner noise issues:
37
Onset of stuttering
Age
38
PreSMA
Frame R ep Frame S ignals
SMA
T rigger Cells T rigger S ignals Next S ound
Overt speech only
IFS
S equence WM
BA 44
S peech S ound Map
39
Carryover Coarticulation
Schematized at right is the models Tongue Body Horiz. Position explanation of carryover coarticulation and economy of effort during production of /k/ in luke and leak. HST 722 Speech Motor Control
40
Two factors that could influence target region size: (1) Perceptual acuity of speaker: better perceptual acuity => smaller regions (2) Speaking condition: clear speech (vs. fast speech) => smaller regions
/e/
41
L L L L
Articulatory
LO
2 0
Whod- Hood
Acoustic Contrast Distance (Hz)
400
Cod-Cud
H H
(1) Speakers with high perceptual acuity show greater contrast distance in production of neighboring sound categories.
(2) General tendency for greater contrast distance in clear speech, less in fast speech.
350
HI 300
H
HI LO
*
L L L
LO
250
H L L
200
F N C Speaking Condition
F N C Speaking Condition
Discrimination
Said-shed -
*
H L H L H L
H L
H L
H L L
H L
These results support the predictions on the preceding slide. Perkell et al. (2004a,b).
Speaking condition
Speaking condition
42
Ellipses indicating the range of formant frequencies (+/-1 s.d.) used by a speaker to produce five vowels (iy, eh, aa, uh, uw) during fast speech (light gray) and clear speech (dark gray) in a variety of phonetic contexts.
HST 722 Speech Motor Control 43
44
Despite large articulatory variability, the key acoustic cue for /r/ remains relatively stable across phonetic contexts. Boyce and Espy-Wilson (1997):
45
EMMA/Modeling Study: (1) Collect EMMA data from speakers producing /r/ in different contexts (2) Build speaker-specific vocal tract models (articulatory synthesizers) for two of the speakers (3) Train the DIVA model to produce sounds with the speaker-specific vocal tracts (4) Compare the models /r/ productions to those of the EMMA subjects
HST 722 Speech Motor Control 46
Lips
S1 S5
Tongue constriction
Acoustic effect of the larger front cavity (blue) is compensated by the effect of the longer and narrower constriction (red).
1 cm
S2
S6
This yields similar acoustics for bunched (red) and retroflex (blue) tongue configurations for /r/ (Stevens, 1998; Boyce & Espy-Wilson, 1997).
All seven subjects in the EMMA study utilized similar trading relations (Guenther et al., 1999).
S3
S7
BACK
FRONT
BACK
FRONT
47
changing
F1
changing
F2
changing
F3
Subject 2
Subject 1
48
Comparison of models articulations using speaker-specific vocal tracts to those speakers actual articulations:
/ar/
subject data
Subject 1 /dr/
/gr/
/ar/
Subject 2 /dr/
/gr/
DIVA simulations
[Nieto-Castanon, Guenther, Perkell, and Curtin (2005), J Acoust Soc Am.]
HST 722 Speech Motor Control 49
Somatosensory Goal Region Speech Sound Map (Premotor Cortex) Auditory Goal Region Auditory State
Somatosensory State
Feedforward Command
To Muscles
50