HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech...

50
HST 722 – Speech Motor Control 1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department of Cognitive and Neural Systems, Boston University Division of Health Sciences and Technology, Harvard University / M.I.T. Research Laboratory of Electronics, Massachusetts Institute of Technology Satrajit Ghosh Alfonso Nieto- Castanon Jason Tourville Oren Civier Kevin Reilly Jason Bohland Jonathan Brumberg Michelle Hampson Joseph Perkell Virgilio Villacorta Majid Zandipour Melanie Matthies Shinji Maeda Collaborators

Transcript of HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech...

Page 1: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 1

Auditory, Somatosensory, and Motor Interactions in Speech Production

Supported by NIDCD,

NSF.

Frank H. Guenther

Department of Cognitive and Neural Systems, Boston UniversityDivision of Health Sciences and Technology, Harvard University / M.I.T.

Research Laboratory of Electronics, Massachusetts Institute of Technology

Satrajit GhoshAlfonso Nieto-Castanon

Jason TourvilleOren CivierKevin Reilly

Jason BohlandJonathan BrumbergMichelle Hampson

Joseph PerkellVirgilio VillacortaMajid ZandipourMelanie Matthies

Shinji Maeda

Collaborators

Page 2: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 2

CNS Speech Lab at Boston University

Primary goal is to elucidate the neural processes underlying:

• Learning of speech in children• Normal speech in adults• Breakdowns of speech in disorders such as stuttering

and apraxia of speech

Methods of investigation include:• Neural network modeling• Functional brain imaging• Motor and auditory psychophysics

These studies are organized around the DIVA model, a neural network model of speech acquisition and production developed in our lab.

Page 3: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 3

Talk Outline

Overview of the DIVA model• Mirror neurons in the model• Learning in the model• Simulating a hemodynamic response from the

model

Feedback control subsystem• Auditory perturbation fMRI experiment• Somatosensory perturbation fMRI experiment

Feedforward control subsystem• Sensorimotor adaptation to F1 perturbation

Summary

Page 4: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 4

Schematic of the DIVA Model

Page 5: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 5

Boxes in the schematic correspond to maps of neurons; arrows correspond to synaptic projections.

The model controls movements of a “virtual vocal tract”, or articulatory synthesizer. Video shows random movements of the articulators in this synthesizer.

Production of a speech sound in the model starts with activation of a speech sound map cell in left ventral premotor cortex (BA 44/6), which in turn activates feedforward and feedback control subsystems that converge on primary motor cortex.

Page 6: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 6

Speech Sound Map Mirror Neurons

Since its inception in 1992, the DIVA model has included a speech sound map that contains cells which are active during both perception and production of a particular speech sound (phoneme or syllable).

During perception, these neurons are necessary to learn an auditory target or goal for the sound, and to a lesser degree somatosensory targets (limited to the visible articulators such as lips).

Speech Sound Map

(Premotor Cortex)

Articulator Velocity and

Position Cells (Motor Cortex)

Auditory Error(Auditory Cortex)

Somatosensory Error

(Somatosensory Cortex)

Auditory GoalRegion

Somatosensory Goal Region

Somatosensory State

Auditory State

FeedforwardCommand

To Muscles

Auditory Feedback-Based Command

Somatosensory Feedback-Based Command

Speech Sound Map

(Premotor Cortex)

Articulator Velocity and

Position Cells (Motor Cortex)

Auditory Error(Auditory Cortex)

Somatosensory Error

(Somatosensory Cortex)

Auditory GoalRegion

Somatosensory Goal Region

Somatosensory State

Auditory State

FeedforwardCommand

To Muscles

Auditory Feedback-Based Command

Somatosensory Feedback-Based Command

Speech sound map during perception

Page 7: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 7

Speech Sound Map Mirror Neurons

After a sound has been learned (described next), activating the speech sound map cells for the sound leads to readout of the learned feedforward commands (“gestures”) and auditory and somatosensory targets for the sound (red arrows at right).

These targets are compared to incoming sensory signals to generate corrective commands if needed (blue).

The overall motor command (purple) combines feedforward and feedback components.

Speech Sound Map

(Premotor Cortex)

Articulator Velocity and

Position Cells (Motor Cortex)

Auditory Error(Auditory Cortex)

Somatosensory Error

(Somatosensory Cortex)

Auditory GoalRegion

Somatosensory Goal Region

Somatosensory State

Auditory State

FeedforwardCommand

To Muscles

Auditory Feedback-Based Command

Somatosensory Feedback-Based Command

Speech Sound Map

(Premotor Cortex)

Articulator Velocity and

Position Cells (Motor Cortex)

Auditory Error(Auditory Cortex)

Somatosensory Error

(Somatosensory Cortex)

Auditory GoalRegion

Somatosensory Goal Region

Somatosensory State

Auditory State

FeedforwardCommand

To Muscles

Auditory Feedback-Based Command

Somatosensory Feedback-Based Command

Speech sound map during production

Page 8: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 8

Learning in the Model – Stage 1

In the first learning stage, the model learns the relationships between motor commands, somatosensory feedback, and auditory feedback.

In particular, the model needs to learn how to transform sensory error signals into corrective motor commands.

This is done with babbling movements of the vocal tract which provide paired sensory and motor signals that can be used to tune these transformations.

Speech Sound Map

(Premotor Cortex)

Articulator Velocity and

Position Cells (Motor Cortex)

Auditory Error(Auditory Cortex)

Somatosensory Error

(Somatosensory Cortex)

Auditory GoalRegion

Somatosensory Goal Region

Somatosensory State

Auditory State

FeedforwardCommand

To Muscles

Auditory Feedback-Based Command

Somatosensory Feedback-Based Command

Speech Sound Map

(Premotor Cortex)

Articulator Velocity and

Position Cells (Motor Cortex)

Auditory Error(Auditory Cortex)

Somatosensory Error

(Somatosensory Cortex)

Auditory GoalRegion

Somatosensory Goal Region

Somatosensory State

Auditory State

FeedforwardCommand

To Muscles

Auditory Feedback-Based Command

Somatosensory Feedback-Based Command

Page 9: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 9

Learning in the Model – Stage 2 (Imitation)

The model then needs to learn auditory and somatosensory targets for individual speech sounds, and feedforward motor commands (“gestures”) for these sounds. This is done through an imitation process involving the speech sound map cells.

Speech Sound Map

(Premotor Cortex)

Articulator Velocity and

Position Cells (Motor Cortex)

Auditory Error(Auditory Cortex)

Somatosensory Error

(Somatosensory Cortex)

Auditory GoalRegion

Somatosensory Goal Region

Somatosensory State

Auditory State

FeedforwardCommand

To Muscles

Auditory Feedback-Based Command

Somatosensory Feedback-Based Command

Speech Sound Map

(Premotor Cortex)

Articulator Velocity and

Position Cells (Motor Cortex)

Auditory Error(Auditory Cortex)

Somatosensory Error

(Somatosensory Cortex)

Auditory GoalRegion

Somatosensory Goal Region

Somatosensory State

Auditory State

FeedforwardCommand

To Muscles

Auditory Feedback-Based Command

Somatosensory Feedback-Based Command

Model projections tuned during the imitation process are shown in red.

Page 10: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 10

The Imitation Process

(1) The model learns an auditory target from a sound sample provided by a fluent speaker; this target is stored in the synaptic weights projecting from the speech sound map to the higher-order auditory cortical areas.

(2) The model practices production of the sound to tune the feedforward commands and learn a somatosensory target.

Auditory target

for “ba”

Page 11: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 11

Simulation – Learning Feedforward Commands

Then it tries to repeat the target, initially under auditory feedback control. With each repetition, the model relies less on feedback control and more on feedforward control, resulting in better and better productions.

The model first learns the auditory target for the sound by listening to someone produce it. Sound sample presented to model:

Page 12: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 12

Top panel: Spectrogram of target utterance presented to the model.

Remaining panels:Spectrograms of the model’s first few attempts to produce the utterance.

Note improvement of auditory trajectories with each practice iteration due to improved feedforward commands.

Page 13: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 13

Tongue

Lip

Jaw

Larynx

SSM

S

A

Aud

Resp†

Lat Cbm

Palate*

Tongue

Lip

Jaw

Larynx

S

A

AudLat Cbm

Palate*

Resp†

Estimated Anatomical Locations of Model Components

The anatomical locations of the model’s components have been fine-tuned by comparison to the results of previous neurophysiological and neuroimaging studies (Guenther, Ghosh, and Tourville, 2006).

Each model component corresponds to a particular region of the brain:

Simulating a Hemodynamic Response in the Model

Page 14: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 14

The model’s cell activities during simulations can be directly compared to the results of fMRI and PET studies.

Page 15: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 15

Talk Outline

Overview of the DIVA model• Mirror neurons in the model• Learning in the model• Simulating a hemodynamic response from the

model

Feedback control subsystem• Auditory perturbation fMRI experiment• Somatosensory perturbation fMRI experiment

Feedforward control subsystem• Sensorimotor adaptation to F1 perturbation

Summary

Page 16: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 16

The model’s feedback control subsystem compares learned auditory and somatosensory target regions for the current speech sound to incoming sensory information.

If the current auditory or somatosensory state is outside the target region for the sound, error signals arise in higher-order auditory and/or somatosensory areas in the superior temporal lobe and parietal lobe.

Speech Sound Map

(Premotor Cortex)

Articulator Velocity and

Position Cells (Motor Cortex)

Auditory Error(Auditory Cortex)

Somatosensory Error

(Somatosensory Cortex)

Auditory GoalRegion

Somatosensory Goal Region

Somatosensory State

Auditory State

FeedforwardCommand

To Muscles

Auditory Feedback-Based Command

Somatosensory Feedback-Based Command

Speech Sound Map

(Premotor Cortex)

Articulator Velocity and

Position Cells (Motor Cortex)

Auditory Error(Auditory Cortex)

Somatosensory Error

(Somatosensory Cortex)

Auditory GoalRegion

Somatosensory Goal Region

Somatosensory State

Auditory State

FeedforwardCommand

To Muscles

Auditory Feedback-Based Command

Somatosensory Feedback-Based Command

Feedback Control Subsystem

Page 17: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 17

Speech Sound Map

(Premotor Cortex)

Articulator Velocity and

Position Cells (Motor Cortex)

Auditory Error(Auditory Cortex)

Somatosensory Error

(Somatosensory Cortex)

Auditory GoalRegion

Somatosensory Goal Region

Somatosensory State

Auditory State

FeedforwardCommand

To Muscles

Auditory Feedback-Based Command

Somatosensory Feedback-Based Command

Speech Sound Map

(Premotor Cortex)

Articulator Velocity and

Position Cells (Motor Cortex)

Auditory Error(Auditory Cortex)

Somatosensory Error

(Somatosensory Cortex)

Auditory GoalRegion

Somatosensory Goal Region

Somatosensory State

Auditory State

FeedforwardCommand

To Muscles

Auditory Feedback-Based Command

Somatosensory Feedback-Based Command

Feedback Control Subsystem (continued)

Auditory and somatosensory error signals are then transformed into corrective motor commands via projections from the sensory areas to the motor cortex.

Page 18: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 18

Prediction: Auditory Error Cells

The model hypothesizes an auditory error map in the higher-order auditory cortex of the posterior superior temporal gyrus and planum temporale.

Cells in this map should become active if a subject’s auditory feedback of his/her own speech is perturbed so that it mismatches the subject’s auditory target.

The model also predicts that this auditory error cell activation will give rise to increased activity in motor areas, where corrective articulator commands are generated.

Page 19: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 19

fMRI Study of Unexpected Auditory Perturbation During Speech

To test these predictions, we performed an fMRI study involving 11 subjects in which the first formant frequency (an important acoustic cue for speech) was unexpectedly perturbed upward or downward by 30% in ¼ of the production trials.

The perturbed feedback trials were randomly interspersed with normal feedback trials so the subject could not anticipate the perturbations.

Perturbations were applied using a DSP device developed with colleagues at MIT (Villacorta, Perkell, Guenther, 2004) which feeds the modified speech signal back to the subject in near real-time (~16 ms delay, not noticeable to subject).

Sound before shift:

Sound after shift:

Page 20: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 20

Time (sec)

Nor

ma

lize

d F

1

Unexpectedly shifting the feedback caused subjects to compensate within the same syllable as the shift (gray 95% confidence intervals):

DIVA model productions in response to unexpected upward (dashed line) and downward (solid line) perturbations of F1 fall within the distribution of productions of the speakers in the fMRI study (shaded regions).

Response to downshift

Response to upshift

Page 21: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 21

=> Auditory feedback control is right-lateralized in the frontal cortex.

Page 22: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 22

Speech Sound Map

(Premotor Cortex)

Articulator Velocity and

Position Cells (Motor Cortex)

Auditory Error(Auditory Cortex)

Somatosensory Error

(Somatosensory Cortex)

Auditory GoalRegion

Somatosensory Goal Region

Somatosensory State

Auditory State

FeedforwardCommand

To Muscles

Auditory Feedback-Based Command

Somatosensory Feedback-Based Command

Speech Sound Map

(Premotor Cortex)

Articulator Velocity and

Position Cells (Motor Cortex)

Auditory Error(Auditory Cortex)

Somatosensory Error

(Somatosensory Cortex)

Auditory GoalRegion

Somatosensory Goal Region

Somatosensory State

Auditory State

FeedforwardCommand

To Muscles

Auditory Feedback-Based Command

Somatosensory Feedback-Based Command

The model also predicts that a sudden, unexpected perturbation of the jaw should cause an increase in error cell activity in somatosensory and (perhaps) auditory cortical areas.

This in turn should lead to increased activity in motor areas where corrective commands are generated.

Prediction: Somatosensory Error Cells

Page 23: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 23

fMRI Study of Unexpected Jaw Perturbation During Speech

13 subjects produced /aCi/ utterances while in the MRI scanner (e.g., “abi”, “ani”, “agi”).

An event-triggered paradigm was used to avoid movement artifacts and scanner noise issues:

On 1 in 7 utterances, a small balloon was rapidly inflated between the teeth during the initial vowel.

The balloon inhibits upward jaw movement for the consonant and final vowel, causing the subject to compensate with larger tongue and/or lip movements.

Page 24: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 24

Perturbed – Unperturbed Speech (p<0.001):

L

L

R

R

L

R

Page 25: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 25

Talk Outline

Overview of the DIVA model• Mirror neurons in the model• Learning in the model• Simulating a hemodynamic response from the

model

Feedback control subsystem• Auditory perturbation fMRI experiment• Somatosensory perturbation fMRI experiment

Feedforward control subsystem• Sensorimotor adaptation to F1 perturbation

Summary

Page 26: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 26

Feedforward Control in the Model

In addition to activating the feedback control subsystem, activating speech sound map cells also causes the readout of feedforward commands for the sound to be produced.

These commands are encoded in synaptic projections from premotor cortex to primary motor cortex, including both cortico-cortical (blue) and trans-cerebellar (purple) projections.

Feedforward commands in the DIVA model

Page 27: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 27

The commands generated by the feedforward system (red) and feedback system (blue) are combined in motor cortex to form the overall motor command to the speech articulators (purple)

Combining Feedforward and Feedback Commands

Speech Sound Map

(Premotor Cortex)

Articulator Velocity and

Position Cells (Motor Cortex)

Auditory Error(Auditory Cortex)

Somatosensory Error

(Somatosensory Cortex)

Auditory GoalRegion

Somatosensory Goal Region

Somatosensory State

Auditory State

FeedforwardCommand

To Muscles

Auditory Feedback-Based Command

Somatosensory Feedback-Based Command

Speech Sound Map

(Premotor Cortex)

Articulator Velocity and

Position Cells (Motor Cortex)

Auditory Error(Auditory Cortex)

Somatosensory Error

(Somatosensory Cortex)

Auditory GoalRegion

Somatosensory Goal Region

Somatosensory State

Auditory State

FeedforwardCommand

To Muscles

Auditory Feedback-Based Command

Somatosensory Feedback-Based Command

Page 28: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 28

Early in development, the feedforward commands are poorly tuned, so the feedback control subsystem is needed to “correct” the commands.

On each attempt to produce a sound, the feedforward controller incorporates these feedback-based corrections into the feedforward command for the next attempt, resulting in better and better feedforward control with practice.

Tuning Feedforward Commands

Page 29: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 29

The interactions between the feedforward and feedback control subsystems in the model lead to the following predictions:

• If a speaker’s auditory feedback is perturbed consistently over many consecutive productions of a syllable, corrective commands issued by the auditory feedback control subsystem will become incorporated into the feedforward commands for that syllable.

• Speakers with better hearing (auditory acuity) will adapt more than speakers with worse hearing.

• If the perturbation is then removed, the speaker will show “after-effects” due to these adjustments to the feedforward command.

This was investigated by Villacorta, Perkell, & Guenther (2004).

Page 30: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 30

Sensorimotor Adaptation Study – F1 Perturbation

In each epoch of the adaptation study, the subject read a short list of words involving the vowel “eh” (e.g., “bet”, “peck”).

After a baseline phase of 15 epochs of reading with normal feedback, a shift of F1 was gradually applied to the subject’s auditory feedback during the next 5 epochs.

The shift was then held at the maximum level (30% shift) for 25 epochs.

Finally, feedback was returned to normal in a 20-epoch post-test phase.

The entire experimental session lasted approximately 60-90 minutes.

Page 31: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 31

• Results for 20 subjects shown by lines with standard error bars.• Shaded region is the 95% confidence interval for model

simulation results (one simulation per speaker, target region size determined by speaker’s auditory acuity).

Page 32: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 32

Sensorimotor Adaptation Study Results

• Sustained auditory perturbation leads to adjustments in feedforward commands for speech in order to cancel out the effects of the perturbation.

• Amount of adaptation is correlated to speaker’s auditory acuity: high acuity speakers adapt more completely to the perturbation.

• When perturbation is removed, speech only gradually returns to normal values; i.e., there is an after-effect in the first few trials after hearing returns to normal (evidence for feedforward command adaptation).

• The model provides a close quantitative fit to these processes.

Page 33: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 33

Summary

The DIVA model elucidates several types of learning in speech acquisition, e.g.:

• Learning of relationships between articulations and their acoustic and somatosensory consequences

• Learning of auditory targets for speech sounds in the native language from externally presented examples

• Learning of feedforward commands for new sounds through practice

The model elucidates the interactions between motor, somatosensory, and auditory areas responsible for speech motor control.

The model spans behavioral and neural levels and makes predictions that are being tested using a variety of experimental techniques.

Page 34: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 34

Reconciling Gestural and Auditory Views of Speech Production

In our view, the “gestural score” is the feedforward component of speech production, stored in the projections from premotor to motor cortex, which consists of an optimized set of motor programs for the most frequently produced phonemes, syllables and syllable strings (in Levelt’s terms, the “syllabary”).

These gestural scores are shaped by auditory experience in order to adhere to acceptable auditory bounds of speech sounds in the native language(s).

They are supplemented by auditory and somatosensory feedback control systems that constantly adjust the gestures when they detect errors in performance, e.g. due to growth of the vocal tract, addition of a false palate, or auditory feedback perturbation.

Page 35: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 35

Alfonso Nieto-Castanon

Satrajit GhoshJason TourvilleKevin Reilly Oren Civier

Jonathan BrumbergJason Bohland

Michelle Hampson

Joseph PerkellMajid ZandipourVirgilio VillacortaMelanie Matthies

Shinji Maeda

Collaborators

Supported by NSF and NIDCD

Page 36: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 36

Simulating a Hemodynamic Response from the Model

Model cell activities during simulations of speech are convolved with an idealized hemodynamic response function, generated using default settings of the function ‘spm_hrf’ from the SPM toolbox. This function was designed to characterize the transformation from cell activity to hemodynamic response in the brain.

A brain volume is then constructed with the appropriate hemodynamic response values at each position and smoothed with a Gaussian kernel (FWHM=12mm). This smoothing process approximates the smoothing carried out during standard SPM analysis of human subject fMRI data.

The resultant volume is then rendered using routines from the SPM toolbox.

Page 37: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 37

An event-triggered paradigm was used to avoid movement artifacts and scanner noise issues:

Page 38: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 38

Error signal magnitude of stuttering individual

Threshold for motor reset due to sensory error signals

Error signal magnitude for non-stuttering individual

Onset ofstuttering

Age

Hypothesis regarding onset of stuttering:

Page 39: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 39

Brain regions active during cued production of 3-syllable strings

IFS BA 44

Pre SM A SM A

Se q ue nc eWM

Sp e e c hSo und M a p

Trig g e r C e llsFra m e Re p

Ne xtSo und

Trig g e rSig na ls

Fra m eSig na ls

To m o to rc o rte x(DIVA

m o d e l)

O ve rtsp e e c ho nly

Page 40: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 40

Target for /k/

Config. for /u/ Config. for /i/

Tongue Body Horiz. Position

Ton

gue

Bod

y H

eigh

t

The model’s use of sensory target regions provides a unified account for a number of speech production phenomena, including aspects of:

• Economy of effort (cf. Lindblom)

•Articulatory variability

•Anticipatory coarticulation

•Carryover coarticulation

•Speaking rate effects

Schematized at right is the model’s explanation of carryover coarticulation and economy of effort during production of /k/ in “luke” and “leak”.

Carryover Coarticulation

Auditory Target Regions

Page 41: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 41

Contrast distance

/i/

/e/

First formant frequency

Second formant frequency

Two factors that could influence target region size:

(1) Perceptual acuity of speaker: better perceptual acuity => smaller regions

(2) Speaking condition: clear speech (vs. fast speech) => smaller regions

Page 42: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 42

0

2

4

6

8

10

L

LL

Art

icu

lato

ryC

on

trast

Dis

tan

ce (

mm

)

H

HH

F N CSpeaking Condition

200

250

300

350

400

L

L

L

Ac

ou

sti

cCo

ntr

ast

Dis

tan

ce (

Hz)

H

HH

L LL

H HH

F N CSpeaking Condition

LL

L

HH

H

Who'd-Hood Cod -Cud

* *

*

LO

HI LO

HI

LO

HI

LO

HI

L

LLH

HH

L

L

LH

HH

L LL

H HH

LL

L

HH

H

- -Cud

* *

*

LO

HI LO

HI

LO

HI

0

2

4

6

8

10

L

LL

Art

icu

lato

ryC

on

trast

Dis

tan

ce (

mm

)

H

HH

F N CSpeaking Condition

200

250

300

350

400

L

L

L

Ac

ou

sti

cCo

ntr

ast

Dis

tan

ce (

Hz)

H

HH

L LL

H HH

F N CSpeaking Condition

LL

L

HH

H

Who'd-Hood Cod -Cud

* *

*

LO

HI

LO

HI LO

HI

LO

HI

LO

HI

LO

HI

LO

HI

LO

HI

L

LLH

HH

L

L

LH

HH

L LL

H HH

LL

L

HH

H

- -Cud

* *

*

LO

HI LO

HI

LO

HI

F N C0

500

1000

1500

2000

Aco

ust

ic c

on

tra

st d

ista

nce

(H

z)

F N C

HH H

LL L

Sod-shod Said-shed

Speaking condition

Discrimination

F N CF N C

HH

H

L L L

Speaking condition

*H

L

H

L

F N CF N C

HH H

LL L

- -

HH

H

L L LH

L

H

LHH H

LL L

- -

HH

H

L L LH

L

H

L

F N C0

500

1000

1500

2000

Aco

ust

ic c

on

tra

st d

ista

nce

(H

z)

F N C

HH H

LL L

Sod-shod Said-shed

Speaking condition

Discrimination

F N CF N C

HH

H

L L L

Speaking condition

*H

L

H

L

F N C0

500

1000

1500

2000

Aco

ust

ic c

on

tra

st d

ista

nce

(H

z)

F N C

HH H

LL L

Sod-shod Said-shed

Speaking condition

Discrimination

F N CF N C

HH

H

L L L

Speaking condition

*H

L

H

L

F N CF N C

HH H

LL L

- -

HH

H

L L LH

L

H

LHH H

LL L

- -

HH

H

L L LH

L

H

L

Who’d- Hood Cod-Cud

Perkell et al. (2004a,b).

Results of EMMA studies:

(1) Speakers with high perceptual acuity show greater contrast distance in production of neighboring sound categories.

(2) General tendency for greater contrast distance in clear speech, less in fast speech.

These results support the predictions on the preceding slide.

Page 43: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 43

Ellipses indicating the range of formant frequencies (+/-1 s.d.) used by a speaker to produce five vowels (iy, eh, aa, uh, uw) during fast speech (light gray) and clear speech (dark gray) in a variety of phonetic contexts.

Page 44: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 44

Motor Equivalence in American English /r/ Production

It has long been known that the American English phoneme /r/ is produced with a large amount of articulatory variability, both within a subject and between subjects. Delattre and Freeman (1968):

Page 45: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 45

Despite large articulatory variability, the key acoustic cue for /r/ remains relatively stable across phonetic contexts. Boyce and Espy-Wilson (1997):

Page 46: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 46

Motor Equivalence in the DIVA Model

Model’s use of auditory target for /r/ and directional mapping between auditory and articulatory spaces leads to different articulatory gestures, and different vocal tract shapes, for /r/ depending on phonetic context:

Producing /r/ after /g/: Producing /r/ after /a/:

EMMA/Modeling Study:

(1) Collect EMMA data from speakers producing /r/ in different contexts (2) Build speaker-specific vocal tract models (articulatory synthesizers) for two

of the speakers(3) Train the DIVA model to produce sounds with the speaker-specific vocal

tracts(4) Compare the model’s /r/ productions to those of the EMMA subjects

Page 47: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 47

Sketch of hypothesized trading relations for /r/:

Acoustic effect of the larger front cavity (blue) is compensated by the effect of the longer and narrower constriction (red).

This yields similar acoustics for “bunched” (red) and “retroflex” (blue) tongue configurations for /r/ (Stevens, 1998; Boyce & Espy-Wilson, 1997).

All seven subjects in the EMMA study utilized similar trading relations (Guenther et al., 1999).

Lips

Tongueconstriction

/wagrav//warav/ or /wabrav/

S1

S2

S3

S4

S5

S6

S7

BACK FRONT BACK FRONT

1 cm

Page 48: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 48

changing F1 changing F2 changing F3

Su

bje

ct 1

Su

bje

ct 2

Building Speaker-Specific Vocal Tract Models from MRI Images

Page 49: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 49

/ar/ /dr/ /gr/ /ar/ /dr/ /gr/

sub

ject

d

ata

DIV

A

sim

ula

tio

ns

Subject 1 Subject 2

[Nieto-Castanon, Guenther, Perkell, and Curtin (2005), J Acoust Soc Am.]

Comparison of model’s articulations using speaker-specific vocal tracts to those speakers’ actual articulations:

Page 50: HST 722 – Speech Motor Control1 Auditory, Somatosensory, and Motor Interactions in Speech Production Supported by NIDCD, NSF. Frank H. Guenther Department.

HST 722 – Speech Motor Control 50

Speech Sound Map

(Premotor Cortex)

Articulator Velocity and

Position Cells (Motor Cortex)

Auditory Error(Auditory Cortex)

Somatosensory Error

(Somatosensory Cortex)

Auditory GoalRegion

Somatosensory Goal Region

Somatosensory State

Auditory State

FeedforwardCommand

To Muscles

Auditory Feedback-Based Command

Somatosensory Feedback-Based Command

Speech Sound Map

(Premotor Cortex)

Articulator Velocity and

Position Cells (Motor Cortex)

Auditory Error(Auditory Cortex)

Somatosensory Error

(Somatosensory Cortex)

Auditory GoalRegion

Somatosensory Goal Region

Somatosensory State

Auditory State

FeedforwardCommand

To Muscles

Auditory Feedback-Based Command

Somatosensory Feedback-Based Command