Neural Representations of Speech at the “Cocktail Party” in...
Transcript of Neural Representations of Speech at the “Cocktail Party” in...
Neural Representations of Speech at the “Cocktail Party” in Human Auditory Cortex
Jonathan Z. SimonDepartment of Electrical & Computer EngineeringDepartment of BiologyInstitute for Systems Research
University of Maryland
Peking University / NYU Shanghai / Zhejiang University 2016http://www.isr.umd.edu/Labs/CSSL/simonlab
AcknowledgementsCurrent (Simon Lab & Affiliates)
Christian BrodbeckFrancisco CervantesDavid NahmiasMahshid NajafiKrishna PuvvadaPeng Zan
Past (Simon Lab & Affiliate Labs)Nayef AhmarMurat AytekinClaudia BoninMaria ChaitMarisel Villafane Delgado Kim DrnecNai Ding Victor Grau-SerratJulian JenkinsNatalia LapinskayaKai Sum LiHuan LuoLing MaAlex Presacco
Raul RodriguezBen WalshJuanjuan Xiang Jiachen Zhuo
CollaboratorsPamela AbshireSamira Anderson Behtash Babadi Catherine CarrMonita ChatterjeeAlain de CheveignéDidier DepireuxMounya ElhilaliBernhard EnglitzJonathan FritzStefanie KuchinskyCindy MossDavid PoeppelShihab Shamma
Past Postdocs & Visitors Sahar Akram
Funding NIH (NIDCD, NIA, NIBIB); USDA; UMD
Aline Gesualdi Manhães Dan HertzJonas VanthornhoutYadong Wang
Undergraduate Students Abdulaziz Al-Turki Nicholas AsendorfSonja BohrElizabeth CamengaCorinne CameronJulien DagenaisKatya DombrowskiKevin HoganKevin KahnAlexandria MillerAndrea ShomeSandra SoltzMadeleine VarmerJames Williams
Outline• Cortical Representations of Speech (via MEG)
‣ Encoding vs. Decoding
• Cortical Representations of the “Cocktail Party”
• Recent Results
‣ Attentional Dynamics
‣ Aging & Cortical Representations of Speech
‣ Higher Level Interference & Noise
Functional Brain Imaging= Non-invasive recording from human brain
Hem
odyn
amic
tech
niqu
esEl
ectr
omag
netic
tech
niqu
es
fMRI functional magnetic resonance imaging
PET positron emission tomography
EEG electroencephalography
MEG magnetoencephalography
Excellent Spatial Resolution(~1 mm)
Poor Temporal Resolution(~1 s)
Poor Spatial Resolution(~1 cm)
Excellent Temporal Resolution(~1 ms)
Functional Brain Imaging
fMRI & MEG can capture effects in single
subjects
MEG & Auditory Cortex
• Non-invasive, Passive, Silent Neural Recordings
• MEG Response Patterns Time-Locked to Stimulus Events
• Robust
• Strongly Lateralized
• Cortical Origin Only
Pure Tone
Broadband Noise
time (ms)
time (ms)
MEG Responses
AuditoryModel
to Speech Modulations
Ding & Simon, J Neurophysiol (2012) “Spectro-Temporal Response Function”
(up to ~10 Hz)
MEG Responses Predicted by STRF Model
Linear Kernel = STRF
Long duration speech: ~60 s
Ding & Simon, J Neurophysiol (2012) “Spectro-Temporal Response Function”
(up to ~10 Hz)
MEG Responses Predicted by STRF Model
Linear Kernel = STRF
Long duration speech: ~60 s
Ding & Simon, J Neurophysiol (2012)Zion-Golumbic et al., Neuron (2013)
Neural Reconstruction of Speech Envelope
2 s
stimulus speech envelopereconstructed stimulus speech envelope
Reconstruction accuracy comparable to single unit & ECoG recordings
(up to ~ 10 Hz)
MEG Responses
...
DecoderSpeech Envelope
Ding & Simon, J Neurophysiol (2012)Zion-Golumbic et al., Neuron (2013)
Neural Reconstruction of Speech Envelope
2 s
stimulus speech envelopereconstructed stimulus speech envelope
Reconstruction accuracy comparable to single unit & ECoG recordings
(up to ~ 10 Hz)
MEG Responses
...
DecoderSpeech Envelope
Ding & Simon, J Neurophysiol (2012)Zion-Golumbic et al., Neuron (2013)
Neural Reconstruction of Speech Envelope
2 s
stimulus speech envelopereconstructed stimulus speech envelope
Reconstruction accuracy comparable to single unit & ECoG recordings
(up to ~ 10 Hz)
MEG Responses
...
DecoderSpeech Envelope
Neural Representation of Speech: Temporal
Speech in Stationary Noise
Ding & Simon, J Neuroscience (2013)
Speech in Stationary Noise
Ding & Simon, J Neuroscience (2013)
Speech in Noise: Results
+6 dB
-6 dB1 s
A Neural Reconstruction ofUnderlying Speech Envelope
B Reconstruction Accuracy
corre
latio
n
SNR (dB)Q +6 +2 −3 −6 −9
0
.1
.2
0 25 50 75 1000
.1
.2
.3
C Correlation with Intelligiblity
intelligiblity (%)
reco
nstru
ctio
n ac
cura
cy
Ding & Simon, J Neuroscience (2013)
Speech in Noise: Results
+6 dB
-6 dB1 s
A Neural Reconstruction ofUnderlying Speech Envelope
+6 dB
-6 dB1 s
A Neural Reconstruction ofUnderlying Speech Envelope
B Reconstruction Accuracy
corre
latio
n
SNR (dB)Q +6 +2 −3 −6 −9
0
.1
.2
0 25 50 75 1000
.1
.2
.3
C Correlation with Intelligiblity
intelligiblity (%)
reco
nstru
ctio
n ac
cura
cy
Ding & Simon, J Neuroscience (2013)
Speech in Noise: Results
+6 dB
-6 dB1 s
A Neural Reconstruction ofUnderlying Speech Envelope
+6 dB
-6 dB1 s
A Neural Reconstruction ofUnderlying Speech Envelope
B Reconstruction Accuracy
corre
latio
n
SNR (dB)Q +6 +2 −3 −6 −9
0
.1
.2
0 25 50 75 1000
.1
.2
.3
C Correlation with Intelligiblity
intelligiblity (%)
reco
nstru
ctio
n ac
cura
cy
+6 dB
-6 dB1 s
A Neural Reconstruction ofUnderlying Speech Envelope
B Reconstruction Accuracy
corre
latio
n
SNR (dB)Q +6 +2 −3 −6 −9
0
.1
.2
0 25 50 75 1000
.1
.2
.3
C Correlation with Intelligiblity
intelligiblity (%)
reco
nstru
ctio
n ac
cura
cy
across Subjects
Ding & Simon, J Neuroscience (2013)
Noise-Vocoded Speech
Ding, Chatterjee & Simon, NeuroImage (2014)
Intelligibility Reflected only in Delta Band (1– 4 Hz)
Multiple Cortical Speech Representations?
Di Liberto, et al. (2015) Low-Frequency Cortical Entrainment to Speech Reflects Phoneme-Level Processing
Kayser et al. (2015) Irregular Speech Rate Dissociates Auditory Cortical Entrainment, Evoked Responses, and Frontal Alpha
Ding et al. (2015) Cortical tracking of hierarchical linguistic structures in connected speech
Cortical Speech Representations
• Neural Representations: Encoding & Decoding
• Linear models: Useful & Robust
• Speech Envelope only (as seen in MEG)
• Envelope Rates: ~ 1 - 10 Hz
• Intelligibility linked to lower range of frequencies (Delta)
Alex Katz, The Cocktail Party
Listening to Speech at the Cocktail Party
Alex Katz, The Cocktail Party
Listening to Speech at the Cocktail Party
Alex Katz, The Cocktail Party
Listening to Speech at the Cocktail Party
Alex Katz, The Cocktail Party
Listening to Speech at the Cocktail Party
speech
competing speech
Competing Speech Streams
Selective Neural Encoding
Selective Neural Encoding
Unselective vs. Selective Neural Encoding
Selective Neural Encoding
Selective Encoding: Resultsrepresentative
subject
Identical Stimuli!
reconstructed from MEG
attended speech envelopes
reconstructed from MEG
attending tospeaker 1
attending tospeaker 2
Ding & Simon, PNAS (2012)
Single Trial Speech Reconstruction
Ding & Simon, PNAS (2012)
Single Trial Speech Reconstruction
Ding & Simon, PNAS (2012)
STRF Results
•STRF separable (time, frequency)•300 Hz - 2 kHz dominant carriers•M50STRF positive peak•M100STRF negative peak
TRF
•M100STRF strongly modulated by attention, but not M50STRF
attended
.2
.5
1
3
0 100 200
Background
fre
qu
en
cy (
kH
z)
.2
.5
1
3
0 100 200
Attended
time (ms) time (ms)
background
Neural Sources
RightLeft
anterior
posterior
medial
M50STRFM100STRFM100
•M100STRF source near (same as?) M100 source: Planum Temporale
•M50STRF source is anterior and medial to M100 (same as M50?): Heschl’s Gyrus
5 mm
•PT strongly modulated by attention, but not HG
Recent Results
• Attentional Dynamics
• Aging & Cortical Representations of Speech
• High Level Interference & Noise
Recent Results
• Attentional Dynamics
• Aging & Cortical Representations of Speech
• High Level Interference & Noise
Attentional DynamicsAttend to Speaker 1
Attend to Speaker 2
Prob
abilit
y of
atte
ndin
g S
peak
er 1
Akram et al., NeuroImage (2016)
Attentional DynamicsAttend to Speaker 1
Switch Attention
Attend to Speaker 2
Prob
abilit
y of
atte
ndin
g S
peak
er 1
Akram et al., NeuroImage (2016)
Recent Results
• Attentional Dynamics
• Aging & Cortical Representations of Speech
• High Level Interference & Noise
-200 0 200 400 600 800Time (ms)
-5
-3
-1
0
1
3
Stan
dard
ized
am
plitu
de (f
T)Evoked response to the 500 Hz tone burst
Younger
Older*
*
X: 0.7861Y: 0.1801
Aging & Auditory Cortex
Average Responses to Pure Tone
Older Younger
-200 0 200 400 600 800Time (ms)
-5
-3
-1
0
1
3
Stan
dard
ized
am
plitu
de (f
T)Evoked response to the 500 Hz tone burst
Younger
Older*
*
X: 0.7861Y: 0.1801
Aging & Auditory Cortex
Average Responses to Pure Tone
Older Younger
Over-Representation
-200 0 200 400 600 800Time (ms)
-5
-3
-1
0
1
3
Stan
dard
ized
am
plitu
de (f
T)Evoked response to the 500 Hz tone burst
Younger
Older*
*
X: 0.7861Y: 0.1801
Aging & Auditory Cortex
Average Responses to Pure Tone
Older Younger
Subject0
2
4
6
z-score
Older
Younger
M100 Power by Subject
Over-Representation
Over-Representation
Speech Over-Representation
Speech Reconstruction by SubjectPresacco et al., J Neurophysiol (2016a)
Rec
onst
ruct
ion
Accu
racy
Speech Over-Representation
Speech Reconstruction by Subject
Speech Reconstruction by SNR
Quiet +3 dB 0 dB -3 dB -6 dB0
0.1
0.2
0.3
0.4
r
Older Younger
500350500 500350150 500350150 5003501500
0.1
0.2
0.3
0.4
r
Noise
***
***
*
N.S.N.S.
**** *** *
Quiet Integrationtime (ms)
Rec
onst
ruct
ion
Accu
racy
Presacco et al., J Neurophysiol (2016a)
Rec
onst
ruct
ion
Accu
racy
Aging & Integration Time
withCompetingSpeaker
Rec
onst
ruct
ion
Accu
racy
withCompetingSpeaker
Older AdultsYounger Adults
In Quiet
Integration window (ms)
In Quiet
Presacco et al., J Neurophysiol (2016a)
Neural vs Inhibitory ControlR
econ
stru
ctio
n Ac
cura
cy
Flanker Score
Olderρ = –0.62
Youngerρ = 0.43
Presacco et al., J Neurophysiol (2016b)
Recent Results
• Attentional Dynamics
• Aging & Cortical Representations of Speech
• High Level Interference & Noise
Rec
onst
ruct
ion
Accu
racy
High Level Interference Effects
Speech Reconstruction by SNR
In Quiet
Presacco et al., J Neurophysiol (2016b)
Rec
onst
ruct
ion
Accu
racy
High Level Interference Effects
Speech Reconstruction by SNR
In Quiet
Presacco et al., J Neurophysiol (2016b)
• Unfamiliarity of Background- Boosts Intelligibility
of Attended Speech
Rec
onst
ruct
ion
Accu
racy
High Level Interference Effects
Speech Reconstruction by SNR
In Quiet
Presacco et al., J Neurophysiol (2016b)
• Unfamiliarity of Background- Boosts Intelligibility
of Attended Speech
Rec
onst
ruct
ion
Accu
racy
High Level Interference Effects
Speech Reconstruction by SNR
In Quiet
Presacco et al., J Neurophysiol (2016b)
• Unfamiliarity of Background- Boosts Intelligibility
of Attended SpeechUnfamiliar
Unfamiliar
Rec
onst
ruct
ion
Accu
racy
High Level Interference Effects
Speech Reconstruction by SNR
In Quiet
Presacco et al., J Neurophysiol (2016b)
• Unfamiliarity of Background- Boosts Intelligibility
of Attended Speech- Also Boosts Cortical
Reconstruction of Attended Speech
Unfamiliar
Unfamiliar
Summary• Cortical representations of speech- representation of envelope (up to ~10 Hz)- robust against a variety of noise types- neural representation of perceptual object
• Object-based representation at 100 ms latency (PT), but not by 50 ms (HG)
• Robust Dynamical Foreground Monitoring
• Over-Representation with Aging- Reconstruction depends on integration time- Over-Representation tracks inhibitory control
• Background familiarity: neural tracks behavior
Thank You