Dynamic Aspects of the Cocktail Party Listening Problem Douglas S. Brungart Air Force Research...
-
Upload
damon-miller -
Category
Documents
-
view
216 -
download
0
Transcript of Dynamic Aspects of the Cocktail Party Listening Problem Douglas S. Brungart Air Force Research...
Dynamic Aspects of the Cocktail Party Listening Problem
Douglas S. BrungartAir Force Research Laboratory
2
Credits
AFOSR Sponsored Research
Team:
Brian Simpson
Alex Kordik
Rich McKinley
Mark Ericson
Collaborators:
Chris Darwin
Gerald Kidd
3
Introduction
1) Energetic and Informational Masking:
Speech in Noise vs Speech in Speech
2) Monaural speech segregation
3) Binaural and Dichotic speech segregation
4) Dynamic aspects of cocktail party problem
5) Audio-Visual cocktail party effects
4
Energetic Masking
In classic speech-on-noise masking, only one type of masking occurs: Energetic Masking
In Energetic Masking:
-The masking sound is more intense than the target in one or more critical bands
-Some portion of the target signal is inaudible at the periphery
5
Energetic MaskingArticulation Theory
Energetic masking in speech was studied for years by Fletcher and others at Bell Labs
-Articulation Theory
-Articulation Index (AI)
Allows accurate prediction of intelligibility:
-For any phonetically balanced vocabulary
-For any continuous noise source
-Plus numerous correction factors
High-Amplitudes, Reverb, Peak-Clipping, etc.
6
Informational Masking
Energetic Masking also occurs in Speech-on-Speech masking
-Where signals overlap within critical band
However, informational masking also occurs:
• Listeners hear two or more audible sounds, but can’t segregate them into separate messages
• Classic example: multi-tone complexes
- No energetic overlap in stimuli, but substantial masking is observed (Kidd, Neff)
7
Data collected with Coordinate Response Measure
-CRM Originally developed by Moore & McKinley (1980)
- Format: Ready (Call Sign) go to (Color) (Number) now.
- Target is indicated by call sign Baron
- Maskers indicated by other call signs
- Complete CRM corpus is available (Bolia et. al, 2001)
- 8 Talkers in corpus (4 M, 4 F), 2048 Phrases
- 8 Talkers x 4 Colors x 8 Numbers x 8 Call Signs
- Embedded call-sign ideal for multitalker studies
- Similar to many multichannel monitoring tasks
MethodsThe Coordinate Response Measure (CRM)
8
Listeners respond by selecting the appropriate colored digit with the computer mouse
MethodsThe Coordinate Response Measure
9
MethodsPros and Cons of CRM
Advantages of CRM:
Rapid data collection: training and scoring
Sentences are reusable
Embedded call sign to designate target
- does not require a priori designation
Disadvantages of CRM:
Limited vocabulary
- partially offset by lack of context
- not phonetically balaced
Not “conversationally” realistic
CRM emphasizes “speech on speech” masking
10
MethodsPros and Cons of CRM
Advantages of CRM:
Rapid data collection: training and scoring
Sentences are reusable
Embedded call sign to designate target
- does not require a priori designation
Disadvantages of CRM:
Limited vocabulary
- partially offset by lack of context
- not phonetically balaced
Not “conversationally” realistic
CRM emphasizes “speech on speech” masking
11
MethodsPros and Cons of CRM
Advantages of CRM:
Rapid data collection: training and scoring
Sentences are reusable
Embedded call sign to designate target
- does not require a priori designation
Disadvantages of CRM:
Limited vocabulary
- partially offset by lack of context
- not phonetically balaced
Not “conversationally” realistic
CRM emphasizes “informational” masking
12
Two-Talker Diotic ListeningResults
TM=Mod. Noise Masker
TN=Cont. Noise Masker
TD=Diff. Sex Masker
TS=Same Sex Masker
TT=Same Talker Masker
13
Two-Talker Diotic ListeningError Distribution
Most errors match the color and number spoken by the masking talker….
This is indicative of informational masking
14
Three-Talker Diotic ListeningResults
T=Target Talker
M=Mod. Noise Masker
D=Diff. Sex Masker
S=Same Sex Masker
T=Same Talker Masker
15
Four-Talker Diotic ListeningResults
T=Target Talker
M=Mod. Noise Masker
D=Diff. Sex Masker
S=Same Sex Masker
T=Same Talker Masker
16
3-4 Talker ListeningResults
17
Dichotic ListeningIntroduction
To this point, all stimuli have been diotic
• Spatial separation is known to play a role
- Cherry’s “Cocktail Party Problem”
• Dichotic masking is pure informational masking
- No contralateral energetic masking occurs
• Previous results have suggested:
- Almost perfect segregation across ears
- Cherry, Broadbent, Triesman, Kidd, Neff, etc.
18
Dichotic ListeningProcedure
Dichotic listening similar to other procedure but
1) Talkers were known a priori
- 1 male, 1 female target talker
2) 2 Talkers presented in right ear (T and M)
3) Masking signal was presented in left ear
19
Dichotic ListeningResults
With 2 talkers in right ear…
Noise in left ear doesn’t interfere
(Even when Loud)
Speech interferes substantially…
(Even when Quiet)
Reversed Speech interferes…
but only when
target in right ear lower than
masker in right ear
20
Binaural ListeningSpatial Separation in Azimuth
From the classic “cocktail party effect”
Spatial separation improves segregation
Diotic vs.
45˚ Separation,
same-sex
talkers
21
Binaural ListeningSpatial Separation in Distance
22
Binaural ListeningSpatial Separation in Distance
With Natural
Better-Ear SNR Cues,
Both speech and noise
Benefit from separation in
distance
23
Binaural ListeningSpatial Separation in Distance
With normalization, speech is
Better but Noise is not
24
Dynamic Aspects of Multitalker Listening
Most Cocktail-Party Listening Experiments assume
1) Target talker is known (“Selective Attention”)
2) Target talker is unknown (“Divided Attention”)
Real world listening falls in between these extremes
- Attention focused primarily on one talker
- Other talkers monitored for “important” info
How do listeners adapt to conversational dynamics
25
Dynamic Cocktail Party EffectsMultitalker Transition Probability
Experiment: 3-Talker Condition
1) Standard CRM task
2) 2, 3, or 4 Spatially Separated Same-Sex Talkers
- Close or Far separation for 2 and 3 talkers
3) 5 Transition Probabilities (0-1)
4) 3 Talker Configurations
- Talkers selected randomly
- Each location assigned a talker
- Target talker follows target location
5) Total of 106,200 Trials
- Balanced by Target Talker and Target Location
26
Dynamic Cocktail Party Effects Multitalker Transition Probability
Overall Perfomance Improves Gradually After Transitions
27
Conclusions
?
1) Speech-on-Speech Speech-in-Noise
- Deployment of Auditory Attention is Important
- Signal “similarity” is a major factor
- Spatial separation is particularly beneficial
2) Multitalker Listening is a Dynamic Process
- Listeners adapt to source location changes over 5-8 trials
- Listeners learn new situations quickly (10 trials)
- Listeners adopt optimal listening strategies