Transcript of Email: {ikeno, John.Hansen}@utdallas.edu Slide 1 IAFPA-2006 Center for Robust Speech Systems SLIDES ...
- Slide 1
- Slide 2
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 1 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
Ayako Ikeno and John H.L. Hansen IAFPA-2006 July 23-26, 2006 Center
for Robust Speech Systems (CRSS) Erik Jonsson School of Engineering
& Computer Science University of Texas at Dallas Richardson,
Texas 75083-0688, U.S.A.
- Slide 3
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 2 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
CRSS & Speech Processing Overview Previous Studies on Stress
& Lombard Effect Perceptual Speaker ID with Lombard Speech
Speech Corpus - UTScope Experimental Setup Results Summary &
Impact
- Slide 4
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 3 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
SPOKEN DOCUMENT RETRIEVAL Overview of CRSS-Hansen Research:
http://SpeechFind.utdallas.edu Speech Under Stress Speech
Enhancement UTDrive & CU-Move: In-Vehicle Voice Navigation
Dialect & Accent In-Set / Out-of-Set Speaker Detection
Normalization: Speaker, Environment, Language UAE, Egypt,
Palestine, etc. Cuba, Peru, Puerto Rico Cambridge, Irish, Welsh,
etc.
- Slide 5
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 4 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
File:1998_WhyRecogBreak Disk:PwrBook(jhlh) E NVIRONMENTAL B ASED A
COUSTIC N OISE R OOM R EVERBERATION P HYSICAL T ASK D EMANDS C
OMMUNICATION B ASED M ICROPHONE V OICE C OMPRESSION C HANNEL /M
OBILE C ELLULAR S PEAKER B ASED P ROBLEMS S TRESS & E MOTION L
OMBARD E FFECT / N OISE P SYCHOLOGICAL T ASK D EMANDS A CCENT /L
ANGUAGE S PEAKER D IFFERENCES ( AGE, SEX, VOCAL TRACT ) S
PONTANEOUS S PEECH C ONTEXT B ASED E FFECTS H OMONYMS (E NGLISH
+10,000; J APANESE 120) C ONFUSABLE : (T AKE, S TAKE, S TRAIGHT ; C
AKE, K ATE ) A MBIGUOUS : J EET YET ? " IT ' S OURS " VS. " IT
SOURS " " NICE GUYS " VS. " NICE SKIES " "Um, I just wanna, I just
want to say, I don't know what I want to say." SPEECH
STRESSENVIRONMENT NOISE ACCENT LANGUAGE SPEECH RECOGNITION HUMAN
(AUDITORY) RECOGNITION VOICE COMMUNICATIONS CHANNEL NOISE AMERICAN
ENGLISH SPEAKER LOMBARD EFFECT SPEAKER RECOGNITION Why Speech
Systems Break?
- Slide 6
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 5 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
Speech Production: Phonetics & Acoustics Noise Stress
Microphone Speaker Speech Physiology Acoustic Speech Waveform
NeutralStress
- Slide 7
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 6 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
DOES STRESS VARIABILITY IMPACT SPEAKER RECOGNITION? Limited
Research on Speaker Recognition over Stress, Lombard Effect, etc.
NATO RSG.10 Report showed probe experimental results with SUSAS
corpus NATO, 2000
- Slide 8
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 7 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
Pitch Glottal Spectral Slope (earlier studies by Hansen (1988), 200
speech features, 10,000 stat. tests) Formant Location
- Slide 9
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 8 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
Phone Duration RMS Intensity
- Slide 10
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 9 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
Conditional Gaussian fit (Zhou, Hansen 1997) Classification error
rate Neutral vs. Loud: 7.24% (Neutral), 8.28% (Loud) Neutral vs.
Lombard: 20.69% (Neutral), 19.31% (Lombard) Probability
distribution Detection (ROC) curves STRESS DETECTION USING
PITCH
- Slide 11
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 10 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
ROC CURVES STRESS DETECTION
- Slide 12
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 11 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
Individual Feature Pitch Glottal Spectral Slope Intensity Phone
Duration Formant Location 1st formant 2nd formant Feature Fusion
Duration + Intensity + mean Pitch Stress/Neutral Error Rates 621%
1836% 2846% 3846% 50 58% 017% PAST STRESS DETECTION STUDIES USING
TRADITIONAL FEATURES
- Slide 13
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 12 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
Discrete time and Continuous time TEO : where, is Teager Energy
Operator TEO-CB-Auto-Env: Critical Band based TEO AUTOcorrelation
ENVelope Ref: Zhou, Hansen,Kaiser, IEEE Transactions on Speech
& Audio Processing, vol. 9(2): 201-216, March 2001 Critical
Frequency 17 Band Partition = based on Auditory Perception TEAGER
ENERGY OPERATOR
- Slide 14
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 13 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
Neutral HMM Model vs. Stress trained HMM Model Assessment for NATO
SUSC-0 Military Cockpit Recordings
- Slide 15
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 14 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
GOAL: (1) Identify, Model, and Classify Speech Under Stress in
Military-Related Task Conditions, and (2) Improve Automatic Speech
Coding under Stress Effective Soldier of the Quarter Board Paradigm
Monitor and Track Biometrics of Stress: Heart rate, blood pressure,
stress hormones, psychometrics. Engineering: Focus on NONLINEAR Air
Turbulent Model Teager Energy Operator; Identify Stress Dependent
Performance across Speakers, phonemes APPROACH: Rahurkar, Hansen,
Meyerhoff, Saviolakis, Koenig, "Frequency Distribution based
Weighted Sub-Band Approach for Classification of
Emotional/Stressful Content in Speech," Interspeech, pp.721-724,
Geneva, Switzerland, Sept. 2003 (another paper at Interspeech-2005)
Detection of Speech Under Stress: WRAIR
- Slide 16
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 15 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
First observed by Etienne Lombard in 1911 Change in speech
production in response to noise to increase communication
performance Lombard Test - standard test for hearing loss in U.S.
(ASHA) measure dB-SPL change in speech production Hansen (1988)
evaluation of 200 features with +10,000 statistical tests on 11
different stressed speech conditions to quantify changes in speech
production
- Slide 17
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 16 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
IAFPA-06: focus on Lombard Effect Audio samples for the perceptual
experiment were extracted from UTScope corpus. S peech under CO
gnitive and P hysical stress & E motion Consists of 4 Domains
Lombard Effect noise levels & types Physical Stress stair
climbing/stepper Cognitive Stress driving (simulator & actual)
Emotion (Angry, Fear, Anxiety, Frustration)
- Slide 18
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 17 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
Goal: obtain Lombard Speech at different noise levels Quantify
ground truth with biometric analysis Lombard Effect Speech 9
conditions (3 noise, 3 levels) 1 sec. duration Pink Noise 65,75,85
dB-SPL Highway Noise (windows open) 70,80,90 dB-SPL Large Crowd
Noise 70,80,90 dB-SPL
- Slide 19
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 18 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
UTScope PINK NOISE 65, 75, 86 dB-SPL HIGHWAY DRIVING, WINDOWS HALF
OPEN 70, 80,90 dB-SPL LARGE CROWD NOISE 70, 80, 90 dB-SPL PURETONE
HEARING SCREENING OPEN-AIR HEADPHONES FOR SPEECH FEEDBACK NOISE
LEVELS CALIBRATED WITH QUEST SLM
- Slide 20
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 19 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
UTScope 20 TIMIT SENTENCES 5 DIGIT STRINGS 1 MINUTE SPONTANEOUS
SPEECH 100 SPEAKERS 8-CHANNEL DAT RECORDER P-MIC CLOSE-TALKING MIC
FAR-FIELD MIC
- Slide 21
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 20 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
The ASHA-certified sound booth and recording equipments
- Slide 22
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 21 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
Male Lombard Male Neutral Lombard Effect impacts Temporal and
Spectral Structure (as expected) Evaluation: Perceptual Experiments
to assess Speaker Recognition
- Slide 23
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 22 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
Listener Test Speakers Corpus: UTScope Native US English speakers
Female speakers only Speech Conditions ReferenceTest
NL-LDNeutralLombard LD-LDLombard NL-NLNeutral Noise Type Highway
driving Noise Level 90dB-SPL
- Slide 24
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 23 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
Speech Materials Read speech TIMIT sentences: phonetically balanced
3 sentences per audio sample (.wav, 16k Hz) Ref : Basketball can be
an entertaining sport. My problem is, the cats meow always hurts my
ears. The causeway ended abruptly at the shore. Test : Youngsters
commonly love chocolate and candies as treats. December and January
are nice months to spend in Miami. There were other farmhouses
nearby.
- Slide 25
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 24 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
Listener Test Listeners (12: 2f/10m May 06, -- 41 as of July 06)
India(4), China(1), Korea(1), Mexico(1), Pakistan(1), Thai(1),
Turkey(1) US(1), Vietnam(1) Task: In-set vs. Out-of-set Speaker
Identification Reference/Training 12 In-set Female speakers Test 8
In-Set speakers 4 Out-of-Set speakers
- Slide 26
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 25 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
Reference audio: Neutral Lombard Test audio: Neutral Lombard
- Slide 27
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 26 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
The effect of speech condition : significant (p=.0024). Mismatched
condition (NL-LD) accuracy: chance level (52%). Lombard speech
(LD-LD, 79%): higher accuracy than neutral speech (NL-NL, 67%).
Lombard effect may emphasize the speech characteristics, and
improve accuracy on perceptual speaker ID.
- Slide 28
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 27 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
Emotion/stressMismatched Training Matched Training Neutral96
Angry3475 Lombard4899 Fast9190 Slow9098 Soft7389 Loud2281 Automated
System Performance (SUSAS Corpus) (See Hansen, et.al, The Impact of
Speech Under `Stress' on Military Speech Technology, NATO Research
& Tech. Org. RTO-TR-10, March 2000). Angry 62% Lombard 48% Loud
74% 5-74% LOSS The trend hold the same for the automated
system.
- Slide 29
- Email: {ikeno, John.Hansen}@utdallas.edu Slide 28 IAFPA-2006
Center for Robust Speech Systems SLIDES by John H.L. Hansen, 2006
In-Set accuracy : affected by the speech condition significantly
(p