2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007.

2007 IEEE Automatic Speech Recognition and Understanding WorkshopKyoto, 12/10/2007

Voice Search – Information Access Via Voice Queries

Ye-Yi WangIEEE ASRU, Kyoto, JapanDecember 10, 2007

Why the Topic of Voice SearchIt’s an hot and important application of

speech understanding technology

It’s a challenging problem and provides a fertile ground for researchTypical automation rate 30%~60%

Voice Search Applications

Residential DA• French

Telecom• Bell Canada• Tele

Auto-attendant• IBM• AT&T• MicrosoftStock Quote• Tellme

Business DA• Tellme• Nuance• Jingle

Networks• Google• Microsoft

Product Rating• MicrosoftMusic search• Daimler• MicrosoftConference papers• Carnegie Mellon• AT&T,ICSI, Edinburg Univ. etc1995

Form Filling

(ATIS)

Call Routing (AT&T HMIHY)

Voice Search

(Directory Assistance)Sp

ShowFlights

From=Seattle

To=Boston

Date=12/23

Show flights from Seattle to Boston on December 23

Networking

Hardware

I need the number of Big 5

My outlook does not show any new messages for about 5 hours

• 2200 148th Ave Redmond, WA

• (425) 644-6526

Sears Roebuck & Company

• 132 Lake St S, Kirkland, WA

• (425) 822-7350

Calabria Ristorante

Italiano

• 4315 University Way Seattle WA

• (206) 547-2445

Big 5 Sporting Goods

Spoken Language Understanding

User input

utterances

Target Semantics

Naturalness Input space

Resolution

Semantic space

Form filling/ directed dialog

low small low small

Form filling/ mixed initiative

low-med small high small

Call routing high large low small

Voice search med-high large low med-large

Voice SearchASR

SLU/Search

Dialog ManagerTTS

ASR in Voice Search

Noise Related

Normal ASR

Pronunciation

Spelling/Chopped Speech

Y. Gao, et al. "Innovative approaches for large vocabulary name recognition." ICASSP 2001

•Noisy environment•Different channel conditions•Speaker variance

Acoustic Model

•Foreign names•Unseen words•Pronunciation variance

Pronunciation Model

• Large vocabulary• Linguistic variance• Little training data from

Language Model

es (ASR

• Huge semantic space• Linguistic variance

SLU/Search

• Multi-source uncertainty• High level confusability

Dialog Management

• Large vocabulary• Foreign names• Unseen words

• System tuning is a necessity

• High maintenance costFeedback Loop

Acoustic ModelingAcoustic model clustering

More precise modeling of noisy environment and channel conditions

Massive adaptationMore precise modeling of speaker variance

Self-adaptationBetter models for unseen speakers

Y. Gao, et al. "Innovative approaches for large vocabulary name recognition." ICASSP 2001

Pronunciation Model Pronunciation variants – Common Approach

ASR Auto Phonetic Transcription

Canonical Pronunciation

Learning

Variation rules/mod

elWord Dictiona

Requires canonical pronunciation (No OOV)

Derive pronunciation from acoustics onlyB. Ramabhadtan, L. R. Bahl, P. V. deSouza, and M. Padmanabhan, "Acoustics-only based automatic phonetic baseform generation," ICASSP 1998

N-Best Rescoring with Pronunciation Distortation

F. Béchet, R. De Mori, and G. Subsol, "Dynamic generation of proper name pronunciations for directory assistance," ICASSP 2002

𝑝ሺ𝑊|𝜏𝑤ሻ

: from acoustic model

: from data/knowledge source

𝑝ሺ𝐴|𝜏𝑤ሻ

Finite State Transducer LMs Calabria Ristorante ItalianoCalabria Jack J DOCalabria Electric

Calabria

Ristorante

Italiano

Electric

(425)822-7350

(425)783-2005

(206)782-9530

DB listings don’t match users’ expressionExact rule matching – lack of robustness High perplexity due to the lack of probabilities

Finite State Signature LMs Signature: Subsequence of a listing that uniquely identifies the listing

Listing 1: “3-L Hair World on North 3rd Street”Listing 2: “Suzie’s Hair World on Main Street”

Signatures: “3-L”, “Hair 3rd”, and “Hair Main” Non-Signatures: “Hair World” and “World on”

FST LM: S:= 3-L Hair World? On? North? 3rd ? Street? :1 | 3-L Hair? World? On? North 3rd ?

Street? :1 | 3-L Hair? World on? North? 3rd ? Street? :1 | 3-L? Hair World? On? North 3rd ?

Street? :1 | Suzie’s? Hair World? On? Main Street? :2 | Suzie’s Hair World? On? Main? Street? :2

| Suzie’s Hair? World on? Main? Street? :2 | Suzie’s? Hair? World on? Main Street? :2

Finite State Signature LMs LM is constructed from listing database. No training data from users is required. Presumption: users are more likely to say the words to avoid ambiguity and skip the rest

May not be true: users may not have the knowledge about confusable entries.E.g. “Calabria” for “Calabria Ristorante Italiano” when “Calabria Electric” is in the DB

E. E. Jan, B. ı. Maison, L. Mangu, and G. Zweig, "Automatic Construction of Unique Signatures and Confusable Sets for Natural Language Directory Assistance Applications," Eurospeech 2003.

Statistical N-gram LMs

Modeling Variations in LM

SLU/SearchFinite state transducer LM – not an issue

May be good enough for residential DAPoor coverage, not robust for business DA

Statistical n-gram language modelSLU/Search is requiredStatistical model for robustness

SLU/Search – Channel Model𝐿 = argmax𝐿 𝑝ሺ𝐿a𝐶,𝑄ሻ= argmax𝐿 𝑝ሺ𝐶,𝑄a𝐿ሻ𝑝(𝐿) = argmax𝐿 𝑝ሺ𝐶a𝐿ሻሺ𝑄a𝐿ሻ𝑝(𝐿)

𝑝(𝐿) : static rank

𝑝ሺ𝑄a𝐿ሻ= ෑ� ൫𝑎0𝑝ሺ𝑤|𝐺𝐸ሻ+ 𝑎1𝑝ሺ𝑤|𝐿ሻ൯𝑤∈𝑄

Designed for Frequently requested names (FRNs)

𝑝ሺ𝐶a𝐿ሻ : directly estimated from data

P. Natarajan, R. Prasad, R. M. Schwartz, and J. Makhoul, "A scalable architecture for directory assistance automation," ICASSP 2002

SLU/Search – Tf*Idf VSMTf*Idf weighted vector space model

Represent queries and listings as vectorsEach dimension represents the importance of a term (e.g., word, word bigram) in a query/document The importance is proportional to the term’s frequency in the query/document (TF). It reduces as the term occurs in many different documents – proportional to the logarithm of the inverse document frequency (IDF). Measure the similarity as the cosine of the vector

SLU/Search – Tf*Idf VSMEnhancements

Duplicated words for short documents (listing) and queries – term frequencies are not reliable

“Big 5” will get “Big 5 Sorting Goods” instead of “5 star 5”

Category smoothing “Calabria Ristorante Italiano” is favored over “Calabria Electric” on the query “Calabria Restaurant”

Character-ngrams as termsLime Wire vs. Dime Wired (ASR Error) $Lim Lime ime_ me_W e_Wi _Wir Wire ire$$Dim Dime ime_ me_W e_Wi _Wir Wire ired red$

Dialog Management

Disambiguation Strategies

Summarization in Disambiguation User: Tell me about restaurants in London.System: I know of 596 restaurants in London. All price ranges

are represented. Some of the cuisine options are Italian, British, European, and French.

User: I’m interested in Chinese food.System: I know of 27 restaurants in London that serve Chinese

cuisine. All price ranges are represented. Some are near the Leicester square tube station.

User: How about a cheap one?System: I know of 14 inexpensive restaurants that serve

Chinese cuisine. Some are near the Leicester Square tube station. Some are in Soho.

J. Polifroni and M. Walker, "An Analysis of Automatic Content Selection Algorithms for Spoken Dialogue System Summaries," SLT 2006.

Confidence Measures

ASR •ASR Features

Search

•Search Features

DM •DM Features

Correct

Incorrect

MaxEnt

Classifier

Y.-Y. Wang, D. Yu, Y.-C. Ju, G. Zweig, and A. Acero, "Confidence Measures for Voice Search Applications," INTERSPEECH 2007.

•ASR confidence score•ASR semantic confidence•[No ASR internal features]

•TF*IDF VSM scores w/ and w/o category smoothing•VSM score gap from the best hypothesis•Normalized # of char. matches btw. ASR and hypo.•Covered/uncovered words’ IDF ratio

Search

•Same hypothesis made in previous turn•Dialog turn•City match (application-specific)

•ASR confidence on the word with max. IDF value•Joint ASR confidence score/TF*IDF

Combined

Multilingual Text-to-Speech

Learning in the Feedback Loop

Multimodal Voice Search

• Two sweet features added... gas and voice... this software rocks!• Isn’t this program great! I feel like the NSA guys in enemy of

the state• Hey Google map user put up your stylus and check out Live

Search. No stylus needed. Now that's handy or should I say one-handed.

SummaryA Confidence Measure that is generally applicable for Voice Search ApplicationsRoadmap to the goal

As many initial easy-to-obtain features as possibleFeature elimination with significance testing

Confidence measure with 5 domain independent features illustrates good performance

Acceptation/rejection rate, Confidence distribution, Correlation with end-to-end accuracy

2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007.

Documents

Transcript of 2007 IEEE Automatic Speech Recognition and Understanding Workshop Kyoto, 12/10/2007.

Adult) 'i Kyoto City Subway, Kyoto City Bus,Xyoto …...Adult) 'i Kyoto City Subway, Kyoto City Bus,Xyoto Bus, Keihan eus ¥900- *yoto (Adult) Kyoto City Subway, Kyoto City Bus, Kyoto

Speech Science Speech production II – Phonation Version WS 2007/8.

2012IEEE International Conference on Acoustics, Speech and … · 2012. 10. 21. · 2012IEEEInternational ConferenceonAcoustics,Speech andSignalProcessing (ICASSP2012) Kyoto,Japan

3GPP LTE presentation Kyoto May 22rd 2007

The Speech Speech casey chesnut brains-N-brawn.com Madison.NET April 2007.

2007 Ozanam Lecture Speech Book

KYOTO-OSAKA KYOTO KYOTO-OSAKA SIGHTSEEING PASS … · KYOTO-OSAKA SIGHTSEEING PASS < 1day > KYOTO-OSAKA SIGHTSEEING PASS [for Hirakata Park] KYOTO SIGHTSEEING PASS KYOTO-OSAKA

Topics 2/2007: Cycle management, Climate neutrality, Kyoto ... · 3 Munich Re, Topics 2/2007 Contents Kyoto Multi Risk Policy New solutions are needed to put a stop to climate change.

Marketing Research Event February 2007 Speech 020207a

Auditory-Visual Speech Processing ISCAArchive 2007 ...

Canadian Budget Speech 2007

Doctoral Course in Speech Recognitionmatsb/speech_speaker_rec_course... · Doctoral Course in Speech Recognition May 2007 Kjell Elenius. May 11, 2007 Speech recognition 2007 CHAPTER

Speech Science VII Acoustic Structure of Speech Sounds WS 2007-8.

Human Language Technologies – Text-to-Speech © 2007 IBM Corporation Sixth Speech Synthesis Workshop, Bonn, Germany.August 22-24, 2007 Automatic Exploration.

4 Steps to fine-tune your speech 2007 Toastmasters International District 67 English prepared speech champion Hubert Yuo 2007 Toastmasters International.

Speech Science XII Speech Perception (acoustic cues) Version 2007-8.

Kyoto · Kyoto Art Box Twitter : @Kyoto_wanobunka Facebook facebook.com/KyotoWanobunka Kyoto Art Box

The CMU TransTac 2007 Eyes-free and Hands-free Two-way Speech-to-Speech Translation System

2007 BUDGET SPEECH

京都市 産業観光局 観光部御中 - MLIT · 2007. 8. 31. · Kyoto Visitors Guide（アドブレーン株式会社） ② 同 外国語地図 Kyoto City Map（京都市）、Kyoto

京都市産業観光局観光部御中 - MLIT · 2007. 8. 31. · Kyoto Visitors Guide（アドブレーン株式会社） ② 同外国語地図 Kyoto City Map（京都市）、Kyoto