CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts...

22
C M I SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence Laboratory Information Engineering Division Cambridge University SCILL: Spoken Conversational Interaction for Language Learning Stephanie Seneff ([email protected]) Jim Glass ([email protected]) Spoken Language Systems Group MIT Computer Science and Artificial Intelligence Lab Steve Young ([email protected]) Speech Group CUED Machine Intelligence Lab

Transcript of CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts...

Page 1: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

C

M

I

SPOKEN LANGUAGE SYSTEMS

Computer Science and Artificial Intelligence LaboratoryMassachusetts Institute of Technology

SPEECH GROUP

Machine Intelligence LaboratoryInformation Engineering Division

Cambridge University

SCILL: Spoken Conversational Interaction for Language Learning

Stephanie Seneff ([email protected]) Jim Glass ([email protected])

Spoken Language Systems GroupMIT Computer Science and Artificial Intelligence Lab

Steve Young ([email protected])Speech Group

CUED Machine Intelligence Lab

Page 2: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

SpeechRecognition

SpeechRecognition

Language Understanding

Language Understanding

ContextResolution

ContextResolution

DialogueManagement

DialogueManagement

LanguageGenerationLanguage

Generation

SpeechSynthesisSpeech

Synthesis

AudioAudio DatabaseDatabase

Conversational Interfaces

Page 3: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

Hub

GalaxyArchitecture

LanguageGenerationLanguage

Generation

SpeechRecognition

SpeechRecognition

Language Understanding

Language Understanding

ContextResolution

ContextResolution

DatabaseDatabase

DialogueManagement

DialogueManagement

SpeechSynthesisSpeech

Synthesis

AudioAudio

Conversational Interfaces

Page 4: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

Bilingual Weather Domain: Video Clip

Page 5: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

Computer Aids through Conversational Interaction

• Language teachers have limited time to interact with students in dialogue exchanges

• Computers provide non-threatening environment in which to practice communicating

• Three-phase interaction framework is envisioned:– Preparation: practice phrases, simulated dialogues

– Conversational Interaction

* Telephone conversation with graphical support

* Seamless translation aid

– Assessment

* Review dialog interaction

* Feedback and fluency scores

Page 6: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

SCILL: A Spoken Computer Interface for Language Learning

Speaks only target language.

Has access to information sources.

Can provide translations for both user queries and system responses

Domain Expert

Tutor

Conversational systems for interactive environment for language learning

MIT SLS

Bilingual Conversational

Dialogue Systems

CU Speech Group

Speech Recognition and

Pronunciation Scoring

Page 7: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

Technology Requirements

• Robust recognition and understanding of foreign-accented speech– If recognition is too poor, student may become frustrated

– Customize vocabulary and linguistic constructs to lesson plans

• High quality cross-lingual language generation

• Natural and fluent speech synthesis

• Ability to automatically generate simulated dialogues– System should be able to generate multiple dialogues based on

a given lesson topic on the fly

– Allows the student to see example sentence constructs for a particular lesson

• Ability to reconfigure quickly and easily to new lessons

• Automatic scoring for fluency, pronunciation, tone quality, use of vocabulary, etc.

Page 8: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

USER INTERFACE

SCILL System Overview

WEBSERVER

Page 9: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

Bilingual Spoken Dialogue Interaction: Current Status

• Initial version of end-to-end system is in place for the weather domain– Rain, snow, wind, temperature, warnings (e.g., tornado), etc.

• MIT Recognizer supports both English and Mandarin– Seamless language switching

• English queries are translated into Mandarin

• Mandarin queries are answered in Mandarin– User can ask for a translation into English of the response at

any time

• Currently using off-the-shelf Mandarin synthesizer from ITRI– Plan to develop high quality domain-dependent Mandarin

synthesis using our Envoice tools

• System can be configured as telephone-only or as telephone augmented with a Web-based GUI interface

Page 10: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

Bilingual Recognizer Construction

English corpus

Chinese corpus

Generate

English Recognizer Language Model

Chinese Recognizer Language Model

Automatically induce language model for both English and Mandarin recognizers using NL grammar

Create Mandarin corpus by automatically translating existing English corpus

RecognizerEnglishNetwork

ChineseNetwork

Parse Interlingua

Two recognizers compete in common search space

Page 11: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

HTK Mandarin Speech Recognizer

Except:

• Standard PLP front-end augmented with F0+derivatives (F0 added after HLDA transformation)

• 46 phone acoustic model set with long final phones split eg uang -> ua ng

• Questions about tone added to decision tree context clustering

Standard HTK LVCSR Setup:

• PLP Front-end with 1st/2nd/3rd Derivatives transformed using HLDA

• 3 state cross-word hidden Markov models

• Decision tree clustered context dependent triphones

• N-gram language model smoothed with class-based language model

Page 12: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

HMM-Based Pronunciation Scoring

Basic approach:

• estimate posterior probabilities (ie confidence score) of each phone or syllable given acoustics

• map confidence scores to good/bad decision using data labelled by experts

'

)'|()|(

)|(

p

pAPpAP

ApP sh ih d ax

. . .

A simple approximation

Relates confidence scores to human perception

P(p | A)

BadGood

Good

Bad

Expert Rankings

Page 13: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

NLG

Synthesis

NLU

Recognition

Multilingual Translation Framework

Common meaning representation: semantic frame

SemanticFrame

ParsingRules

GenerationRules

Models

SpeechCorpora

EnglishChineseSpanishJapanese

EnglishChineseSpanishJapanese

Page 14: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

English: Some thunderstorms may be accompanied by gusty winds and hail

clause: weather_eventtopic: precip_act, name: thunderstorm, num: pl

quantifier: somepred: accompanied_by

adverb: possiblytopic: wind, num: pl, pred: gusty

and: precip_act, name: hail

Frame indexed under weather, wind, rain, storm, and hail

Content Understanding and Translation

Japanese:

Spanish: Algunas tormentas posiblement acompanadas por vientos racheados y granizo

Chinese: ¤@ ¨Ç ¹p «B ¥i ¯à ·| ¦ñ ¦³ °} · ©M ¦B ¹r

wind

hail

rain/storm

weather

Page 15: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

• User asks: “Will it rain tomorrow in Boston?” • System paraphrases query, then responds in Chinese• “Please repeat that” in English or Chinese interpreted identically• System repeats response in Chinese• User speaks query in English: seamless language switching • System paraphrases, then translates query into Chinese• User attempts to repeat translation

– Recognition error: hallucinates an erroneous date (February 30) which will be remembered

• System supplies known cities in England• User chooses London• System has no weather for London on February 30• User asks “how about today?”• System provides London’s weather today• User asks for a translation into English, which is provided

Audio Demonstration

Page 16: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

Proposed Translation Procedure

Chinese query

Linguistic Frame

English query

parse parse

Linguistic Frame

transfer

Key-value Representation generate

generate

generate

{c wh_question

:topic {q name

:poss “you” }

:auxil “link”

:complement {q object :trace “what” }

{c wh_question

:topic {q name }

:pro “you”

:verb “call”

:complement {q object :trace “what” }

“what is your name” “ni3 jiao4 shen2_me5 ming2_zi4”

{c eform

:attribute “name”

:person “you” }

If generated query fails to parse,simplify interlingua and generation

Page 17: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

Proposed Exercise using Typed Inputs

Input: Da2 la2 si4 hui4 xia4 yu3 ming2 tian1 ma5?

Query:

Response:

Type-in Window

Reply Window

Input:

Query: Da2 la1 si1 ming2 tian1 hui4 xia4 yu3 ma5?

Response: Da2 la1 si1 ming2 tian1 xia4 wu3 xia4 te4 da4 yu3

Next: Dallas rain tomorrowNext: Los Angeles wind Saturday

System color codes errors in tone and in syntactic constructs

System is able to parse query in spite of tone errors and (limited) syntax errors

Page 18: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

Testing the Effectiveness of Training on Typed Input: Proposed Measures

• Compare the quality of spoken dialogue recorded before and after a Web-based training session

• Measures of fluency: – Syntactic well-formedness

– Tone production accuracy

– Frequency of pauses, edits, and filler words

– Phonetic quality , etc.

• Measures of communication success:– Frequency of usage of translation assistance

– Understanding error rate

– Task completion

– Time to completion, etc.

Page 19: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

Technology Goal: Automated Language Understanding

English Sentence

Corpus Pairs

Grammar Induction

Mandarin Parsing

Grammar

Once translation ability exists from English to target language, can create reverse system almost effortlessly

Interlingual Representation

parse Mandarin Sentence

generate

Utilizes English parse tree and

Mandarin generation lexicon to induce

Mandarin parse tree

Page 20: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

Building NxN Translation Efficiently

English

JapaneseMandarin

French Arabic

Korean

Automatic Grammar Induction

InterlinguaInterlingua

Spanish Urdu

Page 21: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory

Future Plans (Near Term and Long Term)

• Install current version of system at Cambridge University

• Incorporate CU Mandarin recognizer

• Add support for audio input at the computer

• Build high quality synthesis capability

• Improve understanding, dialogue, and translation performance

• Collect and transcribe data from language learners and assess both system and students

• Develop various scoring algorithms for student fluency

• Refine all aspects of system based on collected data

Page 22: CMICMI SPOKEN LANGUAGE SYSTEMS Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology SPEECH GROUP Machine Intelligence.

SLS

MIT Computer Science and Artificial Intelligence Laboratory

MILSpeechGroup

CUED Machine Intelligence Laboratory