Speech-to-Speech MT Design and Engineering
-
Upload
dante-valencia -
Category
Documents
-
view
49 -
download
2
description
Transcript of Speech-to-Speech MT Design and Engineering
![Page 1: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/1.jpg)
Speech-to-Speech MTDesign and Engineering
Alon Lavie and Lori Levin
MT Class
April 5 2000
![Page 2: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/2.jpg)
Outline
• Design and Engineering of the JANUS/C-STAR speech-to-speech MT system
• The C-STAR Travel Domain Interlingua (IF)• Evaluation and User Studies• Open Problems, Current and Future Research
![Page 3: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/3.jpg)
Overview
• Fundamentals of our approach
• System overview
• Engineering a multi-domain system
• Evaluations and user studies
• Alternative translation approaches
• Current and future research
![Page 4: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/4.jpg)
JANUS Speech Translation
• Translation via an interlingua representation
• Main translation engine is rule-based
• Semantic grammars
• Modular grammar design
• System engineered for multiple domains
• Incorporate alternative translation engines
![Page 5: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/5.jpg)
The C-STAR Travel Planning Domain
General Scenario:
• Dialogue between one traveler and one or more travel agents
• Focus on making travel arrangements for a personal leisure trip (not business)
• Free spontaneous speech
![Page 6: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/6.jpg)
The C-STAR Travel Planning Domain
Natural breakdown into several sub-domains:
• Hotel Information and Reservation
• Transportation Information and Reservation
• Information about Sights and Events
• General Travel Information
• Cross Domain
![Page 7: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/7.jpg)
Semantic Grammars
• Describe structure of semantic concepts instead of syntactic constituency of phrases
• Well suited for task-oriented dialogue containing many fixed expressions
• Appropriate for spoken language - often disfluent and syntactically ill-formed
• Faster to develop reasonable coverage for limited domains
![Page 8: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/8.jpg)
Semantic Grammars
Hotel Reservation Example:
Input: we have two hotels available
Parse Tree:
[give-information+availability+hotel]
(we have [hotel-type]
([quantity=] (two)
[hotel] (hotels)
available)
![Page 9: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/9.jpg)
The JANUS-III Translation System
![Page 10: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/10.jpg)
The JANUS-III Translation System
![Page 11: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/11.jpg)
The SOUP Parser
• Specifically designed to parse spoken language using domain-specific semantic grammars
• Robust - can skip over disfluencies in input• Stochastic - probabilistic CFG encoded as a
collection of RTNs with arc probabilities• Top-Down - parses from top-level concepts of the
grammar down to matching of terminals• Chart-based - dynamic matrix of parse DAGs
indexed by start and end positions and head cat
![Page 12: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/12.jpg)
The SOUP Parser
• Supports parsing with large multiple domain grammars
• Produces a lattice of parse analyses headed by top-level concepts
• Disambiguation heuristics rank the analyses in the parse lattice and select a single best path through the lattice
• Graphical grammar editor
![Page 13: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/13.jpg)
SOUP Disambiguation Heuristics
• Maximize coverage (of input)• Minimize number of parse trees (fragmentation)• Minimize number of parse tree nodes• Minimize the number of wild-card matches• Maximize the probability of parse trees• Find sequence of domain tags with maximal
probability given the input words: P(T|W), where T= t1,t2,…,tn is a sequence of domain tags
![Page 14: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/14.jpg)
JANUS Generation Modules
Two alternative generation modules:
• Top-Down context-free based generator - fast, used for English and Japanese
• GenKit - unification-based generator augmented with Morphe morphology module - used for German
![Page 15: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/15.jpg)
Modular Grammar Design• Grammar development separated into modules corresponding to
sub-domains (Hotel, Transportation, Sights, General Travel, Cross Domain)
• Shared core grammar for lower-level concepts that are common to the various sub-domains (e.g. times, prices)
• Grammars can be developed independently (using shared core grammar)
• Shared and Cross-Domain grammars significantly reduce effort in expanding to new domains
• Separate grammar modules facilitate associating parses with domain tags - useful for multi-domain integration within the parser
![Page 16: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/16.jpg)
Translation with Multiple Domain Grammars
• Parser is loaded with all domain grammars
• Domain tag attached to grammar rules of each domain
• Previously developed grammars for other domains can also be incorporated
• Parser creates a parse lattice consisting of multiple analyses of the input into sequences of top-level domain concepts
• Parser disambiguation heuristics rank the analyses in the parse lattice and select a single best sequence of concepts
![Page 17: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/17.jpg)
Translation with Multiple Domain Grammars
![Page 18: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/18.jpg)
A SOUP Parse Lattice
![Page 19: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/19.jpg)
User Studies• We conducted three sets of user tests• Travel agent played by experienced system user• Traveler is played by a novice and given five minutes of
instruction• Traveler is given a general scenario - e.g., plan a trip to
Heidelberg
• Communication only via ST system, multi-modal interface and muted video connection
• Data collected used for system evaluation, error analysis and then grammar development
![Page 20: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/20.jpg)
System Evaluation Methodology
• End-to-end evaluations conducted at the SDU (sentence) level
• Multiple bilingual graders compare the input with translated output and assign a grade of: Perfect, OK or Bad
• OK = meaning of SDU comes across• Perfect = OK + fluent output• Bad = translation incomplete or incorrect
![Page 21: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/21.jpg)
August-99 Evaluation
• Data from latest user study - traveler planning a trip to Japan
• 132 utterances containing one or more SDUs, from six different users
• SR word error rate 14.7%
• 40.2% of utterances contain recognition error(s)
![Page 22: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/22.jpg)
Evaluation ResultsMethod Output
LanguageOK+Perfect Perfect
SOUP -Transcribed English 74% 54%SOUP-Recognition English 59% 42%SOUP-Transcribed Japanese 77% 59%SOUP-Recognition Japanese 62% 45%SOUP-Transcribed German 70% 39%SOUP-Recognition German 58% 34%
![Page 23: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/23.jpg)
Evaluation - Progress Over Time
Method OK+Perfect Perfect
Jan-99 Transcribed 69% 46%
Apr-99 Transcribed 70% 49%
Aug-99 Transcribed 74% 54%
Jan-99 Recognition 55% 36%
Apr-99 Recognition 57% 38%
Aug-99 Recognition 59% 42%
![Page 24: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/24.jpg)
Alternative Approaches: SALT
SALT - Statistical Analyzer for Lang. Translation• Combines ML trainable and rule-based analysis
methods for robustness and portability• Rule-based parsing restricted to well-defined set of
argument-level phrases and fragments• Trainable classifiers (NN, Decision Trees, etc.) used to
derive the DA (speech-act and concepts) from the sequence of argument concepts.
• Phrase-level grammars are more robust and portable to new domains
![Page 25: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/25.jpg)
Alternative Approaches: Pangloss
Glossary-based Translation• Translates directly into target language (no IF)• Based on Pangloss translation system developed at
CMU• Uses a combination of EBMT, phrase glossaries
and a bilingual dictionary• English/German system operational• Good fall-back for uncovered utterances
![Page 26: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/26.jpg)
Current and Future Work
• Expanding the travel domain: covering descriptive as well as task-oriented sentences
• Development of the SALT statistical approach and expanding it to other domains
• Full integration of multiple MT approaches: SOUP, SALT, Pangloss
• Task-based evaluation• Disambiguation: improved sentence-level
disambiguation; applying discourse contextual information for disambiguation
![Page 27: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/27.jpg)
Students Working on the Project
• Chad Langley: improved SALT approach
• Dorcas Wallace: DA disambiguation using decision trees, English grammars
• Taro Watanabe: DA correction and disambiguation using Transformation-based Learning, Japanese grammars
• Ariadna Font-Llitjos: Spanish Generation
![Page 28: Speech-to-Speech MT Design and Engineering](https://reader035.fdocuments.in/reader035/viewer/2022062321/5681342d550346895d9b1ab0/html5/thumbnails/28.jpg)
The JANUS/C-STAR Team• Project Leaders: Lori Levin, Alon Lavie, Monika
Woszczyna, Alex Waibel• Grammar and Component Developers: Donna Gates,
Dorcas Wallace, Taro Watanabe, Boris Bartlog, Marsal Gavalda, Chad Langley, Marcus Munk, Klaus Ries, Klaus Zechner, Detlef Koll, Michael Finke, Eric Carraux, Celine Morel, Alexandra Slavkovic, Susie Burger, Laura Tomokiyo, Takashi Tomokiyo, Kavita Thomas, Mirella Lapata, Matthew Broadhead, Cortis Clark, Christie Watson, Daniella Mueller, Sondra Ahlen