The MEI Team August 23, 2000
description
Transcript of The MEI Team August 23, 2000
Mandarin-English Information (MEI):Investigating Translingual Speech Retrieval
Johns Hopkins University Center of Language and Speech Processing
Summer Workshop 2000
The MEI Team
August 23, 2000
MEI Team
• Senior Members
• Students
Helen Meng Chinese University of Hong KongErika Grams Advanced Analytic ToolsSanjeev Khudanpur Johns Hopkins UniversityGina-Anne Levow University of MarylandDouglas Oard University of MarylandPatrick Schone US Department of DefenseHsin-Min Wang Academia Sinica, Taiwan
Berlin Chen National Taiwan UniversityWai-Kit Lo Chinese University of Hong KongKaren Tang Princeton UniversityJianqiang Wang University of Maryland
Outline• Motivation• Background• The Multi-scale Paradigm
– multi-scale query processing
– multi-scale document indexing
– multi-scale retrieval
• The Perfect Retrieval Myth• Experiments and Findings• Conclusions and Future Work
Motivation
• Monolingual speech retrieval applications are emerging, e.g.– http://speechbot.research.compaq.com
source: www.real.com, Feb 2000
529
1367
English
OtherLanguages
Internet-accessibleRadio and Television Stations
Source: Global Reach
EnglishEnglish
2000 2005
Motivation (cont): Internet User Population
Chinese
MEI: The Big Picture
InteractiveRefinement
Speech-to-SpeechTranslation
English Spoken Documents
Retrieval Engine
English Text Query (Exemplar)
English-to-ChineseTranslation
Mandarin Audio News Broadcasts
Mandarin AudioIndexing (ASR)
Ranked List of Mandarin Spoken Documents
Concept Demo
Karen, Erika
Two Prevailing Problems in CL-SDR• Translation problem
– out-of-vocabulary (OOV) in translation– too many translations
• Recognition problem– OOV in recognition– acoustic confusions
• Solution: subword units may help– transliteration, e.g.
Northern Ireland /bei3 ai4 er3 lan2/ (in query)– recognition of subword units, e.g.
Iraq --> a rock (in document)
Background for Mandarin Speech Recognition
• 400 syllables – full phonological coverage in Mandarin Chinese
• 6,800 characters – full textual coverage in written Chinese (GB-coded)
– each character pronounced as a syllable
• Unknown number of Chinese words– one to several characters per word
– character combinations create different meanings
– ambiguity in word tokenization
OOV and Acoustic Confusions in Mandarin SDR
Query: …Iraq...
Subwords for Retrieval
• Character n-grams – robust to word-level mismatches due to
different tokenization
• Syllable n-grams– robust to word/character-level mismatches
due to homophones
• Partial matches possible
Pros
Con • Subwords contain reduced lexical knowledge c.f. words
The MEI Investigation• Use of a multi-scale representation for crosslingual spoken
document retrieval (CL-SDR)• Words and subwords
Research Challenges• Multi-scale query translation
• Multi-scale audio indexing
• Multi-scale retrieval
Query byExampleEnglish
NewswireExemplars
MandarinAudioStories
President Bill Clinton and Chinese President Jiang Zemin engaged in a spirited, televised debate Saturday over human rights and theTiananmen Square crackdown, and announced a string of agreements on arms control, energy and environmental matters. There were no announced breakthroughs on American human rights concerns, including Tibet, but both leaders accentuated the positive …
美国总统克林顿的助手赞扬中国官员允许电视现场直播克林顿和江泽民在首脑会晤后举行的联合记者招待会。。特别是一九八九镇压民主运动的决定。他表示镇压天安门民主运动是错误的 , 他还批评了中国对西藏精神领袖达 国家安全事务助理伯格表示 , 这次直播让中国人第一次在种公开的论坛上听到围绕敏感的人权问题的讨论。在记者招待会上 …
Evaluation Collection
2265manually
segmentedstories
3371manually segmented
stories
DevelopmentCollection: TDT-2
EvaluationCollection: TDT-3
Mar 98
Oct 98 Dec 98
17 topics,variable number
of exemplars
Jun 98Jan 98
Exhaustive relevance assessment based on event overlap
English texttopic exemplars:Associated PressNew York Times
Mandarin audiobroadcast news:Voice of America
56 topics,variable number
of exemplars
Cross-LanguageSpeech Retrieval
American EnglishText Exemplar
Ranked Listof News Stories
Mandarin ChineseBroadcast News
Abstract Task Model
Evaluation of Ranked Lists
VOA 0427.22
VOA 0521.14
VOA 0604.39
VOA 0419.12
VOA 0527.13
VOA 0513.17
…
Relevant
Not
Not
Relevant
Not
Relevant
…Relevance Judgments
Recall-Precision Graph
0.0
0.5
1.0
0.0 0.2 0.4 0.6 0.8 1.0
Recall
Inte
rpol
ated
Pre
cisi
on
0.0
0.5
1.0
0.0 0.2 0.4 0.6 0.8 1.0
Recall
Inte
rpol
ated
Pre
cisi
onVariation Across Exemplars
Average Across Exemplars
0.0
0.5
1.0
0.0 0.2 0.4 0.6 0.8 1.0
Recall
Me
an
Inte
rpo
late
d P
rec
.
0.353
Variation Across Topics
0.0
0.2
0.4
0.6
0.8
1.0
Me
an
Un
inte
rpo
late
d
Av
era
ge
Pre
cis
ion
Topic
Comparing Two Systems
0.0
0.2
0.4
0.6
0.8
1.0
Me
an U
nin
terp
ola
ted
Avg
Pre
c
System A System B
Topic
Significance Testing
• Statistical significance– Null hypothesis: mean average precision across
topics is drawn from same distribution
– Paired 2-tailed t-test, significant if p<0.05• For System A vs. System B, p=0.94
• Meaningful differences– Rule of thumb: 5-10% relative
• For System A vs. System B, relative difference is <1%
Translingual and Multi-ScaleQuery Processing
Mandarin Audio
Term Translation
President Bill Clinton and…
English Exemplar
Term Selection
BilingualTermList
Query Construction
MandarinIR System
StoryBoundaries
Evaluation
Named Entity
Tagging
DocumentConstruction
SpeechRecognition
Relevance Judgments
RankedList
BBN
U Mass
LDC
Cornell
DragonLDC
LDC
LDC 000100010000010100
MeanUninterpolated
AveragePrecision
Multi-Scale Query Translation
• Words and Phrases
(Gina, Sanjeev)
• Subwords
(Helen, Wai-Kit, Berlin, Karen)
Bilingual Term List
• Combination of– LDC English-Chinese bilingual term list
– Chinese-English Translation Assistance File (CETA) [inverted]
199,444395,216
81,127105,750
Total English TermsTotal Translation Pairs
Phrasal TermsPhrasal Translation Pairs
Termhuman right(s)human rights
# translations7
301
Query Term Selection
• Tagged named entities (BBN Identifinder)– Person: partners of Goldman, Sachs, & Co.– Organization: UN Security Council
• Dictionary-based “phrases”– translatable multi-word units, e.g– “Wall Street”, “best interests”, “guiding principles”, “human rights”
– automatic tagging: greedy, left-to-right, max match
• Chi-squared filtering– Compared to English background model
Query Term Translation
• Named entities– if absent from dictionary, translate individual terms
• e.g. “Security Council” versus “First Bank of Siam”
• Numeric Expressions– special processing for digits
• e.g. “12:30 pm, June 15, 1969”
• Remaining terms– Consult bilingual term list, lemmatize if necessary
• e.g. “televised” translates as “television”
Query Construction
• Unbalanced queries– Use all plausible translations for each term
• Balanced queries– Pseudo-term weight: average of translations’ weights
• Structured queries– Recompute pseudo-term weight from translations’ term
frequency and document frequency
Strategies in Query Translation
• Phrase based translation is significantly better
• Named entities and numeral translations are (barely) helpful
• Balanced translation matches Structured queries– also extends easily to
subword units
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Me
an
Av
era
ge
Pre
cis
ion
Wo
rds
Ph
ras
es
NE
/NU
ME
X
Ba
lan
ce
d
Strategy
Untranslatable Terms
suharto 97 (# of occurrences)netanyahu 88starr 62arafat 50bjp 45vajpayee 44estrada 44….hsu 19zemin 7
# (by token)87,0043,028
# (by type)12,4021,122
TermstotalOOV
Subword Transliteration
English Query Exemplar
Mandarin Audio Document
……..Kosovo…...
…../ke-suo-fo/….
Sound alike --> match in phonetic space?
Kosovo (/ke1-suo3-wo4/, /ke1-suo3-fo2/, /ke1-suo3-fu1, /ke1-suo3-fu2/)
Subword Transliteration Procedure (1)
Named Entities
PinYin / WadeGiles Spellingse.g. Wang Jianqiang, Wang Hsinmin
Syllables, e.g.wang jian qiangwang xin min
Acquire English Pronunciation• PRONLEX Lookup• Spelling-to-Pron Generation• e.g. christopher
English Phones, e.g. /kk rr ih ss tt aa ff er/
Trans. Error-Driven Learning[Brill 1994]PRONLEX, 85K(train), 4.5K (test)82%(phoneme), 45% (word)
Subword Transliteration Procedure (2)
Cross-lingual Phonetic MappingEnglish phones to Chinese “phones”
Trans. Error-Driven Learning• 4800 words (train) [Chen H. H., NTU; WWW]• FST aligns Eng / Chin phones• /k e l i s i t uo f u/
Chinese phone lattice generationSyllable bigram language modelN-best syllable sequence hyp
N=1 (one-best hypothesis)/ji li si te fu/ (hyp)/ke li si tuo fu/ (ref)
/kk rr ih ss tt aa ff er/
Cross-lingual Phonological Rules• Syllable nuclei insertion
Handle consonant clustersWord-final consonants, etc.
/kk ax rr ih ss ax tt aa ff er/
Cross Lingual Phonetic Matching
• Documents are indexed with syllable bigrams (in addition to words and character bigrams if necessary)
• Query terms are translated as words where possible, phonetically where necessary 0.2
0.3
0.4
0.5
0.6
0.7
0.8
Me
an
Av
era
ge
Pre
cis
ion
Word Char Syllable
Indexing Terms
no CLPM CLPM
Multi-Scale Query Construction
Helen
Multi-Scale Query Construction:Objectives
Query Construction
Bag of Englishquery terms(selected)
Multi-scale queryrepresentation in
Chinese
Multi-scale representation integrates:• translated phrases, named entities, numeric expressions, translated terms• transliterated syllables• words, characters and syllable n-grams
Multi-Scale Query ConstructionProcedures
Syllable bigrams and Transliterationsyi-se se-lie shou-xiang ben-jie jie-ming ne-tan tan-ya ya-hu
English Bag of TermsIsraeli <Ph>Prime Minister</Ph> <NE>Benjamin Netanyahu</NE>
Chinese Translations and Transliteration
ne-tan tan-ya ya-hu
Character bigrams and Transliterations
ne-tan tan-ya ya-hu
words + syl bigrams
char + syl bigrams
syl bigrams
Multi-Scale Audio Document Indexing
Hsin-min, Helen, Berlin, and Wai-kit
Previous Chinese Example
Audio Document IndexingObjectives
• Augment words with subword-based indexing• Dragon word recognition outputs are provided• Character-based indexing
– Characters derived from Dragon’s recognized words
• Syllable-based indexing– Syllables derived by pronunciation lookup using Dragon’s
recognized words
• Address Dragon’s ASR errors– Augment with alternative (word/char/syl) hypotheses e.g.
syllable lattice [Chen & Wang, ICASSP-2000]
Syllable Lattice Development
Dragon’s syl
• Dragon’s recognition accuracies– Evaluated against anchor scripts – 82.0%(word) 87.9%(char) 92.1%(syl)– Syllable substitution errors (5.2%)
• MEI’s syllable recognition accuracy– Trained on Hub4 Mandarin (VOA, 11 hours, 1997)– 70.2% (syl) !!!
Alternative syl
• Develop a syllable recognizer to produce lattice representation
Strategy
• Improve MEI’s syllable recognizer
• Design a structure for document indexing which incorporates– Dragon’s word / character / syllable hypotheses
– MEI’s syllable hypotheses
(hopefully complementary to Dragon’s syllables)
MEI Syllable Recognizer:Improve Acoustic Models
VOA Audio for Doc i Forced
AlignmentSpeaker Adaptation
Speaker-Adapted Acoustic Models
Baseline Acoustic Models
Syllable Recognition
MEI Syllablesfor Doc i• Forced alignment with Dragon’s output for each document
• Blind speaker adaptation with Dragon’s syllables• MEI syllable accuracy: 70.2%(original) 87.7% !!!
Dragon Outputs for Doc i
MEI Syllable Recognizer:Incorporate Language Model
VOA Audio for Doc i
Dragon Outputs for Doc i
Forced Alignment
Speaker Adaptation
Speaker-Adapted Acoustic Models
Baseline Acoustic Models
Syllable Recognition
MEI Syllablesfor Doc i
1998 XinhuaLanguage Models
• Syllable trigram language model • MEI syllable accuracy: 70.2%87.7%90.0% !!!
Audio Document Indexing withMultiple Syllable Recognition Outputs
Dragon’s syl
MEI’s syl
Two separate recognition outputsDragon’s syl
MEI’s syl
The revised syllable lattice
Multi-scale Audio Document Indexing
MEI’s syl
Dragon’s word
Dragon’s syl
Dragon’s chr
Fusion of Words and Subwordsin Multi-Scale Retrieval
Wai-Kit Lo, Pat Schone
• Merging ranked lists from separate runs• For each query and document pair, the
score is recalculated as
– wk are the weights for different retrieval runs– K denotes a retrieval run at some scale (word,
characters, syllables, combinations)– Sk (Qi, Dj) is a rank-based score between
query i and document j in retrieval run k
Loose Coupling
Loose Coupling
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Ch
ar2
Sy
l2
ME
ISy
l2
Sy
l2
ME
ISy
l2
ME
ISy
l2
Wo
r d
Ch
ar 2
Ch
ar 2
Syl 2
Wo
r d
Wo
r d
fusion
Tight Coupling
• Unified indexing of words and subword ngrams• For query and documents
– Combine terms at different scales to form a multi-scale query/document representation, e.g.
• Multi-scale retrieval produces a single ranked listyi-se se-lie shou-xiang ben-jie jie-ming ne-tan tan-ya ya-hu
Loose vs Tight Coupling
• Tight coupling combines document scores before ranking– may need weight
optimization
• Loose coupling combines lists post-hoc– outperforms individual lists
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Me
an
Av
era
ge
Pre
cis
ion
Wo
rds
Ch
ar2
Lo
os
e
Tig
ht
Retrieval Method
The Perfect Retrieval Myth
Erika, Helen, Hsin-Min, Jian Qiang, Berlin
Differences in News Sources
The Perfect Retrieval Myth• 100% Average Precision = ALL relevant docs
and ZERO non-relevant docs retrieved
Query Processing
English Newswire Article
Term selectionTranslation errorsTranslation ambiguityOOV
Document Processing
Mandarin Audio Files
Speech recognition errorsWord tokenization ambiguity
OOV
Is corrupted by ...
“Bounds” on Word-Based Systems
• Using Mandarin VOA documents as exemplars– matched condition
• Using Xinhua text documents as exemplar– source mismatch
• Using manual translations of NYT documents as exemplars
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Mea
n Av
erag
e Pr
ecis
ion
VOA Xinhua NYT
Query Exemplars
ASR Output Anchor Scripts
“Bounds” on Subword-Based Systems
• Character bigrams for indexing– marginally outperforms
word-based systems
• Syllable bigrams– are quite competitive,
though somewhat behind
• Mean average precision ~0.6 is a good CL-SDR target
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Mea
n A
ver
age
Pre
cisi
on
Xinhua NYT
Query Exemplars
Words Char Syllable
TDT-2 Results
Retrieval Performance on TDT2
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Me
an
Av
era
ge
Pre
cis
ion
Word
s
Char
2
Syl
lable
s(C
LPM
)
Word
s (C
LPM
)
Char
2 (C
LPM
)
Char
1+2
+3
Word
s+C
har
(L)
Word
s+C
har
(T)
MEI S
ylla
ble
s
Lat
tice
TDT2
TDT-3 Results
Retrieval Performance on TDT3
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Me
an
Av
era
ge
Pre
cis
ion
Wo
rds
Ch
ar
2
Sy
l2(C
LP
M)
Wo
rds
(C
LP
M)
Ch
ar2
(C
LP
M)
Ch
ar
1+
2+
3
Wo
rds
+C
ha
r(L
)
Wo
rds
+C
ha
r(T
)
ME
I Sy
llab
les
La
ttic
e
Ch
ar2
(SR
Err
)
TDT2 TDT3
Summary and Conclusions• Novel multi-scale paradigm for CL-SDR
– ameliorates the translation and recognition OOV problems
• Multi-scale query and document processing– cross-lingual subword transliteration procedure (CLPM)
– query and document construction embeds words / characters / syllables
– balanced and structured queries
• Multi-scale retrieval– tight and loose coupling strategies to fuse words and
subwords for retrieval
Summary and Conclusions (2)
• Extensive experiments on TDT-2, TDT-3– character bigrams typically outperform words or
syllable bigrams in retrieval
– fusion of word and subword units shows potential in multi-scale retrieval
– syllable lattice needs further investigation
Future Work
• Word-subword fusion techniques merit further investigation
• Multi-scale query expansion for retrieval performance improvement (Wai-Kit)
• Incorporation of acoustic scores in syllable lattice representation for documents
END