The MEI Team August 23, 2000

Mandarin-English Information (MEI):Investigating Translingual Speech Retrieval

Johns Hopkins University Center of Language and Speech Processing

Summer Workshop 2000

The MEI Team

August 23, 2000

MEI Team

• Senior Members

• Students

Helen Meng Chinese University of Hong KongErika Grams Advanced Analytic ToolsSanjeev Khudanpur Johns Hopkins UniversityGina-Anne Levow University of MarylandDouglas Oard University of MarylandPatrick Schone US Department of DefenseHsin-Min Wang Academia Sinica, Taiwan

Berlin Chen National Taiwan UniversityWai-Kit Lo Chinese University of Hong KongKaren Tang Princeton UniversityJianqiang Wang University of Maryland

Outline• Motivation• Background• The Multi-scale Paradigm

– multi-scale query processing

– multi-scale document indexing

– multi-scale retrieval

• The Perfect Retrieval Myth• Experiments and Findings• Conclusions and Future Work

Motivation

• Monolingual speech retrieval applications are emerging, e.g.– http://speechbot.research.compaq.com

source: www.real.com, Feb 2000

529

1367

English

OtherLanguages

Internet-accessibleRadio and Television Stations

Source: Global Reach

EnglishEnglish

2000 2005

Motivation (cont): Internet User Population

Chinese

MEI: The Big Picture

InteractiveRefinement

Speech-to-SpeechTranslation

English Spoken Documents

Retrieval Engine

English Text Query (Exemplar)

English-to-ChineseTranslation

Mandarin Audio News Broadcasts

Mandarin AudioIndexing (ASR)

Ranked List of Mandarin Spoken Documents

Concept Demo

Karen, Erika

Two Prevailing Problems in CL-SDR• Translation problem

– out-of-vocabulary (OOV) in translation– too many translations

• Recognition problem– OOV in recognition– acoustic confusions

• Solution: subword units may help– transliteration, e.g.

Northern Ireland /bei3 ai4 er3 lan2/ (in query)– recognition of subword units, e.g.

Iraq --> a rock (in document)

Background for Mandarin Speech Recognition

• 400 syllables – full phonological coverage in Mandarin Chinese

• 6,800 characters – full textual coverage in written Chinese (GB-coded)

– each character pronounced as a syllable

• Unknown number of Chinese words– one to several characters per word

– character combinations create different meanings

– ambiguity in word tokenization

OOV and Acoustic Confusions in Mandarin SDR

Query: …Iraq...

Subwords for Retrieval

• Character n-grams – robust to word-level mismatches due to

different tokenization

• Syllable n-grams– robust to word/character-level mismatches

due to homophones

• Partial matches possible

Pros

Con • Subwords contain reduced lexical knowledge c.f. words

The MEI Investigation• Use of a multi-scale representation for crosslingual spoken

document retrieval (CL-SDR)• Words and subwords

Research Challenges• Multi-scale query translation

• Multi-scale audio indexing

• Multi-scale retrieval

Query byExampleEnglish

NewswireExemplars

MandarinAudioStories

President Bill Clinton and Chinese President Jiang Zemin engaged in a spirited, televised debate Saturday over human rights and theTiananmen Square crackdown, and announced a string of agreements on arms control, energy and environmental matters. There were no announced breakthroughs on American human rights concerns, including Tibet, but both leaders accentuated the positive …

美国总统克林顿的助手赞扬中国官员允许电视现场直播克林顿和江泽民在首脑会晤后举行的联合记者招待会。。特别是一九八九镇压民主运动的决定。他表示镇压天安门民主运动是错误的 , 他还批评了中国对西藏精神领袖达国家安全事务助理伯格表示 , 这次直播让中国人第一次在种公开的论坛上听到围绕敏感的人权问题的讨论。在记者招待会上 …

Evaluation Collection

2265manually

segmentedstories

3371manually segmented

stories

DevelopmentCollection: TDT-2

EvaluationCollection: TDT-3

Mar 98

Oct 98 Dec 98

17 topics,variable number

of exemplars

Jun 98Jan 98

Exhaustive relevance assessment based on event overlap

English texttopic exemplars:Associated PressNew York Times

Mandarin audiobroadcast news:Voice of America

56 topics,variable number

of exemplars

Cross-LanguageSpeech Retrieval

American EnglishText Exemplar

Ranked Listof News Stories

Mandarin ChineseBroadcast News

Abstract Task Model

Evaluation of Ranked Lists

VOA 0427.22

VOA 0521.14

VOA 0604.39

VOA 0419.12

VOA 0527.13

VOA 0513.17

…

Relevant

Not

Not

Relevant

Not

Relevant

…Relevance Judgments

Recall-Precision Graph

0.0

0.5

1.0

0.0 0.2 0.4 0.6 0.8 1.0

Recall

Inte

rpol

ated

Pre

cisi

on

0.0

0.5

1.0

0.0 0.2 0.4 0.6 0.8 1.0

Recall

Inte

rpol

ated

Pre

cisi

onVariation Across Exemplars

Average Across Exemplars

0.0

0.5

1.0

0.0 0.2 0.4 0.6 0.8 1.0

Recall

Me

an

Inte

rpo

late

d P

rec

.

0.353

Variation Across Topics

0.0

0.2

0.4

0.6

0.8

1.0

Me

an

Un

inte

rpo

late

d

Av

era

ge

Pre

cis

ion

Topic

Comparing Two Systems

0.0

0.2

0.4

0.6

0.8

1.0

Me

an U

nin

terp

ola

ted

Avg

Pre

c

System A System B

Topic

Significance Testing

• Statistical significance– Null hypothesis: mean average precision across

topics is drawn from same distribution

– Paired 2-tailed t-test, significant if p<0.05• For System A vs. System B, p=0.94

• Meaningful differences– Rule of thumb: 5-10% relative

• For System A vs. System B, relative difference is <1%

Translingual and Multi-ScaleQuery Processing

Mandarin Audio

Term Translation

President Bill Clinton and…

English Exemplar

Term Selection

BilingualTermList

Query Construction

MandarinIR System

StoryBoundaries

Evaluation

Named Entity

Tagging

DocumentConstruction

SpeechRecognition

Relevance Judgments

RankedList

BBN

U Mass

LDC

Cornell

DragonLDC

LDC

LDC 000100010000010100

MeanUninterpolated

AveragePrecision

Multi-Scale Query Translation

• Words and Phrases

(Gina, Sanjeev)

• Subwords

(Helen, Wai-Kit, Berlin, Karen)

Bilingual Term List

• Combination of– LDC English-Chinese bilingual term list

– Chinese-English Translation Assistance File (CETA) [inverted]

199,444395,216

81,127105,750

Total English TermsTotal Translation Pairs

Phrasal TermsPhrasal Translation Pairs

Termhuman right(s)human rights

# translations7

301

Query Term Selection

• Tagged named entities (BBN Identifinder)– Person: partners of Goldman, Sachs, & Co.– Organization: UN Security Council

• Dictionary-based “phrases”– translatable multi-word units, e.g– “Wall Street”, “best interests”, “guiding principles”, “human rights”

– automatic tagging: greedy, left-to-right, max match

• Chi-squared filtering– Compared to English background model

Query Term Translation

• Named entities– if absent from dictionary, translate individual terms

• e.g. “Security Council” versus “First Bank of Siam”

• Numeric Expressions– special processing for digits

• e.g. “12:30 pm, June 15, 1969”

• Remaining terms– Consult bilingual term list, lemmatize if necessary

• e.g. “televised” translates as “television”

Query Construction

• Unbalanced queries– Use all plausible translations for each term

• Balanced queries– Pseudo-term weight: average of translations’ weights

• Structured queries– Recompute pseudo-term weight from translations’ term

frequency and document frequency

Strategies in Query Translation

• Phrase based translation is significantly better

• Named entities and numeral translations are (barely) helpful

• Balanced translation matches Structured queries– also extends easily to

subword units

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Me

an

Av

era

ge

Pre

cis

ion

Wo

rds

Ph

ras

es

NE

/NU

ME

X

Ba

lan

ce

d

Strategy

Untranslatable Terms

suharto 97 (# of occurrences)netanyahu 88starr 62arafat 50bjp 45vajpayee 44estrada 44….hsu 19zemin 7

# (by token)87,0043,028

# (by type)12,4021,122

TermstotalOOV

Subword Transliteration

English Query Exemplar

Mandarin Audio Document

……..Kosovo…...

…../ke-suo-fo/….

Sound alike --> match in phonetic space?

Kosovo (/ke1-suo3-wo4/, /ke1-suo3-fo2/, /ke1-suo3-fu1, /ke1-suo3-fu2/)

Subword Transliteration Procedure (1)

Named Entities

PinYin / WadeGiles Spellingse.g. Wang Jianqiang, Wang Hsinmin

Syllables, e.g.wang jian qiangwang xin min

Acquire English Pronunciation• PRONLEX Lookup• Spelling-to-Pron Generation• e.g. christopher

English Phones, e.g. /kk rr ih ss tt aa ff er/

Trans. Error-Driven Learning[Brill 1994]PRONLEX, 85K(train), 4.5K (test)82%(phoneme), 45% (word)

Subword Transliteration Procedure (2)

Cross-lingual Phonetic MappingEnglish phones to Chinese “phones”

Trans. Error-Driven Learning• 4800 words (train) [Chen H. H., NTU; WWW]• FST aligns Eng / Chin phones• /k e l i s i t uo f u/

Chinese phone lattice generationSyllable bigram language modelN-best syllable sequence hyp

N=1 (one-best hypothesis)/ji li si te fu/ (hyp)/ke li si tuo fu/ (ref)

/kk rr ih ss tt aa ff er/

Cross-lingual Phonological Rules• Syllable nuclei insertion

Handle consonant clustersWord-final consonants, etc.

/kk ax rr ih ss ax tt aa ff er/

Cross Lingual Phonetic Matching

• Documents are indexed with syllable bigrams (in addition to words and character bigrams if necessary)

• Query terms are translated as words where possible, phonetically where necessary 0.2

0.3

0.4

0.5

0.6

0.7

0.8

Me

an

Av

era

ge

Pre

cis

ion

Word Char Syllable

Indexing Terms

no CLPM CLPM

Multi-Scale Query Construction

Helen

Multi-Scale Query Construction:Objectives

Query Construction

Bag of Englishquery terms(selected)

Multi-scale queryrepresentation in

Chinese

Multi-scale representation integrates:• translated phrases, named entities, numeric expressions, translated terms• transliterated syllables• words, characters and syllable n-grams

Multi-Scale Query ConstructionProcedures

Syllable bigrams and Transliterationsyi-se se-lie shou-xiang ben-jie jie-ming ne-tan tan-ya ya-hu

English Bag of TermsIsraeli <Ph>Prime Minister</Ph> <NE>Benjamin Netanyahu</NE>

Chinese Translations and Transliteration

ne-tan tan-ya ya-hu

Character bigrams and Transliterations

ne-tan tan-ya ya-hu

words + syl bigrams

char + syl bigrams

syl bigrams

Multi-Scale Audio Document Indexing

Hsin-min, Helen, Berlin, and Wai-kit

Previous Chinese Example

Audio Document IndexingObjectives

• Augment words with subword-based indexing• Dragon word recognition outputs are provided• Character-based indexing

– Characters derived from Dragon’s recognized words

• Syllable-based indexing– Syllables derived by pronunciation lookup using Dragon’s

recognized words

• Address Dragon’s ASR errors– Augment with alternative (word/char/syl) hypotheses e.g.

syllable lattice [Chen & Wang, ICASSP-2000]

Syllable Lattice Development

Dragon’s syl

• Dragon’s recognition accuracies– Evaluated against anchor scripts – 82.0%(word) 87.9%(char) 92.1%(syl)– Syllable substitution errors (5.2%)

• MEI’s syllable recognition accuracy– Trained on Hub4 Mandarin (VOA, 11 hours, 1997)– 70.2% (syl) !!!

Alternative syl

• Develop a syllable recognizer to produce lattice representation

Strategy

• Improve MEI’s syllable recognizer

• Design a structure for document indexing which incorporates– Dragon’s word / character / syllable hypotheses

– MEI’s syllable hypotheses

(hopefully complementary to Dragon’s syllables)

MEI Syllable Recognizer:Improve Acoustic Models

VOA Audio for Doc i Forced

AlignmentSpeaker Adaptation

Speaker-Adapted Acoustic Models

Baseline Acoustic Models

Syllable Recognition

MEI Syllablesfor Doc i• Forced alignment with Dragon’s output for each document

• Blind speaker adaptation with Dragon’s syllables• MEI syllable accuracy: 70.2%(original) 87.7% !!!

Dragon Outputs for Doc i

MEI Syllable Recognizer:Incorporate Language Model

VOA Audio for Doc i

Dragon Outputs for Doc i

Forced Alignment

Speaker Adaptation

Speaker-Adapted Acoustic Models

Baseline Acoustic Models

Syllable Recognition

MEI Syllablesfor Doc i

1998 XinhuaLanguage Models

• Syllable trigram language model • MEI syllable accuracy: 70.2%87.7%90.0% !!!

Audio Document Indexing withMultiple Syllable Recognition Outputs

Dragon’s syl

MEI’s syl

Two separate recognition outputsDragon’s syl

MEI’s syl

The revised syllable lattice

Multi-scale Audio Document Indexing

MEI’s syl

Dragon’s word

Dragon’s syl

Dragon’s chr

Fusion of Words and Subwordsin Multi-Scale Retrieval

Wai-Kit Lo, Pat Schone

• Merging ranked lists from separate runs• For each query and document pair, the

score is recalculated as

– wk are the weights for different retrieval runs– K denotes a retrieval run at some scale (word,

characters, syllables, combinations)– Sk (Qi, Dj) is a rank-based score between

query i and document j in retrieval run k

Loose Coupling

Loose Coupling

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Ch

ar2

Sy

l2

ME

ISy

l2

Sy

l2

ME

ISy

l2

ME

ISy

l2

Wo

r d

Ch

ar 2

Ch

ar 2

Syl 2

Wo

r d

Wo

r d

fusion

Tight Coupling

• Unified indexing of words and subword ngrams• For query and documents

– Combine terms at different scales to form a multi-scale query/document representation, e.g.

• Multi-scale retrieval produces a single ranked listyi-se se-lie shou-xiang ben-jie jie-ming ne-tan tan-ya ya-hu

Loose vs Tight Coupling

• Tight coupling combines document scores before ranking– may need weight

optimization

• Loose coupling combines lists post-hoc– outperforms individual lists

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Me

an

Av

era

ge

Pre

cis

ion

Wo

rds

Ch

ar2

Lo

os

e

Tig

ht

Retrieval Method

The Perfect Retrieval Myth

Erika, Helen, Hsin-Min, Jian Qiang, Berlin

Differences in News Sources

The Perfect Retrieval Myth• 100% Average Precision = ALL relevant docs

and ZERO non-relevant docs retrieved

Query Processing

English Newswire Article

Term selectionTranslation errorsTranslation ambiguityOOV

Document Processing

Mandarin Audio Files

Speech recognition errorsWord tokenization ambiguity

OOV

Is corrupted by ...

“Bounds” on Word-Based Systems

• Using Mandarin VOA documents as exemplars– matched condition

• Using Xinhua text documents as exemplar– source mismatch

• Using manual translations of NYT documents as exemplars

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Mea

n Av

erag

e Pr

ecis

ion

VOA Xinhua NYT

Query Exemplars

ASR Output Anchor Scripts

“Bounds” on Subword-Based Systems

• Character bigrams for indexing– marginally outperforms

word-based systems

• Syllable bigrams– are quite competitive,

though somewhat behind

• Mean average precision ~0.6 is a good CL-SDR target

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Mea

n A

ver

age

Pre

cisi

on

Xinhua NYT

Query Exemplars

Words Char Syllable

TDT-2 Results

Retrieval Performance on TDT2

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Me

an

Av

era

ge

Pre

cis

ion

Word

s

Char

2

Syl

lable

s(C

LPM

)

Word

s (C

LPM

)

Char

2 (C

LPM

)

Char

1+2

+3

Word

s+C

har

(L)

Word

s+C

har

(T)

MEI S

ylla

ble

s

Lat

tice

TDT2

TDT-3 Results

Retrieval Performance on TDT3

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Me

an

Av

era

ge

Pre

cis

ion

Wo

rds

Ch

ar

2

Sy

l2(C

LP

M)

Wo

rds

(C

LP

M)

Ch

ar2

(C

LP

M)

Ch

ar

1+

2+

3

Wo

rds

+C

ha

r(L

)

Wo

rds

+C

ha

r(T

)

ME

I Sy

llab

les

La

ttic

e

Ch

ar2

(SR

Err

)

TDT2 TDT3

Summary and Conclusions• Novel multi-scale paradigm for CL-SDR

– ameliorates the translation and recognition OOV problems

• Multi-scale query and document processing– cross-lingual subword transliteration procedure (CLPM)

– query and document construction embeds words / characters / syllables

– balanced and structured queries

• Multi-scale retrieval– tight and loose coupling strategies to fuse words and

subwords for retrieval

Summary and Conclusions (2)

• Extensive experiments on TDT-2, TDT-3– character bigrams typically outperform words or

syllable bigrams in retrieval

– fusion of word and subword units shows potential in multi-scale retrieval

– syllable lattice needs further investigation

Future Work

• Word-subword fusion techniques merit further investigation

• Multi-scale query expansion for retrieval performance improvement (Wai-Kit)

• Incorporation of acoustic scores in syllable lattice representation for documents

The MEI Team August 23, 2000

Documents

Transcript of The MEI Team August 23, 2000