GBO Presentation Cross-Lingual Language …clsp.jhu.edu/~woosung/pdf/gbo.pdfGBO Presentation...

GBO : Nov. 14, 2003 Cross-Lingual Language Modeling for ASR – p. 1/19

GBO Presentation

Cross-Lingual Language Modeling

for Automatic Speech RecogntionNovember 14, 2003

Woosung Kimwoosung@cs.jhu.edu

Center for Language and Speech ProcessingDept. of Computer Science

The Johns Hopkins University, Baltimore, MD 21218, USA

•Title

•Introduction

•Langauge Models

•Information Retrieval

•Cross-Lingual LM for ASR

•Model Estimation

•Cross-Lingual Triggers

•LSA for CLIR

•Corpora

•Experimental Results

•Conclusions

Introduction

Motivation :Success of statistical modeling techniques

Development of modeling and automatic learningtechniquesA large amount of data for training is availableMost resources on English, French and German

How to construct stochastic models in resource-deficientlanguages?

•Title

•Introduction

•Langauge Models

•Model Estimation

•LSA for CLIR

•Corpora

•Conclusions

Introduction

Motivation :Success of statistical modeling techniques

Development of modeling and automatic learningtechniquesA large amount of data for training is availableMost resources on English, French and German

How to construct stochastic models in resource-deficientlanguages? ➔ Bootstrap from other languages, e.g.

Universal phone-set for ASR(Schultz & Waibel, 98, Byrne et al, 00)

Exploit parallel texts to project morphologicalanalyzers, POS taggers, etc.(Yarowsky, Ngai & Wicentowski, 01)

Language modeling (this talk)

Introduction

We present:An approach to sharpen an LM in a resource-deficient language usingcomparable text from resource-rich languages

Introduction

Story-specific language models from contemporaneous text

Introduction

Story-specific language models from contemporaneous text

Integration of machine translation (MT), cross-language informationretrieval (CLIR), and language modeling (LM)

•Title

•Introduction

•Langauge Models

•Model Estimation

•LSA for CLIR

•Corpora

•Conclusions

Langauge Models

Ex1: Optical Character Recognition

All Iam a student

•Title

•Introduction

•Langauge Models

•Model Estimation

•LSA for CLIR

•Corpora

•Conclusions

Langauge Models

All I am a student

Ex2: Speech Recognition

It’s [tu:] cold He is [tu:]-years-old

•Title

•Introduction

•Langauge Models

•Model Estimation

•LSA for CLIR

•Corpora

•Conclusions

Langauge Models

All I am a student

Ex2: Speech Recognition

It’s [tu:] cold He is [tu:]-years-old

too two

Many speech & NLP applications deal with ungrammaticalsentences/phrases

Need to suppress ungrammatical outputs

Assign a probability to given a sequence of word strings

P (“He is two-years-old”) ≫ P (“He is too-years-old”)

Langauge Models: ASR problem

ASR problem : Finding word string W = w1, w2, · · · , wn given acousticevidence A

W = arg maxW

P (W |A) (1)

Bayes’ Rule

P (W |A) =P (W )P (A|W )

P (A)(2)

W = arg maxW

P (W )P (A|W ) (3)

Langauge Models: ASR problem

ASR problem : Finding word string W = w1, w2, · · · , wn given acousticevidence A

W = arg maxW

P (W |A) (1)

Bayes’ Rule

P (W |A) =P (W )P (A|W )

P (A)(2)

W = arg maxW

P (W )P (A|W ) (3)

Language Model Acoustic Model

Langauge Models: Formulation

P (w1, w2, · · · , wn)

= P (w1)P (w2|w1)P (w3|w1, w2) · · ·P (wn|w1, · · · , wn−1)

≈ P (w1)P (w2|w1)P (w3|w1, w2) · · ·P (wn|wn−2, wn−1)

= P (w1)P (w2|w1)

P (wi|wi−2, wi−1) (4)

P (wi|wi−2, wi−1) =N(wi−2, wi−1, wi)

N(wi−2, wi−1)=⇒ Trigram (5)

P (wi|wi−1) =N(wi−1, wi)

N(wi−1)=⇒ Bigram (6)

P (wi) =N(wi)

w∈V N(w)=⇒ Unigram (7)

where N(wi−2, wi−1, wi) denotes the number of times entry (wi−2, wi−1, wi)appears in the training data and V is the vocabulary.

Langauge Models: Evaluation

Word Error Rate (WER): Performace of ASR given LM

REFERENCE: UP UPSTATE NEW YORK SOMEWHERE UH OVER OVER HUGE AREAS

HYPOTHESIS: UPSTATE NEW YORK SOMEWHERE UH ALL ALL THE HUGE AREAS

COR(0)/ERR(1): 1 0 0 0 0 0 1 1 1 0 0

:4 errors per 10 words in reference; WER = 40%

➔ Ultimate measure, but expensive to compute

Perplexity (PPL): based on cross entropy of test data D w.r.t. LM M

H(PD; PM ) = −∑

PD(w) log2 PM (w) (8)

= −1

log2 PM (wi|wi−2, wi−1) (9)

PPLM (D) = 2H(PD;PM ) = [PM (w1, · · · , wN )]−1

N (10)

For both of WER and PPL, the lower, the better !Close correlation between WER and PPL

•Title

•Introduction

•Langauge Models

•Model Estimation

•LSA for CLIR

•Corpora

•Conclusions

Information Retrieval

Given a query doc, find relevant docs from the collection

Vector-based IR:Represent each doc as a Bag-Of-Words (BOW)Convert the BOW into the vector of termsAll terms are considered as independent (orthogonal)Measure similarity between a query and docs

air automobile bank car ...query : ( 0 1 2 0 ... )doc : ( 2 0 5 3 ... )

Similarity measure: cosine similarity ➔ inner product

sim(~dj , ~q) =

i=1 wij × wiq√

i=1 w2ij ×

i=1 w2iq

where ~dj = (w1j , · · · , wtj) and ~q = (w1q, · · · , wtq)

Retrieval performance evaluation:

Precision =No. {Retrieved docs ∩ Relevant docs}

No. Retrieved docs(12)

Recall =No. {Retrieved docs ∩ Relevant docs}No. Relevant docs (in the collection)

ProblemsTerms are not orthgonalMismatches between queries and docs

Polysemy: e.g., bank (❶ river, ❷ money, · · · ) ➔ recall ⇓Synonymy: e.g., car, automobile, vehicle, · · · ➔ precision ⇓

➔ Latent Semantic Analysis (LSA) has been proposed

Cross-Lingual IR: query & docs are in different languages ➔ Querytranslation approach

•Title

•Introduction

•Langauge Models

•Model Estimation

•LSA for CLIR

•Corpora

•Conclusions

Cross-Lingual LM for ASR

Automatic Speech

Recognition

Cross-Language

Statistical Machine

Translation

Baseline Chinese

Acoustic Model

Chinese Dictionary

(Vocabulary)

Baseline Chinese

Language Model

Translation Lexicons

Cross-Language

Unigram Model

Automatic

Transcription Contemporaneous

English Articles

English Article Aligned

with Mandarin Story

) | ( ˆ E

i d e P

) | ( e c P T

) | ( unigram CL

i d c P

Mandarin Story

•Title

•Introduction

•Langauge Models

•Model Estimation

•LSA for CLIR

•Corpora

•Conclusions

Model Estimation

Assume document correspondence, dEi ↔ dC

i , is known forChinese test doc dC

PCL-unigram(c|dEi ) =

PT (c|e)P (e|dEi ), ∀c ∈ C (14)

Cross-Language LM constructionBuild story-specific cross-language LMs, P (c|dE

i )Linear interpolation with the baseline trigram LM

PCL-interpolated(ck|ck−1, ck−2, dEi ) (15)

= λPCL-unigram(ck|dEi ) + (1 − λ)P (ck|ck−1, ck−2)

λ is optimized to minimize the PPL of heldout data viaEM algorithm

Model Estimation

Document correspondence ➔ obtained by CLIRFor each Chinese test doc dC

i , create an English BOWFind the English doc with the highest cosine similarity

dEi = arg max

dEj∈DE

simCL(P (e|dCi ), P (e|dE

j )) (16)

Estimation of PT (c|e) and PT (e|c)GIZA++ : statistical MT tool based on IBM model-4

Input : Hong Kong news Chinese-English sentence-aligned parallelcorpus➔ 18K docs, 200K sents, 4M wds eachWe only need translation tables : PT (e|c) and PT (c|e)

Mutual Information-based CL-triggersCL-LSA (Latent Semantic Analysis)

•Title

•Introduction

•Langauge Models

•Model Estimation

•LSA for CLIR

•Corpora

•Conclusions

Cross-Lingual Lexical Triggers: Identification

Monolingual Triggers : e.g. “either ... or”

Cross-Lingual Setting : Translation lexicons

Based on Average Mutual Information (I(e; c))Let

P (e, c) =#d(e, c)

Nand P (e, c) =

#d(e, c)

where #d(e) denote the number of English articles in which e

occurs, and let

P (e) =#d(e)

Nand P (c|e) =

P (e, c)

P (e)(18)

I(e; c) = P (e, c) log P (c|e)P (c) + P (e, c) log P (c|e)

+ P (e, c) log P (c|e)P (c) + P (e, c) log P (c|e)

P (c) (19)

Cross-Lingual Lexical Triggers: Estimation

We estimate the trigger-based CL unigram probs with

PTrig(c|e) =I(e; c)

c′∈C I(e; c′), (20)

Analogous to (14),

PTrig-unigram(c|dEi ) =

PTrig(c|e)P (e|dEi ) (21)

Again, we build the interpolated model

PTrig-interpolated(ck|ck−1, ck−2, dEi ) (22)

= λPTrig-unigram(ck|dEi ) + (1 − λ)P (ck|ck−1, ck−2)

•Title

•Introduction

•Langauge Models

•Model Estimation

•LSA for CLIR

•Corpora

•Conclusions

Latent Semantic Analysis for CLIR

Singular Value Decomposition (SVD) of the parallel corpus

M N M R R R R N

W U S V T

x x x x

Input : word-document frequency matrix, W

Reduce the dimension into the smaller but adequatesubpace ➔ Singular Value Decomposition : U, V , and S

S : diagonal matrix w/ diagonal entries σ1, · · · , σk whereσ1 ≥ σ2 ≥ · · · ≥ σk(k ≥ R)

Remove noisy entries by setting σi = 0 for i > R

•Title

•Introduction

•Langauge Models

•Model Estimation

•LSA for CLIR

•Corpora

•Conclusions

M N M R R R R N

W U S V T

x x x x

•Title

•Introduction

•Langauge Models

•Model Estimation

•LSA for CLIR

•Corpora

•Conclusions

M N M R R R R N

W U S V T

x x x x

Folding-in a monolingual corpus

. . . .

0 . . . . 0

M P M R R R R P

W U S V T

x x x x

Given a monolingual corpus, W , in either side

Use the same matrices U, S

Project into low-dimensional space, V = S−1U−1W

Compare a query and a document in the reduced dimensional space

•Title

•Introduction

•Langauge Models

•Model Estimation

•LSA for CLIR

•Corpora

•Conclusions

Training and Test Corpora

Acoustic model trainingHUB4-NE Mandarin training data (96K wds) ∼ 10 hours

Chinese monolingual language model trainingXINHUA : 13M wdsHUB4-NE : 96K wds

ASR test set : NIST HUB4-NE test data (only F0 portion)1263 sents, 9.8K wds (1997 ∼ 1998)

English CLIR corpus : NAB-TDTNAB (1997 LA, WP) + TDT-2 (1998 APW, NYT)45K docs, 30M wds

•Title

•Introduction

•Langauge Models

•Model Estimation

•LSA for CLIR

•Corpora

•Conclusions

ASR Experimental Results

Vocab : 51K for Chinese300-best list rescoring

Oracle best/worst WER :33.4/94.4% for Xinhua and 39.7/95.5% for HUB4-NE

Language Model Perp WER CER p-value

Xinhua Trigram 426 49.9% 28.8% –

Trig-interpolated 367 49.1% 28.6% 0.004LSA-interpolated 364 49.3% 28.9% 0.043CL-interpolated 346 48.8% 28.4% < 0.001

HUB4-NE Trigram 1195 60.1% 44.1% –

Trig-interpolated 727 58.8% 43.3% < 0.001LSA-interpolated 695 58.6% 43.1% <0.001CL-interpolated 630 58.8% 43.1% < 0.001

•Title

•Introduction

•Langauge Models

•Model Estimation

•LSA for CLIR

•Corpora

•Conclusions

Conclusions

Exploits side-information from contemporaneous articles➔ useful for resource-deficient languages

Statistically significant improvements in ASR WER

Use of CL triggers & CL-LSA➔ A document-aligned corpus suffices rather than asentence-aligned corpus

Future workExtensions to higher order N -grams (e.g., bigrams)Discriminate LMs for Word Sense Disambiguation➔ story-specific translation modelsApplications to other languages (e.g., Arabic) and othertasks (e.g., MT)

GBO Presentation Cross-Lingual Language …clsp.jhu.edu/~woosung/pdf/gbo.pdfGBO Presentation...

Documents

Transcript of GBO Presentation Cross-Lingual Language …clsp.jhu.edu/~woosung/pdf/gbo.pdfGBO Presentation...

Lingual Institute

intra-lingual and cross-lingual voice conversion using harmonic plus ...

Lingual Incisor

Face recogntion Using PCA Algorithm

Arco lingual

Speech Recogntion Using Hidden Markov Models

5 Fears of Speech Recogntion

Mul%lingual / Cross-lingual Methods

Learning center national recogntion

Woosung International · 2016-12-26 · Woosung International Woosung International / Woosung International was established in July 2003, and has been exporting Polyester Staple Fiber

How To Get Recogntion At Work

Lingual Thyroid

Face Recogntion Technology

VRGVIN002 Character Recogntion. Using Cross correlation.

Audio lingual approach1

Audio Lingual Method

A front-runner of developed kitchen culture, GRAND WOOSUNG Inc.

Cross-Lingual Latent Semantic Analysis for Language Modelingclsp.jhu.edu/~woosung/pdf/icassp04-talk.pdf · Cross-Lingual Latent Semantic Analysis for Language Modeling ... p. 2/12

Woosung Kim TRB 2007

Nerf lingual