Topic Models for Dynamic Translation Model Adaptation · doc1 doc1 out out in dev w test ....

Post on 11-Jun-2020

19 views 0 download

Transcript of Topic Models for Dynamic Translation Model Adaptation · doc1 doc1 out out in dev w test ....

Topic Models for Dynamic Translation Model Adaptation

Vladimir Eidelman

Jordan Boyd-Graber

Philip Resnik

(Typical) Domain Adaptation

doc4

doc3 doc2

doc4 doc3

doc2 doc1

doc4 doc3

doc2 doc1 doc1

Training Corpus

(Typical) Domain Adaptation

doc4

doc3 doc2

doc4 doc3

doc2 doc1

doc4 doc3

doc2 doc1 doc1

Newswire

(Typical) Domain Adaptation

doc4

doc3 doc2

doc4 doc3

doc2 doc1

doc4 doc3

doc2 doc1 doc1

Newswire Web

(Typical) Domain Adaptation

doc4

doc3 doc2

doc4 doc3

doc2 doc1

doc4 doc3

doc2 doc1 doc1

Newswire Web Europarl

(Typical) Domain Adaptation

doc4

doc3 doc2

doc4 doc3

doc2 doc1

doc4 doc3

doc2 doc1 doc1

Newswire Web Europarl

dev

(Typical) Domain Adaptation

doc4

doc3 doc2

doc4 doc3

doc2 doc1

doc4 doc3

doc2 doc1 doc1

out out in

dev

(Typical) Domain Adaptation

doc4

doc3 doc2

doc4 doc3

doc2 doc1

doc4 doc3

doc2 doc1 doc1

out out in

dev w test

Motivation

doc4

doc3 doc2

doc4 doc3

doc2 doc1

doc4 doc3

doc2 doc1 doc1

Motivation

doc4

doc3 doc2

doc4 doc3

doc2 doc1

doc4 doc3

doc2 doc1 doc1

test dev w test

Aims

• Model Domain

– Induce soft unsupervised domains

• Latent Topics

• Apply to MT

– Bias translation model

• Introduce topic-dependent lexical weighting

Lexical Weighting

• Estimate phrase pair quality word-by-word

粉丝 很多 fěnsī hěnduō noodles a lot of

Lexical Weighting

• Estimate phrase pair quality word-by-word

粉丝 很多 fěnsī hěnduō noodles a lot of

Lexical Weighting

• Estimate phrase pair quality word-by-word

粉丝 很多 fěnsī hěnduō noodles a lot of fans a lot of

Topic Models

•Used MALLET (McCallum, 2002) •Latent Dirichlet Allocation (Blei, Ng, Jordan 2003) •Only on source •Topic distribution the same for every sentence in document

Standard Lexical Weighting

粉丝很多

粉丝很多

Standard Lexical Weighting

Source Target P(e|f)

粉丝很多 lots of noodles .45

粉丝很多 lots of fans .33

粉丝很多

粉丝很多

Translation Table

Standard Lexical Weighting

Translation Table

粉丝很多

粉丝很多

Source Target P(e|f)

粉丝很多 lots of noodles .45

粉丝很多 lots of fans .33

Domain Lexical Weighting (Chiang 2011)

粉丝很多

粉丝很多

Domain Lexical Weighting

Translation Table: nw

(Chiang 2011)

粉丝很多

粉丝很多

Source Target P(e|f)

粉丝很多 lots of noodles .41

粉丝很多 lots of fans .32

Domain Lexical Weighting

Translation Table: nw

Translation Table: Web

(Chiang 2011)

粉丝很多

粉丝很多

Source Target Ps=nw(e|f)

粉丝很多 lots of noodles .41

粉丝很多 lots of fans .32

Source Target Ps=wb(e|f)

粉丝很多 lots of noodles .30

粉丝很多 lots of fans .58

Lexical Weighting with Topic Models

粉丝很多

粉丝很多

Lexical Weighting with Topic Models

粉丝很多

粉丝很多

Translation Table: Topic 1

Source Target Ptopic=1(e|f)

粉丝很多 lots of noodles .71

粉丝很多 lots of fans .15

Lexical Weighting with Topic Models

Translation Table: Topic 2

粉丝很多

粉丝很多

Translation Table: Topic 1

Source Target Ptopic=1(e|f)

粉丝很多 lots of noodles .71

粉丝很多 lots of fans .15

Source Target Ptopic=2(e|f)

粉丝很多 lots of noodles .41

粉丝很多 lots of fans .47

Lexical Weighting with Topic Models

Translation Table: Topic 2

粉丝很多

粉丝很多

Source Target Ptopic=2(e|f)

粉丝很多 lots of noodles .41

粉丝很多 lots of fans .47

Translation Table: Topic 1

Source Target Ptopic=1(e|f)

粉丝很多 lots of noodles .71

粉丝很多 lots of fans .15

Translation Table: Topic 3

Source Target Ptopic=3(e|f)

粉丝很多 lots of noodles .21

粉丝很多 lots of fans .68

Lexical Weighting Adaptation Features

Source Target Ptopic(e|f)

粉丝很多 lots of noodles .71

粉丝很多 lots of fans .15

Source Target Ptopic(e|f)

粉丝很多 lots of noodles .41

粉丝很多 lots of fans .47

Source Target Ptopic(e|f)

粉丝很多 lots of noodles .21

粉丝很多 lots of fans .68

Translation Table: Topic 1

Translation Table: Topic 2

Translation Table: Topic 3

test sentence

Lexical Weighting Adaptation Features

ƒ1(e|f) = 0.71 * 0.65

Source Target Ptopic(e|f)

粉丝很多 lots of noodles .71

粉丝很多 lots of fans .15

Source Target Ptopic(e|f)

粉丝很多 lots of noodles .41

粉丝很多 lots of fans .47

Source Target Ptopic(e|f)

粉丝很多 lots of noodles .21

粉丝很多 lots of fans .68

Translation Table: Topic 1

Translation Table: Topic 2

Translation Table: Topic 3

Lexical Weighting Adaptation Features

ƒ1(e|f) = 0.15 * 0.65

Source Target Ptopic(e|f)

粉丝很多 lots of noodles .71

粉丝很多 lots of fans .15

Source Target Ptopic(e|f)

粉丝很多 lots of noodles .41

粉丝很多 lots of fans .47

Source Target Ptopic(e|f)

粉丝很多 lots of noodles .21

粉丝很多 lots of fans .68

Translation Table: Topic 1

Translation Table: Topic 2

Translation Table: Topic 3

Lexical Weighting Adaptation Features

Source Target Ptopic(e|f)

粉丝很多 lots of noodles .71 0.46

粉丝很多 lots of fans .15 0.09

Source Target Ptopic(e|f)

粉丝很多 lots of noodles .41 0.09

粉丝很多 lots of fans .47 0.10

Source Target Ptopic(e|f)

粉丝很多 lots of noodles .21 0.02

粉丝很多 lots of fans .68 0.08

Translation Table: Topic 1

Translation Table: Topic 2

Translation Table: Topic 3

Lexical Weighting Adaptation Features

粉丝很多 ||| lots of fans ||| ƒ1(e|f)=.46 ƒ2(e|f)=.09 ƒ3(e|f)=.02 ƒ1(f|e) ƒ2(f|e) ƒ3(f|e) …

Source Target Ptopic(e|f)

粉丝很多 lots of noodles .71 0.46

粉丝很多 lots of fans .15 0.09

Source Target Ptopic(e|f)

粉丝很多 lots of noodles .41 0.09

粉丝很多 lots of fans .47 0.10

Source Target Ptopic(e|f)

粉丝很多 lots of noodles .21 0.02

粉丝很多 lots of fans .68 0.08

Translation Table: Topic 1

Translation Table: Topic 2

Translation Table: Topic 3

Experiments

• Chinese-English

• Two settings – Small (FBIS)

• 300k sentence pairs

• Document boundaries

– Large (~NIST) • 1.6m sentence pairs

• No documents

• NIST MT06 tune, MT03 & 05 test

• MIRA optimizer

Unsupervised Domain Induction

• What is a document (for topic modeling)?

• Only some MT data have document boundaries

• Treat each sentence as document

Document v. Sentence Results

Document v. Sentence Results

Document v. Sentence Results

Document v. Sentence Results

Document v. Sentence Results

FBIS Document v. Sentence Results

Large Setting

Large Setting

Future Work

• Improve Topic Model

– Multilingual Topic Modeling

– More (mono,multi)-lingual data

– Hierarchical models

• Other languages

Conclusions

• Extend domain adaptation

– No reliance on collection/genre annotation

– Finer-grained topic distributions

• Bias transation toward topic

– Lexical weighting adaptation with soft membership

• Add Ptopic(e|f) and Ptopic(f|e) features to every rule

• Thank You!

• Question?

Feature Representation

• Topic Identity

– Probability under topic 1, topic 2?

– Cross-domain

• Topic Distribution

– Probability under most probable topic? Second most?

– Dynamic

Global vs. Local Topic Model

Large Corpus