Large-Context Models for Large-Scale Machine...

191
Large-Context Models for Large-Scale Machine Translation John DeNero Dissertation Talk

Transcript of Large-Context Models for Large-Scale Machine...

Page 1: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Large-Context Models forLarge-Scale Machine Translation

John DeNeroDissertation Talk

Page 2: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Statistical Language Systems are Working

1

Page 3: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Statistical Language Systems are Working

1

Page 4: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Statistical Language Systems are Working

1

Google spent $5.6 billion on infrastructure

in the last 3 years

How?

Data!

Google.com annual report of capital expenditure, “the majority of which was related to IT infrastructure investments.”1

Page 5: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Statistical Language Systems are Working

1

Google spent $5.6 billion on infrastructure

in the last 3 years

How?

Data!

Google.com annual report of capital expenditure, “the majority of which was related to IT infrastructure investments.”1

Page 6: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Statistical Language Systems are Working

1

Google spent $5.6 billion on infrastructure

in the last 3 years

How?

Data!

Google.com annual report of capital expenditure, “the majority of which was related to IT infrastructure investments.”1

Page 7: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Statistical Language Systems are Working

1

People use [Google Translate] hundreds of millions of times a week.“ ” 2

“Google’s Computing Power Refines Translation Tool,” New York Times, 9 March 2010, Technology Section.2

Google spent $5.6 billion on infrastructure

in the last 3 years

How?

Data!

Google.com annual report of capital expenditure, “the majority of which was related to IT infrastructure investments.”1

Page 8: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Many Use(r)s of Machine Translation

Assimilation Dissemination Communication

Page 9: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Many Use(r)s of Machine Translation

Assimilation Dissemination Communication

• Document translation

• Broadcast monitoring

• Intelligence gathering

Page 10: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Many Use(r)s of Machine Translation

Assimilation Dissemination Communication

73%Most Internet users can’t read the English Web

• Document translation

• Broadcast monitoring

• Intelligence gathering

Page 11: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Many Use(r)s of Machine Translation

Assimilation Dissemination Communication

73%Most Internet users can’t read the English Web

• Document translation

• Broadcast monitoring

• Intelligence gathering

Page 12: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Many Use(r)s of Machine Translation

Assimilation Dissemination Communication

73%Most Internet users can’t read the English Web

• Document translation

• Broadcast monitoring

• Intelligence gathering

machine translated

Page 13: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Many Use(r)s of Machine Translation

Assimilation Dissemination Communication

73%Most Internet users can’t read the English Web

• Document translation

• Broadcast monitoring

• Intelligence gathering

machine translated

Page 14: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Many Use(r)s of Machine Translation

Assimilation Dissemination Communication

73%Most Internet users can’t read the English Web

• Document translation

• Broadcast monitoring

• Intelligence gathering

machine translated

John I’m in Berkeley

Page 15: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Many Use(r)s of Machine Translation

Assimilation Dissemination Communication

73%Most Internet users can’t read the English Web

• Document translation

• Broadcast monitoring

• Intelligence gathering

machine translated

John I’m in Berkeley

Juan Estoy en Berkeley

Page 16: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Many Use(r)s of Machine Translation

Assimilation Dissemination Communication

73%Most Internet users can’t read the English Web

• Document translation

• Broadcast monitoring

• Intelligence gathering

machine translated

John I’m in Berkeley

Juan Estoy en Berkeley

• Emergency room triage

• Military deployments

• Multilingual education

• 9-1-1 Response

• Commerce with tourists

Page 17: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Many Use(r)s of Machine Translation

Assimilation Dissemination Communication

73%Most Internet users can’t read the English Web

• Document translation

• Broadcast monitoring

• Intelligence gathering

machine translated

John I’m in Berkeley

Juan Estoy en Berkeley

• Emergency room triage

• Military deployments

• Multilingual education

• 9-1-1 Response

• Commerce with tourists

婆婆

John

[Grandma-in-law]

Page 18: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Data-Driven Machine Translation

Parallel corpus gives translation examples

Yo lo haré de muy buen grado

I will do it gladly

Después lo veras

You will see later

Machine translation system:

Target language corpus gives examples of well-formed sentences

I will get to it later See you later He will do it

Page 19: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Data-Driven Machine Translation

Parallel corpus gives translation examples

Yo lo haré de muy buen grado

I will do it gladly

Después lo veras

You will see later

Machine translation system:

Model of translation

Target language corpus gives examples of well-formed sentences

I will get to it later See you later He will do it

Page 20: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

I will do it later

Target language

Data-Driven Machine Translation

Parallel corpus gives translation examples

Yo lo haré de muy buen grado

I will do it gladly

Después lo veras

You will see later

Machine translation system:

Model of translation

Target language corpus gives examples of well-formed sentences

I will get to it later See you later He will do it

Yo lo haré despuésNOVEL SENTENCE

Source language

Page 21: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

VB

MD VP

VPNP

S

PRP ADV

Stitching Together Fragments

Yo lo haré de muy buen grado

I will do it gladly

Después lo veras

You will see later

PRPVB

MD VP

VPNP

S

PRP ADV

I will do it laterModel of translation

Yo lo haré después

Machine translation system:

Parallel corpus gives translation examples

Page 22: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

VB

MD VP

VPNP

S

PRP ADV

Stitching Together Fragments

Yo lo haré de muy buen grado

I will do it gladly

Después lo veras

You will see later

PRPVB

MD VP

VPNP

S

PRP ADV

I will do it laterModel of translation

Yo lo haré después

Machine translation system:

ADV ADV

Parallel corpus gives translation examples

Page 23: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

VB

MD VP

VPNP

S

PRP ADV

Stitching Together Fragments

Yo lo haré de muy buen grado

I will do it gladly

Después lo veras

You will see later

PRPVB

MD VP

VPNP

S

PRP ADV

I will do it laterModel of translation

Yo lo haré después

Machine translation system:S S

ADV ADV

Parallel corpus gives translation examples

Page 24: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

VB

MD VP

VPNP

S

PRP ADV

Stitching Together Fragments

Yo lo haré de muy buen grado

I will do it gladly

Después lo veras

You will see later

PRPVB

MD VP

VPNP

S

PRP ADV

I will do it laterModel of translation

Yo lo haré después

Machine translation system:S S

ADV ADV

Parallel corpus gives translation examples

Page 25: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

An Example Syntax-Based TranslationNew Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 2

New Arabic v5.1 base system - sentence 211foreign:tac-lang: urfD albaz aladla’ baá tSryHat fur uSulh alá almqaT‘e .bckwltr: wrfD AlbAz AlAdlA’ bAY tSryHAt fwr wSwlh AlY AlmqATEp .

Tune.nw.0: al @-@ baz declined to make any statements upon his arrival in the province .Tune.nw.1: al @-@ baz refused to give any statements on arriving at al @-@ muqataah .Tune.nw.2: immediately upon his arrival in the area , al @-@ baz declined to give any statements .Tune.nw.3: al @-@ baz refused to make any statement upon his arrival at the moqata’ah .1-best: al @-@ baz declined to give any statements upon his arrival in the province .

[ara-tune4600:211] 1-best Dot Productfeature weight value product

derivation-size 0.41 8 3.30glue-rule 3.89 2 7.78green -0.08 0 0gt_prob 0.40 36.18 14.43identity -9.97 0 0

is_lexicalized -0.65 6 -3.91lex_pef 1.02 5.47 5.60lex_pfe 0.31 4.44 1.39lm1 1 22.76 22.76

lm1-unk 30.08 0 0lm2 0.74 26.66 19.79

lm2-unk -39.18 0 0missingWord -1.29 0 0model1inv 1.02 10.60 10.81model1nrm 1.35 11.29 15.22

nonmonotone 4.17 0 0olive 1.95 0 0psm1n 0.50 24.65 12.30

text-length -3.87 15 -58.05trivial_cond_prob 0.41 3.34 1.38

unk-rule 19.28 0 0reported totalcost 52.82 v · w 52.82

Arabic source sentence:

Page 26: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

An Example Syntax-Based TranslationNew Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 2

New Arabic v5.1 base system - sentence 211foreign:tac-lang: urfD albaz aladla’ baá tSryHat fur uSulh alá almqaT‘e .bckwltr: wrfD AlbAz AlAdlA’ bAY tSryHAt fwr wSwlh AlY AlmqATEp .

Tune.nw.0: al @-@ baz declined to make any statements upon his arrival in the province .Tune.nw.1: al @-@ baz refused to give any statements on arriving at al @-@ muqataah .Tune.nw.2: immediately upon his arrival in the area , al @-@ baz declined to give any statements .Tune.nw.3: al @-@ baz refused to make any statement upon his arrival at the moqata’ah .1-best: al @-@ baz declined to give any statements upon his arrival in the province .

[ara-tune4600:211] 1-best Dot Productfeature weight value product

derivation-size 0.41 8 3.30glue-rule 3.89 2 7.78green -0.08 0 0gt_prob 0.40 36.18 14.43identity -9.97 0 0

is_lexicalized -0.65 6 -3.91lex_pef 1.02 5.47 5.60lex_pfe 0.31 4.44 1.39lm1 1 22.76 22.76

lm1-unk 30.08 0 0lm2 0.74 26.66 19.79

lm2-unk -39.18 0 0missingWord -1.29 0 0model1inv 1.02 10.60 10.81model1nrm 1.35 11.29 15.22

nonmonotone 4.17 0 0olive 1.95 0 0psm1n 0.50 24.65 12.30

text-length -3.87 15 -58.05trivial_cond_prob 0.41 3.34 1.38

unk-rule 19.28 0 0reported totalcost 52.82 v · w 52.82

Arabic source sentence:

Al-baz declined to make any statements upon his arrival in the province

Reference translation from a human translator:

Page 27: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

An Example Syntax-Based TranslationNew Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 3

[ara-tune4600:211] 1-best PoS-Tree

al

NNP

@-@

HYPH

baz

NNP

NML

NPB

NP-C63425995

declined

VBD

to

TO

give

VB

any

DT

statements

NNS30229081

NPB

upon

IN

his

PRP$

arrival

NN

NPB

in

IN

the

DT

province

NN

NPB

NP-C59736686

PP

NP-C

PP114470921

NP-C

VP-C

VP220719583

SG-C

VP

S-BAR

.

.

S151963398

GLUE265961794

TOP265961890

64

2190

13

3

7

26

New Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 2

New Arabic v5.1 base system - sentence 211foreign:tac-lang: urfD albaz aladla’ baá tSryHat fur uSulh alá almqaT‘e .bckwltr: wrfD AlbAz AlAdlA’ bAY tSryHAt fwr wSwlh AlY AlmqATEp .

Tune.nw.0: al @-@ baz declined to make any statements upon his arrival in the province .Tune.nw.1: al @-@ baz refused to give any statements on arriving at al @-@ muqataah .Tune.nw.2: immediately upon his arrival in the area , al @-@ baz declined to give any statements .Tune.nw.3: al @-@ baz refused to make any statement upon his arrival at the moqata’ah .1-best: al @-@ baz declined to give any statements upon his arrival in the province .

[ara-tune4600:211] 1-best Dot Productfeature weight value product

derivation-size 0.41 8 3.30glue-rule 3.89 2 7.78green -0.08 0 0gt_prob 0.40 36.18 14.43identity -9.97 0 0

is_lexicalized -0.65 6 -3.91lex_pef 1.02 5.47 5.60lex_pfe 0.31 4.44 1.39lm1 1 22.76 22.76

lm1-unk 30.08 0 0lm2 0.74 26.66 19.79

lm2-unk -39.18 0 0missingWord -1.29 0 0model1inv 1.02 10.60 10.81model1nrm 1.35 11.29 15.22

nonmonotone 4.17 0 0olive 1.95 0 0psm1n 0.50 24.65 12.30

text-length -3.87 15 -58.05trivial_cond_prob 0.41 3.34 1.38

unk-rule 19.28 0 0reported totalcost 52.82 v · w 52.82

Arabic source sentence:

Al-baz declined to make any statements upon his arrival in the province

Reference translation from a human translator:

Page 28: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

An Example Syntax-Based TranslationNew Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 3

[ara-tune4600:211] 1-best PoS-Tree

al

NNP

@-@

HYPH

baz

NNP

NML

NPB

NP-C63425995

declined

VBD

to

TO

give

VB

any

DT

statements

NNS30229081

NPB

upon

IN

his

PRP$

arrival

NN

NPB

in

IN

the

DT

province

NN

NPB

NP-C59736686

PP

NP-C

PP114470921

NP-C

VP-C

VP220719583

SG-C

VP

S-BAR

.

.

S151963398

GLUE265961794

TOP265961890

64

2190

13

3

7

26

New Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 2

New Arabic v5.1 base system - sentence 211foreign:tac-lang: urfD albaz aladla’ baá tSryHat fur uSulh alá almqaT‘e .bckwltr: wrfD AlbAz AlAdlA’ bAY tSryHAt fwr wSwlh AlY AlmqATEp .

Tune.nw.0: al @-@ baz declined to make any statements upon his arrival in the province .Tune.nw.1: al @-@ baz refused to give any statements on arriving at al @-@ muqataah .Tune.nw.2: immediately upon his arrival in the area , al @-@ baz declined to give any statements .Tune.nw.3: al @-@ baz refused to make any statement upon his arrival at the moqata’ah .1-best: al @-@ baz declined to give any statements upon his arrival in the province .

[ara-tune4600:211] 1-best Dot Productfeature weight value product

derivation-size 0.41 8 3.30glue-rule 3.89 2 7.78green -0.08 0 0gt_prob 0.40 36.18 14.43identity -9.97 0 0

is_lexicalized -0.65 6 -3.91lex_pef 1.02 5.47 5.60lex_pfe 0.31 4.44 1.39lm1 1 22.76 22.76

lm1-unk 30.08 0 0lm2 0.74 26.66 19.79

lm2-unk -39.18 0 0missingWord -1.29 0 0model1inv 1.02 10.60 10.81model1nrm 1.35 11.29 15.22

nonmonotone 4.17 0 0olive 1.95 0 0psm1n 0.50 24.65 12.30

text-length -3.87 15 -58.05trivial_cond_prob 0.41 3.34 1.38

unk-rule 19.28 0 0reported totalcost 52.82 v · w 52.82

Arabic source sentence:

Al-baz declined to make any statements upon his arrival in the province

Reference translation from a human translator:

New Arabic v5.1 base system - sentence 211 Generated by Jens-S. Vöckler 2008-04-10 21:29 2

New Arabic v5.1 base system - sentence 211foreign:tac-lang: urfD albaz aladla’ baá tSryHat fur uSulh alá almqaT‘e .bckwltr: wrfD AlbAz AlAdlA’ bAY tSryHAt fwr wSwlh AlY AlmqATEp .

Tune.nw.0: al @-@ baz declined to make any statements upon his arrival in the province .Tune.nw.1: al @-@ baz refused to give any statements on arriving at al @-@ muqataah .Tune.nw.2: immediately upon his arrival in the area , al @-@ baz declined to give any statements .Tune.nw.3: al @-@ baz refused to make any statement upon his arrival at the moqata’ah .1-best: al @-@ baz declined to give any statements upon his arrival in the province .

[ara-tune4600:211] 1-best Dot Productfeature weight value product

derivation-size 0.41 8 3.30glue-rule 3.89 2 7.78green -0.08 0 0gt_prob 0.40 36.18 14.43identity -9.97 0 0

is_lexicalized -0.65 6 -3.91lex_pef 1.02 5.47 5.60lex_pfe 0.31 4.44 1.39lm1 1 22.76 22.76

lm1-unk 30.08 0 0lm2 0.74 26.66 19.79

lm2-unk -39.18 0 0missingWord -1.29 0 0model1inv 1.02 10.60 10.81model1nrm 1.35 11.29 15.22

nonmonotone 4.17 0 0olive 1.95 0 0psm1n 0.50 24.65 12.30

text-length -3.87 15 -58.05trivial_cond_prob 0.41 3.34 1.38

unk-rule 19.28 0 0reported totalcost 52.82 v · w 52.82

Features:

Page 29: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Choose a translation

Learn a model

Apply the model

Page 30: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Choose a translation

Learn a model

Apply the model

Proceedings

United Nations

will do it ADVVP

lo haré ADV

Page 31: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Choose a translation

Learn a model

Apply the model

Yo lo haré despuésNOVEL SENTENCE

Later do it I will

Proceedings

United Nations

will do it ADVVP

lo haré ADV

Page 32: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Choose a translation

Learn a model

Apply the model

Yo lo haré despuésNOVEL SENTENCE

Later do it I willI will later do it

That I’ll do laterLater that I’ll do

I will do it later...

...

Proceedings

United Nations

will do it ADVVP

lo haré ADV

Page 33: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Choose a translation

Learn a model

Apply the model

Yo lo haré despuésNOVEL SENTENCE

Later do it I willI will later do it

That I’ll do laterLater that I’ll do

I will do it later...

...

Proceedings

United Nations

will do it ADVVP

lo haré ADV

Page 34: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Choose a translation

Learn a model

Apply the model

Yo lo haré despuésNOVEL SENTENCE

Later do it I willI will later do it

That I’ll do laterLater that I’ll do

I will do it later...

...

Proceedings

United Nations

will do it ADVVP

lo haré ADV

Page 35: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Learn a model

Apply the model

Choose a translation

Page 36: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Alignment Problem in Translation

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado.

Page 37: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Alignment Problem in Translation

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado.

Thank you

Gracias

Page 38: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Alignment Problem in Translation

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado.

Thank you

muy buen

Thank you

Gracias

Page 39: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Alignment Problem in Translation

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado.

Thank you

muy buen

Thank you

Gracias

Page 40: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Alignment Problem in Translation

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado.

About the task:• A lot can be inferred

from lexical statistics

• Correct alignments are not one-to-one

• Some cases are tricky, even for people

Thank you

muy buen

Thank you

Gracias

Page 41: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Alignment Problem in Translation

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado.

About the task:• A lot can be inferred

from lexical statistics

• Correct alignments are not one-to-one

• Some cases are tricky, even for people

Thank you

muy buen

Thank you

Gracias

Page 42: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Alignment Problem in Translation

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado.

About the task:• A lot can be inferred

from lexical statistics

• Correct alignments are not one-to-one

• Some cases are tricky, even for people

About solutions:

• Word-to-word links

• Learning driven by conditional word distributions

Thank you

muy buen

Thank you

Gracias

Page 43: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Alignment Problem in Translation

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado.

About the task:• A lot can be inferred

from lexical statistics

• Correct alignments are not one-to-one

• Some cases are tricky, even for people

About solutions:

• Word-to-word links

• Learning driven by conditional word distributions

P(gracias|you)

Thank you

muy buen

Thank you

Gracias

Page 44: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Thank you , I will do it gladly .

Large-Context Alignment Challenges

Gracias,loharédemuybuengrado.

Goal: Model multi-word structures during alignment

Page 45: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Thank you , I will do it gladly .

Large-Context Alignment Challenges

Gracias,loharédemuybuengrado.

Goal: Model multi-word structures during alignment

1

P(gracias|you)

Challenge• Jointly infer phrase

boundaries and alignments

• Boundaries depend on both languages

P(gracias,Thank you)

Page 46: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Thank you , I will do it gladly .

Large-Context Alignment Challenges

Gracias,loharédemuybuengrado.

Goal: Model multi-word structures during alignment

Challenge• Capture context

• Compose phrases

1

2

P(gracias|you)

Challenge• Jointly infer phrase

boundaries and alignments

• Boundaries depend on both languages

P(gracias,Thank you)�(lo hare, I will do it)

Page 47: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Modeling Phrasal Correspondence1

Train a generative model that explains observed translations via latent structure

Paradigm:

Page 48: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Modeling Phrasal Correspondence1

Train a generative model that explains observed translations via latent structure

Phrase pairs are generated independentlyProcess:

Paradigm:

Page 49: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Thank you , I will do it gladly

Gracias , lo haré de muy buen grado

Modeling Phrasal Correspondence1

Train a generative model that explains observed translations via latent structure

Phrase pairs are generated independentlyProcess:

Paradigm:

Page 50: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Thank you , I will do it gladly

Gracias , lo haré de muy buen grado

Modeling Phrasal Correspondence1

Train a generative model that explains observed translations via latent structure

Phrase pairs are generated independentlyProcess:

Paradigm:

Page 51: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Thank you , I will do it gladly

Gracias , lo haré de muy buen grado

Modeling Phrasal Correspondence1

Train a generative model that explains observed translations via latent structure

Phrase pairs are generated independentlyProcess:

Paradigm:

Page 52: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Thank you , I will do it gladly

Gracias , lo haré de muy buen grado

Modeling Phrasal Correspondence1

Train a generative model that explains observed translations via latent structure

Phrase pairs are generated independentlyProcess:

Paradigm:

Page 53: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Thank you , I will do it gladly

Gracias , lo haré de muy buen grado

Modeling Phrasal Correspondence1

Train a generative model that explains observed translations via latent structure

Phrase pairs are generated independentlyProcess:

Paradigm:

Page 54: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Thank you , I will do it gladly

Gracias , lo haré de muy buen grado

Modeling Phrasal Correspondence1

Train a generative model that explains observed translations via latent structure

Phrase pairs are generated independentlyProcess:

Paradigm:

Page 55: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Thank you , I will do it gladly

Gracias , lo haré de muy buen grado

Modeling Phrasal Correspondence1

Train a generative model that explains observed translations via latent structure

Phrase pairs are generated independentlyProcess:

Paradigm:

Optimization: Explain all translations with shared parameters

Page 56: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Modeling Phrasal Correspondence

P(A = a) = �(Thank you,Gracias) · �(I will do,hare) · �(it, lo) · · ·

Thank you , I will do it gladly .

Gracias

,

lo

haré

de

muy

buen

grado

.

1

We learn , a multinomial distribution over phrase pairs �

Page 57: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Modeling Phrasal Correspondence

P(A = a) = �(Thank you,Gracias) · �(I will do,hare) · �(it, lo) · · ·

Thank you , I will do it gladly .

Gracias

,

lo

haré

de

muy

buen

grado

.

1

We learn , a multinomial distribution over phrase pairs �

P (A = a) =�

(e,s)�a

�(e, s)

* Terms omitted: Phrase pair count and phrase permutation

*

Page 58: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Modeling Phrasal Correspondence

P(A = a) = �(Thank you,Gracias) · �(I will do,hare) · �(it, lo) · · ·

Thank you , I will do it gladly .

Gracias

,

lo

haré

de

muy

buen

grado

.

1

We learn , a multinomial distribution over phrase pairs �

L(�) =�

d�D

��

a�A(d)

P (A = a)

P (A = a) =�

(e,s)�a

�(e, s)

* Terms omitted: Phrase pair count and phrase permutation

*

Page 59: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Modeling Phrasal Correspondence

P(A = a) = �(Thank you,Gracias) · �(I will do,hare) · �(it, lo) · · ·

Thank you , I will do it gladly .

Gracias

,

lo

haré

de

muy

buen

grado

.

For each sentence pair:

1

We learn , a multinomial distribution over phrase pairs �

L(�) =�

d�D

��

a�A(d)

P (A = a)

P (A = a) =�

(e,s)�a

�(e, s)

* Terms omitted: Phrase pair count and phrase permutation

*

Page 60: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Modeling Phrasal Correspondence

P(A = a) = �(Thank you,Gracias) · �(I will do,hare) · �(it, lo) · · ·

Thank you , I will do it gladly .

Gracias

,

lo

haré

de

muy

buen

grado

.

For each sentence pair:For each alignment:

1

We learn , a multinomial distribution over phrase pairs �

L(�) =�

d�D

��

a�A(d)

P (A = a)

P (A = a) =�

(e,s)�a

�(e, s)

* Terms omitted: Phrase pair count and phrase permutation

*

Page 61: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Modeling Phrasal Correspondence

P(A = a) = �(Thank you,Gracias) · �(I will do,hare) · �(it, lo) · · ·

Thank you , I will do it gladly .

Gracias

,

lo

haré

de

muy

buen

grado

.

For each sentence pair:For each alignment:

Maximizing likelihood gives a degenerate solution: huge phrases!

1

We learn , a multinomial distribution over phrase pairs �

L(�) =�

d�D

��

a�A(d)

P (A = a)

P (A = a) =�

(e,s)�a

�(e, s)

* Terms omitted: Phrase pair count and phrase permutation

*

Page 62: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Modeling Phrasal Correspondence

P(A = a) = �(Thank you,Gracias) · �(I will do,hare) · �(it, lo) · · ·

Thank you , I will do it gladly .

Gracias

,

lo

haré

de

muy

buen

grado

.

For each sentence pair:For each alignment:

Maximizing likelihood gives a degenerate solution: huge phrases!

1

We learn , a multinomial distribution over phrase pairs �

L(�) =�

d�D

��

a�A(d)

P (A = a)

P (A = a) =�

(e,s)�a

�(e, s)

* Terms omitted: Phrase pair count and phrase permutation

*

Page 63: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

.

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado

Guiding Phrasal Correspondence Models

� � DP(�0,�)

Prefers short phrases�0Base distribution:

DP( · ,�) Non-parametric cache modelDirichlet process:

1

(Thank you, Gracias)

(Thanks, Gracias)

(Thank you, Muchas gracias)

...

...English-Spanish phrase pair Count

Phrase Pair Cache (c):

Page 64: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

.

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado

Guiding Phrasal Correspondence Models

� � DP(�0,�)

Prefers short phrases�0Base distribution:

DP( · ,�) Non-parametric cache modelDirichlet process:

1

(Thank you, Gracias)

(Thanks, Gracias)

(Thank you, Muchas gracias)

...

...English-Spanish phrase pair Count

Phrase Pair Cache (c):

Page 65: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

.

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado

Guiding Phrasal Correspondence Models

� � DP(�0,�)

Prefers short phrases�0Base distribution:

DP( · ,�) Non-parametric cache modelDirichlet process:

1

(Thank you, Gracias)

(Thanks, Gracias)

(Thank you, Muchas gracias)

...

...English-Spanish phrase pair Count

Phrase Pair Cache (c):

Page 66: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

.

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado

Guiding Phrasal Correspondence Models

� � DP(�0,�)

Prefers short phrases�0Base distribution:

DP( · ,�) Non-parametric cache modelDirichlet process:

1

(Thank you, Gracias)

(Thanks, Gracias)

(Thank you, Muchas gracias)

...

...English-Spanish phrase pair Count

Phrase Pair Cache (c):

Page 67: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

.

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado

Guiding Phrasal Correspondence Models

� � DP(�0,�)

Prefers short phrases�0Base distribution:

DP( · ,�) Non-parametric cache modelDirichlet process:

1

(Thank you, Gracias)

(Thanks, Gracias)

(Thank you, Muchas gracias)

...

...English-Spanish phrase pair Count

Phrase Pair Cache (c):

P(z|c) =c(z) + � · �0(z)|c| + �

Page 68: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

.

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado

Guiding Phrasal Correspondence Models

� � DP(�0,�)

Prefers short phrases�0Base distribution:

DP( · ,�) Non-parametric cache modelDirichlet process:

1

(Thank you, Gracias)

(Thanks, Gracias)

(Thank you, Muchas gracias)

...

...English-Spanish phrase pair Count

Phrase Pair Cache (c):

P(z|c) =c(z) + � · �0(z)|c| + �

Page 69: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

.

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado

Guiding Phrasal Correspondence Models

� � DP(�0,�)

Prefers short phrases�0Base distribution:

DP( · ,�) Non-parametric cache modelDirichlet process:

1

(Thank you, Gracias)

(Thanks, Gracias)

(Thank you, Muchas gracias)

...

...English-Spanish phrase pair Count

Phrase Pair Cache (c):

P(z|c) =c(z) + � · �0(z)|c| + �

Page 70: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

.

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado

Guiding Phrasal Correspondence Models

� � DP(�0,�)

Prefers short phrases�0Base distribution:

DP( · ,�) Non-parametric cache modelDirichlet process:

1

(Thank you, Gracias)

(Thanks, Gracias)

(Thank you, Muchas gracias)

...

...English-Spanish phrase pair Count

Phrase Pair Cache (c):

P(z|c) =c(z) + � · �0(z)|c| + �

grow fixed

Page 71: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

.

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado

Guiding Phrasal Correspondence Models

� � DP(�0,�)

Prefers short phrases�0Base distribution:

DP( · ,�) Non-parametric cache modelDirichlet process:

Iterative realignment of all the data by sampling Consistent, efficient estimation

1

(Thank you, Gracias)

(Thanks, Gracias)

(Thank you, Muchas gracias)

...

...English-Spanish phrase pair Count

Phrase Pair Cache (c):

P(z|c) =c(z) + � · �0(z)|c| + �

grow fixed

Page 72: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Thank you , I shall do so gladly .

What Happens in Practice

Gracias

,

lo

haré

de

muy

buen

grado

.

Thank you , I shall do so gladly .

A state-of-the-art word-level alignment

A sampled phrase alignmentfrom our system

1

Page 73: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Thank you , I shall do so gladly .

What Happens in Practice

Gracias

,

lo

haré

de

muy

buen

grado

.

Thank you , I shall do so gladly .

A state-of-the-art word-level alignment

A sampled phrase alignmentfrom our system

1

Page 74: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Performance Results

Translation performance in a phrase-based system (Moses)for Spanish-to-English parliamentary proceedings (Europarl)

Translation quality (BLEU)

28293031

30.129.8

Word-level baselinePhrase-level model [DeNero et al. EMNLP ’08]*

0246

3.14.4Millions of

phrase pairs

Higher is better

Lower is cheaper

* John DeNero, Alex Bouchard-Côté, and Dan Klein. Sampling Alignment Structure under a Bayesian Translation Model, EMNLP 2008.

1

Page 75: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Subsequent Work

We described a non-parametric Bayesian prior and a consistent sampling procedure (EMNLP 2008)

• Phil Blunsom, Trevor Cohn, Chris Dyer, and Miles Osborne. A Gibbs sampler for phrasal synchronous grammar induction, ACL 2009.

• Abhishek Arun, Chris Dyer, Barry Haddow, Phil Blunsom, Adam Lopez, and Philipp Koehn. Monte Carlo inference and maximization for phrase-based translation, CoNLL 2009.

• Ding Liu and Daniel Gildea. Bayesian Learning of Phrasal Tree-to-String Templates, EMNLP 2009.

• Matt Post and Daniel Gildea. Bayesian Learning of a Tree Substitution Grammars, ACL 2009.

• Phil Blunsom and Trevor Cohn. Inducing Synchronous Grammars with Slice Sampling, NAACL 2010.

• Trevor Cohn and Phil Blunsom. A Bayesian Model of Syntax-Directed Tree to String Grammar Induction, EMNLP 2009.

1

Page 76: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

A Model of Composed Phrases

过去

In the past two years

[past]

两 [two]

年 [year]

中 [in]

2

Page 77: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

A Model of Composed Phrases

过去

In the past two years

[past]

两 [two]

年 [year]

中 [in]

2

Page 78: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

A Model of Composed Phrases

过去

In the past two years

[past]

两 [two]

年 [year]

中 [in]

2

Page 79: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

A Model of Composed Phrases

过去

In the past two years

[past]

两 [two]

年 [year]

中 [in]

2

Page 80: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

A Model of Composed Phrases

过去 [past]

两 [two]

年 [year]

中 [in]

A model can predict the whole analysis above, including minimal links & composed phrase pairs .

In the past two years

2

Page 81: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

A Model of Composed Phrases

过去 [past]

两 [two]

年 [year]

中 [in]

A model can predict the whole analysis above, including minimal links & composed phrase pairs .

In the past two years

< P(in|中) = 0.8 , InDictionary = 1.0, ... >

Features on

word links:

2

Page 82: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

A Model of Composed Phrases

过去 [past]

两 [two]

年 [year]

中 [in]

A model can predict the whole analysis above, including minimal links & composed phrase pairs .

In the past two years

< Count(the past two, 过去 两 ) = 7, Size(3,2) = 1, ... > phrase pairs:

< P(in|中) = 0.8 , InDictionary = 1.0, ... >

Features on

word links:

2

Page 83: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Learning from Supervised Data

过去 [past]

两 [two]

年 [year]

中 [in]

Guess: Model Prediction

In the past two years

2

Page 84: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Learning from Supervised Data

过去 [past]

两 [two]

年 [year]

中 [in]

Guess: Model Prediction

In the past two years

Gold: Human Annotation

In the past two years

2

Page 85: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Learning from Supervised Data

过去 [past]

两 [two]

年 [year]

中 [in]

Guess: Model Prediction

In the past two years

Gold: Human Annotation

In the past two years

2

Page 86: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Learning from Supervised Data

过去 [past]

两 [two]

年 [year]

中 [in]

Guess: Model Prediction

In the past two years

Gold: Human Annotation

In the past two years

2

Page 87: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Learning from Supervised Data

过去 [past]

两 [two]

年 [year]

中 [in]

Guess: Model Prediction

In the past two years

Gold: Human Annotation

In the past two years

Loss function: Number of differing rounded rectangles

2

Page 88: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Learning from Supervised Data

过去 [past]

两 [two]

年 [year]

中 [in]

Guess: Model Prediction

In the past two years

Gold: Human Annotation

In the past two years

Online learning (MIRA) adjusts model parameters to prefer the gold over the guess by a margin of the loss

Loss function: Number of differing rounded rectangles

2

Page 89: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

过去 [past]

两 [two]

年 [year]

中 [in]

In the past two years

Finding the Optimal Correspondence2

arg maxy�ITG(x)

� · [ �word(x, y) + �phrase(x, y) ]

Page 90: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

过去 [past]

两 [two]

年 [year]

中 [in]

In the past two years

Finding the Optimal Correspondence2

Hierarchical decomposition

arg maxy�ITG(x)

� · [ �word(x, y) + �phrase(x, y) ]

Page 91: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

过去 [past]

两 [two]

年 [year]

中 [in]

In the past two years

Finding the Optimal Correspondence2

Hierarchical decomposition

arg maxy�ITG(x)

� · [ �word(x, y) + �phrase(x, y) ]

Page 92: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

过去 [past]

两 [two]

年 [year]

中 [in]

In the past two years

Finding the Optimal Correspondence2

Hierarchical decomposition

arg maxy�ITG(x)

� · [ �word(x, y) + �phrase(x, y) ]

Page 93: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

过去 [past]

两 [two]

年 [year]

中 [in]

In the past two years

Finding the Optimal Correspondence2

Hierarchical decomposition

arg maxy�ITG(x)

� · [ �word(x, y) + �phrase(x, y) ]

Page 94: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

过去 [past]

两 [two]

年 [year]

中 [in]

In the past two years

Finding the Optimal Correspondence2

Hierarchical decomposition

arg maxy�ITG(x)

� · [ �word(x, y) + �phrase(x, y) ]

Page 95: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

过去 [past]

两 [two]

年 [year]

中 [in]

In the past two years

Finding the Optimal Correspondence2

Hierarchical decomposition

arg maxy�ITG(x)

� · [ �word(x, y) + �phrase(x, y) ]

Page 96: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

过去 [past]

两 [two]

年 [year]

中 [in]

In the past two years

Finding the Optimal Correspondence2

Hierarchical decomposition

arg maxy�ITG(x)

� · [ �word(x, y) + �phrase(x, y) ]

Page 97: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

过去 [past]

两 [two]

年 [year]

中 [in]

In the past two years

Finding the Optimal Correspondence2

Hierarchical decomposition

arg maxy�ITG(x)

� · [ �word(x, y) + �phrase(x, y) ]

Page 98: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

过去 [past]

两 [two]

年 [year]

中 [in]

In the past two years

Finding the Optimal Correspondence2

Hierarchical decomposition

arg maxy�ITG(x)

� · [ �word(x, y) + �phrase(x, y) ]

Page 99: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

过去 [past]

两 [two]

年 [year]

中 [in]

In the past two years

Finding the Optimal Correspondence2

Hierarchical decomposition

ITG parser with a state space that tracks peripheral alignments for each region

arg maxy�ITG(x)

� · [ �word(x, y) + �phrase(x, y) ]

Page 100: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Experimental Results

Unsupervised word model baselineSupervised word model [Haghighi, Blitzer, DeNero, and Klein. ACL ’09]*Composed Phrase Pair Model [DeNero and Klein. In submission]**

** John DeNero and Dan Klein. Supervised Modeling of Extraction Sets for Machine Translation, in submission.

* Aria Haghighi, John Blitzer, John DeNero, and Dan Klein. Better Word Alignments with Supervised ITG Models, ACL 2009.

55

65

75

71.668.464.1

Phrase Pair F1

Alignment quality relative to human-annotated data

33343536

35.934.734.5

Translation quality for Chinese-to-English

BLEU

2

Page 101: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Learn a model

Apply the model

Choose a translation

Page 102: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Learn a model

Apply the model

Choose a translation

Large data sets provide statistics for larger structures

Page 103: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Learn a model

Apply the model

Choose a translation

Non-parametric models scale with the data

Large data sets provide statistics for larger structures

Page 104: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Learn a model

Apply the model

Choose a translation

Non-parametric models scale with the data

Large data sets provide statistics for larger structures

The more context we incorporate, the better we do

Page 105: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Learn a model

Apply the model

Choose a translation

Page 106: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Extracting Translation Rules

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado.

Page 107: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Extracting Translation Rules

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado.

VP

PRPVB

MD VP

NP

.S

PRP ADV

S

S

VB NP

PRP

VP

,

Page 108: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

VP

Extracting Translation Rules

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado.

VP

PRPVB

MD VP

NP

.S

PRP ADV

S

S

VB NP

PRP

VP

,

Page 109: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

VP

Extracting Translation Rules

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado.

ADV

AD

V

VP

PRPVB

MD VP

NP

.S

PRP ADV

S

S

VB NP

PRP

VP

,

Page 110: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

VP

Extracting Translation Rules

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado.

ADV

AD

V

VP

PRPVB

MD VP

NP

.S

PRP ADV

S

S

VB NP

PRP

VP

,

will do it ADVVP

lo haré ADV

Page 111: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

VP

Extracting Translation Rules

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado.

ADV

AD

V

VP

PRPVB

MD VP

NP

.S

PRP ADV

S

S

VB NP

PRP

VP

,

will do it ADVVP

lo haré ADV

Page 112: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

VP

Extracting Translation Rules

Thank you , I will do it gladly .

Gracias,loharédemuybuengrado.

ADV

AD

V

VP

PRPVB

MD VP

NP

.S

PRP ADV

S

S

VB NP

PRP

VP

,

will do it ADVVP

lo haré ADV

Frequency statistics on these rules guide

translation

Page 113: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Synchronous Context-Free Grammars

Mi dormitorio nuevo no es ni grande ni pequeño

GrammarDerivation

Translation:

Source:

Page 114: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Synchronous Context-Free Grammars

Mi dormitorio nuevo no es ni grande ni pequeño

GrammarDerivation

dormitorioNN bedroom

Translation:

Source:

Page 115: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Synchronous Context-Free Grammars

Mi dormitorio nuevo no es ni grande ni pequeño

GrammarDerivation

dormitorioNN bedroom

NN

Translation:

Source:

Page 116: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Synchronous Context-Free Grammars

Mi dormitorio nuevo no es ni grande ni pequeño

GrammarDerivation

dormitorioNN bedroom

NN

bedroomNN

Translation:

Source:

Page 117: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Synchronous Context-Free Grammars

Mi dormitorio nuevo no es ni grande ni pequeño

GrammarDerivation

dormitorioNN bedroom

nuevoJJ new

grandeJJ big

pequeñoJJ small

NN

bedroomNN

Translation:

Source:

Page 118: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Synchronous Context-Free Grammars

Mi dormitorio nuevo no es ni grande ni pequeño

GrammarDerivation

dormitorioNN bedroom

nuevoJJ new

grandeJJ big

pequeñoJJ small

NN JJ JJ JJ

newJJ

bigJJ

smallJJ

bedroomNN

Translation:

Source:

Page 119: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Synchronous Context-Free Grammars

Mi dormitorio nuevo no es ni grande ni pequeño

GrammarDerivation

dormitorioNN bedroom

nuevoJJ new

grandeJJ big

pequeñoJJ small

Mi NN JJNP

My JJ NN

NN JJ JJ JJ

newJJ

bigJJ

smallJJ

bedroomNN

Translation:

Source:

Page 120: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Synchronous Context-Free Grammars

Mi dormitorio nuevo no es ni grande ni pequeño

GrammarDerivation

dormitorioNN bedroom

nuevoJJ new

grandeJJ big

pequeñoJJ small

Mi NN JJNP

My JJ NN

NN JJ JJ JJ

NP

newJJ

bigJJ

smallJJ

bedroomNN

My

NP

Translation:

Source:

Page 121: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Synchronous Context-Free Grammars

Mi dormitorio nuevo no es ni grande ni pequeño

GrammarDerivation

dormitorioNN bedroom

nuevoJJ new

grandeJJ big

pequeñoJJ small

Mi NN JJNP

My JJ NN

NP no es ni JJ ni JJSNP is neither JJ nor JJ

NN JJ JJ JJ

NP

newJJ

bigJJ

smallJJ

bedroomNN

My

NP

Translation:

Source:

Page 122: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Synchronous Context-Free Grammars

Mi dormitorio nuevo no es ni grande ni pequeño

GrammarDerivation

dormitorioNN bedroom

nuevoJJ new

grandeJJ big

pequeñoJJ small

Mi NN JJNP

My JJ NN

NP no es ni JJ ni JJSNP is neither JJ nor JJ

NN JJ JJ JJ

NP

S

NP no es ni JJ ni JJ

newJJ

bigJJ

smallJJ

bedroomNN

My

NP

Translation:

Source:

Page 123: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Synchronous Context-Free Grammars

Mi dormitorio nuevo no es ni grande ni pequeño

GrammarDerivation

dormitorioNN bedroom

nuevoJJ new

grandeJJ big

pequeñoJJ small

Mi NN JJNP

My JJ NN

NP no es ni JJ ni JJSNP is neither JJ nor JJ

NN JJ JJ JJ

NP

S

NP no es ni JJ ni JJ

newJJ

bigJJ

smallJJ

bedroomNN

S

is neither norMy

NP

Translation:

Source:

Page 124: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Size of the Grammar

0

22,500

45,000

67,500

90,000

1 2 3 4 5 6 7 8 9 10+

332,000 rules match a 30-word sentence to be translated

Size of the source-side yield (terminals + non-terminals)

Rule

Cou

nt

A grammar learned from 220 million words of Arabic-to-English example translations:

John DeNero, Adam Pauls, Mohit Bansal, and Dan Klein. Efficient Parsing for Transducer Grammars, NAACL 2009.

Page 125: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Size of the Grammar

0

22,500

45,000

67,500

90,000

1 2 3 4 5 6 7 8 9 10+

332,000 rules match a 30-word sentence to be translated

Size of the source-side yield (terminals + non-terminals)

Rule

Cou

nt

A grammar learned from 220 million words of Arabic-to-English example translations:

NP no es ni JJ ni JJ . NP is neither JJ nor JJ .

S

John DeNero, Adam Pauls, Mohit Bansal, and Dan Klein. Efficient Parsing for Transducer Grammars, NAACL 2009.

Page 126: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Structure of the Grammar

NP no es ni JJ ni JJ

S

Mi dormitorio nuevo no es ni grande ni pequeño

NP JJ JJ

S → NP no es ni JJ ni JJ

Page 127: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Structure of the Grammar

Mi dormitorio nuevo no es ni grande ni pequeño

S → NP no es ni JJ ni JJ

NP JJ JJ

Page 128: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

ni JJ ni JJ

ADJP

ADJP → ni JJ ni JJ

The Structure of the Grammar

Mi dormitorio nuevo no es ni grande ni pequeño

S → NP no es ni JJ ni JJ

NP JJ JJ

Page 129: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

es ADJP

VP

VP→ es ADJP

ni JJ ni JJ

ADJP

ADJP → ni JJ ni JJ

The Structure of the Grammar

Mi dormitorio nuevo no es ni grande ni pequeño

S → NP no es ni JJ ni JJ

NP JJ JJ

Page 130: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

es ADJP

VP

VP→ es ADJP

ni JJ ni JJ

ADJP

ADJP → ni JJ ni JJ

The Structure of the Grammar

Mi dormitorio nuevo no es ni grande ni pequeño

S → NP no es ni JJ ni JJ

NP JJ JJ

S

NP no VP

S → NP no VP

Page 131: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Coarse-to-Fine Translation

Mi dormitorio nuevo no es ni grande ni pequeño

Page 132: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Coarse-to-Fine Translation

1 Apply a subset of the grammar with only small rules

Mi dormitorio nuevo no es ni grande ni pequeño

Page 133: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Coarse-to-Fine Translation

1 Apply a subset of the grammar with only small rules

Mi dormitorio nuevo no es ni grande ni pequeño

ADJP

VP

S

NP

JJ JJJJNN

Page 134: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Coarse-to-Fine Translation

1 Apply a subset of the grammar with only small rules

2 Prune away unlikely portions of the search space

Mi dormitorio nuevo no es ni grande ni pequeño

ADJP

VP

S

NP

JJ JJJJNN

Page 135: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Coarse-to-Fine Translation

1 Apply a subset of the grammar with only small rules

2 Prune away unlikely portions of the search space

Mi dormitorio nuevo no es ni grande ni pequeño

ADJP

VP

S

NP

JJ JJJJNN

Page 136: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Coarse-to-Fine Translation

1 Apply a subset of the grammar with only small rules

2 Prune away unlikely portions of the search space

Mi dormitorio nuevo no es ni grande ni pequeño

Page 137: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Coarse-to-Fine Translation

1 Apply a subset of the grammar with only small rules

2 Prune away unlikely portions of the search space

Apply the full translation grammar to the pruned space

3

Mi dormitorio nuevo no es ni grande ni pequeño

Page 138: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Experimental Results

* John DeNero, Adam Pauls, Mohit Bansal, and Dan Klein. Efficient Parsing for Transducer Grammars, NAACL 2009.

Baseline

+Optimization

+Transformation

+Coarse-to-Fine

0 75 150 225 300

50

104

181

264

5x speed-up with the largest translation grammars in use today (ISI Syntax-Based MT System) [DeNero et al. NAACL ’09]*

Minutes required to analyze a 300 sentence test set

Page 139: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Learn a model

Apply the model

Choose a translation

Page 140: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Learn a model

Apply the model

Choose a translation

Fully exploiting large data sets requires searching over very large spaces

Page 141: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Learn a model

Apply the model

Choose a translation

Coarse-to-fine inference is a powerful technique for doing so

Fully exploiting large data sets requires searching over very large spaces

Page 142: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Learn a model

Apply the model

Choose a translation

Page 143: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Even the Best Models are Wrong

Model

NOVEL SENTENCE

Yo lo haré después

Page 144: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

I will later do itLater do it I will

That I’ll do laterLater that I’ll do

...

Even the Best Models are Wrong

Model

NOVEL SENTENCE

Yo lo haré después

Page 145: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Even the Best Models are Wrong

Total model score for 1000 sentences

Tran

slat

ion

Qua

lity

(BLE

U)

Samples from output spaceSamples near maximumHighest scoring translation

Model

I will later do itLater do it I will

That I’ll do laterLater that I’ll do

...

NOVEL SENTENCE

Yo lo haré después

Page 146: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Even the Best Models are Wrong

511660 513245 51483050.0

50.2

50.4

50.6

50.8

Total model score for 1000 sentences

Tran

slat

ion

Qua

lity

(BLE

U)

Samples from output spaceSamples near maximumHighest scoring translation

Model

I will later do itLater do it I will

That I’ll do laterLater that I’ll do

...

NOVEL SENTENCE

Yo lo haré después

Page 147: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Even the Best Models are Wrong

511660 513245 51483050.0

50.2

50.4

50.6

50.8

Total model score for 1000 sentences

Tran

slat

ion

Qua

lity

(BLE

U)

Samples from output spaceSamples near maximumHighest scoring translation

Model

I will later do itLater do it I will

That I’ll do laterLater that I’ll do

...

NOVEL SENTENCE

Yo lo haré después

Page 148: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Even the Best Models are Wrong

511660 513245 51483050.0

50.2

50.4

50.6

50.8

Total model score for 1000 sentences

Tran

slat

ion

Qua

lity

(BLE

U)

Samples from output spaceSamples near maximumHighest scoring translation

Model

I will later do itLater do it I will

That I’ll do laterLater that I’ll do

...

NOVEL SENTENCE

Yo lo haré después

Page 149: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Even the Best Models are Wrong

511660 513245 51483050.0

50.2

50.4

50.6

50.8

Total model score for 1000 sentences

Tran

slat

ion

Qua

lity

(BLE

U)

Samples from output spaceSamples near maximumHighest scoring translation

Model

I will later do itLater do it I will

That I’ll do laterLater that I’ll do

...

NOVEL SENTENCE

Yo lo haré después

Page 150: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Consensus by Averaging Over Sentences

Model

I will later do itLater do it I will

That I’ll do laterLater that I’ll do

...

NOVEL SENTENCE

Yo lo haré después

Page 151: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Consensus by Averaging Over Sentences

Intuition: “Happy families are all alike; every unhappy family is unhappy in its own way.” [Tolstoy. 1877]*

* Leo Tolstoy. Анна Каренина. 1877.

Model

I will later do itLater do it I will

That I’ll do laterLater that I’ll do

...

NOVEL SENTENCE

Yo lo haré después

Page 152: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Consensus by Averaging Over Sentences

Intuition: “Happy families are all alike; every unhappy family is unhappy in its own way.” [Tolstoy. 1877]*

* Leo Tolstoy. Анна Каренина. 1877.

Idea: Average over sentences to find the phrases that are alike. [DeNero et al. ACL ’09]**Model

I will later do itLater do it I will

That I’ll do laterLater that I’ll do

...

NOVEL SENTENCE

** John DeNero, David Chiang, and Kevin Knight. Fast Consensus Decoding over Translation Forests, ACL 2009.

Yo lo haré después

Page 153: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Consensus by Averaging Over Sentences

Intuition: “Happy families are all alike; every unhappy family is unhappy in its own way.” [Tolstoy. 1877]*

* Leo Tolstoy. Анна Каренина. 1877.

Idea: Average over sentences to find the phrases that are alike. [DeNero et al. ACL ’09]**Model

I will later do itLater do it I will

That I’ll do laterLater that I’ll do

...

NOVEL SENTENCE

** John DeNero, David Chiang, and Kevin Knight. Fast Consensus Decoding over Translation Forests, ACL 2009.

“Later” “do” “do it” “I’ll” “do later” “do it I will”... ...

Yo lo haré después

Page 154: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Consensus by Averaging Over Sentences

Intuition: “Happy families are all alike; every unhappy family is unhappy in its own way.” [Tolstoy. 1877]*

* Leo Tolstoy. Анна Каренина. 1877.

Idea: Average over sentences to find the phrases that are alike. [DeNero et al. ACL ’09]**Model

I will later do itLater do it I will

That I’ll do laterLater that I’ll do

...

NOVEL SENTENCE

** John DeNero, David Chiang, and Kevin Knight. Fast Consensus Decoding over Translation Forests, ACL 2009.

“Later” “do” “do it” “I’ll” “do later” “do it I will”... ...

1 1 1 0 0 1

1 1 1 0 0 0

1 1 0 1 1 0

1 1 0 1 0 0

Yo lo haré después

Page 155: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Consensus by Averaging Over Sentences

Intuition: “Happy families are all alike; every unhappy family is unhappy in its own way.” [Tolstoy. 1877]*

* Leo Tolstoy. Анна Каренина. 1877.

Idea: Average over sentences to find the phrases that are alike. [DeNero et al. ACL ’09]**Model

I will later do itLater do it I will

That I’ll do laterLater that I’ll do

...

NOVEL SENTENCE

** John DeNero, David Chiang, and Kevin Knight. Fast Consensus Decoding over Translation Forests, ACL 2009.

“Later” “do” “do it” “I’ll” “do later” “do it I will”... ...

1 1 1 0 0 1

1 1 1 0 0 0

1 1 0 1 1 0

1 1 0 1 0 0

Yo lo haré después

Page 156: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Consensus by Averaging Over Sentences

Intuition: “Happy families are all alike; every unhappy family is unhappy in its own way.” [Tolstoy. 1877]*

* Leo Tolstoy. Анна Каренина. 1877.

Idea: Average over sentences to find the phrases that are alike. [DeNero et al. ACL ’09]**Model

I will later do itLater do it I will

That I’ll do laterLater that I’ll do

...

NOVEL SENTENCE

** John DeNero, David Chiang, and Kevin Knight. Fast Consensus Decoding over Translation Forests, ACL 2009.

“Later” “do” “do it” “I’ll” “do later” “do it I will”... ...

1 1 1 0 0 1

1 1 1 0 0 0

1 1 0 1 1 0

1 1 0 1 0 0

Yo lo haré después

Page 157: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Consensus by Averaging Over Sentences

Intuition: “Happy families are all alike; every unhappy family is unhappy in its own way.” [Tolstoy. 1877]*

* Leo Tolstoy. Анна Каренина. 1877.

Idea: Average over sentences to find the phrases that are alike. [DeNero et al. ACL ’09]**Model

I will later do itLater do it I will

That I’ll do laterLater that I’ll do

...

NOVEL SENTENCE

** John DeNero, David Chiang, and Kevin Knight. Fast Consensus Decoding over Translation Forests, ACL 2009.

“Later” “do” “do it” “I’ll” “do later” “do it I will”... ...

1 1 1 0 0 1

1 1 1 0 0 0

1 1 0 1 1 0

1 1 0 1 0 0

0.12

0.10

0.07

0.07...

Yo lo haré después

Page 158: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Consensus by Averaging Over Sentences

Intuition: “Happy families are all alike; every unhappy family is unhappy in its own way.” [Tolstoy. 1877]*

* Leo Tolstoy. Анна Каренина. 1877.

Idea: Average over sentences to find the phrases that are alike. [DeNero et al. ACL ’09]**Model

I will later do itLater do it I will

That I’ll do laterLater that I’ll do

...

NOVEL SENTENCE

** John DeNero, David Chiang, and Kevin Knight. Fast Consensus Decoding over Translation Forests, ACL 2009.

“Later” “do” “do it” “I’ll” “do later” “do it I will”... ...

0.97 0.98 0.54 0.41 0.34 0.121.00

1 1 1 0 0 1

1 1 1 0 0 0

1 1 0 1 1 0

1 1 0 1 0 0

0.12

0.10

0.07

0.07...

Expected output

Yo lo haré después

Page 159: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Phrase Expectations from Forests

Yo lo haré después

Page 160: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Phrase Expectations from Forests

Yo lo haré después

YoNP

I

I

Page 161: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Phrase Expectations from Forests

Yo lo haré después

YoNP

I

I

lo haréVP

shall do it

shall do it

Page 162: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Phrase Expectations from Forests

Yo lo haré después

YoNP

I

I

lo haréVP

shall do it

shall do it

loPRP it

it

Page 163: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Phrase Expectations from Forests

Yo lo haré después

YoNP

I

I

lo haréVP

shall do it

shall do it

loPRP it

it

PRP haréVP

will do PRP

will do it

“do it”

Page 164: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Phrase Expectations from Forests

Yo lo haré después

YoNP

I

I

lo haréVP

shall do it

shall do it

loPRP it

it

PRP haréVP

will do PRP

will do it

NP VPS

NP VP“I will”

“do it”

Page 165: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

NP VPS

NP VP

Phrase Expectations from Forests

Yo lo haré después

YoNP

I

I

lo haréVP

shall do it

shall do it

loPRP it

it

PRP haréVP

will do PRP

will do it

NP VPS

NP VP“I will”

“I shall”

“do it”

Page 166: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

NP VPS

NP VP

Phrase Expectations from Forests

Yo lo haré después

YoNP

I

I

lo haréVP

shall do it

shall do it

loPRP it

it

PRP haréVP

will do PRP

will do it

NP VPS

NP VP

I {will, shall} do it

“I will”

“I shall”

“do it”

Page 167: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

NP VPS

NP VP

Phrase Expectations from Forests

Yo lo haré después

YoNP

I

I

lo haréVP

shall do it

shall do it

loPRP it

it

PRP haréVP

will do PRP

will do it

NP VPS

NP VP

I {will, shall} do it

I {will, shall} do it later

S despuésS

S later

“it later”“I will”

“I shall”

“do it”

Page 168: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Single System Translation Results

Translation quality in ISI’s Full-Scale Arabic-to-English Hierarchical Translation System

Translation Quality (BLEU)

50

51

52

5353.0

52.252.0

Model-Only BaselineConsensus from a List [DeNero et al. ACL ’09]*Consensus from a Forest [DeNero et al. ACL ’09]*

* John DeNero, David Chiang, and Kevin Knight. Fast Consensus Decoding over Translation Forests, ACL 2009.

Page 169: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Model 1 Model 2 Model 3

Translating Using Multiple Systems

NOVEL SENTENCE

Yo lo haré después

Page 170: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Model 1 Model 2 Model 3

Translating Using Multiple Systems

NOVEL SENTENCE

Yo lo haré después

Page 171: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Model 1 Model 2 Model 3

Translating Using Multiple Systems

NOVEL SENTENCE

Yo lo haré después

0.96 0.99 0.57 0.44 0.30 0.00

0.94 0.92 0.81 0.67 0.11 0.09

0.97 0.98 0.54 0.41 0.34 0.12

Page 172: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Model 1 Model 2 Model 3

Translating Using Multiple Systems

NOVEL SENTENCE

Yo lo haré después

I will do it later

0.96 0.99 0.57 0.44 0.30 0.00

0.94 0.92 0.81 0.67 0.11 0.09

0.97 0.98 0.54 0.41 0.34 0.12

Page 173: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Consensus Modeling FAQ

Page 174: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Consensus Modeling FAQ

Q: How do we combine different models?

Page 175: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Consensus Modeling FAQ

Q: How do we combine different models?

Models Which model?

Phrase lengths

Expected counts

Model score

Length

A: Train a linear consensus model scoring a derivation d:

I�

i=1

�w(�)

i �i(d) +4�

n=1

w(n)i v(n)

i (d)

�+ w(b) · b(d) + w(�) · �(d)

Page 176: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Consensus Modeling FAQ

Q: How do we combine different models?

Models Which model?

Phrase lengths

Expected counts

Model score

Length

A: Train a linear consensus model scoring a derivation d:

I�

i=1

�w(�)

i �i(d) +4�

n=1

w(n)i v(n)

i (d)

�+ w(b) · b(d) + w(�) · �(d)

Q: What output sentences are considered?

Page 177: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Consensus Modeling FAQ

Q: How do we combine different models?

Models Which model?

Phrase lengths

Expected counts

Model score

Length

A: Train a linear consensus model scoring a derivation d:

I�

i=1

�w(�)

i �i(d) +4�

n=1

w(n)i v(n)

i (d)

�+ w(b) · b(d) + w(�) · �(d)

Q: What output sentences are considered?

A: The union of output spaces of models:R

Rpb Rh

Page 178: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Multi-System Translation Results

Google’s Full-Scale Research Translation System for Arabic-to-English

Translation quality (BLEU)

42

43

44

45

46

45.3

43.9

Best Single-System Model-Only BaselineMulti-System Forest-Based Consensus [DeNero et al. NAACL ’10]*

* John DeNero, Shankar Kumar, Ciprian Chelba, and Franz Och. Model Combination for Machine Translation, NAACL 2010.

Page 179: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Learn a model

Apply the model

Choose a translation

Page 180: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Learn a model

Apply the model

Choose a translation

Statistical models provide distributions over outputs

Page 181: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Learn a model

Apply the model

Choose a translation

Statistical models provide distributions over outputs

Leveraging those distributions improves performance

Page 182: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

The Steps in a Modern Translation System

Learn a model

Apply the model

Choose a translation

Statistical models provide distributions over outputs

Leveraging those distributions improves performance

Compact representations can enable large-scale computation

Page 183: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Summary of Translation Research

Learn a model

Apply the model

Choose a translation

Page 184: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Summary of Translation Research

Learn a model

Apply the model

Choose a translation

Non-parametric models

Large-context models

[DeNero et al. EMNLP ’08]

[DeNero & Klein. ACL ’10]

Page 185: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Summary of Translation Research

Learn a model

Apply the model

Choose a translation

Non-parametric models

Large-context models Coarse-to-fine

[DeNero et al. NAACL ’09][DeNero et al. EMNLP ’08]

[DeNero & Klein. ACL ’10]

Page 186: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Summary of Translation Research

Learn a model

Apply the model

Choose a translation

Non-parametric models

Large-context models Coarse-to-fine

Compact encodings

Full distributions

[DeNero et al. NAACL ’09] [DeNero et al. ACL ’09]

[DeNero et al. NAACL ’10]

[DeNero et al. EMNLP ’08]

[DeNero & Klein. ACL ’10]

Page 187: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Summary of Translation Research

Learn a model

Apply the model

Choose a translation

Non-parametric models

Large-context models Coarse-to-fine

Compact encodings

Full distributions

[DeNero et al. NAACL ’09] [DeNero et al. ACL ’09]

[DeNero et al. NAACL ’10]

[DeNero et al. EMNLP ’08]

[DeNero & Klein. ACL ’10]

Are we done yet?

Page 188: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Summary of Translation Research

Learn a model

Apply the model

Choose a translation

Non-parametric models

Large-context models Coarse-to-fine

Compact encodings

Full distributions

[DeNero et al. NAACL ’09] [DeNero et al. ACL ’09]

[DeNero et al. NAACL ’10]

[DeNero et al. EMNLP ’08]

[DeNero & Klein. ACL ’10]

Are we done yet?

• Morphology in alignment modeling

Page 189: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Summary of Translation Research

Learn a model

Apply the model

Choose a translation

Non-parametric models

Large-context models Coarse-to-fine

Compact encodings

Full distributions

[DeNero et al. NAACL ’09] [DeNero et al. ACL ’09]

[DeNero et al. NAACL ’10]

[DeNero et al. EMNLP ’08]

[DeNero & Klein. ACL ’10]

Are we done yet?

• Morphology in alignment modeling

• Unsupervised composed phrase learning

Page 190: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Summary of Translation Research

Learn a model

Apply the model

Choose a translation

Non-parametric models

Large-context models Coarse-to-fine

Compact encodings

Full distributions

[DeNero et al. NAACL ’09] [DeNero et al. ACL ’09]

[DeNero et al. NAACL ’10]

[DeNero et al. EMNLP ’08]

[DeNero & Klein. ACL ’10]

Are we done yet?

• Morphology in alignment modeling

• Unsupervised composed phrase learning

• Adding information to consensus models

Page 191: Large-Context Models for Large-Scale Machine Translationdenero.org/content/talks/denero_thesis_slides.pdf · An Example Syntax-Based TranslationNew ic 1 e m - e 211 d by . r 10 9

Acknowledgements

Thank you!

and many thanks to my excellent coauthors on this work:

Berkeley: Mohit Bansal, John Blitzer, Alex Bouchard-Côté, Aria Haghighi, Dan Klein, and Adam Pauls

Information Sciences Institute: David Chiang and Kevin Knight

Google: Ciprian Chelba, Shankar Kumar, and Franz Och

John

Juan Gracias!