CS60057 Speech &Natural Language Processing

117
CS60057 Speech &Natural Language Processing Autumn 2008 Lecture 16 3 Sep 2008

description

CS60057 Speech &Natural Language Processing. Autumn 2008. Lecture 16 3 Sep 2008. Outline for MT. Intro and a little history Language Similarities and Divergences Three classic MT Approaches Transfer Interlingua Direct Modern Statistical MT Evaluation. What is MT?. - PowerPoint PPT Presentation

Transcript of CS60057 Speech &Natural Language Processing

Page 1: CS60057 Speech &Natural Language Processing

CS60057Speech &Natural Language

Processing

Autumn 2008

Lecture 16

3 Sep 2008

Page 2: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 2

Outline for MT

Intro and a little history Language Similarities and Divergences Three classic MT Approaches

Transfer Interlingua Direct

Modern Statistical MT Evaluation

Page 3: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 3

What is MT?

Translating a text from one language to another automatically.

Page 4: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 4

Machine Translation

dai yu zi zai chuang shang gan nian bao chai you ting jian chuang wai zhu shao xiang ye zhe shang, yu sheng xi li, qing han tou mu, bu jue you di xia lei lai.

Dai-yu alone on bed top think-of-with-gratitude Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come

As she lay there alone, Dai-yu’s thoughts turned to Bao-chai… Then she listened to the insistent rustle of the rain on the bamboos and plantains outside her window. The coldness penetrated the curtains of her bed. Almost without noticing it she had begun to cry.

Page 5: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 5

Machine Translation

Page 6: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 6

Machine Translation

The Story of the Stone =The Dream of the Red Chamber (Cao Xueqin 1792) Issues:

Word segmentation Sentence segmentation: 4 English sentences to 1 Chinese Grammatical differences

Chinese rarely marks tense: As, turned to, had begun, tou -> penetrated

Zero anaphora No articles

Stylistic and cultural differences Bamboo tip plaintain leaf -> bamboos and plantains Ma ‘curtain’ -> curtains of her bed Rain sound sigh drop -> insistent rustle of the rain

Page 7: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 7

Not just literature

Hansards: Canadian parliamentary proceeedings

Page 8: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 8

What is MT not good for?

Really hard stuff Literature Natural spoken speech (meetings, court reporting)

Really important stuff Medical translation in hospitals, 911

Page 9: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 9

What is MT good for?

Tasks for which a rough translation is fine Web pages, email

Tasks for which MT can be post-edited MT as first pass “Computer-aided human translation

Tasks in sublanguage domains where high-quality MT is possible FAHQT

Page 10: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 10

Sublanguage domain

Weather forecasting “Cloudy with a chance of showers today and Thursday” “Low tonight 4”

Can be modeling completely enough to use raw MT output Word classes and semantic features like MONTH, PLACE,

DIRECTION, TIME POINT

Page 11: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 11

MT History

1946 Booth and Weaver discuss MT at Rockefeller foundation in New York;

1947-48 idea of dictionary-based direct translation 1949 Weaver memorandum popularized idea 1952 all 18 MT researchers in world meet at MIT 1954 IBM/Georgetown Demo Russian-English MT 1955-65 lots of labs take up MT

Page 12: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 12

History of MT: Pessimism

1959/1960: Bar-Hillel “Report on the state of MT in US and GB” Argued FAHQT too hard (semantic ambiguity, etc) Should work on semi-automatic instead of automatic His argument

Little John was looking for his toy box. Finally, he found it. The box was in the pen. John was very happy.

Only human knowledge let’s us know that ‘playpens’ are bigger than boxes, but ‘writing pens’ are smaller

His claim: we would have to encode all of human knowledge

Page 13: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 13

History of MT: Pessimism

The ALPAC report Headed by John R. Pierce of Bell Labs Conclusions:

Supply of human translators exceeds demand All the Soviet literature is already being translated MT has been a failure: all current MT work had to be post-edited Sponsored evaluations which showed that intelligibility and

informativeness was worse than human translations Results:

MT research suffered Funding loss Number of research labs declined Association for Machine Translation and Computational

Linguistics dropped MT from its name

Page 14: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 14

History of MT

1976 Meteo, weather forecasts from English to French Systran (Babelfish) been used for 40 years 1970’s:

European focus in MT; mainly ignored in US 1980’s

ideas of using AI techniques in MT (KBMT, CMU) 1990’s

Commercial MT systems Statistical MT Speech-to-speech translation

Page 15: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 15

Language Similarities and Divergences

Some aspects of human language are universal or near-universal, others diverge greatly.

Typology: the study of systematic cross-linguistic similarities and differences

What are the dimensions along with human languages vary?

Page 16: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 16

Morphological Variation

Isolating languages Cantonese, Vietnamese: each word generally has one

morpheme Vs. Polysynthetic languages

Siberian Yupik (`Eskimo’): single word may have very many morphemes

Agglutinative languages Turkish: morphemes have clean boundaries

Vs. Fusion languages Russian: single affix may have many morphemes

Page 17: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 17

Syntactic Variation

SVO (Subject-Verb-Object) languages English, German, French, Mandarin

SOV Languages Japanese, Hindi

VSO languages Irish, Classical Arabic

SVO lgs generally prepositions: to Yuriko VSO lgs generally postpositions: Yuriko ni

Page 18: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 18

Segmentation Variation

Not every writing system has word boundaries marked Chinese, Japanese, Thai, Vietnamese

Some languages tend to have sentences that are quite long, closer to English paragraphs than sentences: Modern Standard Arabic, Chinese

Page 19: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 19

Inferential Load: cold vs. hot lgs

Some ‘cold’ languages require the hearer to do more “figuring out” of who the various actors in the various events are: Japanese, Chinese,

Other ‘hot’ languages are pretty explicit about saying who did what to whom. English

Page 20: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 20

Inferential Load (2)

All noun phrases inblue do not appearin Chinese text … But they are neededfor a good translation

Page 21: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 21

Lexical Divergences

Word to phrases: English “computer science” = French “informatique”

POS divergences Eng. ‘she likes/VERB to sing’ Ger. Sie singt gerne/ADV Eng ‘I’m hungry/ADJ Sp. ‘tengo hambre/NOUN

Page 22: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 22

Lexical Divergences: Specificity Grammatical constraints

English has gender on pronouns, Mandarin not. So translating “3rd person” from Chinese to English, need to figure

out gender of the person! Similarly from English “they” to French “ils/elles”

Semantic constraints English `brother’ Mandarin ‘gege’ (older) versus ‘didi’ (younger) English ‘wall’ German ‘Wand’ (inside) ‘Mauer’ (outside) German ‘Berg’ English ‘hill’ or ‘mountain’

Page 23: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 23

Lexical Divergence: many-to-many

Page 24: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 24

Lexical Divergence: lexical gaps

Japanese: no word for privacy English: no word for Cantonese ‘haauseun’ or Japanese

‘oyakoko’ (something like `filial piety’)

English ‘cow’ versus ‘beef’, Cantonese ‘ngau’

Page 25: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 25

Event-to-argument divergences

English The bottle floated out.

Spanish La botella salió flotando. The bottle exited floating

Verb-framed lg: mark direction of motion on verb Spanish, French, Arabic, Hebrew, Japanese, Tamil, Polynesian,

Mayan, Bantu familiies Satellite-framed lg: mark direction of motion on satellite

Crawl out, float off, jump down, walk over to, run after Rest of Indo-European, Hungarian, Finnish, Chinese

Page 26: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 26

Structural divergences

G: Wir treffen uns am Mittwoch E: We’ll meet on Wednesday

Page 27: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 27

Head Swapping

E: X swim across Y S: X crucar Y nadando

E: I like to eat G: Ich esse gern

E: I’d prefer vanilla G: Mir wäre Vanille lieber

Page 28: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 28

Thematic divergence

Y me gusto I like Y

G: Mir fällt der Termin ein E: I forget the date

Page 29: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 29

Divergence counts from Bonnie Dorr 32% of sentences in UN Spanish/English Corpus (5K)

Categorial X tener hambre Y have hunger

98%

Conflational X dar puñaladas a Z X stab Z

83%

Structural X entrar en Y X enter Y

35%

Head Swapping X cruzar Y nadando X swim across Y

8%

Thematic X gustar a Y Y likes X

6%

Page 31: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 31

3 methods for MT

Direct Transfer Interlingua

Page 32: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 32

Three MT Approaches: Direct, Transfer, Interlingual

Page 33: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 33

Direct Translation

Proceed word-by-word through text Translating each word No intermediate structures except morphology Knowledge is in the form of

Huge bilingual dictionary word-to-word translation information

After word translation, can do simple reordering Adjective ordering English -> French/Spanish

Page 34: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 34

Direct MT Dictionary entry

Page 35: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 35

Direct MT

Page 36: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 36

Problems with direct MT

German

Chinese

Page 37: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 37

The Transfer Model

Idea: apply contrastive knowledge, i.e., knowledge about the difference between two languages

Steps: Analysis: Syntactically parse Source language Transfer: Rules to turn this parse into parse for Target

language Generation: Generate Target sentence from parse

tree

Page 38: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 38

English to French

Generally English: Adjective Noun French: Noun Adjective Note: not always true

Route mauvaise ‘bad road, badly-paved road’ Mauvaise route ‘wrong road’) But is a reasonable first approximation

Rule:

Page 39: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 39

Transfer rules

Page 40: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 40

Lexical transfer

Transfer-based systems also need lexical transfer rules Bilingual dictionary (like for direct MT) English home: German

nach Hause (going home) Heim (home game) Heimat (homeland, home country) zu Hause (at home)

Can list “at home <-> zu Hause” Or do Word Sense Disambiguation

Page 41: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 41

Systran: combining direct and transfer

Analysis Morphological analysis, POS tagging Chunking of NPs, PPs, phrases Shallow dependency parsing

Transfer Translation of idioms Word sense disambiguation Assigning prepositions based on governing verbs

Synthesis Apply rich bilingual dictionary Deal with reordering Morphological generation

Page 42: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 42

Transfer: some problems

N2 sets of transfer rules! Grammar and lexicon full of language-specific stuff Hard to build, hard to maintain

Page 43: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 43

Interlingua Intuition: Instead of lg-lg knowledge rules, use the

meaning of the sentence to help Steps:

1) translate source sentence into meaning representation

2) generate target sentence from meaning.

Page 44: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 44

Interlingua for Mary did not slap the green witch

Page 45: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 45

Interlingua

Idea is that some of the MT work that we need to do is part of other NLP tasks

E.g., disambiguating E:book S:‘libro’ from E:book S:‘reservar’

So we could have concepts like BOOKVOLUME and RESERVE and solve this problem once for each language

Page 46: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 46

Direct MT: pros and cons (Bonnie Dorr)

Pros Fast Simple Cheap No translation rules hidden in lexicon

Cons Unreliable Not powerful Rule proliferation Requires lots of context Major restructuring after lexical substitution

Page 47: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 47

Interlingual MT: pros and cons (B. Dorr)

Pros Avoids the N2 problem Easier to write rules

Cons: Semantics is HARD Useful information lost (paraphrase)

Page 48: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 48

The impossibility of translation

Hebrew “adonoi roi” for a culture without sheep or shepherds Something fluent and understandable, but not faithful:

“The Lord will look after me” Something faithful, but not fluent and nautral

“The Lord is for me like somebody who looks after animals with cotton-like hair”

Page 49: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 49

What makes a good translation

Translators often talk about two factors we want to maximize:

Faithfulness or fidelity How close is the meaning of the translation to the

meaning of the original (Even better: does the translation cause the reader to

draw the same inferences as the original would have) Fluency or naturalness

How natural the translation is, just considering its fluency in the target language

Page 50: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 50

Statistical MT Systems

Spanish BrokenEnglish

English

Spanish/EnglishBilingual Text

EnglishText

Statistical Analysis Statistical Analysis

Que hambre tengo yo

What hunger have I,Hungry I am so,I am so hungry,Have I that hunger …

I am so hungry

Slide from Kevin Knight

Page 51: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 51

Statistical MT Systems

Spanish BrokenEnglish

English

Spanish/EnglishBilingual Text

EnglishText

Statistical Analysis Statistical Analysis

Que hambre tengo yo I am so hungry

TranslationModel P(s|e)

LanguageModel P(e)

Decoding algorithmargmax P(e) * P(s|e) e

Slide from Kevin Knight

Page 52: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 52

Statistical MT: Faithfulness and Fluency formalized! Best-translation of a source sentence S:

Developed by researchers who were originally in speech recognition at IBM

Called the IBM model

ˆ T argmaxT fluency(T)faithfulness(T,S)

Page 53: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 53

Three Problems for Statistical MT

Language model Given an English string e, assigns P(e) by formula good English string -> high P(e) random word sequence -> low P(e)

Translation model Given a pair of strings <f,e>, assigns P(f | e) by formula <f,e> look like translations -> high P(f | e) <f,e> don’t look like translations -> low P(f | e)

Decoding algorithm Given a language model, a translation model, and a new sentence f

… find translation e maximizing P(e) * P(f | e)

Slide from Kevin Knight

Page 54: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 54

The IBM model

Hmm, those two factors might look familiar…

Yup, it’s Bayes rule:

ˆ T argmaxT fluency(T)faithfulness(T,S)

ˆ T argmaxT P(T)P(S | T)

Page 55: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 55

More formally

Assume we are translating from a foreign language sentence F to an English sentence E: F = f1, f2, f3,…, fm

We want to find the best English sentence E-hat = e1, e2, e3,…, en

E-hat = argmaxE P(E|F) = argmaxE P(F|E)P(E)/P(F) = argmaxE P(F|E)P(E)Translation Model Language Model

Page 56: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 56

The noisy channel model for MT

Page 57: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 57

Fluency: P(T)

How to measure that this sentence That car was almost crash onto me

is less fluent than this one: That car almost hit me.

Answer: language models (N-grams!) For example P(hit|almost) > P(was|almost)

But can use any other more sophisticated model of grammar

Advantage: this is monolingual knowledge!

Page 58: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 58

Faithfulness: P(S|T)

French: ça me plait [that me pleases] English:

that pleases me - most fluent I like it I’ll take that one

How to quantify this? Intuition: degree to which words in one sentence are

plausible translations of words in other sentence Product of probabilities that each word in target

sentence would generate each word in source sentence.

Page 59: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 59

Faithfulness P(S|T)

Need to know, for every target language word, probability of it mapping to every source language word.

How do we learn these probabilities? Parallel texts!

Lots of times we have two texts that are translations of each other

If we knew which word in Source Text mapped to each word in Target Text, we could just count!

Page 60: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 60

Faithfulness P(S|T)

Sentence alignment: Figuring out which source language sentence maps to

which target language sentence Word alignment

Figuring out which source language word maps to which target language word

Page 61: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 61

Big Point about Faithfulness and Fluency

Job of the faithfulness model P(S|T) is just to model “bag of words”; which words come from say English to Spanish.

P(S|T) doesn’t have to worry about internal facts about Spanish word order: that’s the job of P(T)

P(T) can do Bag generation: put the following words in order (from Kevin Knight) have programming a seen never I language better

-actual the hashing is since not collision-free usually the is less perfectly the of somewhat capacity table

Page 62: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 62

P(T) and bag generation: the answer

“Usually the actual capacity of the table is somewhat less, since the hashing is not collision-free”

How about: loves Mary John

Page 63: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 63

Three Problems for Statistical MT

Language model Given an English string e, assigns P(e) by formula good English string -> high P(e) random word sequence -> low P(e)

Translation model Given a pair of strings <f,e>, assigns P(f | e) by formula <f,e> look like translations -> high P(f | e) <f,e> don’t look like translations -> low P(f | e)

Decoding algorithm Given a language model, a translation model, and a new sentence f

… find translation e maximizing P(e) * P(f | e)

Slide from Kevin Knight

Page 64: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 64

The Classic Language ModelWord N-Grams

Goal of the language model -- choose among:

He is on the soccer fieldHe is in the soccer field

Is table the on cup theThe cup is on the table

Rice shrineAmerican shrineRice companyAmerican company

Slide from Kevin Knight

Page 65: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 65

Intuition of phrase-based translation (Koehn et al. 2003)

Generative story has three steps1) Group words into phrases2) Translate each phrase3) Move the phrases around

Page 66: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 66

Generative story again

1) Group English source words into phrases e1, e2, …, en

2) Translate each English phrase ei into a Spanish phrase fj.

1) The probability of doing this is (fj|ei)

3) Then (optionally) reorder each Spanish phrase1) We do this with a distortion probability2) A measure of distance between positions of a

corresponding phrase in the 2 lgs.3) “What is the probability that a phrase in position X in

the English sentences moves to position Y in the Spanish sentence?”

Page 67: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 67

Distortion probability

The distortion probability is parameterized by ai-bi-1

Where ai is the start position of the foreign (Spanish) phrase generated by the ith English phrase ei.

And bi-1 is the end position of the foreign (Spanish) phrase generated by the I-1th English phrase ei-1.

We’ll call the distortion probability d(ai-bi-1). And we’ll have a really stupid model:

d(ai-bi-1) = |ai-bi-1|

Where is some small constant.

Page 68: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 68

Final translation model for phrase-based MT

Let’s look at a simple example with no distortion

P(F | E) ( f i,e ii1

l

)d(ai bi 1)

Page 69: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 69

Phrase-based MT

Language model P(E) Translation model P(F|E)

Model How to train the model

Decoder: finding the sentence E that is most probable

Page 70: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 70

Training P(F|E)

What we mainly need to train is (fj|ei) Suppose we had a large bilingual training corpus

A bitext In which each English sentence is paired with a

Spanish sentence And suppose we knew exactly which phrase in Spanish

was the translation of which phrase in the English We call this a phrase alignment If we had this, we could just count-and-divide:

Page 71: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 71

But we don’t have phrase alignments

What we have instead are word alignments:

Page 72: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 72

Getting phrase alignments

To get phrase alignments:1) We first get word alignments2) Then we “symmetrize” the word alignments

into phrase alignments

Page 73: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 73

How to get Word Alignments

Word alignment: a mapping between the source words and the target words in a set of parallel sentences.

Restriction: each foreign word comes from exactly 1 English word

Advantage: represent an alignment by the index of the English word that the French word comes from

Alignment above is thus 2,3,4,5,6,6,6

Page 74: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 74

One addition: spurious words

A word in the foreign sentence That doesn’t align with any word in the English sentence Is called a spurious word. We model these by pretending they are generated by an

English word e0:

Page 75: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 75

More sophisticated models of alignment

Page 76: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 76

Computing word alignments: IBM Model 1

For phrase-based machine translation We want a word-alignment To extract a set of phrases A word alignment algorithm gives us P(F,E) We want this to train our phrase probabilities (fj|ei) as

part of P(F|E) But a word-alignment algorithm can also be part of a

mini-translation model itself.

Page 77: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 77

IBM Model 1

Page 78: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 78

IBM Model 1

Page 79: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 79

How does the generative story assign P(F|E) for a Spanish sentence F?

Terminology:

Suppose we had done steps 1 and 2, I.e. we already knew the Spanish length J and the alignment A (and English source E):

Page 80: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 80

Let’s formalize steps 1 and 2

We want P(A|E) of an alignment A (of length J) given an English sentence E

IBM Model 1 makes the (very) simplifying assumption that each alignment is equally likely.

How many possible alignments are there between English sentence of length I and Spanish sentence of length J?

Hint: Each Spanish word must come from one of the English source words (or the NULL word)

(I+1)J

Let’s assume probability of choosing length J is small constant epsilon

Page 81: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 81

Model 1 continued

Prob of choosing a length and then one of the possible alignments:

Combining with step 3:

The total probability of a given foreign sentence F:

Page 82: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 82

Decoding

How do we find the best A?

Page 83: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 83

Training alignment probabilities

Step 1: get a parallel corpus Hansards

Canadian parliamentary proceedings, in French and English Hong Kong Hansards: English and Chinese

Step 2: sentence alignment Step 3: use EM (Expectation Maximization) to train word

alignments

Page 84: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 84

Step 1: Parallel corpora

English GermanDiverging opinions about planned tax reform

Unterschiedliche Meinungen zur geplanten Steuerreform

The discussion around the envisaged major tax reform continues .

Die Diskussion um die vorgesehene grosse Steuerreform dauert an .

The FDP economics expert , Graf Lambsdorff , today came out in favor of advancing the enactment of significant parts of the overhaul , currently planned for 1999 .

Der FDP - Wirtschaftsexperte Graf Lambsdorff sprach sich heute dafuer aus , wesentliche Teile der fuer 1999 geplanten Reform vorzuziehen .

Example from DE-News (8/1/1996)

Slide from Christof Monz

Page 85: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 85

Step 2: Sentence Alignment

The old man is happy. He has fished many times. His wife talks to him. The fish are jumping. The sharks await.

Intuition: - use length in words or chars- together with dynamic programming

- or use a simpler MT model

El viejo está feliz porque ha pescado muchos veces. Su mujer habla con él. Los tiburones esperan.

Slide from Kevin Knight

Page 86: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 86

Sentence Alignment

1. The old man is happy. 2. He has fished many

times. 3. His wife talks to him. 4. The fish are jumping. 5. The sharks await.

El viejo está feliz porque ha pescado muchos veces.

Su mujer habla con él. Los tiburones esperan.

Slide from Kevin Knight

Page 87: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 87

Sentence Alignment

1. The old man is happy.

2. He has fished many times.

3. His wife talks to him.

4. The fish are jumping.

5. The sharks await.

El viejo está feliz porque ha pescado muchos veces.

Su mujer habla con él.

Los tiburones esperan.

Slide from Kevin Knight

Page 88: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 88

Sentence Alignment

1. The old man is happy. He has fished many times.

2. His wife talks to him.

3. The sharks await.

El viejo está feliz porque ha pescado muchos veces.

Su mujer habla con él.

Los tiburones esperan.

Note that unaligned sentences are thrown out, andsentences are merged in n-to-m alignments (n, m > 0).

Slide from Kevin Knight

Page 89: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 89

Step 3: word alignments

It turns out we can bootstrap alignments From a sentence-aligned bilingual corpus We use is the Expectation-Maximization or EM

algorithm

Page 90: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 90

EM for training alignment probs

… la maison … la maison bleue … la fleur …

… the house … the blue house … the flower …

All word alignments equally likely

All P(french-word | english-word) equally likely

Slide from Kevin Knight

Page 91: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 91

EM for training alignment probs

… la maison … la maison bleue … la fleur …

… the house … the blue house … the flower …

“la” and “the” observed to co-occur frequently,so P(la | the) is increased.

Slide from Kevin Knight

Page 92: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 92

EM for training alignment probs

… la maison … la maison bleue … la fleur …

… the house … the blue house … the flower …

“house” co-occurs with both “la” and “maison”, butP(maison | house) can be raised without limit, to 1.0,

while P(la | house) is limited because of “the”

(pigeonhole principle)Slide from Kevin Knight

Page 93: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 93

EM for training alignment probs

… la maison … la maison bleue … la fleur …

… the house … the blue house … the flower …

settling down after another iteration

Slide from Kevin Knight

Page 94: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 94

EM for training alignment probs

… la maison … la maison bleue … la fleur …

… the house … the blue house … the flower …

Inherent hidden structure revealed by EM training!For details, see:

•Section 24.6.1 in the chapter• “A Statistical MT Tutorial Workbook” (Knight, 1999).• “The Mathematics of Statistical Machine Translation” (Brown et al, 1993)• Software: GIZA++

Slide from Kevin Knight

Page 95: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 95

Statistical Machine Translation

… la maison … la maison bleue … la fleur …

… the house … the blue house … the flower …

P(juste | fair) = 0.411P(juste | correct) = 0.027P(juste | right) = 0.020 …

new Frenchsentence

Possible English translations,to be rescored by language model

Slide from Kevin Knight

Page 96: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 96

A more complex model: IBM Model 3Brown et al., 1993

Mary did not slap the green witch

Mary not slap slap slap the green witch n(3|slap)

Maria no dió una bofetada a la bruja verded(j|i)

Mary not slap slap slap NULL the green witchP-Null

Maria no dió una bofetada a la verde brujat(la|the)

Generative approach:

Probabilities can be learned from raw bilingual text.

Page 97: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 97

How do we evaluate MT? Human tests for fluency

Rating tests: Give the raters a scale (1 to 5) and ask them to rate Or distinct scales for

Clarity, Naturalness, Style Or check for specific problems

Cohesion (Lexical chains, anaphora, ellipsis) Hand-checking for cohesion.

Well-formedness 5-point scale of syntactic correctness

Comprehensibility tests Noise test Multiple choice questionnaire

Readability tests cloze

Page 98: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 98

How do we evaluate MT? Human tests for fidelity

Adequacy Does it convey the information in the original? Ask raters to rate on a scale

Bilingual raters: give them source and target sentence, ask how much information is preserved

Monolingual raters: give them target + a good human translation

Informativeness Task based: is there enough info to do some task? Give raters multiple-choice questions about content

Page 99: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 99

Evaluating MT: Problems

Asking humans to judge sentences on a 5-point scale for 10 factors takes time and $$$ (weeks or months!)

We can’t build language engineering systems if we can only evaluate them once every quarter!!!!

We need a metric that we can run every time we change our algorithm.

It would be OK if it wasn’t perfect, but just tended to correlate with the expensive human metrics, which we could still run in quarterly.

Bonnie Dorr

Page 100: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 100

Automatic evaluation Miller and Beebe-Center (1958) Assume we have one or more human translations of the

source passage Compare the automatic translation to these human

translations Bleu NIST Meteor Precision/Recall

Page 101: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 101

BiLingual Evaluation Understudy (BLEU —Papineni, 2001)

Automatic Technique, but …. Requires the pre-existence of Human (Reference) Translations Approach:

Produce corpus of high-quality human translations Judge “closeness” numerically (word-error rate) Compare n-gram matches between candidate translation and

1 or more reference translations

http://www.research.ibm.com/people/k/kishore/RC22176.pdf

Slide from Bonnie Dorr

Page 102: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 102

Reference (human) translation: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport .

Machine translation: The American [?] international airport and its the office all receives one calls self the sand Arab rich business [?] and so on electronic mail , which sends out ; The threat will be able after public place and so on the airport to start the biochemistry attack , [?] highly alerts after the maintenance.

BLEU Evaluation Metric(Papineni et al, ACL-2002)

• N-gram precision (score is between 0 & 1)– What percentage of machine n-grams can

be found in the reference translation? – An n-gram is an sequence of n words

– Not allowed to use same portion of reference translation twice (can’t cheat by typing out “the the the the the”)

• Brevity penalty– Can’t just type out single word “the”

(precision 1.0!)

*** Amazingly hard to “game” the system (i.e., find a way to change machine output so that BLEU goes up, but quality doesn’t)

Slide from Bonnie Dorr

Page 103: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 103

Reference (human) translation: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport .

Machine translation: The American [?] international airport and its the office all receives one calls self the sand Arab rich business [?] and so on electronic mail , which sends out ; The threat will be able after public place and so on the airport to start the biochemistry attack , [?] highly alerts after the maintenance.

BLEU Evaluation Metric(Papineni et al, ACL-2002)

• BLEU4 formula (counts n-grams up to length 4)

exp (1.0 * log p1 + 0.5 * log p2 + 0.25 * log p3 + 0.125 * log p4 – max(words-in-reference / words-in-machine – 1, 0)

p1 = 1-gram precisionP2 = 2-gram precisionP3 = 3-gram precisionP4 = 4-gram precision

Slide from Bonnie Dorr

Page 104: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 104

Reference translation 1: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport .

Reference translation 3: The US International Airport of Guam and its office has received an email from a self-claimed Arabian millionaire named Laden , which threatens to launch a biochemical attack on such public places as airport . Guam authority has been on alert .

Reference translation 4: US Guam International Airport and its office received an email from Mr. Bin Laden and other rich businessman from Saudi Arabia . They said there would be biochemistry air raid to Guam Airport and other public places . Guam needs to be in high precaution about this matter .

Reference translation 2: Guam International Airport and its offices are maintaining a high state of alert after receiving an e-mail that was from a person claiming to be the wealthy Saudi Arabian businessman Bin Laden and that threatened to launch a biological and chemical attack on the airport and other public places .

Machine translation: The American [?] international airport and its the office all receives one calls self the sand Arab rich business [?] and so on electronic mail , which sends out ; The threat will be able after public place and so on the airport to start the biochemistry attack , [?] highly alerts after the maintenance.

Multiple Reference Translations

Reference translation 1: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport .

Reference translation 3: The US International Airport of Guam and its office has received an email from a self-claimed Arabian millionaire named Laden , which threatens to launch a biochemical attack on such public places as airport . Guam authority has been on alert .

Reference translation 4: US Guam International Airport and its office received an email from Mr. Bin Laden and other rich businessman from Saudi Arabia . They said there would be biochemistry air raid to Guam Airport and other public places . Guam needs to be in high precaution about this matter .

Reference translation 2: Guam International Airport and its offices are maintaining a high state of alert after receiving an e-mail that was from a person claiming to be the wealthy Saudi Arabian businessman Bin Laden and that threatened to launch a biological and chemical attack on the airport and other public places .

Machine translation: The American [?] international airport and its the office all receives one calls self the sand Arab rich business [?] and so on electronic mail , which sends out ; The threat will be able after public place and so on the airport to start the biochemistry attack , [?] highly alerts after the maintenance.

Slide from Bonnie Dorr

Page 105: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 107

Bleu Comparison

Chinese-English Translation Example:Candidate 1: It is a guide to action which ensures that the military always obeys the commands of the party.Candidate 2: It is to insure the troops forever hearing the activity guidebook that party direct.

Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.Reference 3: It is the practical guide for the army always to heed the directions of the party.

Slide from Bonnie Dorr

Page 106: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 108

How Do We Compute Bleu Scores? Intuition: “What percentage of words in candidate occurred in some

human translation?” Proposal: count up # of candidate translation words (unigrams) # in

any reference translation, divide by the total # of words in # candidate translation

But can’t just count total # of overlapping N-grams! Candidate: the the the the the the Reference 1: The cat is on the mat

Solution: A reference word should be considered exhausted after a matching candidate word is identified.

Slide from Bonnie Dorr

Page 107: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 109

“Modified n-gram precision”

For each word compute: (1) total number of times it occurs in any single reference translation(2) number of times it occurs in the candidate translation

Instead of using count #2, use the minimum of #2 and #2, I.e. clip the counts at the max for the reference transcription

Now use that modified count. And divide by number of candidate words.

Slide from Bonnie Dorr

Page 108: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 110

Modified Unigram Precision: Candidate #1

Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.Reference 3: It is the practical guide for the army always to heed the directions of the party.

It(1) is(1) a(1) guide(1) to(1) action(1) which(1) ensures(1) that(2) the(4) military(1) always(1) obeys(0) the commands(1) of(1) the party(1)

What’s the answer???

17/18

Slide from Bonnie Dorr

Page 109: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 111

Modified Unigram Precision: Candidate #2

It(1) is(1) to(1) insure(0) the(4) troops(0) forever(1) hearing(0) the activity(0) guidebook(0) that(2) party(1) direct(0)

What’s the answer????

8/14

Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.Reference 3: It is the practical guide for the army always to heed the directions of the party.

Slide from Bonnie Dorr

Page 110: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 112

Modified Bigram Precision: Candidate #1

It is(1) is a(1) a guide(1) guide to(1) to action(1) action which(0) which ensures(0) ensures that(1) that the(1) the military(1) military always(0) always obeys(0) obeys the(0) the commands(0) commands of(0) of the(1) the party(1)

What’s the answer????

10/17

Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.Reference 3: It is the practical guide for the army always to heed the directions of the party.

Slide from Bonnie Dorr

Page 111: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 113

Modified Bigram Precision: Candidate #2

Reference 1: It is a guide to action that ensures that themilitary will forever heed Party commands.Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.Reference 3: It is the practical guide for the army always to heed the directions of the party.

It is(1) is to(0) to insure(0) insure the(0) the troops(0) troops forever(0) forever hearing(0) hearing the(0) the activity(0) activity guidebook(0) guidebook that(0) that party(0) party direct(0)

What’s the answer????

1/13

Slide from Bonnie Dorr

Page 112: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 114

Catching Cheaters

Reference 1: The cat is on the matReference 2: There is a cat on the mat

the(2) the the the(0) the(0) the(0) the(0)

What’s the unigram answer?

2/7What’s the bigram answer?

0/7

Slide from Bonnie Dorr

Page 113: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 115

Bleu distinguishes human from machine translations

Slide from Bonnie Dorr

Page 114: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 116

Bleu problems with sentence length Candidate: of the

Solution: brevity penalty; prefers candidates translations which are same length as one of the references

Reference 1: It is a guide to action that ensures that themilitary will forever heed Party commands.Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.Reference 3: It is the practical guide for the army always to heed the directions of the party.

Problem: modified unigram precision is 2/2, bigram 1/1!

Slide from Bonnie Dorr

Page 115: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 117

BLEU Tends to Predict Human Judgments

R2 = 88.0%

R2 = 90.2%

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5

Human Judgments

NIS

T Sc

ore

Adequacy

Fluency

Linear(Adequacy)Linear(Fluency)

slide from G. Doddington (NIST)

(var

iant

of B

LEU

)

Page 116: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 118

Summary

Intro and a little history Language Similarities and Divergences Four main MT Approaches

Transfer Interlingua Direct Statistical

Evaluation

Page 117: CS60057 Speech &Natural Language Processing

Lecture 1, 7/21/2005 Natural Language Processing 119

Classes

LINGUIST 139M/239M. Human and Machine Translation. (Martin Kay)

CS 224N. Natural Language Processing (Chris Manning)