CS60057 Speech &Natural Language Processing

CS60057Speech &Natural Language

Processing

Autumn 2008

Lecture 16

3 Sep 2008

Lecture 1, 7/21/2005 Natural Language Processing 2

Outline for MT

Intro and a little history Language Similarities and Divergences Three classic MT Approaches

Transfer Interlingua Direct

Modern Statistical MT Evaluation


What is MT?

Translating a text from one language to another automatically.


Machine Translation

dai yu zi zai chuang shang gan nian bao chai you ting jian chuang wai zhu shao xiang ye zhe shang, yu sheng xi li, qing han tou mu, bu jue you di xia lei lai.

Dai-yu alone on bed top think-of-with-gratitude Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come

As she lay there alone, Dai-yu’s thoughts turned to Bao-chai… Then she listened to the insistent rustle of the rain on the bamboos and plantains outside her window. The coldness penetrated the curtains of her bed. Almost without noticing it she had begun to cry.


Machine Translation


Machine Translation

The Story of the Stone =The Dream of the Red Chamber (Cao Xueqin 1792) Issues:

Word segmentation Sentence segmentation: 4 English sentences to 1 Chinese Grammatical differences

Chinese rarely marks tense: As, turned to, had begun, tou -> penetrated

Zero anaphora No articles

Stylistic and cultural differences Bamboo tip plaintain leaf -> bamboos and plantains Ma ‘curtain’ -> curtains of her bed Rain sound sigh drop -> insistent rustle of the rain


Not just literature

Hansards: Canadian parliamentary proceeedings


What is MT not good for?

Really hard stuff Literature Natural spoken speech (meetings, court reporting)

Really important stuff Medical translation in hospitals, 911


What is MT good for?

Tasks for which a rough translation is fine Web pages, email

Tasks for which MT can be post-edited MT as first pass “Computer-aided human translation

Tasks in sublanguage domains where high-quality MT is possible FAHQT


Sublanguage domain

Weather forecasting “Cloudy with a chance of showers today and Thursday” “Low tonight 4”

Can be modeling completely enough to use raw MT output Word classes and semantic features like MONTH, PLACE,

DIRECTION, TIME POINT


MT History

1946 Booth and Weaver discuss MT at Rockefeller foundation in New York;

1947-48 idea of dictionary-based direct translation 1949 Weaver memorandum popularized idea 1952 all 18 MT researchers in world meet at MIT 1954 IBM/Georgetown Demo Russian-English MT 1955-65 lots of labs take up MT


History of MT: Pessimism

1959/1960: Bar-Hillel “Report on the state of MT in US and GB” Argued FAHQT too hard (semantic ambiguity, etc) Should work on semi-automatic instead of automatic His argument

Little John was looking for his toy box. Finally, he found it. The box was in the pen. John was very happy.

Only human knowledge let’s us know that ‘playpens’ are bigger than boxes, but ‘writing pens’ are smaller

His claim: we would have to encode all of human knowledge


History of MT: Pessimism

The ALPAC report Headed by John R. Pierce of Bell Labs Conclusions:

Supply of human translators exceeds demand All the Soviet literature is already being translated MT has been a failure: all current MT work had to be post-edited Sponsored evaluations which showed that intelligibility and

informativeness was worse than human translations Results:

MT research suffered Funding loss Number of research labs declined Association for Machine Translation and Computational

Linguistics dropped MT from its name


History of MT

1976 Meteo, weather forecasts from English to French Systran (Babelfish) been used for 40 years 1970’s:

European focus in MT; mainly ignored in US 1980’s

ideas of using AI techniques in MT (KBMT, CMU) 1990’s

Commercial MT systems Statistical MT Speech-to-speech translation


Language Similarities and Divergences

Some aspects of human language are universal or near-universal, others diverge greatly.

Typology: the study of systematic cross-linguistic similarities and differences

What are the dimensions along with human languages vary?


Morphological Variation

Isolating languages Cantonese, Vietnamese: each word generally has one

morpheme Vs. Polysynthetic languages

Siberian Yupik (`Eskimo’): single word may have very many morphemes

Agglutinative languages Turkish: morphemes have clean boundaries

Vs. Fusion languages Russian: single affix may have many morphemes


Syntactic Variation

SVO (Subject-Verb-Object) languages English, German, French, Mandarin

SOV Languages Japanese, Hindi

VSO languages Irish, Classical Arabic

SVO lgs generally prepositions: to Yuriko VSO lgs generally postpositions: Yuriko ni


Segmentation Variation

Not every writing system has word boundaries marked Chinese, Japanese, Thai, Vietnamese

Some languages tend to have sentences that are quite long, closer to English paragraphs than sentences: Modern Standard Arabic, Chinese


Inferential Load: cold vs. hot lgs

Some ‘cold’ languages require the hearer to do more “figuring out” of who the various actors in the various events are: Japanese, Chinese,

Other ‘hot’ languages are pretty explicit about saying who did what to whom. English


Inferential Load (2)

All noun phrases inblue do not appearin Chinese text … But they are neededfor a good translation


Lexical Divergences

Word to phrases: English “computer science” = French “informatique”

POS divergences Eng. ‘she likes/VERB to sing’ Ger. Sie singt gerne/ADV Eng ‘I’m hungry/ADJ Sp. ‘tengo hambre/NOUN


Lexical Divergences: Specificity Grammatical constraints

English has gender on pronouns, Mandarin not. So translating “3rd person” from Chinese to English, need to figure

out gender of the person! Similarly from English “they” to French “ils/elles”

Semantic constraints English `brother’ Mandarin ‘gege’ (older) versus ‘didi’ (younger) English ‘wall’ German ‘Wand’ (inside) ‘Mauer’ (outside) German ‘Berg’ English ‘hill’ or ‘mountain’


Lexical Divergence: many-to-many


Lexical Divergence: lexical gaps

Japanese: no word for privacy English: no word for Cantonese ‘haauseun’ or Japanese

‘oyakoko’ (something like `filial piety’)

English ‘cow’ versus ‘beef’, Cantonese ‘ngau’


Event-to-argument divergences

English The bottle floated out.

Spanish La botella salió flotando. The bottle exited floating

Verb-framed lg: mark direction of motion on verb Spanish, French, Arabic, Hebrew, Japanese, Tamil, Polynesian,

Mayan, Bantu familiies Satellite-framed lg: mark direction of motion on satellite

Crawl out, float off, jump down, walk over to, run after Rest of Indo-European, Hungarian, Finnish, Chinese


Structural divergences

G: Wir treffen uns am Mittwoch E: We’ll meet on Wednesday


Head Swapping

E: X swim across Y S: X crucar Y nadando

E: I like to eat G: Ich esse gern

E: I’d prefer vanilla G: Mir wäre Vanille lieber


Thematic divergence

Y me gusto I like Y

G: Mir fällt der Termin ein E: I forget the date


Divergence counts from Bonnie Dorr 32% of sentences in UN Spanish/English Corpus (5K)

Categorial X tener hambre Y have hunger

98%

Conflational X dar puñaladas a Z X stab Z

83%

Structural X entrar en Y X enter Y

35%

Head Swapping X cruzar Y nadando X swim across Y

8%

Thematic X gustar a Y Y likes X

6%


MT on the web

Babelfish: http://babelfish.altavista.com/

Google: http://www.google.com/search?hl=en&lr=&client=safa

ri&rls=en&q="1+taza+de+jugo"+%28zumo%29+de+naranja+5+cucharadas+de+azucar+morena&btnG=Search

http://babelfish.altavista.com/

http://www.google.com/search?hl=en&lr=&client=safari&rls=en&q=%221+taza+de+jugo%22+%28zumo%29+de+naranja+5+cucharadas+de+azucar+morena&btnG=Search





3 methods for MT

Direct Transfer Interlingua


Three MT Approaches: Direct, Transfer, Interlingual


Direct Translation

Proceed word-by-word through text Translating each word No intermediate structures except morphology Knowledge is in the form of

Huge bilingual dictionary word-to-word translation information

After word translation, can do simple reordering Adjective ordering English -> French/Spanish


Direct MT Dictionary entry


Direct MT


Problems with direct MT

German

Chinese


The Transfer Model

Idea: apply contrastive knowledge, i.e., knowledge about the difference between two languages

Steps: Analysis: Syntactically parse Source language Transfer: Rules to turn this parse into parse for Target

language Generation: Generate Target sentence from parse

tree


English to French

Generally English: Adjective Noun French: Noun Adjective Note: not always true

Route mauvaise ‘bad road, badly-paved road’ Mauvaise route ‘wrong road’) But is a reasonable first approximation

Rule:


Transfer rules


Lexical transfer

Transfer-based systems also need lexical transfer rules Bilingual dictionary (like for direct MT) English home: German

nach Hause (going home) Heim (home game) Heimat (homeland, home country) zu Hause (at home)

Can list “at home <-> zu Hause” Or do Word Sense Disambiguation


Systran: combining direct and transfer

Analysis Morphological analysis, POS tagging Chunking of NPs, PPs, phrases Shallow dependency parsing

Transfer Translation of idioms Word sense disambiguation Assigning prepositions based on governing verbs

Synthesis Apply rich bilingual dictionary Deal with reordering Morphological generation


Transfer: some problems

N2 sets of transfer rules! Grammar and lexicon full of language-specific stuff Hard to build, hard to maintain


Interlingua Intuition: Instead of lg-lg knowledge rules, use the

meaning of the sentence to help Steps:

1) translate source sentence into meaning representation

2) generate target sentence from meaning.


Interlingua for Mary did not slap the green witch


Interlingua

Idea is that some of the MT work that we need to do is part of other NLP tasks

E.g., disambiguating E:book S:‘libro’ from E:book S:‘reservar’

So we could have concepts like BOOKVOLUME and RESERVE and solve this problem once for each language


Direct MT: pros and cons (Bonnie Dorr)

Pros Fast Simple Cheap No translation rules hidden in lexicon

Cons Unreliable Not powerful Rule proliferation Requires lots of context Major restructuring after lexical substitution


Interlingual MT: pros and cons (B. Dorr)

Pros Avoids the N2 problem Easier to write rules

Cons: Semantics is HARD Useful information lost (paraphrase)


The impossibility of translation

Hebrew “adonoi roi” for a culture without sheep or shepherds Something fluent and understandable, but not faithful:

“The Lord will look after me” Something faithful, but not fluent and nautral

“The Lord is for me like somebody who looks after animals with cotton-like hair”


What makes a good translation

Translators often talk about two factors we want to maximize:

Faithfulness or fidelity How close is the meaning of the translation to the

meaning of the original (Even better: does the translation cause the reader to

draw the same inferences as the original would have) Fluency or naturalness

How natural the translation is, just considering its fluency in the target language


Statistical MT Systems

Spanish BrokenEnglish

English

Spanish/EnglishBilingual Text

EnglishText

Statistical Analysis Statistical Analysis

Que hambre tengo yo

What hunger have I,Hungry I am so,I am so hungry,Have I that hunger …

I am so hungry

Slide from Kevin Knight


Statistical MT Systems

Spanish BrokenEnglish

English

Spanish/EnglishBilingual Text

EnglishText

Statistical Analysis Statistical Analysis

Que hambre tengo yo I am so hungry

TranslationModel P(s|e)

LanguageModel P(e)

Decoding algorithmargmax P(e) * P(s|e) e



Statistical MT: Faithfulness and Fluency formalized! Best-translation of a source sentence S:

Developed by researchers who were originally in speech recognition at IBM

Called the IBM model

ˆ T argmaxT fluency(T)faithfulness(T,S)


Three Problems for Statistical MT

Language model Given an English string e, assigns P(e) by formula good English string -> high P(e) random word sequence -> low P(e)

Translation model Given a pair of strings <f,e>, assigns P(f | e) by formula <f,e> look like translations -> high P(f | e) <f,e> don’t look like translations -> low P(f | e)

Decoding algorithm Given a language model, a translation model, and a new sentence f

… find translation e maximizing P(e) * P(f | e)



The IBM model

Hmm, those two factors might look familiar…

Yup, it’s Bayes rule:

ˆ T argmaxT fluency(T)faithfulness(T,S)

ˆ T argmaxT P(T)P(S | T)


More formally

Assume we are translating from a foreign language sentence F to an English sentence E: F = f1, f2, f3,…, fm

We want to find the best English sentence E-hat = e1, e2, e3,…, en

E-hat = argmaxE P(E|F) = argmaxE P(F|E)P(E)/P(F) = argmaxE P(F|E)P(E)Translation Model Language Model


The noisy channel model for MT


Fluency: P(T)

How to measure that this sentence That car was almost crash onto me

is less fluent than this one: That car almost hit me.

Answer: language models (N-grams!) For example P(hit|almost) > P(was|almost)

But can use any other more sophisticated model of grammar

Advantage: this is monolingual knowledge!


Faithfulness: P(S|T)

French: ça me plait [that me pleases] English:

that pleases me - most fluent I like it I’ll take that one

How to quantify this? Intuition: degree to which words in one sentence are

plausible translations of words in other sentence Product of probabilities that each word in target

sentence would generate each word in source sentence.


Faithfulness P(S|T)

Need to know, for every target language word, probability of it mapping to every source language word.

How do we learn these probabilities? Parallel texts!

Lots of times we have two texts that are translations of each other

If we knew which word in Source Text mapped to each word in Target Text, we could just count!


Faithfulness P(S|T)

Sentence alignment: Figuring out which source language sentence maps to

which target language sentence Word alignment

Figuring out which source language word maps to which target language word


Big Point about Faithfulness and Fluency

Job of the faithfulness model P(S|T) is just to model “bag of words”; which words come from say English to Spanish.

P(S|T) doesn’t have to worry about internal facts about Spanish word order: that’s the job of P(T)

P(T) can do Bag generation: put the following words in order (from Kevin Knight) have programming a seen never I language better

-actual the hashing is since not collision-free usually the is less perfectly the of somewhat capacity table


P(T) and bag generation: the answer

“Usually the actual capacity of the table is somewhat less, since the hashing is not collision-free”

How about: loves Mary John


Three Problems for Statistical MT

Language model Given an English string e, assigns P(e) by formula good English string -> high P(e) random word sequence -> low P(e)

Translation model Given a pair of strings <f,e>, assigns P(f | e) by formula <f,e> look like translations -> high P(f | e) <f,e> don’t look like translations -> low P(f | e)

Decoding algorithm Given a language model, a translation model, and a new sentence f

… find translation e maximizing P(e) * P(f | e)



The Classic Language ModelWord N-Grams

Goal of the language model -- choose among:

He is on the soccer fieldHe is in the soccer field

Is table the on cup theThe cup is on the table

Rice shrineAmerican shrineRice companyAmerican company



Intuition of phrase-based translation (Koehn et al. 2003)

Generative story has three steps1) Group words into phrases2) Translate each phrase3) Move the phrases around


Generative story again

1) Group English source words into phrases e1, e2, …, en

2) Translate each English phrase ei into a Spanish phrase fj.

1) The probability of doing this is (fj|ei)

3) Then (optionally) reorder each Spanish phrase1) We do this with a distortion probability2) A measure of distance between positions of a

corresponding phrase in the 2 lgs.3) “What is the probability that a phrase in position X in

the English sentences moves to position Y in the Spanish sentence?”


Distortion probability

The distortion probability is parameterized by ai-bi-1

Where ai is the start position of the foreign (Spanish) phrase generated by the ith English phrase ei.

And bi-1 is the end position of the foreign (Spanish) phrase generated by the I-1th English phrase ei-1.

We’ll call the distortion probability d(ai-bi-1). And we’ll have a really stupid model:

d(ai-bi-1) = |ai-bi-1|

Where is some small constant.


Final translation model for phrase-based MT

Let’s look at a simple example with no distortion

P(F | E) ( f i,e ii1

l

)d(ai bi 1)


Phrase-based MT

Language model P(E) Translation model P(F|E)

Model How to train the model

Decoder: finding the sentence E that is most probable


Training P(F|E)

What we mainly need to train is (fj|ei) Suppose we had a large bilingual training corpus

A bitext In which each English sentence is paired with a

Spanish sentence And suppose we knew exactly which phrase in Spanish

was the translation of which phrase in the English We call this a phrase alignment If we had this, we could just count-and-divide:


But we don’t have phrase alignments

What we have instead are word alignments:


Getting phrase alignments

To get phrase alignments:1) We first get word alignments2) Then we “symmetrize” the word alignments

into phrase alignments


How to get Word Alignments

Word alignment: a mapping between the source words and the target words in a set of parallel sentences.

Restriction: each foreign word comes from exactly 1 English word

Advantage: represent an alignment by the index of the English word that the French word comes from

Alignment above is thus 2,3,4,5,6,6,6


One addition: spurious words

A word in the foreign sentence That doesn’t align with any word in the English sentence Is called a spurious word. We model these by pretending they are generated by an

English word e0:


More sophisticated models of alignment


Computing word alignments: IBM Model 1

For phrase-based machine translation We want a word-alignment To extract a set of phrases A word alignment algorithm gives us P(F,E) We want this to train our phrase probabilities (fj|ei) as

part of P(F|E) But a word-alignment algorithm can also be part of a

mini-translation model itself.


IBM Model 1


How does the generative story assign P(F|E) for a Spanish sentence F?

Terminology:

Suppose we had done steps 1 and 2, I.e. we already knew the Spanish length J and the alignment A (and English source E):


Let’s formalize steps 1 and 2

We want P(A|E) of an alignment A (of length J) given an English sentence E

IBM Model 1 makes the (very) simplifying assumption that each alignment is equally likely.

How many possible alignments are there between English sentence of length I and Spanish sentence of length J?

Hint: Each Spanish word must come from one of the English source words (or the NULL word)

(I+1)J

Let’s assume probability of choosing length J is small constant epsilon


Model 1 continued

Prob of choosing a length and then one of the possible alignments:

Combining with step 3:

The total probability of a given foreign sentence F:


Decoding

How do we find the best A?


Training alignment probabilities

Step 1: get a parallel corpus Hansards

Canadian parliamentary proceedings, in French and English Hong Kong Hansards: English and Chinese

Step 2: sentence alignment Step 3: use EM (Expectation Maximization) to train word

alignments


Step 1: Parallel corpora

English GermanDiverging opinions about planned tax reform

Unterschiedliche Meinungen zur geplanten Steuerreform

The discussion around the envisaged major tax reform continues .

Die Diskussion um die vorgesehene grosse Steuerreform dauert an .

The FDP economics expert , Graf Lambsdorff , today came out in favor of advancing the enactment of significant parts of the overhaul , currently planned for 1999 .

Der FDP - Wirtschaftsexperte Graf Lambsdorff sprach sich heute dafuer aus , wesentliche Teile der fuer 1999 geplanten Reform vorzuziehen .

Example from DE-News (8/1/1996)

Slide from Christof Monz


Step 2: Sentence Alignment

The old man is happy. He has fished many times. His wife talks to him. The fish are jumping. The sharks await.

Intuition: - use length in words or chars- together with dynamic programming

- or use a simpler MT model

El viejo está feliz porque ha pescado muchos veces. Su mujer habla con él. Los tiburones esperan.



Sentence Alignment

1. The old man is happy. 2. He has fished many

times. 3. His wife talks to him. 4. The fish are jumping. 5. The sharks await.

El viejo está feliz porque ha pescado muchos veces.

Su mujer habla con él. Los tiburones esperan.



Sentence Alignment

1. The old man is happy.

2. He has fished many times.

3. His wife talks to him.

4. The fish are jumping.

5. The sharks await.


Su mujer habla con él.

Los tiburones esperan.



Sentence Alignment

1. The old man is happy. He has fished many times.

2. His wife talks to him.

3. The sharks await.


Su mujer habla con él.

Los tiburones esperan.

Note that unaligned sentences are thrown out, andsentences are merged in n-to-m alignments (n, m > 0).



Step 3: word alignments

It turns out we can bootstrap alignments From a sentence-aligned bilingual corpus We use is the Expectation-Maximization or EM

algorithm


EM for training alignment probs

… la maison … la maison bleue … la fleur …

… the house … the blue house … the flower …

All word alignments equally likely

All P(french-word | english-word) equally likely






“la” and “the” observed to co-occur frequently,so P(la | the) is increased.






“house” co-occurs with both “la” and “maison”, butP(maison | house) can be raised without limit, to 1.0,

while P(la | house) is limited because of “the”

(pigeonhole principle)Slide from Kevin Knight





settling down after another iteration






Inherent hidden structure revealed by EM training!For details, see:

•Section 24.6.1 in the chapter• “A Statistical MT Tutorial Workbook” (Knight, 1999).• “The Mathematics of Statistical Machine Translation” (Brown et al, 1993)• Software: GIZA++



Statistical Machine Translation



P(juste | fair) = 0.411P(juste | correct) = 0.027P(juste | right) = 0.020 …

new Frenchsentence

Possible English translations,to be rescored by language model



A more complex model: IBM Model 3Brown et al., 1993

Mary did not slap the green witch

Mary not slap slap slap the green witch n(3|slap)

Maria no dió una bofetada a la bruja verded(j|i)

Mary not slap slap slap NULL the green witchP-Null

Maria no dió una bofetada a la verde brujat(la|the)

Generative approach:

Probabilities can be learned from raw bilingual text.


How do we evaluate MT? Human tests for fluency

Rating tests: Give the raters a scale (1 to 5) and ask them to rate Or distinct scales for

Clarity, Naturalness, Style Or check for specific problems

Cohesion (Lexical chains, anaphora, ellipsis) Hand-checking for cohesion.

Well-formedness 5-point scale of syntactic correctness

Comprehensibility tests Noise test Multiple choice questionnaire

Readability tests cloze


How do we evaluate MT? Human tests for fidelity

Adequacy Does it convey the information in the original? Ask raters to rate on a scale

Bilingual raters: give them source and target sentence, ask how much information is preserved

Monolingual raters: give them target + a good human translation

Informativeness Task based: is there enough info to do some task? Give raters multiple-choice questions about content


Evaluating MT: Problems

Asking humans to judge sentences on a 5-point scale for 10 factors takes time and $$$ (weeks or months!)

We can’t build language engineering systems if we can only evaluate them once every quarter!!!!

We need a metric that we can run every time we change our algorithm.

It would be OK if it wasn’t perfect, but just tended to correlate with the expensive human metrics, which we could still run in quarterly.

Bonnie Dorr


Automatic evaluation Miller and Beebe-Center (1958) Assume we have one or more human translations of the

source passage Compare the automatic translation to these human

translations Bleu NIST Meteor Precision/Recall


BiLingual Evaluation Understudy (BLEU —Papineni, 2001)

Automatic Technique, but …. Requires the pre-existence of Human (Reference) Translations Approach:

Produce corpus of high-quality human translations Judge “closeness” numerically (word-error rate) Compare n-gram matches between candidate translation and

1 or more reference translations

http://www.research.ibm.com/people/k/kishore/RC22176.pdf

Slide from Bonnie Dorr

http://www.research.ibm.com/people/k/kishore/RC22176.pdf


Reference (human) translation: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport .

Machine translation: The American [?] international airport and its the office all receives one calls self the sand Arab rich business [?] and so on electronic mail , which sends out ; The threat will be able after public place and so on the airport to start the biochemistry attack , [?] highly alerts after the maintenance.

BLEU Evaluation Metric(Papineni et al, ACL-2002)

• N-gram precision (score is between 0 & 1)– What percentage of machine n-grams can

be found in the reference translation? – An n-gram is an sequence of n words

– Not allowed to use same portion of reference translation twice (can’t cheat by typing out “the the the the the”)

• Brevity penalty– Can’t just type out single word “the”

(precision 1.0!)

*** Amazingly hard to “game” the system (i.e., find a way to change machine output so that BLEU goes up, but quality doesn’t)



Reference (human) translation: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport .


BLEU Evaluation Metric(Papineni et al, ACL-2002)

• BLEU4 formula (counts n-grams up to length 4)

exp (1.0 * log p1 + 0.5 * log p2 + 0.25 * log p3 + 0.125 * log p4 – max(words-in-reference / words-in-machine – 1, 0)

p1 = 1-gram precisionP2 = 2-gram precisionP3 = 3-gram precisionP4 = 4-gram precision



Reference translation 1: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport .

Reference translation 3: The US International Airport of Guam and its office has received an email from a self-claimed Arabian millionaire named Laden , which threatens to launch a biochemical attack on such public places as airport . Guam authority has been on alert .

Reference translation 4: US Guam International Airport and its office received an email from Mr. Bin Laden and other rich businessman from Saudi Arabia . They said there would be biochemistry air raid to Guam Airport and other public places . Guam needs to be in high precaution about this matter .

Reference translation 2: Guam International Airport and its offices are maintaining a high state of alert after receiving an e-mail that was from a person claiming to be the wealthy Saudi Arabian businessman Bin Laden and that threatened to launch a biological and chemical attack on the airport and other public places .


Multiple Reference Translations

Reference translation 1: The U.S. island of Guam is maintaining a high state of alert after the Guam airport and its offices both received an e-mail from someone calling himself the Saudi Arabian Osama bin Laden and threatening a biological/chemical attack against public places such as the airport .

Reference translation 3: The US International Airport of Guam and its office has received an email from a self-claimed Arabian millionaire named Laden , which threatens to launch a biochemical attack on such public places as airport . Guam authority has been on alert .

Reference translation 4: US Guam International Airport and its office received an email from Mr. Bin Laden and other rich businessman from Saudi Arabia . They said there would be biochemistry air raid to Guam Airport and other public places . Guam needs to be in high precaution about this matter .

Reference translation 2: Guam International Airport and its offices are maintaining a high state of alert after receiving an e-mail that was from a person claiming to be the wealthy Saudi Arabian businessman Bin Laden and that threatened to launch a biological and chemical attack on the airport and other public places .




Bleu Comparison

Chinese-English Translation Example:Candidate 1: It is a guide to action which ensures that the military always obeys the commands of the party.Candidate 2: It is to insure the troops forever hearing the activity guidebook that party direct.

Reference 1: It is a guide to action that ensures that the military will forever heed Party commands.Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.Reference 3: It is the practical guide for the army always to heed the directions of the party.



How Do We Compute Bleu Scores? Intuition: “What percentage of words in candidate occurred in some

human translation?” Proposal: count up # of candidate translation words (unigrams) # in

any reference translation, divide by the total # of words in # candidate translation

But can’t just count total # of overlapping N-grams! Candidate: the the the the the the Reference 1: The cat is on the mat

Solution: A reference word should be considered exhausted after a matching candidate word is identified.



“Modified n-gram precision”

For each word compute: (1) total number of times it occurs in any single reference translation(2) number of times it occurs in the candidate translation

Instead of using count #2, use the minimum of #2 and #2, I.e. clip the counts at the max for the reference transcription

Now use that modified count. And divide by number of candidate words.



Modified Unigram Precision: Candidate #1


It(1) is(1) a(1) guide(1) to(1) action(1) which(1) ensures(1) that(2) the(4) military(1) always(1) obeys(0) the commands(1) of(1) the party(1)

What’s the answer???

17/18



Modified Unigram Precision: Candidate #2

It(1) is(1) to(1) insure(0) the(4) troops(0) forever(1) hearing(0) the activity(0) guidebook(0) that(2) party(1) direct(0)

What’s the answer????

8/14




Modified Bigram Precision: Candidate #1

It is(1) is a(1) a guide(1) guide to(1) to action(1) action which(0) which ensures(0) ensures that(1) that the(1) the military(1) military always(0) always obeys(0) obeys the(0) the commands(0) commands of(0) of the(1) the party(1)


10/17




Modified Bigram Precision: Candidate #2

Reference 1: It is a guide to action that ensures that themilitary will forever heed Party commands.Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.Reference 3: It is the practical guide for the army always to heed the directions of the party.

It is(1) is to(0) to insure(0) insure the(0) the troops(0) troops forever(0) forever hearing(0) hearing the(0) the activity(0) activity guidebook(0) guidebook that(0) that party(0) party direct(0)


1/13



Catching Cheaters

Reference 1: The cat is on the matReference 2: There is a cat on the mat

the(2) the the the(0) the(0) the(0) the(0)

What’s the unigram answer?

2/7What’s the bigram answer?

0/7



Bleu distinguishes human from machine translations



Bleu problems with sentence length Candidate: of the

Solution: brevity penalty; prefers candidates translations which are same length as one of the references

Reference 1: It is a guide to action that ensures that themilitary will forever heed Party commands.Reference 2: It is the guiding principle which guarantees the military forces always being under the command of the Party.Reference 3: It is the practical guide for the army always to heed the directions of the party.

Problem: modified unigram precision is 2/2, bigram 1/1!



BLEU Tends to Predict Human Judgments

R2 = 88.0%

R2 = 90.2%

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5

Human Judgments

NIS

T Sc

ore

Adequacy

Fluency

Linear(Adequacy)Linear(Fluency)

slide from G. Doddington (NIST)

(var

iant

of B

LEU

)


Summary

Intro and a little history Language Similarities and Divergences Four main MT Approaches

Transfer Interlingua Direct Statistical

Evaluation


Classes

LINGUIST 139M/239M. Human and Machine Translation. (Martin Kay)

CS 224N. Natural Language Processing (Chris Manning)

CS60057 Speech &Natural Language Processing

Documents

Transcript of CS60057 Speech &Natural Language Processing