CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding...

15
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept., IIT Bombay 17 th Feb, 2011

Transcript of CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding...

Page 1: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,

CS460/626 : Natural Language Processing/Speech, NLP and the Web

(Lecture 18– Training and Decoding in SMT System)

Kushal Ladha

M.Tech Student

CSE Dept., IIT Bombay

17th Feb, 2011

Page 2: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,

Training Process

It is a nine step process Prepare Data Run GIZA++ Align words Get lexical translation table Extract Phrases Score Phrases Build lexicalized reordering model Build generation models Create configuration file

Page 3: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,

Preparing Data (1/2) Sentence aligned data in two files. One file containing foreign sentences other

containing English sentences Everything should be lowercase Sentence length less than 100 For factored model the training date should be

Word0factor0|word0factor1|word0factor2 so on instead of the un-factored word0 word1 word 2

Cleaning the corpus drop empty lines, redundant spaces and eliminates sentence that violate 9-1 sentence ratio limit

Page 4: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,

Preparing Data (2/2) Input to GIZA++ is two vocabulary files Vocabulary files contain words, integer word

identifiers and word count information 1 UNK 0 2 and 7 3 , 6 4 irian 4

Each sentence has 3 entries 1 47 74 65 56 70 5 81 7 84 5 75 2 24 69 35 7 55 44 3 86 5 46 2 36 15 16 22 4 71 49 39 11 60 48 57 17 68 53 55 29 13 44 3 25 61 2 36 70 5

GIZA++ requires each word to be placed into word classes Done by mkcls

Page 5: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,

Run GIZA++ Used to establish word alignments Word alignments are taken from the intersection

of bidirectional runs of GIZA++ For each word in each sentence it marks the

possible alignment points in both the direction

Page 6: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,

Align Words Heuristics GROW-DIAG-FINAL(e2f,f2e):

neighboring = ((-1,0),(0,-1),(1,0),(0,1),(-1,-1),(-1,1),(1,-1),(1,1)) alignment = intersect(e2f,f2e); GROW-DIAG(); FINAL(e2f); FINAL(f2e);

GROW-DIAG(): iterate until no new points added for english word e = 0 ... en for foreign word f = 0 ... fn if ( e aligned with f ) for each neighboring point ( e-new, f-new ): if (( e-new, f-new ) in union( e2f, f2e ) and

( e-new not aligned and f-new not aligned )) add alignment point ( e-new, f-new ) FINAL(a): for english word e-new = 0 ... en for foreign word f-new = 0 ... fn if ( ( ( e-new, f-new ) in alignment a) and

( e-new not aligned or f-new not aligned ) ) add alignment point ( e-new, f-new )

Page 7: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,

Get lexical Translation Table

Given alignment, maximum likelihood lexical translation table can be estimated

Two files in model/lex.0-0.f2e and model/lex.0-0.e2f contains the lexical translation probability

Extract Phrases: model/extract contains all the phrases,

with their translation in target language and alignment points

An inverted file named model/extract.inv contains the inverse mapping

Page 8: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,

Score Phrases To calculate phrase translation probability

sort the extract file, so that translation for a particular foreign word are next to each other in the file

Calculate counts for each foreign word Do this for the inverted file to calculate for

each English word Each phrase table entry consists of 5 probabilities

Phrase translation probability Lexical weighting lex(f|e) Phrase translation probability Lexical weighting lex(e|f) Phrase penalty (exp(1)= 2.718 always )

)|( fe

)|( ef

)|( ef

)|( fe

Page 9: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,

Sample from Phrase-tableb o ||| b aa ||| (0) (1) ||| (0) (1) ||| 1 0.666667 1

0.181818 2.718b ||| b ||| (0) ||| (0) ||| 1 1 1 1 2.718c o m p o ||| aa m p ||| (2) (0,1) (1) (0) (1) ||| (1,3)

(1,2,4) (0) ||| 1 0.0486111 1 0.154959 2.718c ||| p ||| (0) ||| (0) ||| 1 1 1 1 2.718d w ||| d w ||| (0) (1) ||| (0) (1) ||| 1 0.75 1 1 2.718d ||| d ||| (0) ||| (0) ||| 1 1 1 1 2.718e b ||| ah b ||| (0) (1) ||| (0) (1) ||| 1 1 1 0.6 2.718e l l ||| ah l ||| (0) (1) (1) ||| (0) (1,2) ||| 1 1 0.5 0.5

2.718e l l ||| eh l ||| (0) (0) (1) ||| (0,1) (2) ||| 1

0.111111 0.5 0.111111 2.718e l ||| eh ||| (0) (0) ||| (0,1) ||| 1 0.111111 1

0.133333 2.718e ||| ah ||| (0) ||| (0) ||| 1 1 0.666667 0.6 2.718h e ||| hh ah ||| (0) (1) ||| (0) (1) ||| 1 1 1 0.6

2.718h ||| hh ||| (0) ||| (0) ||| 1 1 1 1 2.718l e b ||| l ah b ||| (0) (1) (2) ||| (0) (1) (2) ||| 1 1 1

0.5 2.718l e ||| l ah ||| (0) (1) ||| (0) (1) ||| 1 1 1 0.5 2.718

l l o ||| l ow ||| (0) (0) (1) ||| (0,1) (2) ||| 0.5 1 1 0.227273 2.718l l ||| l ||| (0) (0) ||| (0,1) ||| 0.25 1 1 0.833333 2.718l o ||| l ow ||| (0) (1) ||| (0) (1) ||| 0.5 1 1 0.227273 2.718l ||| l ||| (0) ||| (0) ||| 0.75 1 1 0.833333 2.718m ||| m ||| (0) ||| (0) ||| 1 0.5 1 1 2.718n d ||| n d ||| (0) (1) ||| (0) (1) ||| 1 1 1 1 2.718n e ||| eh n iy ||| (1) (2) ||| () (0) (1) ||| 1 1 0.5 0.3 2.718n e ||| n iy ||| (0) (1) ||| (0) (1) ||| 1 1 0.5 0.3 2.718n ||| eh n ||| (1) ||| () (0) ||| 1 1 0.25 1 2.718o o m ||| uw m ||| (0) (0) (1) ||| (0,1) (2) ||| 1 0.5 1 0.181818 2.718o o ||| uw ||| (0) (0) ||| (0,1) ||| 1 1 1 0.181818 2.718o ||| aa ||| (0) ||| (0) ||| 1 0.666667 0.2 0.181818 2.718o ||| ow eh ||| (0) ||| (0) () ||| 1 1 0.2 0.272727 2.718o ||| ow ||| (0) ||| (0) ||| 1 1 0.6 0.272727 2.718w o r ||| w er ||| (0) (1) (1) ||| (0) (1,2) ||| 1 0.1875 1 0.424242 2.718w ||| w ||| (0) ||| (0) ||| 1 0.75 1 1 2.718

Page 10: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,

Build reordering, generation model

Distance based reordering model is included

Generation model is build in case of factored model

It is done at the target side For eg:

root/lemma at target side + suffix at target side surface word at target side

Page 11: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,

Decoding It uses Beam search algorithm Translation options: given an input, a

number of phrase translations are applied

Translation options are collected before any decoding takes place

Following information are stored along with TO First foreign word covered Last foreign word covered English phrase translation Phrase translation probability

Page 12: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,

Algorithm Start with null hypothesis For all the foreign words look for the translation options

are expand it along with its probability The best probability hypothesis is selected and that

foreign word is marked as translated We maintain back pointers of the hypothesis to read

partial translations of the sentence The probability is nothing but cost of the new state i.e.

cost of original state multiplied with the translation, distortion and language model cost of the added phrasal translation

Final state is the state where all the foreign words are covered

Page 13: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,

Recombining Hypothesis Two hypothesis can be recombined if they

agree in Foreign words covered so far The last two English words generated The end of the last foreign phrase covered

We keep the cheaper hypothesis and discard the other one

The pruning not only include the cost of each hypothesis so far but also the future estimate of the remaining sentence

Page 14: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,

Pruning

Two types of pruning Threshold pruning

a hypothesis which is less than α times the best hypothesis

Histogram pruning Keeps certain number of hypothesis (e.g.

n=100)

This type of pruning is not risk free in comparison to recombining of hypothesis

Page 15: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,

Example

Hindi Sentence is वि�नो�द नो� सचिनो को� छू रे� स� मा�रे� |Required English sentence is Vinod Stabbed Sachin.At each level the best translation option is selected whose probability is highest of all.