Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011
description
Transcript of Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011
![Page 1: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/1.jpg)
CS460/626 : Natural Language Processing/Speech, NLP and the Web
(Lecture 18– Alignment in SMT and Tutorial on Giza++ and Moses)
Pushpak BhattacharyyaCSE Dept., IIT Bombay
15th Feb, 2011
![Page 2: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/2.jpg)
Going forward from word alignment
Word alignment
Phrase Alignment Decoding(going to bigger units (best possibleOf correspondence) translation)
![Page 3: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/3.jpg)
Abstract ProblemGiven: eoe1e2e3….enen+1 (Entities)
Goal: lol1l2l3….lnln+1 (Labels)
The Goal is to find the best possible label sequence
Generative Model
))|((maxarg* ELPLL
)|().(maxarg)|(maxarg LEPLPELPL
![Page 4: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/4.jpg)
SimplificationUsing Markov Assumption, the Language Model can be represented using bigrams
Similarly translation model can also be represented in the following way:
)|()( 10
ii
n
iLLPLP
n
iii lePLEP
0
)|()|(
![Page 5: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/5.jpg)
Statistical Machine Translation Finding the best possible English
sentence given the foreign sentence
P(E)= Language Model P(F|E) = Translation Model E: English, F: Foreign Language
)|().(maxarg)|(maxarg* EFPEPFEPeE
![Page 6: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/6.jpg)
Problems in the framework Labels are words of the target
language Very large in number Who do you want to_go with ? With whom do you want to go ? आप कि�स �े_साथ जाना चाहते_हो (Aap kis ke_sath jaana chahate_ho)
who who
do do and so on you youwant want
to_go to_gowith with
Each word have multiple translation options.
Preposition Stranding
![Page 7: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/7.jpg)
Column of words of target language on the source language words
^ Aap kis ke_sath jaana chahate_ho .
who who do do and so on you you^ want want …
. to_go to_go with with
Find the best possible path from ‘^’ to ‘.’ using transition andObservation probabilities.
Viterbi can be used
![Page 8: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/8.jpg)
TUTORIAL ON Giza++ and Moses tools(delivered by Kushal Ladha)
![Page 9: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/9.jpg)
Word-based alignment For each word in source language,
align words from target language that this word possibly produces
Based on IBM models 1-5 Model 1 – simplest As we go from models 1 to 5, models
get more complex but more realistic This is all that Giza++ does
![Page 10: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/10.jpg)
AlignmentA function from target position to source position:
10
The alignment sequence is: 2,3,4,5,6,6,6Alignment function A: A(1) = 2, A(2) = 3 ..A different alignment function will give the sequence:1,2,1,2,3,4,3,4 for A(1), A(2)..
To allow spurious insertion, allow alignment with word 0 (NULL)No. of possible alignments: (I+1)J
![Page 11: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/11.jpg)
IBM Model 1: Generative Process
11
![Page 12: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/12.jpg)
Training Alignment Models
12
Given a parallel corpora, for each (F,E) learn the best alignment A and the component probabilities: t(f|e) for Model 1 lexicon probability P(f|e) and alignment
probability P(ai|ai-1,I) How to compute these probabilities if
all you have is a parallel corpora
![Page 13: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/13.jpg)
Intuition : Interdependence of Probabilities
13
If you knew which words are probable translation of each other then you can guess which alignment is probable and which one is improbable
If you were given alignments with probabilities then you can compute translation probabilities
Looks like a chicken and egg problem
EM algorithm comes to the rescue
![Page 14: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/14.jpg)
Limitation: Only 1->Many Alignments allowed
14
![Page 15: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/15.jpg)
Phrase-based alignment
More natural
Many-to-one mappings allowed
![Page 16: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/16.jpg)
Giza++ and Moses Package http://cl.naist.jp/~eric-n/ubuntu-nlp/ Select your Ubuntu version Browse the nlp folder Download debian package of giza+
+, moses, mkcls, srilm Resolve all the dependencies and
they get installed For alternate installation, refer to
http://www.statmt.org/moses_steps.html
![Page 17: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/17.jpg)
Steps
Input - sentence aligned parallel corpus
Output- target side tagged data Training Tuning Generate output on test corpus
(decoding)
![Page 18: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/18.jpg)
Training Create a folder named corpus
containing test, train and tuning file Giza++ is used to generate
alignment Phrase table is generated after
training Before training language model
needs to be build on target side mkdir lm ; /usr/bin/ngram-count -order 3 -interpolate -kndiscount -
text $PWD/corpus/train_surface.hi -lm lm/train.lm; /usr/share/moses/scripts/training/train-factored-phrase-model.perl
-scripts-root-dir /usr/share/moses/scripts -root-dir . -corpus train.clean -e hi -f en -lm 0:3:$PWD/lm/train.lm:0;
![Page 19: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/19.jpg)
Example train.enh e l l oh e l l ow o r l dc o m p o u n d w o r dh y p h e n a t e do n eb o o mk w e e z l e b o t t e r
train.prhh eh l owhh ah l oww er l dk aa m p aw n d w er dhh ay f ah n ey t ih dow eh n iyb uw mk w iy z l ah b aa t ah r
![Page 20: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/20.jpg)
Sample from Phrase-tableb o ||| b aa ||| (0) (1) ||| (0) (1) ||| 1 0.666667 1
0.181818 2.718b ||| b ||| (0) ||| (0) ||| 1 1 1 1 2.718c o m p o ||| aa m p ||| (2) (0,1) (1) (0) (1) ||| (1,3)
(1,2,4) (0) ||| 1 0.0486111 1 0.154959 2.718c ||| p ||| (0) ||| (0) ||| 1 1 1 1 2.718d w ||| d w ||| (0) (1) ||| (0) (1) ||| 1 0.75 1 1 2.718d ||| d ||| (0) ||| (0) ||| 1 1 1 1 2.718e b ||| ah b ||| (0) (1) ||| (0) (1) ||| 1 1 1 0.6 2.718e l l ||| ah l ||| (0) (1) (1) ||| (0) (1,2) ||| 1 1 0.5 0.5
2.718e l l ||| eh l ||| (0) (0) (1) ||| (0,1) (2) ||| 1 0.111111
0.5 0.111111 2.718e l ||| eh ||| (0) (0) ||| (0,1) ||| 1 0.111111 1
0.133333 2.718e ||| ah ||| (0) ||| (0) ||| 1 1 0.666667 0.6 2.718h e ||| hh ah ||| (0) (1) ||| (0) (1) ||| 1 1 1 0.6 2.718h ||| hh ||| (0) ||| (0) ||| 1 1 1 1 2.718l e b ||| l ah b ||| (0) (1) (2) ||| (0) (1) (2) ||| 1 1 1
0.5 2.718l e ||| l ah ||| (0) (1) ||| (0) (1) ||| 1 1 1 0.5 2.718
l l o ||| l ow ||| (0) (0) (1) ||| (0,1) (2) ||| 0.5 1 1 0.227273 2.718l l ||| l ||| (0) (0) ||| (0,1) ||| 0.25 1 1 0.833333 2.718l o ||| l ow ||| (0) (1) ||| (0) (1) ||| 0.5 1 1 0.227273 2.718l ||| l ||| (0) ||| (0) ||| 0.75 1 1 0.833333 2.718m ||| m ||| (0) ||| (0) ||| 1 0.5 1 1 2.718n d ||| n d ||| (0) (1) ||| (0) (1) ||| 1 1 1 1 2.718n e ||| eh n iy ||| (1) (2) ||| () (0) (1) ||| 1 1 0.5 0.3 2.718n e ||| n iy ||| (0) (1) ||| (0) (1) ||| 1 1 0.5 0.3 2.718n ||| eh n ||| (1) ||| () (0) ||| 1 1 0.25 1 2.718o o m ||| uw m ||| (0) (0) (1) ||| (0,1) (2) ||| 1 0.5 1 0.181818 2.718o o ||| uw ||| (0) (0) ||| (0,1) ||| 1 1 1 0.181818 2.718o ||| aa ||| (0) ||| (0) ||| 1 0.666667 0.2 0.181818 2.718o ||| ow eh ||| (0) ||| (0) () ||| 1 1 0.2 0.272727 2.718o ||| ow ||| (0) ||| (0) ||| 1 1 0.6 0.272727 2.718w o r ||| w er ||| (0) (1) (1) ||| (0) (1,2) ||| 1 0.1875 1 0.424242 2.718w ||| w ||| (0) ||| (0) ||| 1 0.75 1 1 2.718
![Page 21: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/21.jpg)
Tuning
Not a compulsory step but will improve the decoding by a small percentage
mkdir tuning; cp $WDIR/corpus/tun.en tuning/input; cp $WDIR/corpus/tun.hi tuning/reference; /usr/share/moses/scripts/training/mert-moses.pl $PWD/tuning/input $PWD/tuning/reference /usr/bin/moses $PWD/model/moses.ini --working-dir $PWD/tuning --rootdir /usr/share/moses/scripts
It will take around 1 hour on a server with 32GB RAM
![Page 22: Pushpak Bhattacharyya CSE Dept., IIT Bombay 15 th Feb, 2011](https://reader035.fdocuments.in/reader035/viewer/2022062812/56816294550346895dd308c4/html5/thumbnails/22.jpg)
Testing mkdir evaluation; /usr/bin/moses -config $WDIR/tuning/moses.ini -
input-file $WDIR/corpus/test.en >evaluation/test.output;
The output will be in evaluation/test.output file
Sample Output h o t hh aa t p h o n e p|UNK hh ow eh n iy b o o k b uw k