Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music!...

48
Machine Translation Introduction to MT

Transcript of Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music!...

Page 1: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Machine Translation

Introduction to MT

Page 2: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Machine  Transla-on  

•  Fully  automa1c   •  Helping  human  translators  

Enter  Source  Text:  

Transla1on  from  Stanford’s  Phrasal:  

 这 不过 是 一 个 时间 的 问题 . �  

This  is  only  a  ma@er  of  1me.    

Page 3: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Google  Translate  

•  Fried  ripe  plantains:  •  h@p://laylita.com/recetas/2008/02/28/platanos-­‐maduros-­‐fritos/  

Page 4: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Machine  Transla-on  

•  The  Story  of  the  Stone  (“The  Dream  of  the  Red  Chamber”)  •  Cao  Xueqin  1792  

•  Chinese  gloss:  Dai-­‐yu  alone  at  bed  on  think-­‐of-­‐with-­‐gra1tude  Bao-­‐chai…  again  listen  to  window  outside  bamboo  1p  plantain  leaf  of  on,  rain  sound  sigh  drop,  clear  cold  penetrate  curtain,  not  feeling  again  fall  down  tears  come.  

•  Hawkes  transla1on:  As  she  lay  there  alone,  Dai-­‐yu’s  thoughts  turned  to  Bao-­‐chai…  Then  she  listened  to  the  insistent  rustle  of  the  rain  on  the  bamboos  and  plantains  outside  her  window.    The  coldness  penetrated  the  curtains  of  her  bed.  Almost  without  no1cing  it  she  had  begun  to  cry.  

 

Page 5: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Difficul-es  in  Chinese  to  English  transla-on  

•  Long  Chinese  sentences:  4  English  sentences  to  1  Chinese  •  Chinese  no  pronouns  or  ar1cles    (English  the,  a)  •  Chinese  has  loca1ve  post-­‐posi1ons,  English  preposi1ons  

•  Chinese  bed  on,  window  outside,    English  on  the  bed,  outside  the  window  •  Chinese  rarely  marks  tense:  

•  English  as,  turned  to,  had  begun,  •  Chinese  tou,  ‘penetrate’  -­‐>  English  penetrated  

•  Chinese  rela1ve  clauses  are  before  the  noun,  English  a]er  •  Chinese:  [window  outside  bamboo  on]  rain  •  English:    rain  [on  the  bamboo  outside  the  window]  

•  Stylis1c  and  cultural  differences  •  Chinese  bamboo  1p  plaintain  leaf  -­‐>  bamboos  and  plantains  •  Chinese  rain  sound  sigh  drop  -­‐>  insistent  rustle  of  the  rain  •  Chinese  ma  ‘curtain’  -­‐>  curtains  of  her  bed  

Page 6: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Alignment  in  Machine  Transla-on  

Page 7: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Early  MT  History  

1946  Booth  and  Weaver  discuss  MT  in  New  York  1947-­‐48  idea  of  dic1onary-­‐based  direct  transla1on  1947  Warren  Weaver  suggests  transla1on  by  computer  1949  Weaver  memorandum  1952  all  18  MT  researchers  in  world  meet  at  MIT  1954  IBM/Georgetown  Demo  Russian-­‐English  MT  1955-­‐65  lots  of  labs  take  up  MT    h@p://www.hutchinsweb.me.uk/PPF-­‐TOC.htm  

Page 8: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

1949  Weaver  memorandum  

•  h@p://www.mt-­‐archive.info/Weaver-­‐1949.pdf  

•  “There  are  certain  invariant  proper1es  which  are…  common  to  all  languages”  

•  ‘When  I  look  at  an  ar1cle  in  Russian,  I  say  "This  is  really  wri@en  in  English,  but  it  has  been  coded  in  some  strange  symbols.  I  will  now  proceed  to  decode.”’  

•  “[If]  one  can  see…    N  words  on  either  side,  then,  if  N  is  large  enough,  one  can  unambiguously  decide  the  meaning  of  the  central  word.”  

8  

Page 9: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

The  History  of  MT:  Pessimism  

•  1959/1960  •  Yehoshua  Bar-­‐Hillel  “Report  on  the  state  of  MT  in  US  and  GB”  •  FAHQ  MT  too  hard  because  we  would  have  to  encode  all  of  human  knowledge  •  Instead  we  should  work  on  computer  tools  for  human  translators  

Page 10: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

The  claim  that  fully  automa-c  high  quality  MT  is  impossible  

Yehoshua  Bar-­‐Hillel.  1960.  A  Demonstra1on  of  the  Nonfeasibility  of  Fully  Automa1c  High  Quality  Transla1on.  

•  Little John was looking for his toy box. Finally he found it. The box was in the pen. John was very happy.!

• Pen1:  Enclosure  for  small  children  • Pen2:  Wri1ng  utensil  

� Pen1:  Enclosure  for  small  children  

Page 11: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

•  The box was in the pen.!

Page 12: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

The  claim  that  fully  automa-c  high  quality  MT  is  impossible  

Yehoshua  Bar-­‐Hillel,  1960  

“I  now  claim  that  no  exis1ng  or  imaginable  program  will  enable  an  electronic  computer  to  determine…”  

Page 13: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

The  state  of  the  art  in  MT  

Page 14: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

The  state  of  the  art  in  MT  

Page 15: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

History  of  MT:  Further  Pessimism  The  ALPAC  report  

•  Headed  by  John  R.  Pierce  of  Bell  Labs  •  Conclusions:  

•  MT  doesn’t  work  •  MT  a  failure:  all  current  MT  work  had  to  be  post-­‐edited  •  Intelligibility  and  informa1veness  worse  than  human  

• We  don’t  need  MT  anyhow  •  Already  too  many  human  translators  from  Russian  

•  Results:  MT  research  suffered  •  Funding  loss  •  Number  of  research  labs  declined  •  Associa1on  for  Machine  Transla1on  and  Computa1onal  Linguis1cs  dropped  MT  from  its  name  

Page 16: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

MT  in  the  modern  age  

•  1975-­‐1985  Resurgence  of  MT  in  Europe  and  Japan  •  Domain-­‐specific  rule-­‐based  systems  

•  1990-­‐present  •  Rise  of  Sta1s1cal  Machine  Transla1on  

Page 17: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Machine Translation

Introduction to MT

Page 18: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Machine Translation

Language Divergences

Page 19: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Language  Similari-es  and  Divergences  

•  Typology:    •  the  study  of  systema1c  cross-­‐linguis1c  similari1es  and  differences  

•  What  are  the  dimensions  along  which  human  languages  vary?  

Page 20: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Syntac-c  Varia-on:  Basic  Word  Orders  

•  SVO  (Subject-­‐Verb-­‐Object)  languages  English,  German,  French,  Mandarin  I baked a pizza!

•  SOV  Languages  Japanese,  Hindi  English:          He adores listening to music!Japanese:  kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores  

•  VSO  languages  •  Irish,  Classical  Arabic,  Tagalog  

In  many  languages  one  word  order  is  more  basic  

Page 21: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Morphology  

•  Morpheme:  “Minimal  meaningful  unit  of  language”  Word = Morpheme + Morpheme + Morpheme +…!

•  Stems:  (base  form,  root)    hope+ing  à  hoping  hop  à  hopping  

•  Affixes  • Prefixes:  An1disestablishmentarianism  •  Suffixes:  An1disestablishmentarianism  •  Infixes:  hingi  (borrow)  –  humingi  (borrower)  in  Tagalog  • Circumfixes:  sagen  (say)  –  gesagt  (said)  in  German  

Page 22: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Morphemes  per  Word  

isola1ng   synthe1c  

Vietnamese  

Joseph  Greenberg.  1954.  A  Quan1ta1ve  Approach  to  the  Morphological  Typology  of  Language.  IJAL  26:3.  

1   3  

1.06  

Yakut  (Turkic)  

2.17  

English  

1.68  

West  Greenlandic  (Eskimo-­‐Inuit)  

3.72  

2  

Swahili  

2.55  

4  

Page 23: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Few  morphemes  per  word:  Cantonese  

 “He  said  this  was  the  biggest  building  in  the  whole  country”    Each  word  in  this  sentence  has  one  morpheme  (and  one  syllable):    keui wa chyuhn gwok jeui daaih gaan nguk haih li gaan!he say entire country most big bldg house is this bldg!

Page 24: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Many  Morphemes  per  word:  Turkish  

uygarlaştıramadıklarımızdanmışsınızcasına  uygar+laş+tır+ama+dık+lar+ımız+dan+mış+sınız+casına  Behaving  as  if  you  are  among  those  whom  we  could  not  cause  to  become  civilized  

Page 25: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Word  Segmenta-on  Are  word  boundaries  marked  in  wri-ng?  

•  Some  wri1ng  systems:  boundaries  between  words  not  marked  •  Chinese,  Japanese,  Thai  • Word  segmenta1on  becomes  an  important  part  of  text  normaliza1on  for  MT  

•  Some  languages  tend  to  have  sentences  that  are  quite  long,  closer  to  English  paragraphs  than  sentences:  •  Modern  Standard  Arabic,  Chinese  •  Sentence  segmenta1on  may  be  necessary  for  MT  between  these  languages  and  languages  like  English  

Page 26: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky   Inferen-al  Load:    cold  vs.  hot  languages  

•  Hot  languages:  • Who  did  what  to  whom  is  marked  explicitly  •  English  

•  Cold  languages:  •  The  hearer  has  more  “figuring  out”  of  who  the  various  actors  in  the  various  events  are  

•  Japanese,  Chinese  

Balthasar  Bickel.  2003.  Referen1al  density  in  discourse  and  syntac1c  typology.  Language  79:2,  708-­‐36  

Page 27: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Inferen-al  Load:  The  blue  noun  phrases  are  not  in  the  Chinese  original  

飓风丽塔已经减弱为第三级飓风, Rita  weakened  and  was  downgraded  to  a  Category  3  storm;    

ø  迫近美国德课萨斯州和路易斯安那州, [Rita/it/the  storm]  is  moving  close  to  Texas  and  Louisiana;  

当局表示, the  authori1es  announced;    

虽然 ø  在登陆前可能再稍微减弱, although  [Rita/it/the  storm]  might  weaken  again  before  landing,  

但 ø  仍然会非常危险, [Rita/it/the  storm]  is  s1ll  very  dangerous;    

ø    预料 ø  会在当地时间星期六凌晨在德州和路易斯安那州之间登陆, [the  authori1es]  predict  [Rita/it/the  storm]    will  arrive  at  the  Texas-­‐

Louisiana  border  on  Saturday  morning  local  1me;    

ø    直接吹袭休斯敦市东面的主要炼油设施。 [Rita/it/the  storm]  will  directly  hit  the  oil-­‐refining  industry  east  of  

Houston.  

Page 28: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Lexical  Divergences  

•  Word  to  phrases:  •  English            computer science !•  French            informatique!

•  Part  of  Speech  divergences  

•  English            She likes to sing !•  German        Sie singt gerne [She  sings  likefully]  

•  English              I’m hungry!•  Spanish            Tengo hambre [I  have  hunger]  

Page 29: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Lexical  Specificity  Divergences  

•  Gramma1cal  specificity  •  Spanish:  plural  pronouns  have  gender    (ellos/ellas)  •  English:  plural  pronouns  no  gender  (they)  

•  So  transla1ng  “they”  from  English  to  Spanish,  need  to  figure  out  gender  of  the  referent!  

Page 30: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky   Lexical  Divergences:  Seman-c  Specificity  

English            brother    Mandarin  gege  (older  brother),  didi    (younger  brother)  

 

English        wall  German    Wand  (inside)          Mauer  (outside)    

English        fish!Spanish      pez  (the  creature)        pescado  (fish  as  food)  !

   Cantonese ngau!English cow beef  

Page 31: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Predicate  Argument  divergences  

•  English        Spanish  The  bo@le  floated  out.  La  botella  salió  flotando.  

       The  bo@le  exited  floa-ng    

•  Satellite-­‐framed  languages:    •  direc1on  of  mo1on  is  marked  on  the  satellite  •  Crawl out, float off, jump down, walk over to, run after!

•  Most  of  Indo-­‐European,  Hungarian,  Finnish,  Chinese  

•  Verb-­‐framed  languages:    •  direc1on  of  mo1on  is  marked  on  the  verb  •  Spanish,  French,  Arabic,  Hebrew,  Japanese,  Tamil,  Polynesian,  Mayan,  Bantu  families  

L.  Talmy.  1985.  Lexicaliza1on  pa@erns:  Seman1c  Structure  in  Lexical  Form.  

Page 32: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Predicate  Argument  divergences:  Heads  and  Argument  swapping  

Heads:  English:    X  swim  across  Y  Spanish:  X  crucar  Y  nadando    English:  I  like  to  eat  German:  Ich  esse  gern    English:  I’d  prefer  vanilla  German:  Mir  wäre  Vanille  lieber  

 

Arguments:  Spanish:    Y  me  gusta  English:  I  like  Y    German:  Der  Termin  fällt  mir  ein  English:  I  forget  the  date  

Dorr,  Bonnie  J.,  "Machine  Transla1on  Divergences:  A  Formal  Descrip1on  and  Proposed  Solu1on,"  Computa1onal  Linguis1cs,  20:4,  597-­‐-­‐633  

Page 33: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Predicate-­‐Argument  Divergence  Counts  

Found  divergences  in  32%  of  sentences  in  UN  Spanish/English  Corpus  

Part  of  Speech   X  tener  hambre    Y  have  hunger  

98%  

Phrase/Light  verb   X  dar  puñaladas  a  Z  X  stab  Z  

83%  

Structural   X  entrar  en  Y    X  enter  Y  

35%  

Heads  swap   X  cruzar  Y  nadando  X  swim  across  Y  

8%  

Arguments  swap   X  gustar  a  Y  Y  likes  X  

6%  

B.Dorr  et  al.  2002.  DUSTer:  A  Method  for  Unraveling  Cross-­‐Language  Divergences  for  Sta1s1cal  Word-­‐Level  Alignment  

Page 34: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Machine Translation

Language Divergences

Page 35: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Machine Translation

Three classical methods for MT

Page 36: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

3  Classical  methods  for  MT  

•  Direct  •  Transfer  •  Interlingua  

Page 37: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Three  MT  Approaches:  Direct,  Transfer,  Interlingual  

Page 38: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Direct  Transla-on  

•  Proceed  word-­‐by-­‐word  through  text  •  Transla1ng  each  word  •  No  intermediate  structures  except  morphology  •  Knowledge  is  in  the  form  of    

•  Huge  bilingual  dic1onary  •  word-­‐to-­‐word  transla1on  informa1on  

•  A]er  word  transla1on,  can  do  simple  reordering  •  Adjec1ve  ordering  English  -­‐>  French/Spanish  

Page 39: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Direct  MT  Dic-onary  entry  

Page 40: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Direct  MT  

Page 41: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Problems  with  direct  MT  

•  German  

•  Chinese  

Page 42: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

The  Transfer  Model  

•  Idea:  apply  contras1ve  knowledge,  i.e.,  knowledge  about  the  difference  between  two  languages  

•  Steps:  • Analysis:    Syntac1cally  parse  source  language  •  Transfer:  Rules  to  turn  this  parse  into  parse  for  target  language  • Genera-on:  Generate  target  sentence  from  parse  tree  

Page 43: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

English  to  French  

English:  Adjec1ve  Noun  French:  Noun  Adjec1ve  •  This  is  not  always  true  

Route  mauvaise      ‘bad  road,  badly-­‐paved  road’  Mauvaise  route        ‘wrong  road’  

•  But  is  a  reasonable  first  approxima1on  

•  Rule:  

Page 44: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Transfer  rules  

Page 45: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Transferring  the  green  witch….  

45  

Page 46: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Interlingua  

•  Instead  of  N2  sets  of  transfer  rules  •  Use  meaning  as  a  representa1on  language  

1.  Parse  source  sentence  into  meaning  representa1on  2.  Generate  target  sentence  from  meaning.  

•  Intui1on:  Use  other  NLP  applica1ons  to  do  MT  work  •  English  book      to  Spanish:        libro      or  reservar!•  Disambiguate  book  into  concepts  BOOKVOLUME  and  RESERVE  

•  Need  2N  systems  (a  parser  and  generator  for  each  language)  

Page 47: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Dan  Jurafsky  

Interlingua  for    Mary  did  not  slap  the  green  witch  

Page 48: Machine Translation - Stanford University$Hindi$ English:$$$$$He adores listening to music! Japanese:$kare ha ongaku wo kiku no ga daisuki desu! he music to listening adores$ • VSO$languages$

Machine Translation

Three classical methods for MT