1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT...

18
1 Josef van Genabith & Andy Way • TransBooster (2003-2006) • LaDEva: Labelled Dependency-Based MT Evaluation (2006-2008) • GramLab (2001-2008) Previous MT Work & GramLab
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT...

Page 1: 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

1

Josef van Genabith & Andy Way

• TransBooster (2003-2006)

• LaDEva: Labelled Dependency-Based MT Evaluation (2006-2008)

• GramLab (2001-2008)

Previous MT Work & GramLab

Page 2: 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

2

TransBooster

TransBooster (2003-2006)

Enterprise Ireland funded Basic Research Project

PI: Josef van Genabith Col: Andy Way

Students: Bart Mellebeek, Anna Khasin, Karolina Owczarzak

Page 3: 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

3

TransBooster

TransBooster Basic Idea:

• MT systems are better on short (= simple) sentences than on longer ones.

• Capitalise on this!

• Divide up long sentences (automatically) into shorter components

• Feed those components to MT system

• Translate (get better results for shorter components)

• Put (better) translations together in target (= get better translation)

A bit like Controlled Language, but automatic and without the restrictions (to particular syntax etc.)!

Page 4: 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

4

TransBooster

TransBooster Example

Page 5: 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

5

TransBooster

• Wrapper technology• Tricks MT system to produce better results …

Page 6: 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

6

TransBooster

TransBooster needs

• Good parsers

• Head and argument/adjunct finding rules

TransBooster with

• Rule-Based MT (Systran, Logomedia)

• Example-Based MT (DCU system)

• Statistical MT (standard Aachen PBSMT)

• Multi-engine MT

Improves results! => full details Bart Mellebeek’s PhD & publications

Page 7: 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

7

TransBooster

Bart Mellebeeks PhD dissertation 2007

Page 8: 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

8

LaDEva

LaDEva: Labelled Dependency Based Evaluation for MT (2005-2008)

Microsoft Ireland funded Basic Research Project

PIs: Josef van Genabith/Andy Way

Students: Karolina Owczarzak

Page 9: 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

9

LaDEva

Basic Idea:

• Automatic evaluation methods extremely important for MT

• String-based MT evaluation (BLEU etc.) unfairly penalises perfectly valid

- lexical variation/paraphrases - syntactic variation/paraphrases

• Compare:

John resigned yesterday.

Yesterday, John quit.

• Use labelled dependencies (instead of surface strings) for automatic evaluation

Page 10: 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

10

LaDEva

LaDEva example (syntactic variation):

Use WordNet and PBSMT alignments for lexical variation …

Page 11: 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

11

LaDEva

LaDEva needs

• Very (!) robust dependency parsers that can parse MT output (as opposed to grammatical language)

• DCU GramLab treebank-based LFG parsers

• Microsoft Parsers

• WordNet, PBSMT alignments

Evaluate LaDEva using

• BLEU• NIST• GTM• Meteor

in terms of correlation with human judgments

Page 12: 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

12

LaDEva

Page 13: 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

13

LaDEva

Karolina Owczarzak’s PhD thesis 2008

Page 14: 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

14

GramLab

GramLab (2001 – 2008)

- Automatic Annotation of Penn-II Treenbank with LFG F-Structures (2001-2004) Enterprise Ireland funded Basic Research Project

Team: PI: Josef van Genabith, Col: Andy Way, Aoife Cahill, Mairead McCarthy, Mick Burke, Ruth O’Donovan

- GramLab: Chinese, Japanese, Arabic, Spanish, French, German, English(2004-2008) Science Foundation Ireland funded Principal Investigatorship

Team: PI: Josef van Genabith, Grzegorz Chrupala, Natalie Schluter, Ines Rehbein, Yuqing Guo, Masanori Oya, Amine Akrout, Dr. Aoife Cahill, Dr. Yaffa Al-Raheb, Dr. Deirdre Hogan, Dr. Sisay Adafre, Dr. Lamia Tounsi, Dr. Mohammed Attia

Page 15: 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

15

GramLab

GramLab (2001 – 2008)

Basic Idea:

Handcrafting deep wide coverage grammars is time-consuming, expensive and difficult to scale to unrestricted text.

Acquire grammars automatically from treebanks => shallow grammars

New: acquire deep grammars automatically from treebanks

Page 16: 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

16

GramLab

Shallow Grammar: defines language as set of strings and associates syntactic structure to string

Deep Grammar: shallow grammar + maps strings to information (meaning, dependencies, predicate argument structure – “who did what to whom”) + non-local dependency resolution

Page 17: 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

17

GramLab

Page 18: 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT Evaluation (2006- 2008) GramLab (2001-2008) Previous MT Work.

18

GramLab

• Probabilistic Parsing & Probabilistic Generation • Used in MT Evaluation (Karo), Question Answering System (Sisay)• Outperforms best hand-crafted resources (XLE, RASP) for English

• Lots of publications, including 2 Computational Linguistics Journal Papers, 6 ACL, COLING, EMNLP Papers (2004-2008)

• Aoife Cahill, Michael Burke, Ruth O'Donovan, Stefan Riezler, Josef van Genabith and Andy Way, Wide-Coverage Deep Statistical Parsing using Automatic Dependency Structure Annotation in Computational Linguistics, 2008

• Ruth O'Donovan, Michael Burke, Aoife Cahill, Josef van Genabith and Andy Way (2005) Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II

and Penn-III Treebanks, Computational Linguistics, 2005

• Transfer-based probabilistic data-driven MT … (Yvette Graham)• LORG industry strength parsers and generators for IE/IR & QA (Jennifer & Deirdre)