1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT...

1

Josef van Genabith & Andy Way

• TransBooster (2003-2006)

• LaDEva: Labelled Dependency-Based MT Evaluation (2006-2008)

• GramLab (2001-2008)

Previous MT Work & GramLab

2

TransBooster

TransBooster (2003-2006)

Enterprise Ireland funded Basic Research Project

PI: Josef van Genabith Col: Andy Way

Students: Bart Mellebeek, Anna Khasin, Karolina Owczarzak

3

TransBooster

TransBooster Basic Idea:

• MT systems are better on short (= simple) sentences than on longer ones.

• Capitalise on this!

• Divide up long sentences (automatically) into shorter components

• Feed those components to MT system

• Translate (get better results for shorter components)

• Put (better) translations together in target (= get better translation)

A bit like Controlled Language, but automatic and without the restrictions (to particular syntax etc.)!

4

TransBooster

TransBooster Example

5

TransBooster

• Wrapper technology• Tricks MT system to produce better results …

6

TransBooster

TransBooster needs

• Good parsers

• Head and argument/adjunct finding rules

TransBooster with

• Rule-Based MT (Systran, Logomedia)

• Example-Based MT (DCU system)

• Statistical MT (standard Aachen PBSMT)

• Multi-engine MT

Improves results! => full details Bart Mellebeek’s PhD & publications

7

TransBooster

Bart Mellebeeks PhD dissertation 2007

8

LaDEva

LaDEva: Labelled Dependency Based Evaluation for MT (2005-2008)

Microsoft Ireland funded Basic Research Project

PIs: Josef van Genabith/Andy Way

Students: Karolina Owczarzak

9

LaDEva

Basic Idea:

• Automatic evaluation methods extremely important for MT

• String-based MT evaluation (BLEU etc.) unfairly penalises perfectly valid

- lexical variation/paraphrases - syntactic variation/paraphrases

• Compare:

John resigned yesterday.

Yesterday, John quit.

• Use labelled dependencies (instead of surface strings) for automatic evaluation

10

LaDEva

LaDEva example (syntactic variation):

Use WordNet and PBSMT alignments for lexical variation …

11

LaDEva

LaDEva needs

• Very (!) robust dependency parsers that can parse MT output (as opposed to grammatical language)

• DCU GramLab treebank-based LFG parsers

• Microsoft Parsers

• WordNet, PBSMT alignments

Evaluate LaDEva using

• BLEU• NIST• GTM• Meteor

in terms of correlation with human judgments

12

LaDEva

13

LaDEva

Karolina Owczarzak’s PhD thesis 2008

14

GramLab

GramLab (2001 – 2008)

- Automatic Annotation of Penn-II Treenbank with LFG F-Structures (2001-2004) Enterprise Ireland funded Basic Research Project

Team: PI: Josef van Genabith, Col: Andy Way, Aoife Cahill, Mairead McCarthy, Mick Burke, Ruth O’Donovan

- GramLab: Chinese, Japanese, Arabic, Spanish, French, German, English(2004-2008) Science Foundation Ireland funded Principal Investigatorship

Team: PI: Josef van Genabith, Grzegorz Chrupala, Natalie Schluter, Ines Rehbein, Yuqing Guo, Masanori Oya, Amine Akrout, Dr. Aoife Cahill, Dr. Yaffa Al-Raheb, Dr. Deirdre Hogan, Dr. Sisay Adafre, Dr. Lamia Tounsi, Dr. Mohammed Attia

15

GramLab

GramLab (2001 – 2008)

Basic Idea:

Handcrafting deep wide coverage grammars is time-consuming, expensive and difficult to scale to unrestricted text.

Acquire grammars automatically from treebanks => shallow grammars

New: acquire deep grammars automatically from treebanks

16

GramLab

Shallow Grammar: defines language as set of strings and associates syntactic structure to string

Deep Grammar: shallow grammar + maps strings to information (meaning, dependencies, predicate argument structure – “who did what to whom”) + non-local dependency resolution

17

GramLab

18

GramLab

• Probabilistic Parsing & Probabilistic Generation • Used in MT Evaluation (Karo), Question Answering System (Sisay)• Outperforms best hand-crafted resources (XLE, RASP) for English

• Lots of publications, including 2 Computational Linguistics Journal Papers, 6 ACL, COLING, EMNLP Papers (2004-2008)

• Aoife Cahill, Michael Burke, Ruth O'Donovan, Stefan Riezler, Josef van Genabith and Andy Way, Wide-Coverage Deep Statistical Parsing using Automatic Dependency Structure Annotation in Computational Linguistics, 2008

• Ruth O'Donovan, Michael Burke, Aoife Cahill, Josef van Genabith and Andy Way (2005) Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II

and Penn-III Treebanks, Computational Linguistics, 2005

• Transfer-based probabilistic data-driven MT … (Yvette Graham)• LORG industry strength parsers and generators for IE/IR & QA (Jennifer & Deirdre)

1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT...

Documents

Transcript of 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT...