Mt Buller · 2015-03-30 · Created Date: 1/23/2006 10:42:45 AM
1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT...
-
date post
20-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of 1 Josef van Genabith & Andy Way TransBooster (2003-2006) LaDEva: Labelled Dependency-Based MT...
1
Josef van Genabith & Andy Way
• TransBooster (2003-2006)
• LaDEva: Labelled Dependency-Based MT Evaluation (2006-2008)
• GramLab (2001-2008)
Previous MT Work & GramLab
2
TransBooster
TransBooster (2003-2006)
Enterprise Ireland funded Basic Research Project
PI: Josef van Genabith Col: Andy Way
Students: Bart Mellebeek, Anna Khasin, Karolina Owczarzak
3
TransBooster
TransBooster Basic Idea:
• MT systems are better on short (= simple) sentences than on longer ones.
• Capitalise on this!
• Divide up long sentences (automatically) into shorter components
• Feed those components to MT system
• Translate (get better results for shorter components)
• Put (better) translations together in target (= get better translation)
A bit like Controlled Language, but automatic and without the restrictions (to particular syntax etc.)!
4
TransBooster
TransBooster Example
5
TransBooster
• Wrapper technology• Tricks MT system to produce better results …
6
TransBooster
TransBooster needs
• Good parsers
• Head and argument/adjunct finding rules
TransBooster with
• Rule-Based MT (Systran, Logomedia)
• Example-Based MT (DCU system)
• Statistical MT (standard Aachen PBSMT)
• Multi-engine MT
Improves results! => full details Bart Mellebeek’s PhD & publications
7
TransBooster
Bart Mellebeeks PhD dissertation 2007
8
LaDEva
LaDEva: Labelled Dependency Based Evaluation for MT (2005-2008)
Microsoft Ireland funded Basic Research Project
PIs: Josef van Genabith/Andy Way
Students: Karolina Owczarzak
9
LaDEva
Basic Idea:
• Automatic evaluation methods extremely important for MT
• String-based MT evaluation (BLEU etc.) unfairly penalises perfectly valid
- lexical variation/paraphrases - syntactic variation/paraphrases
• Compare:
John resigned yesterday.
Yesterday, John quit.
• Use labelled dependencies (instead of surface strings) for automatic evaluation
10
LaDEva
LaDEva example (syntactic variation):
Use WordNet and PBSMT alignments for lexical variation …
11
LaDEva
LaDEva needs
• Very (!) robust dependency parsers that can parse MT output (as opposed to grammatical language)
• DCU GramLab treebank-based LFG parsers
• Microsoft Parsers
• WordNet, PBSMT alignments
Evaluate LaDEva using
• BLEU• NIST• GTM• Meteor
in terms of correlation with human judgments
12
LaDEva
13
LaDEva
Karolina Owczarzak’s PhD thesis 2008
14
GramLab
GramLab (2001 – 2008)
- Automatic Annotation of Penn-II Treenbank with LFG F-Structures (2001-2004) Enterprise Ireland funded Basic Research Project
Team: PI: Josef van Genabith, Col: Andy Way, Aoife Cahill, Mairead McCarthy, Mick Burke, Ruth O’Donovan
- GramLab: Chinese, Japanese, Arabic, Spanish, French, German, English(2004-2008) Science Foundation Ireland funded Principal Investigatorship
Team: PI: Josef van Genabith, Grzegorz Chrupala, Natalie Schluter, Ines Rehbein, Yuqing Guo, Masanori Oya, Amine Akrout, Dr. Aoife Cahill, Dr. Yaffa Al-Raheb, Dr. Deirdre Hogan, Dr. Sisay Adafre, Dr. Lamia Tounsi, Dr. Mohammed Attia
15
GramLab
GramLab (2001 – 2008)
Basic Idea:
Handcrafting deep wide coverage grammars is time-consuming, expensive and difficult to scale to unrestricted text.
Acquire grammars automatically from treebanks => shallow grammars
New: acquire deep grammars automatically from treebanks
16
GramLab
Shallow Grammar: defines language as set of strings and associates syntactic structure to string
Deep Grammar: shallow grammar + maps strings to information (meaning, dependencies, predicate argument structure – “who did what to whom”) + non-local dependency resolution
17
GramLab
18
GramLab
• Probabilistic Parsing & Probabilistic Generation • Used in MT Evaluation (Karo), Question Answering System (Sisay)• Outperforms best hand-crafted resources (XLE, RASP) for English
• Lots of publications, including 2 Computational Linguistics Journal Papers, 6 ACL, COLING, EMNLP Papers (2004-2008)
• Aoife Cahill, Michael Burke, Ruth O'Donovan, Stefan Riezler, Josef van Genabith and Andy Way, Wide-Coverage Deep Statistical Parsing using Automatic Dependency Structure Annotation in Computational Linguistics, 2008
• Ruth O'Donovan, Michael Burke, Aoife Cahill, Josef van Genabith and Andy Way (2005) Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II
and Penn-III Treebanks, Computational Linguistics, 2005
• Transfer-based probabilistic data-driven MT … (Yvette Graham)• LORG industry strength parsers and generators for IE/IR & QA (Jennifer & Deirdre)