Evaluating the Waspbench

Evaluating the Waspbench

A Lexicography Tool Incorporating Word Sense

Disambiguation

Rob Koeling, Adam Kilgarriff,

David Tugwell, Roger Evans

ITRI, University of Brighton

Credits: UK EPSRC grant WASPS, M34971

Lexicographers need NLP

NLP needs lexicography

Word senses: nowhere truer

Lexicography– the second hardest part



NLP– Word sense disambiguation (WSD)

SENSEVAL-1 (1998): 77% Hector SENSEVAL-2 (2001): 64% WordNet



NLP– Word sense disambiguation (WSD)

SENSEVAL-1 (1998): 77% Hector SENSEVAL-2 (2001): 64% WordNet

– Machine Translation Main cost is lexicography

Synergy

The WASPBENCH

Inputs and outputs Inputs

– Corpus (processed)– Lexicographic expertise

Inputs and outputs Outputs

– Analysis of meaning/translation repertoire – Implemented:

Word expert Can disambiguate

A “disambiguating dictionary”

Inputs and outputs

MT needs rules of form

in context C, S => T– Major determinant of MT quality– Manual production: expensive– Eng oil => Fr huile or petrole?

SYSTRAN: 400 rules

Inputs and outputs

MT needs rules of form

in context C, S => T– Major determinant of MT quality– Manual production: expensive– Eng oil => Fr huile or petrole?

SYSTRAN: 400 rules

Waspbench output: thousands of rules

Evaluation

hard

Evaluation

hard Three communities

Evaluation

hard Three communities No precedents

Evaluation

hard Three communities No precedents The art and craft of lexicography

Evaluation

hard Three communities No precedents The art and craft of lexicography MT personpower budgets

Five threads as WSD: SENSEVAL for lexicography: MED expert reports Quantitative experiments with human

subjects– India

Within-group consistency

– Leeds Comparison with commercial MT

Method Human1

creates word experts Computer

uses word experts to disambiguate test instances MT system

translates same test instances Human2

– evaluates computer and MT performance on each instance:

– good / bad / unsure / preferred / alternative

Words mid-frequency

– 1,500-20,000 instances in BNC At least two clearly distinct meanings

– Checked with ref to translations into Fr/Ger/Dutch

33 words– 16 nouns, 10 verbs, 7 adjs

around 40 test instances per word

WordsNouns Verbs Adjectives

bank party charge toast bright

chest policy float undermine free

coat record move funny

fit seal observe hot

line step offend moody

lot term post strong

mass volume pray

Human subjects Translation studies students, Univ Leeds

– Thanks: Tony Hartley Native/near-native in English and their other

language twelve people, working with:

– Chinese (4) French (3) German (2) Italian (1) Japanese (2) (no MT system for Japanese)

circa four days’ work:– introduction/training– two days to create word experts– two days to evaluate output

Method Human1

creates word experts, average 30 mins/word Computer

uses word experts to disambiguate test instances MT system: Babelfish via Altavista

translates same test instances Human2

– evaluates computer and MT performance on each instance:

– good / bad / unsure / preferred / alternative

Results (%)

Lang Wasps MT both neither unsure

Ger 60 28 19 26 5

Fr 61 45 37 28 4

Ch 68 42 37 23 3

It 67 29 23 22 5

All 64 36 29 25 4

Results by POS (%)Wasps MT both neither

Nouns 69 40 35 24

Verbs 61 38 32 27

Adjs 63 41 31 24

Observations Grad student users, 4-hour training 30 mins per (not-too-complex) word ‘fuzzy’ words intrinsically harder No great inter-subject disparities

– (it’s the words that vary, not the people)

Conclusion WSD can improve MT

(using a tool like WASPS)

Future work multiwords n>2 thesaurus other source languages new corpora, bigger corpora

– the web

Evaluating the Waspbench

Documents

Transcript of Evaluating the Waspbench