Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.

Robert's Drawers

(and other variations on GRE shared tasks)

Gatt, Belz, Reiter, Viethen

Available resources● TUNA Corpus (Gatt et al; ca. 2500 refs)

one-shot references

balanced

2500 refs to furniture or people

● Robert's drawers (Viethen and Dale; ca. 140 refs)

one-shot references

not yet balanced

● GREC (“GRE in Context”) (Belz and Varges)

2000 introductory passages from Wikipedia

1000 annotated, rest in progress

annotated for reference to the main subject (“topic”)

different NP types:subjects, objects, possessives

● COCONUT (Jordan)

goes beyond just identification

● (possibly another corpus of newspaper texts)

Short-term additions to resources

● Add comprehension data: Carry out experiments to get people to identify

referents and pair results with corpus descriptions. Data include:

● reaction time● error rate● self-paced reading for GREC-type corpora

Long-term additions to resources

● Eye-tracking data

● Situated reference in virtual environments (Koller et al, this Workshop)

● In progress: small multimodal corpus (Bangerter, van der Sluis, Gatt)

Task definition● Task structure:

provide a data source

have a small set of clearly defined tasks but ALSO:

have an open category

● Evaluation:

default metric

call for proposals for evaluation metrics

correlate metrics with human judgments/performance

● Scope for variation:

Task: content determination, realisation, lexical choice

Type of reference: full definite, anaphoric, singular/plural

Goal: model production or enhance comprehension

(Sub-)communities

● GRE people (the usual suspects)

● CoNLL/EMNLP community

● Psycholinguists: advice/expertise computational psycholinguistic modelling

Aims● “Community” aims:

Have fun! Get people working together, consolidate the community Broaden the community

● Broader aims:

Have a test-bed to see if NLG STECs actually work GRE is probably the best initial candidate

● Scientific aims:

Hothouse effect Evaluation:

● Use different methods● Evaluate the methods

Execution: Logistics● Dry run to pilot the idea

Possibly at UCNLG (September)

Shared competitive task: Content Determination● singular definites, furniture

Production evaluation, using TUNA

Include a call for evaluation metrics

Also include open track

● Main event (larger scale & wider scope)

Co-located with INLG?

Several shared tasks + open category

Evaluation:● Production: match between algorithm & human● Comprehension: ease of identification, etc.

Evaluation: £££

● Sources of expense:

Human evaluations Adding comprehension data to the corpora Organisational costs (web site, etc)

● Who's paying?

Community effort Aberdeen platform grant Brighton Prodigy project funds No special funding (yet)

Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.

Documents

Transcript of Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.