Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.

9
Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen

Transcript of Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.

Page 1: Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.

Robert's Drawers

(and other variations on GRE shared tasks)

Gatt, Belz, Reiter, Viethen

Page 2: Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.

Available resources● TUNA Corpus (Gatt et al; ca. 2500 refs)

one-shot references

balanced

2500 refs to furniture or people

● Robert's drawers (Viethen and Dale; ca. 140 refs)

one-shot references

not yet balanced

● GREC (“GRE in Context”) (Belz and Varges)

2000 introductory passages from Wikipedia

1000 annotated, rest in progress

annotated for reference to the main subject (“topic”)

different NP types:subjects, objects, possessives

● COCONUT (Jordan)

goes beyond just identification

● (possibly another corpus of newspaper texts)

Page 3: Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.

Short-term additions to resources

● Add comprehension data: Carry out experiments to get people to identify

referents and pair results with corpus descriptions. Data include:

● reaction time● error rate● self-paced reading for GREC-type corpora

Page 4: Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.

Long-term additions to resources

● Eye-tracking data

● Situated reference in virtual environments (Koller et al, this Workshop)

● In progress: small multimodal corpus (Bangerter, van der Sluis, Gatt)

Page 5: Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.

Task definition● Task structure:

provide a data source

have a small set of clearly defined tasks but ALSO:

have an open category

● Evaluation:

default metric

call for proposals for evaluation metrics

correlate metrics with human judgments/performance

● Scope for variation:

Task: content determination, realisation, lexical choice

Type of reference: full definite, anaphoric, singular/plural

Goal: model production or enhance comprehension

Page 6: Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.

(Sub-)communities

● GRE people (the usual suspects)

● CoNLL/EMNLP community

● Psycholinguists: advice/expertise computational psycholinguistic modelling

Page 7: Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.

Aims● “Community” aims:

Have fun! Get people working together, consolidate the community Broaden the community

● Broader aims:

Have a test-bed to see if NLG STECs actually work GRE is probably the best initial candidate

● Scientific aims:

Hothouse effect Evaluation:

● Use different methods● Evaluate the methods

Page 8: Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.

Execution: Logistics● Dry run to pilot the idea

Possibly at UCNLG (September)

Shared competitive task: Content Determination● singular definites, furniture

Production evaluation, using TUNA

Include a call for evaluation metrics

Also include open track

● Main event (larger scale & wider scope)

Co-located with INLG?

Several shared tasks + open category

Evaluation:● Production: match between algorithm & human● Comprehension: ease of identification, etc.

Page 9: Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.

Evaluation: £££

● Sources of expense:

Human evaluations Adding comprehension data to the corpora Organisational costs (web site, etc)

● Who's paying?

Community effort Aberdeen platform grant Brighton Prodigy project funds No special funding (yet)