Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.
-
Upload
alexina-hoover -
Category
Documents
-
view
215 -
download
0
Transcript of Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.
![Page 1: Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.](https://reader035.fdocuments.in/reader035/viewer/2022071710/56649dd95503460f94ace977/html5/thumbnails/1.jpg)
Robert's Drawers
(and other variations on GRE shared tasks)
Gatt, Belz, Reiter, Viethen
![Page 2: Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.](https://reader035.fdocuments.in/reader035/viewer/2022071710/56649dd95503460f94ace977/html5/thumbnails/2.jpg)
Available resources● TUNA Corpus (Gatt et al; ca. 2500 refs)
one-shot references
balanced
2500 refs to furniture or people
● Robert's drawers (Viethen and Dale; ca. 140 refs)
one-shot references
not yet balanced
● GREC (“GRE in Context”) (Belz and Varges)
2000 introductory passages from Wikipedia
1000 annotated, rest in progress
annotated for reference to the main subject (“topic”)
different NP types:subjects, objects, possessives
● COCONUT (Jordan)
goes beyond just identification
● (possibly another corpus of newspaper texts)
![Page 3: Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.](https://reader035.fdocuments.in/reader035/viewer/2022071710/56649dd95503460f94ace977/html5/thumbnails/3.jpg)
Short-term additions to resources
● Add comprehension data: Carry out experiments to get people to identify
referents and pair results with corpus descriptions. Data include:
● reaction time● error rate● self-paced reading for GREC-type corpora
![Page 4: Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.](https://reader035.fdocuments.in/reader035/viewer/2022071710/56649dd95503460f94ace977/html5/thumbnails/4.jpg)
Long-term additions to resources
● Eye-tracking data
● Situated reference in virtual environments (Koller et al, this Workshop)
● In progress: small multimodal corpus (Bangerter, van der Sluis, Gatt)
![Page 5: Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.](https://reader035.fdocuments.in/reader035/viewer/2022071710/56649dd95503460f94ace977/html5/thumbnails/5.jpg)
Task definition● Task structure:
provide a data source
have a small set of clearly defined tasks but ALSO:
have an open category
● Evaluation:
default metric
call for proposals for evaluation metrics
correlate metrics with human judgments/performance
● Scope for variation:
Task: content determination, realisation, lexical choice
Type of reference: full definite, anaphoric, singular/plural
Goal: model production or enhance comprehension
![Page 6: Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.](https://reader035.fdocuments.in/reader035/viewer/2022071710/56649dd95503460f94ace977/html5/thumbnails/6.jpg)
(Sub-)communities
● GRE people (the usual suspects)
● CoNLL/EMNLP community
● Psycholinguists: advice/expertise computational psycholinguistic modelling
![Page 7: Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.](https://reader035.fdocuments.in/reader035/viewer/2022071710/56649dd95503460f94ace977/html5/thumbnails/7.jpg)
Aims● “Community” aims:
Have fun! Get people working together, consolidate the community Broaden the community
● Broader aims:
Have a test-bed to see if NLG STECs actually work GRE is probably the best initial candidate
● Scientific aims:
Hothouse effect Evaluation:
● Use different methods● Evaluate the methods
![Page 8: Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.](https://reader035.fdocuments.in/reader035/viewer/2022071710/56649dd95503460f94ace977/html5/thumbnails/8.jpg)
Execution: Logistics● Dry run to pilot the idea
Possibly at UCNLG (September)
Shared competitive task: Content Determination● singular definites, furniture
Production evaluation, using TUNA
Include a call for evaluation metrics
Also include open track
● Main event (larger scale & wider scope)
Co-located with INLG?
Several shared tasks + open category
Evaluation:● Production: match between algorithm & human● Comprehension: ease of identification, etc.
![Page 9: Robert's Drawers (and other variations on GRE shared tasks) Gatt, Belz, Reiter, Viethen.](https://reader035.fdocuments.in/reader035/viewer/2022071710/56649dd95503460f94ace977/html5/thumbnails/9.jpg)
Evaluation: £££
● Sources of expense:
Human evaluations Adding comprehension data to the corpora Organisational costs (web site, etc)
● Who's paying?
Community effort Aberdeen platform grant Brighton Prodigy project funds No special funding (yet)