1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management...

16
1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management Nijmegen November 2004
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    5

Transcript of 1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management...

Page 1: 1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management Nijmegen November 2004.

1

Workshop GoalsDELAMAN and DAM-LR

Peter WittenburgMPI for Psycholinguistics

Access ManagementNijmegenNovember 2004

Page 2: 1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management Nijmegen November 2004.

2

When did we start?

• it is just 5 years that we started in our discipline speaking about– large digital online collections

– standardizing the formats • XML was new and users were very skeptical• MPEG was and is something still not well understood

– open metadata to come to browsable and searchable domains

– using metadata to create well-organized archives

– interoperability

• LREC Athens 2000– first workshop on these issues

– start of the ISLE project (linguistic concepts, lexicon, metadata, …)

– start of the IMDI work

• in 2000 also first LDC workshop with OLAC as focus • little later DOBES was granted and E-Meld started

• this is very short time when you want to convince a community Access ManagementNijmegenNovember 2004

Page 3: 1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management Nijmegen November 2004.

3

What did we achieve?

• have “large” on-line digital archives/collections/Digital Libraries– MPI ~40.000 session bundles / ~10 TB

– DOBES ~1.500 session bundles/ 1500 h

– AILLA

– PARADISEC

– Lund corpora

– also in HLT domain • LDC • ELRA • BAS

– also “traditional” archives (Phonogramm Archiv, NAA, …)

– etc

• some of us became “archivists” by practice • idea of web visibility and online accessibility spreads • despite archiving attempts: according to D. Schüller ~80% of the

digitized material is endangered Access ManagementNijmegenNovember 2004

Page 4: 1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management Nijmegen November 2004.

4

What did we achieve?

• much evangelization and agreement about standards– DOBES workshops and documents

– LDC workshops and documents

– E-Meld workshops and excellent web-site

– ISLE workshops with IMDI result

– PARADISEC workshop with DELAMAN result

– HRELP workshops

– LREC workshops and contributions

– ACL workshops and contributions

– IASA/IAML conference

– etc

• “everyone” agrees with XML, UNICODE and linear PCM• “everyone” understands the relevance of schemas to make

linguistic structure and encoding explicit • wrt JPEG and MPEG we are shooting on a moving target, but

don’t yet have real alternativesAccess ManagementNijmegenNovember 2004

Page 5: 1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management Nijmegen November 2004.

5

What did we achieve?

• created awareness about the need of metadata for visibility • created operational metadata infrastructures within 4 years

– structured IMDI for discovery and management

– OLAC for overall discovery

– gateways between the two domains

• however, still not satisfying situation – > 50 institutions are using IMDI (as far as we know)

– ?? institutions are providing OLAC records

– still only a small fraction of the language resources are visible

– MD creation is hard • it is work for others – although this increasingly often is wrong • it means cleaning up your own holding and figure out what is available • it means to write “correct” scripts and to learn new software • it means being disciplined

• have done our development job – have to continue dissemination• despite limitations we hope that people stick to what is out there

Access ManagementNijmegenNovember 2004

Page 6: 1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management Nijmegen November 2004.

6

What did we achieve?

• interoperability is still a dream however …– have metadata gateways in our discipline (OLAC-IMDI)

– increasingly often tools are producing correct XML, UNICODE, …

– have filters for character encodings and formats although

we miss well-designed and comprehensive services

– have started with ontological work to tackle the linguistic aspects • GOLD ontology from E-Meld• ISO TC37/SC4 Data Category Registry • TDS (Dutch Typology Project) meta-language • EAGLES/ISLE/TEI specifications

• we are at the beginning• cannot speak yet about fully operational infrastructures

but there are islands like FIELD, LEXUS, ONTO-ELAN, …

Access ManagementNijmegenNovember 2004

Page 7: 1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management Nijmegen November 2004.

7

Changing role of Language Archives

different groups of people contribute

Access ManagementNijmegenNovember 2004

The

Archive

different groups of people use the content

specialists maintain, unify, check quality, etc

• at the MPI it is understood that the archive is the capital to build on

• in the DOBES programme the point to make results explicit and accessible

• only works if we don’t have an “inert, dusty” archives – not an attractive perspective – hear more about this from D.Schüller

Page 8: 1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management Nijmegen November 2004.

8

Vision for a single archive

Access ManagementNijmegenNovember 2004

MetadataTools

Archive Utility Layer

Domain ofRegistered Primary and Secondary Resources

Domain ofDescriptive Metadata

Primary Resources:TextsImagesSoundMovies

User

DataIngestion&

Management

UserAuthentication

AccessRights

Web-based Archive Exploration

AnnotationExploration

LexiconExploration

TextExploration

Ontological Knowledge

MediaAnnotation

(Web-based) Archive Enrichment

LexicalEncoding

WebCommentary

The Archive

done in progressto start

Page 9: 1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management Nijmegen November 2004.

9

Everything ok – so let’s go home …

• what about the following scenario?

Access ManagementNijmegenNovember 2004

Raw Data

Metadata

Raw Data

Metadatadata exchange

for

data survival reasons

archive A archive B

Page 10: 1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management Nijmegen November 2004.

10

Everything ok – so let’s go home …

• what about the following scenario?

Access ManagementNijmegenNovember 2004

Raw Data

Metadata

DOBES

Archive

Raw Data

Metadata

AILLA

Archive

my personalTrumai archive

AILLATrumai

DOBESTrumai

not just copies but result of own creative process

Page 11: 1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management Nijmegen November 2004.

11

DELAMAN

Digital Endangered Languages and Music Archive Network

• loose network of “archives” sharing a set of visions such as

– want to exchange data automatically (list driven)

– want to allow people to create integrated virtual working spaces

– want to have an integrated access management domain

• first talks in Nijmegen and at HRELP workshops 2003• foundation at PARADISEC meeting in Sydney 2003

• no deep discussions about wishes in detail and implementation • therefore this workshop in Nijmegen

• it’s about future usage scenarios with distributed archives

Access ManagementNijmegenNovember 2004

Page 12: 1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management Nijmegen November 2004.

12

DELAMAN / DAM-LR Map

• DELAMAN is an international network • DAM-LR

– Distributed Access Management for Language Resources

– 3 year EU project starting at 1.1.05 – yes we have money to start

– centered around the DELAMAN intentions

Access ManagementNijmegenNovember 2004

MPI

AILLA

EMELD

ANLC

LACITO

ELAR

PARADISEC

AMPM

LundINL

AIATSIS

Page 13: 1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management Nijmegen November 2004.

13

Workshop

• want to get a deeper understanding of what “we” want • need good requirements specifications • want to get a deeper understanding what others are doing

– our ideas are not new – we share them with others

– Digital Library initiatives (FEDORA, …)

– GRID initiative(s) (SRB, GTK, …)

– compute/function/data GRID

• therefore we invited – linguists knowing about potential and real user wishes

– “archivists” knowing about maintaining large repositories

– technologists knowing about current and future developments

– some of us looked into the legal and ethical aspects

• at the end we should be ready to start Access ManagementNijmegenNovember 2004

Page 14: 1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management Nijmegen November 2004.

14

Programme 1. Day

Access ManagementNijmegenNovember 2004

29.11.   Setting the Framework

  

9.00 W. Klein Welcome

9.10 P. Wittenburg DELAMAN and Workshop Goals

9.40 D. Schüller Audiovisual archiving: Visions, Challenges, Strategies

10.15 Discussion

  10.30 Coffee Break

    Researcher Requirements

 Kamp

11.00 T. Aristar/H. Dry Linguist Wishes

11.30 P. Austin/D. Nathan Linguist Wishes

12.00 G. Holton/H. Johnson Legal & Ethical Aspects

  12.30 Lunch Break

    Archivist Requirements

Strömquist

13.30 H. Johnson AILLA Setup and Implications

14.00 L. Barwick Paradisec Setup and Implications

14.30 Wittenburg/Skiba/Trilsbeek DOBES Setup and Implications

  15.00 Coffee Break

    Summary and Discussion

Strömquist

15.30 Uneson/Broeder/Strömquist Summary of Requirements

16.00   Questions and Discussion

17.00 W. Krull DOBES Program and the VW Foundation

17.15 Soddemann/Neumair/Verharen/Wbg Technology - Broad View

17.30    

  18.00 End

  20.00 Joint Dinner at Kwok Paw

Page 15: 1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management Nijmegen November 2004.

15

Programme 2. Day

Access ManagementNijmegenNovember 2004

30.11.   Technology Components

Nathan

9.00 T. Soddemann (got the Billing Award) Web Services

9.40 D. Barry GRID Components

10.10 B. Kerver Authentication and Authorization Systems

  11.00 Coffee Break

Nathan

11.30 L. Lannom Handle System

12.00 R. Moore Storage Resource Broker

12.45 Discussion

  13.00 Lunch Break

    Mapping Requirements and Technology

Aristar/Broeder

14.00 Aristar/Dry/Johnson/Barwick/… Understanding Technology Linguists/Archivists

14.15 Broeder/Nathan/Jacobson/Neumair/... Choice and Integration Aspects

14.30 Discussion

  15.00 Coffee Break

  15.30 Grand Summary and Open Discussion

  

16.00 Wittenburg Summary

16.30 Discussion

  17.00 End

times not too strict – it’s a workshop

Page 16: 1 Workshop Goals DELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management Nijmegen November 2004.

16

Let’s go …

Access ManagementNijmegenNovember 2004

The MPI team wishes us two interesting and highly interactive days in Nijmegen

Daan, Andreas Technology

Paul, Roman Archive

Peter ??