Final Report of Working Group 5 Interoperation G. Simons (chair), H. Aristar-Dry, D. Iannucci, E....

24
Final Report of Working Final Report of Working Group 5 Group 5 Interoperation Interoperation G. Simons (chair), H. G. Simons (chair), H. Aristar-Dry, Aristar-Dry, D. Iannucci, E. Richter, H. D. Iannucci, E. Richter, H. Sicard, Sicard, N. Thieberger, P. N. Thieberger, P. Wittenburg Wittenburg

Transcript of Final Report of Working Group 5 Interoperation G. Simons (chair), H. Aristar-Dry, D. Iannucci, E....

Final Report of Working Final Report of Working Group 5Group 5

InteroperationInteroperation

G. Simons (chair), H. Aristar-Dry,G. Simons (chair), H. Aristar-Dry, D. Iannucci, E. Richter, H. Sicard, D. Iannucci, E. Richter, H. Sicard, N. Thieberger, P. Wittenburg N. Thieberger, P. Wittenburg

ELIIP Workshop, Salt Lake City, 12-14 Nov ELIIP Workshop, Salt Lake City, 12-14 Nov 20092009

22

InteroperationInteroperation

What is it?What is it?– Interoperability is the ability for two or more Interoperability is the ability for two or more

systems to exchange information or services systems to exchange information or services and to make satisfactory use of what is and to make satisfactory use of what is exchanged. exchanged.

What does it take for this to happen:What does it take for this to happen:– The systems agree on standardized The systems agree on standardized

definitions of the concepts about which they definitions of the concepts about which they want to sharewant to share

– The systems use a standardized format and The systems use a standardized format and protocol for information interchangeprotocol for information interchange

33

Why interoperate?Why interoperate?

It prevents a centralized service from It prevents a centralized service from duplicating the efforts of othersduplicating the efforts of others

It maximizes data freshness since It maximizes data freshness since updates are propagated when made by updates are propagated when made by the ownerthe owner

It makes a centralized service more It makes a centralized service more sustainable since others bear the cost of sustainable since others bear the cost of providing dataproviding data

It allows multiple centralized service to It allows multiple centralized service to add value to the same basic informationadd value to the same basic information

44

Ways to build a web Ways to build a web information serviceinformation service

Centralized database curationCentralized database curation– The service is self-contained: the service The service is self-contained: the service

defines the database, users edit the data defines the database, users edit the data directly, the service curates the directly, the service curates the information information

Centralized database aggregationCentralized database aggregation– The service has no data of its own: it uses The service has no data of its own: it uses

an interoperation protocol to populate the an interoperation protocol to populate the database from other sources that curate database from other sources that curate the desired informationthe desired information

55

The hybrid approachThe hybrid approach

The service uses an interoperation The service uses an interoperation protocol to aggregate all information it protocol to aggregate all information it can get from elsewhere.can get from elsewhere.

The service develops a database to The service develops a database to handle new information it will curate handle new information it will curate (whether missing data or alternative (whether missing data or alternative values). Avalues). As a “good citizen” the service s a “good citizen” the service shares its unique data with others via the shares its unique data with others via the same protocol.same protocol.

End users see a combination of the End users see a combination of the aggregated and the curated data.aggregated and the curated data.

66

What does this mean What does this mean for ELIIP?for ELIIP? For each kind of information that For each kind of information that

the centralized ELIIP service wants the centralized ELIIP service wants to offer, it must decide whether to:to offer, it must decide whether to:– Aggregate it,Aggregate it,– Curate it, orCurate it, or– Do bothDo both

The answer can be different for The answer can be different for different kinds of informationdifferent kinds of information

77

What kinds of What kinds of information?information?

1.1. Web pages about a languageWeb pages about a language

2.2. Existing language documentationExisting language documentation

3.3. Summary index of documentation levelSummary index of documentation level

4.4. Projects and peopleProjects and people

5.5. Training and revitalization programsTraining and revitalization programs

6.6. The language situationThe language situation

7.7. The genetic classificationThe genetic classification

OUT OF SCOPE: Interoperation over OUT OF SCOPE: Interoperation over language data (like dictionaries and language data (like dictionaries and interlinear texts)interlinear texts)

88

1. Web pages on 1. Web pages on languageslanguages

Two low-bar approaches to Two low-bar approaches to interoperation:interoperation:

– Microformats: Harvestable metadata is Microformats: Harvestable metadata is embedded in the HTML coding of a page.embedded in the HTML coding of a page.

– Predictable URL: A web site that offers Predictable URL: A web site that offers information about many languages has a information about many languages has a main page for each language with a base main page for each language with a base URL parameterized by the ISO 639-3 codeURL parameterized by the ISO 639-3 code

99

ELIIP could …ELIIP could …

Define microformats and provide a Define microformats and provide a service for crawling pages on sites service for crawling pages on sites that use themthat use them

Identify web sites that should Identify web sites that should implement predictable URLs and implement predictable URLs and provide funding to incentivize needed provide funding to incentivize needed changes on those siteschanges on those sites

Provide a service for registering base Provide a service for registering base URLs and boilerplate metadata so URLs and boilerplate metadata so that OLAC records are generated for that OLAC records are generated for all language codes that yield a pageall language codes that yield a page

1010

2. Existing 2. Existing documentationdocumentation A working interoperation A working interoperation

infrastructure already exists in OLACinfrastructure already exists in OLAC ELIIP should aggregate from OLAC to ELIIP should aggregate from OLAC to

avoid duplicatin workavoid duplicatin work But there are huge gaps in the OLAC But there are huge gaps in the OLAC

coveragecoverage Thus ELIIP needs a hybrid approach Thus ELIIP needs a hybrid approach

as OLAC data provider to fill the gaps as OLAC data provider to fill the gaps and as OLAC service provider to and as OLAC service provider to aggregateaggregate

1111

Filling the gapsFilling the gaps

Since … ELIIP could Since … ELIIP could

……        Many language archives Many language archives don’t participate in don’t participate in OLACOLAC

Help those archives Help those archives become OLAC data become OLAC data providersproviders

Many resources are Many resources are being put in generic OAI-being put in generic OAI-based institutional based institutional repositoriesrepositories

Run a service that Run a service that harvests those harvests those resources and assigns resources and assigns linguistic metadata to linguistic metadata to themthem

Many resources are Many resources are conven-tionally conven-tionally published or posted published or posted directly to the webdirectly to the web

Curate a database in Curate a database in which linguists can enter which linguists can enter metadata for those metadata for those resourcesresources

Many linguists don’t Many linguists don’t have a place to deposit have a place to deposit their worktheir work

Curate a digital Curate a digital repository of language repository of language documentationdocumentation

1212

3. Documentation 3. Documentation indexindex A numerical index that summarizes A numerical index that summarizes

level of language documentation (as level of language documentation (as at AUSTLANG) is desirableat AUSTLANG) is desirable

The OLAC aggregator (especially The OLAC aggregator (especially after ELIIP fills the gaps) provides a after ELIIP fills the gaps) provides a list of all the resources by linguistic list of all the resources by linguistic data types data types

What’s needed is a way to convert What’s needed is a way to convert those to a measure of extentthose to a measure of extent

1313

ELIIP could …ELIIP could …

Participate in the OLAC process to refine the Participate in the OLAC process to refine the linguistic data type vocabulary as neededlinguistic data type vocabulary as needed– E.g. add “language instruction”E.g. add “language instruction”

Participate in the OLAC process to add a Participate in the OLAC process to add a new recommendation for <dc:extent>new recommendation for <dc:extent>– E.g. lexicon/0, lexicon/1, lexicon/2, lexicon/3E.g. lexicon/0, lexicon/1, lexicon/2, lexicon/3

Promote its adoption by all OLAC Promote its adoption by all OLAC participants and add curated judgments participants and add curated judgments where that failswhere that fails

Develop an overall numerical index that Develop an overall numerical index that combines results over all the data typescombines results over all the data types

1414

4. People and projects4. People and projects

The OLAC infrastructure can support thisThe OLAC infrastructure can support this DCMI Type vocabulary:DCMI Type vocabulary:

– Event: A time-bounded occurrence Event: A time-bounded occurrence A project can be described in an OLAC A project can be described in an OLAC

record using elements like Contributor, record using elements like Contributor, Language, Linguistic data type, Language, Linguistic data type, Description Description

An advantage of this approach is that An advantage of this approach is that projects appear with all other projects appear with all other resources in any OLAC-based serviceresources in any OLAC-based service

1515

ELIIP could …ELIIP could …

Propose a metadata refinement to Propose a metadata refinement to distinguish a project from other distinguish a project from other kinds of “events”kinds of “events”

Curate records that allow linguists Curate records that allow linguists to describe their own projectsto describe their own projects

Help players like funding agencies Help players like funding agencies with databases of relevant projects with databases of relevant projects to become OLAC data providers to become OLAC data providers

1616

5. Training and 5. Training and revitalizationrevitalization

The OLAC infrastructure can support thisThe OLAC infrastructure can support this

A training course or revitalization program A training course or revitalization program can be described in an OLAC record with can be described in an OLAC record with DCMI Type = “Event” + OLAC resource DCMI Type = “Event” + OLAC resource type = “language instruction” + type = “language instruction” + Language, Description, Identifier for a URL Language, Description, Identifier for a URL

This approach allows these programs to This approach allows these programs to appear with all other resources for the appear with all other resources for the language in any OLAC-based servicelanguage in any OLAC-based service

1717

ELIIP could …ELIIP could …

Curate records that allow these Curate records that allow these programs to describe themselvesprograms to describe themselves

Help players who are curating Help players who are curating databases of training events to databases of training events to become OLAC data providers become OLAC data providers

1818

6. Language situation6. Language situation

No suitable interoperation standard No suitable interoperation standard yet exists for population data, etc. yet exists for population data, etc.

Are there other projects already Are there other projects already curating this kind of information such curating this kind of information such that interoperation is desirable?that interoperation is desirable?– E.g. UNESCO Atlas, Ethnologue, E.g. UNESCO Atlas, Ethnologue,

AUSTLANGAUSTLANG

But interoperation will only work if all But interoperation will only work if all the players agree to do itthe players agree to do it

1919

ELIIP could …ELIIP could …

During proposal phase, identify the During proposal phase, identify the projects that should interoperate and projects that should interoperate and secure agreement in principle to secure agreement in principle to participateparticipate

During the project phase, foster the During the project phase, foster the process among those players to agree on process among those players to agree on standard definitions, format, and standard definitions, format, and protocolprotocol

Could use the OAI protocolCould use the OAI protocol– ““olac” payload for the metadataolac” payload for the metadata– ““eliip” payload for the language informationeliip” payload for the language information

2020

ELIIP could also …ELIIP could also …

Provide a feedback mechanism that Provide a feedback mechanism that allows a user to report an error back to allows a user to report an error back to the provider of the aggregated datathe provider of the aggregated data

Provide a publicly viewable tracking Provide a publicly viewable tracking mechanism to ensure accountability of mechanism to ensure accountability of the data providers, e.g.the data providers, e.g.– Is a population in Ethnologue or UNESCO Is a population in Ethnologue or UNESCO

wrong because they won’t fix it when wrong because they won’t fix it when someone reports the right data, or someone reports the right data, or because the person who knows won’t because the person who knows won’t tell them?tell them?

2121

Nota BeneNota Bene

None of the “ELIIP could” proposals up to None of the “ELIIP could” proposals up to this point would require the overhead of this point would require the overhead of a governing body or regional captains to a governing body or regional captains to vet individual data points (though they vet individual data points (though they would still have a role in recommending would still have a role in recommending and vetting aggregation sources). and vetting aggregation sources).

That threshold is crossed if ELIIP chooses That threshold is crossed if ELIIP chooses to:to:– Curate its own version of language Curate its own version of language

situation data that it judges to be the situation data that it judges to be the most correctmost correct

2222

7. Genetic 7. Genetic classificationclassification Same story as for “language situation” Same story as for “language situation”

informationinformation

If the set of data providers is the same If the set of data providers is the same as for the situation information, then this as for the situation information, then this could be included in the interoperation could be included in the interoperation standard as a kind of situation standard as a kind of situation informationinformation

If there is a different set of players, ELIIP If there is a different set of players, ELIIP could foster the same process to could foster the same process to develop an interoperation standard for develop an interoperation standard for classificationclassification

2323

Thought for the dayThought for the day

The aggregator lies at the sweet The aggregator lies at the sweet spot in the value chain of today’s spot in the value chain of today’s web economy.web economy.

– E.g.E.g. Google, Amazon, iTunes, Netflix Google, Amazon, iTunes, Netflix

– Cf.Cf. Chris Anderson, Chris Anderson, The Long Tail The Long Tail (2006)(2006)

2424

ConclusionConclusion

There are many things that ELIIP could do:There are many things that ELIIP could do:– To exploit the power of interoperationTo exploit the power of interoperation– For mobilizing our community to share For mobilizing our community to share

information about endangered languagesinformation about endangered languages– While minimizing what it must centrally curateWhile minimizing what it must centrally curate

The task for the ELIIP planners is to The task for the ELIIP planners is to decide which of these things they wantdecide which of these things they wantto doto do