LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language...

37
LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U

Transcript of LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language...

Page 1: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use

Helen Aristar-DryInstitute for Language Information and Technology

Eastern Michigan U

Page 2: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

Outline Background: The RELISH project The LEGO project

Project interface Project workflow

LIFT: Lexicon Interchange Format What it is Use in the LEGO project (“LL-LIFT”) Sample entry: comparison with LMF-

compliant XML output by LEXUS

Page 3: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

The RELISH Project RELISH: Rendering Endangered Lexicons

Interoperable through Standards Harmonization Joint project: U of Frankfurt, MPI-Nijmegen,

LINGUIST List Goal: markup harmonization at two levels

Semantic Structural

Test cases: 6 lexicons fr. LEGO & Lexus

Page 4: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

RELISH: MPI, ILIT and Lexicon Standards

Page 5: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.
Page 6: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

LEGO & RELISH

LEGO TestLexicons

LEXUS Test Lexicons

- LL-LIFT- GOLD

- LMF-compliant XML (various)- ISOCats

RELISH

Interchange format: TEI? LIFT?

Page 7: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

The LEGO Project 3-year project sponsored by the NSF Participants: ILIT (Linguist List) & U at Buffalo Goal: a “datanet” of interoperable lexicons. Interoperability based on: grammatical information mapped to GOLD structure mapped to a common schema (LL-LIFT) output in RDF

Page 8: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

The LEGO Project Initial set of lexical resources: 17 EL lexicons prepared by LINGUIST List: Shoshone, Archi, Kayardild, Fulfulde, Mocovi, Biao

Mien, Potawatomi, Saliba, W. Pantar, W. Sissala, Wichi, …

3000+ wordlists prepared by U at Buffalo:Usher-Whitehouse lists, Loanword Typology Lists, Intercontinental Dictionary Series lists.

Extensible: In practice: to Lexicons in LIFT Possibly: RDF, TEI (or another official serialization

of LMF)

Page 9: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

The LEGO Project Purposes Not intended to develop a lexicon creation or lexicon display tool

Intended to support multi-lexicon search and comparison

demonstrate the value of digital standards in linguistic research

Page 10: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

LL

Descriptive XML

LL-LIFT (XML)

LEGO db

mapping to GOLD

LEGO Workflow

LEGO Interoperable

Lexicon

AccessLexicon - MS Word Excel Toolbox Filemaker

Page 11: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

LEGO: lego.linguistlist.org

Page 12: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

LEGO Browse Lexicons Page

Page 13: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

LEGO: Lexicon ‘homepage’

Page 14: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

LEGO: Browse Lexicon Entries

Page 15: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.
Page 16: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

Demonstration of Interoperability: LEGO Search

Page 17: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

LEGO: Search multiple lexicons by LIFT field (Definition, Example, Variant, etc.

Page 18: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

LEGO: Search multiple lexicons by grammatical information (GOLD concept).

Page 19: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

Extending LEGO: Upload Lexicon or Wordlist

Page 20: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

Map to GOLD: Choose Lexicon to Edit

Page 21: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

Mapping 1: View lexicon’s labels for grammatical concepts

Page 22: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

Mapping 2: Click to view GOLD concepts, Click ‘Add’ to Map

Page 23: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

Mapping 4: Lexicon label mapped to GOLD concept

Page 24: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

Mapping 3: Access GOLD interface (if needed)

Page 25: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

LIFT LIFT = Lexicon Interchange Format XML format for storing and exchange of

lexical information Developed by SIL International Designed to be easy to convert into and

out of MDF and Fieldworks formats Current version:

http://code.google.com/p/lift-standard/downloads/detail?name=lift_13.pdf

Page 26: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

Programs that support LIFT

WeSay uses LIFT as its primary format. FieldWorks Language Explorer (FLEx)

 can import and export LIFT files. Lexique Pro can open LIFT documents

for viewing, printing, and making web pages. It can also save to LIFT format.

(fr. http://code.google.com/p/lift-standard/)

Page 27: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

Utilities for LIFT

Solid can convert basic SFM (standard format markers, e.g. Toolbox format) to LIFT (see: http://lingtransoft.info/apps/solid

LiftTweaker Can selectively modify a LIFT file for targeting different audiences

Page 28: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

Lexicons in LIFT LIFT chosen as upload format for LEGO

because of the large number of lexicons potentially available in LIFT About 50 published lexicons in Lexique Pro 180+ lexicons in Fieldworks Language

Explorer (FLEx) ? 300+ lexicons in Shoebox/Toolbox ?

With the owner’s permission, these could easily be integrated into the LEGO system

Page 29: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

LIFT UML Diagram for Entry

Page 30: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

Notes Grammatical Info is attached to Sense,

not to Entry or Form (differs from LMF) Variant is attached to Entry, not ‘sense’ –

can’t add a ‘sense’ to a variant Multiple senses and variants allowed Highly customizable: Field, Type, and

Range can be added to virtually any element (can be defined in the document header)

Page 31: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

LL-LIFT Lack of constraint on use of Field

and Trait constituted a problem for LEGO

Developed ‘LL-LIFT’ a constrained form of LIFT which still validates against the

LIFT schema

Page 32: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

LL-LIFT Major constraints

No Header Grammatical Information confined to a

single element Delimited within db field Parsed out during GOLD mapping

Minor entries, comparison forms, etc separate entries unified via ‘relation’ element

Page 33: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

lexical-unit

dialects

sense

grammatical-info

definition

variant

relation

note

note

example

Structure of a LIFT Entry

paradigmatic traits

Page 34: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

<entry id="d1e56244">

<trait name="original-id" value="2c9090a22632946601267b22f98e5098"/>

<lexical-unit>

<form lang="syv">

<text>дадагалзаар</text>

</form>

</lexical-unit>

<variant>

<form lang="syv">

<text>dadagalzaar</text>

</form>

</variant>

<sense>

<grammatical-info value="n."/>

<definition>

<form lang="eng">

<text>doubt</text>

</form>

</definition>

</sense>

</entry>

Tuva entry in LIFT

Page 35: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

<LexicalEntry id="2c9090a22632946601267b22f98e5098">

<DataCategory type="rank">5155</DataCategory>

<DataCategory type="lexeme">дадагалзаар</DataCategory>

<DataCategory type="part of speech">n.</DataCategory>

<Form id="2c9090a22632946601267b22f9fd50a0" type="form">

<DataCategory type="image"/>

<DataCategory type="Audio">tvn_5155.mp3</DataCategory>

</Form>

<Sense id="2c9090a22632946601267b22f9fd50a6" type="sense">

<DataCategory type="gloss">doubt</DataCategory>

<DataCategory type="transcription">dadagalzaar</DataCategory>

</Sense>

<ListOfComponents/>

</LexicalEntry>

Tuva entry exported from LEXUS (LMF-compliant)

Page 36: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

<entry id="d1e56244">

<trait name="original-id” value=“2…"/>

<lexical-unit>

<form lang="syv">

<text>дадагалзаар</text>

</form>

</lexical-unit>

<variant>

<form lang="syv">

<text>dadagalzaar</text>

</form>

</variant>

<sense>

<grammatical-info value="n."/>

<definition>

<form lang="eng">

<text>doubt</text>

</form>

</definition>

</sense>

</entry>LL-LIFT

<LexicalEntry id="2…">

<DatCat type="rank">5155</DatCat>

<DatCat type="lexeme">

дадагалзаар</DatCat>

<DatCat type="part of speech">n.

</ DatCat >

<Form id="2…" type="form">

< DatCat type="image"/>

< DatCat type="Audio">

tvn_5155.mp3</DatCat>

</Form>

<Sense id="2…" type="sense">

< DatCat

type="gloss">doubt</DatCat >

< DatCat type="transcription">

dadagalzaar</DatCat>

</Sense>

<ListOfComponents/>

</LexicalEntry>

LMF

Page 37: LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.

Summary LL-LIFT = current XML schema used in LEGO Easy to transform (at least one) LMF-compliant XML output of LEXUS to LL-LIFT Transformation could be extended to TEI serialization of LMF. May require constraint