Samuel Läubli, Sabine Tittel, Martin-Dietrich Glessgen, Linking Primary Texts to Electronic...

23
Linking Primary Texts to Electronic Dictionaries COST Workshop “Connecting Textual Corpora and Dictionaries” Samuel Läubli 1,3 Sabine Tittel 2 Martin-Dietrich Glessgen 1 1 Institute of Romance Studies University of Zurich 2 Dictionnaire Étymologique de l’Ancien Français (DEAF) Heidelberg Academy of Sciences and Humanities 3 Institute of Computational Linguistics University of Zurich April 26, 2013

description

Samuel Läubli, Sabine Tittel, Martin-Dietrich Glessgen, Linking Primary Texts to Electronic Dictionaries

Transcript of Samuel Läubli, Sabine Tittel, Martin-Dietrich Glessgen, Linking Primary Texts to Electronic...

Linking Primary Texts to Electronic DictionariesCOST Workshop “Connecting Textual Corpora and Dictionaries”

Samuel Läubli1,3 Sabine Tittel2 Martin-Dietrich Glessgen1

1Institute of Romance StudiesUniversity of Zurich

2Dictionnaire Étymologique de l’Ancien Français (DEAF)Heidelberg Academy of Sciences and Humanities

3Institute of Computational LinguisticsUniversity of Zurich

April 26, 2013

Samuel Läubli | 2/23

Contents

1. Introduction

2. Concept & Requirements

3. InterfaceBack-EndFront-End

4. Plan of Action

5. Conclusion

Sabine Tittel | 3/23

Introduction

1. Introduction

Sabine Tittel, Samuel Läubli | 4/23

Concept & Requirements

2. Concept & RequirementsConnecting Phoenix2 and DEAFél

Samuel Läubli | 5/23

Concept & Requirements

Current State: Phoenix2

Samuel Läubli | 6/23

Concept & Requirements

Phoenix2: Earlier Concept

Samuel Läubli | 7/23

Concept & Requirements

Current State: DEAF Writing System

Samuel Läubli | 8/23

Concept & Requirements

Aim

What do we want to do?

⇒ Include references to DocLing texts in DEAFél (attestations)

DocLing chHM 130; DocLing chMe 195; ...

Samuel Läubli | 9/23

Concept & Requirements

DocLing: Charte chMe 195

Date: Octobre 1266

Type de document: charte: affranchissement

Auteur: Jean seigneur de Joinville et sénéchal de Champagne

...

... 44 Cil de Moustier pourront amener en la vile totes fames parmariaige qui n’ a \37 veront suite ne reclain d’ autre seignour · etautre fames non fors mes fames de cors · 45 Et li home de Moustierne porront marier lour fillies se à mes homes non de ma propreterre · ou à ceus de la juree · 46 Les genz de Moustier ne poentfaire lour fyé clers se par moi non · Et cil de \38 Moustier peuventfaire mairiage aus genz de la terre mon frere de Vauquelour, seloncl’ atiremant ...

Samuel Läubli | 10/23

Concept & Requirements

Requirements

Phoenix2 DEAF Writing System

• Adapt to DEAFlemmatization policy

• Lemmatize texts

• Serve to texts / occurrences

• Enhance writing system withGUIs to integrate DocLingattestations

• Fetch texts / occurrences

↖ ↗INTERFACE

Sabine Tittel, Samuel Läubli | 11/23

Interface

3. InterfaceBack-End | Front-End

Samuel Läubli | 12/23

Interface Back-End

Back-End

“ We needsome kind of interface

to do this ”

Samuel Läubli | 13/23

Interface Back-End

Back-End: SOAP Service

We decided to implement a SOAP service

• Protocol specification for exchanging data via RPC/HTTP• Official W3C recommendation• Uses XML as a transport format• Fully platform independent

Samuel Läubli | 14/23

Interface Back-End

Back-End: SOAP Service

The Phoenix2 SOAP Service provides two functions:

getOccurrences ( Lemma )getOccurrenceDetails ( OccurrenceID )

Phoenix2 DEAFél Phoenix2 DEAFélgetOccurrences(lemma="marïage")

Phoenix2 DEAFélgetOccurrences(lemma="marïage")

OccurrenceCollection (XML)

0..*

Samuel Läubli | 15/23

Interface Back-End

Back-End: SOAP Service

The Phoenix2 SOAP Service thus enables the following functionality:

getOccurrences ( Lemma )

⇒ Show all occurrences, given a lemma

getOccurrenceDetails ( OccurrenceID )

⇒ Show meta information, given the ID (a numeral identifier) of anoccurence

Try it yourself

SOAP Endpoint (document/literal)http://sa.muel.tv/test/soap/ph2deafel.wsdl

Short WSDL Documentationhttp://sa.muel.tv/test/soap/doc/wsdl.html

XML Schema Definitions (XSD)http://sa.muel.tv/test/soap/doc/xsd.html

Sabine Tittel | 16/23

Interface Front-End

Front-End

The DEAF develops a number of graphical user interfaces (GUIs) which

• Build upon the DEAF’s electronic dictionary writing system• Allow for an integration of DocLing material

⇒ No complete blend: DocLing material will continue to be recognizeableas external material

Sabine Tittel | 17/23

Plan of Action

4. Plan of Action

Sabine Tittel | 18/23

Plan of Action

Release

Our joint work is foreseen to be released in three steps:

1. All materials without semantic structure2. All materials with a dedicated semantic structure for DocLing entries3. Full integration of the DocLing entries into the DEAF article structure

Samuel Läubli | 19/23

Plan of Action

Milestones

Phoenix2 DEAF Writing System

4 Migrate old lemmata

• Implement SOAP service

• Lemmatize texts

• Implement new GUIs

• Adapt publication format(web edition)

⇒ First version due in autumn 2013

Sabine Tittel, Samuel Läubli | 20/23

Conclusion

5. Conclusion

Sabine Tittel, Samuel Läubli | 21/23

Conclusion

Benefits

Both DocLing and Phoenix2 benefit from our cooperation:

Phoenix2 DEAF

The vocabulary of the DocLing textsis embedded into its natural contextof the Old French language, and—viathe DEAF’s etymological discussion—in the broader context of the Romancelanguages.

A considerable number of digitalsource texts is added to the dic-tionary. This new source materialwill strengthen the foundation ofthe semantic structure of the DEAFarticles and enhance its quality.

Sabine Tittel, Samuel Läubli | 22/23

Conclusion

Conclusion

Questions?Feedback is always very welcome

Sabine Tittel, Samuel Läubli | 23/23

Thank YouThese slides are available atwww.cl.uzh.ch/people/team/laeubli.html