From Dictionaries to Cross-lingual Lexical Resources

21
Julia Bosque-Gil From Dictionaries to Cross-lingual Lexical Resources Guadalupe Aguado-de-Cea, Elena Montiel-Ponsoda, Ilan Kernerman, Noam Ordan Ontology Engineering Group (OEG) Universidad Politécnica de Madrid (UPM) Acknowledgements Spported by the Spanish Ministry of Economy and Competitiveness project 4V (TIN2013-46238-C4-2-R), the Excellence Network ReTeLe (TIN2015-68955-REDT), and the Juan de la Cierva program and by the Spanish Ministry of Education, Culture and Sports through the FPU program. {lupe,emontiel}@fi.upm.es K Dictionaries, Tel Aviv {ilan, noam}@kdictionaries.com

Transcript of From Dictionaries to Cross-lingual Lexical Resources

Page 1: From Dictionaries to Cross-lingual Lexical Resources

Julia Bosque-Gil

From Dictionaries to Cross-lingual Lexical Resources

Guadalupe Aguado-de-Cea, Elena Montiel-Ponsoda, Ilan Kernerman, Noam Ordan

Ontology Engineering Group (OEG) Universidad Politécnica de Madrid (UPM)

Acknowledgements Spported by the Spanish Ministry of Economy and Competitiveness project 4V (TIN2013-46238-C4-2-R), the Excellence Network ReTeLe (TIN2015-68955-REDT), and the Juan de la Cierva program and by the Spanish Ministry of Education, Culture and Sports through the FPU program.

{lupe,emontiel}@fi.upm.es

K Dictionaries, Tel Aviv {ilan,noam}@kdictionaries.com

Page 2: From Dictionaries to Cross-lingual Lexical Resources

WHY

MetaForum 2016, 4-5.07.16, Lisboa, Portugal

Page 3: From Dictionaries to Cross-lingual Lexical Resources

Data in proprietary formats: •  Isolated •  Not interoperable •  Multiple access points •  Duplicities

MetaForum 2016, 4-5.07.16, Lisboa, Portugal

Page 4: From Dictionaries to Cross-lingual Lexical Resources

http://es.wiktionary.org

http://rae.es

http://www.wikilengua.org/index.php/Terminesp:red

http://es.wikipedia.org

http:/www.apertium.org

“Red” (computer network)

MetaForum 2016, 4-5.07.16, Lisboa, Portugal

http://kdictionaries.com

Page 5: From Dictionaries to Cross-lingual Lexical Resources

*Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell

“Red”

Etimology: Del latin “rete”

Gender: “f”

Definition: “Conjunto de ordenadores o de equipos informáticos conectados entre sí….”

“Red”

translations: “xarxa”(ca), “rede”(ga), …

“Red”

Norm: UNE 21302-131

English: network

German: Netzwerk

“Red”

Pronunciation: [red]

Grammar category: sustantivo femenino

Singular: “red”

Plural: “redes”

“Red_de_computadores”

Category: redes informáticas

Image

Complementary information, but

not linked

MetaForum 2016, 4-5.07.16, Lisboa, Portugal

“Red”

translation:“rede”(pt)

Grammatical categoty: n, f

Definition:“conjunto de computadoras con (...)

Page 6: From Dictionaries to Cross-lingual Lexical Resources

NIFNLPInterchangeFormat

Linguistic Linked Open Data cloud

MetaForum 2016, 4-5.07.16, Lisboa, Portugal

Page 7: From Dictionaries to Cross-lingual Lexical Resources

WHAT

MetaForum 2016, 4-5.07.16, Lisboa, Portugal

Page 8: From Dictionaries to Cross-lingual Lexical Resources

K Dictionaries multi-language Global Series

4

•  Spanish set of the K Dictionaries (KD) multi-language Global Series (24 languages) •  The approach followed in this series is to compile for each language a core vocabulary as a standalone project and have it translated to other languages in more projects. •  Not biased towards any language, and each is represented on its own terms. •  Translated to another language at a later phase, creating a pair-specific, and thus pair-sensitive, interlingual representation

http://kdictionaries-online.com/

MetaForum 2016, 4-5.07.16, Lisboa, Portugal

Page 9: From Dictionaries to Cross-lingual Lexical Resources

K Dictionaries XML proprietary format

4

<SenseGrp identifier="SE00000730" version="1"> <Synonym>agitado</Synonym> <Definition>que es muy animado</Definition> <TranslationCluster identifier="TC00001663" text="que es muy animado" type="def"> <Locale lang="nl"> <TranslationBlock> <TranslationCtn> <Translation>verhit</Translation> </TranslationCtn> <TranslationCtn> <Translation>vurig</Translation> </TranslationCtn> </TranslationBlock> </Locale> <Locale lang="no"> <TranslationBlock> <TranslationCtn> <Translation>ivrig, oppsatt, opphetet</Translation> </TranslationCtn> </TranslationBlock> […]

Multilingual information for the Spanish headword acalorado (heated)

Dutch

Norwegian

Page 10: From Dictionaries to Cross-lingual Lexical Resources

K Dictionaries XML proprietary format

4

[…] </TranslationCluster> <ExampleCtn type="sid" version="1"> <Example>sesión acalorada</Example> <TranslationCluster identifier="TC00001664" text="sesión acalorada" type="exmp"> <Locale lang="nl"> <TranslationBlock> <TranslationCtn> <Translation>vurige zitting</Translation> </TranslationCtn> </TranslationBlock> </Locale> <Locale lang="no"> <TranslationBlock> <TranslationCtn> <Translation>et opphetet møte</Translation> </TranslationCtn> </TranslationBlock> […] </TranslationCluster>

Page 11: From Dictionaries to Cross-lingual Lexical Resources

HOW

MetaForum 2016, 4-5.07.16, Lisboa, Portugal

Page 12: From Dictionaries to Cross-lingual Lexical Resources

5 tasks to migrate resources into linked data (Vila Suero et al., 2014):

1.  Data exploration

2.  URI naming strategy definition

3.  Modeling

4.  RDF generation

5.  Linking

MetaForum 2016, 4-5.07.16, Lisboa, Portugal

Page 13: From Dictionaries to Cross-lingual Lexical Resources

5 tasks to migrate resources into linked data (Vila Suero et al., 2014):

1.  Data exploration

2.  URI naming strategy definition

3.  Modeling

4.  RDF generation

5.  Linking

MetaForum 2016, 4-5.07.16, Lisboa, Portugal

Page 14: From Dictionaries to Cross-lingual Lexical Resources

Model for representing linguistic information in RDF(s) (semantics by reference)

“An ontology-based semantic lexicon would leave the semantics to the ontology, focusing instead on providing domain-specific terms and object descriptions in the ontology.” (Buitelaar, 2010)

•  Concise and descriptive (external repositories of linguistic categories) •  Modular and extensible (5 modules)

Modeling

lemon-ontolex

MetaForum 2016, 4-5.07.16, Lisboa, Portugal

Page 15: From Dictionaries to Cross-lingual Lexical Resources

LMF -Lexical Markup Framework (ISO 24613)

XML LexInfo, LIR

Represent lexical information relative to an ontology OWL

SKOS (W3C Standard) Designed for Taxonomy/Vocabulary representation in RDF

Nowadays… de facto standard for transforming linguistic resources into the linked data format

From McCrae, J. , “lemon: The Lexicon Model for Ontologies” SD-LLOD-15

The origins…

http://linghub.lider-project.eu/

MetaForum 2016, 4-5.07.16, Lisboa, Portugal

Page 16: From Dictionaries to Cross-lingual Lexical Resources

lemon-ontolex

8 MetaForum 2016, 4-5.07.16, Lisboa, Portugal

Page 17: From Dictionaries to Cross-lingual Lexical Resources

The vartrans module

8 MetaForum 2016, 4-5.07.16, Lisboa, Portugal

Page 18: From Dictionaries to Cross-lingual Lexical Resources

Modelling example for acalorado

8

MetaForum 2016, 4-5.07.16, Lisboa, Portugal

Page 19: From Dictionaries to Cross-lingual Lexical Resources

CONCLUSIONS

MetaForum 2016, 4-5.07.16, Lisboa, Portugal

Page 20: From Dictionaries to Cross-lingual Lexical Resources

LD paradigm

4

EN DE

ES

FR

IT

MetaForum 2016, 4-5.07.16, Lisboa, Portugal

•  Innovative way of representing and linking data

•  Keeping language specificities as much as possible, in a cross-lingual graph, bottom-up fashion •  Retrieving language information by linking translations •  In the KD case, RDF links contribute to automatic growth of lexical resources, at a different pace •  “One-stop shop”, rather than many shops

Page 21: From Dictionaries to Cross-lingual Lexical Resources

Thank you! See you at the poster

session…

MetaForum 2016, 4-5.07.16, Lisboa, Portugal