Post on 02-Aug-2022
Julia Bosque-Gil
From Dictionaries to Cross-lingual Lexical Resources
Guadalupe Aguado-de-Cea, Elena Montiel-Ponsoda, Ilan Kernerman, Noam Ordan
Ontology Engineering Group (OEG) Universidad Politécnica de Madrid (UPM)
Acknowledgements Spported by the Spanish Ministry of Economy and Competitiveness project 4V (TIN2013-46238-C4-2-R), the Excellence Network ReTeLe (TIN2015-68955-REDT), and the Juan de la Cierva program and by the Spanish Ministry of Education, Culture and Sports through the FPU program.
{lupe,emontiel}@fi.upm.es
K Dictionaries, Tel Aviv {ilan,noam}@kdictionaries.com
WHY
MetaForum 2016, 4-5.07.16, Lisboa, Portugal
Data in proprietary formats: • Isolated • Not interoperable • Multiple access points • Duplicities
MetaForum 2016, 4-5.07.16, Lisboa, Portugal
http://es.wiktionary.org
http://rae.es
http://www.wikilengua.org/index.php/Terminesp:red
http://es.wikipedia.org
http:/www.apertium.org
“Red” (computer network)
MetaForum 2016, 4-5.07.16, Lisboa, Portugal
http://kdictionaries.com
*Picture attribution: http://commons.wikimedia.org/wiki/User:Gugerell
“Red”
Etimology: Del latin “rete”
Gender: “f”
Definition: “Conjunto de ordenadores o de equipos informáticos conectados entre sí….”
“Red”
translations: “xarxa”(ca), “rede”(ga), …
“Red”
Norm: UNE 21302-131
English: network
German: Netzwerk
“Red”
Pronunciation: [red]
Grammar category: sustantivo femenino
Singular: “red”
Plural: “redes”
“Red_de_computadores”
Category: redes informáticas
Image
Complementary information, but
not linked
MetaForum 2016, 4-5.07.16, Lisboa, Portugal
“Red”
translation:“rede”(pt)
Grammatical categoty: n, f
Definition:“conjunto de computadoras con (...)
NIFNLPInterchangeFormat
Linguistic Linked Open Data cloud
MetaForum 2016, 4-5.07.16, Lisboa, Portugal
WHAT
MetaForum 2016, 4-5.07.16, Lisboa, Portugal
K Dictionaries multi-language Global Series
4
• Spanish set of the K Dictionaries (KD) multi-language Global Series (24 languages) • The approach followed in this series is to compile for each language a core vocabulary as a standalone project and have it translated to other languages in more projects. • Not biased towards any language, and each is represented on its own terms. • Translated to another language at a later phase, creating a pair-specific, and thus pair-sensitive, interlingual representation
http://kdictionaries-online.com/
MetaForum 2016, 4-5.07.16, Lisboa, Portugal
K Dictionaries XML proprietary format
4
<SenseGrp identifier="SE00000730" version="1"> <Synonym>agitado</Synonym> <Definition>que es muy animado</Definition> <TranslationCluster identifier="TC00001663" text="que es muy animado" type="def"> <Locale lang="nl"> <TranslationBlock> <TranslationCtn> <Translation>verhit</Translation> </TranslationCtn> <TranslationCtn> <Translation>vurig</Translation> </TranslationCtn> </TranslationBlock> </Locale> <Locale lang="no"> <TranslationBlock> <TranslationCtn> <Translation>ivrig, oppsatt, opphetet</Translation> </TranslationCtn> </TranslationBlock> […]
Multilingual information for the Spanish headword acalorado (heated)
Dutch
Norwegian
K Dictionaries XML proprietary format
4
[…] </TranslationCluster> <ExampleCtn type="sid" version="1"> <Example>sesión acalorada</Example> <TranslationCluster identifier="TC00001664" text="sesión acalorada" type="exmp"> <Locale lang="nl"> <TranslationBlock> <TranslationCtn> <Translation>vurige zitting</Translation> </TranslationCtn> </TranslationBlock> </Locale> <Locale lang="no"> <TranslationBlock> <TranslationCtn> <Translation>et opphetet møte</Translation> </TranslationCtn> </TranslationBlock> […] </TranslationCluster>
HOW
MetaForum 2016, 4-5.07.16, Lisboa, Portugal
5 tasks to migrate resources into linked data (Vila Suero et al., 2014):
1. Data exploration
2. URI naming strategy definition
3. Modeling
4. RDF generation
5. Linking
MetaForum 2016, 4-5.07.16, Lisboa, Portugal
5 tasks to migrate resources into linked data (Vila Suero et al., 2014):
1. Data exploration
2. URI naming strategy definition
3. Modeling
4. RDF generation
5. Linking
MetaForum 2016, 4-5.07.16, Lisboa, Portugal
Model for representing linguistic information in RDF(s) (semantics by reference)
“An ontology-based semantic lexicon would leave the semantics to the ontology, focusing instead on providing domain-specific terms and object descriptions in the ontology.” (Buitelaar, 2010)
• Concise and descriptive (external repositories of linguistic categories) • Modular and extensible (5 modules)
Modeling
lemon-ontolex
MetaForum 2016, 4-5.07.16, Lisboa, Portugal
LMF -Lexical Markup Framework (ISO 24613)
XML LexInfo, LIR
Represent lexical information relative to an ontology OWL
SKOS (W3C Standard) Designed for Taxonomy/Vocabulary representation in RDF
Nowadays… de facto standard for transforming linguistic resources into the linked data format
From McCrae, J. , “lemon: The Lexicon Model for Ontologies” SD-LLOD-15
The origins…
http://linghub.lider-project.eu/
MetaForum 2016, 4-5.07.16, Lisboa, Portugal
lemon-ontolex
8 MetaForum 2016, 4-5.07.16, Lisboa, Portugal
The vartrans module
8 MetaForum 2016, 4-5.07.16, Lisboa, Portugal
Modelling example for acalorado
8
MetaForum 2016, 4-5.07.16, Lisboa, Portugal
CONCLUSIONS
MetaForum 2016, 4-5.07.16, Lisboa, Portugal
LD paradigm
4
EN DE
ES
FR
IT
MetaForum 2016, 4-5.07.16, Lisboa, Portugal
• Innovative way of representing and linking data
• Keeping language specificities as much as possible, in a cross-lingual graph, bottom-up fashion • Retrieving language information by linking translations • In the KD case, RDF links contribute to automatic growth of lexical resources, at a different pace • “One-stop shop”, rather than many shops
Thank you! See you at the poster
session…
MetaForum 2016, 4-5.07.16, Lisboa, Portugal