ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.

15
ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH

Transcript of ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.

Page 1: ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.

ISO-PWI 24622 Lexical ontologysome loose remarks 

Thierry Declerck, DFKI GmbH

Page 2: ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.

Possible Topics (Nicoletta)• Relations between Lexicon and Ontology. Possible

issues/questions: – Is LMF enough to represent Ontological links? – How to connect work being done in ISO Lexical group and ISO Ontology

groups?– Lexicon and Ontologies: separation? or lexicalised ontologies? or

ontologised lexicons?– Lexicon, Ontologies and Domains– Relation to multilinguality – On a very different dimension: Ontology of lexical/semantic/conceptual

categories? Standardised semantic categories, ontology labels?– A general question: where to store best the semantics of words/terms

etc. In linguistic lexicon or in ontologies?• The items in italic are also central within the MONNET project

(http://www.monnet-project.eu/) • Extension to the Linked Data Cloud for establishing links between

lexical/linguistic information and semantic web? See www.lexvo.org as a first start. / Place of ISO standards (Data Categories) in the cloud?

Page 3: ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.

Relevance of the topic for HLT

• Multingual extraction of and access to information, which is encoded in knowledge representation systems. Towards (again?) knowledge-driven NLP applications.

• But need to clarify the understandings of the word „lexicon“ in the Semantic Web (SW) and in the HLT communities– In SW, a lexicon is probably the list of all natural

language expressions occuring in labels of ontologies– In HLT, the lexicon is probably the repository of all

„lexemes“ of a language

• Relation between both understandings? LMF =>

Page 4: ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.

LMF: Some definitionshttp://www.lexicalmarkupframework.org/

• 3.14 form sequence of morphs • 3.18 graph minimal unit in a written language including letters, pictograms,

ideograms, numerals and punctuations • 3.24 lemma lemmatised form canonical form conventional form chosen to

represent a lexeme • 3.25 lexeme abstract unit generally associated with a set of forms sharing

a common meaning • 3.26 lexical entry container for managing one or several forms and

possibly one or several meanings in order to describe a lexeme • 3.28 lexicon resource comprising lexical entries for a given language

NOTE A special language lexicon or a lexicon prepared for a specific NLP application can comprise a specific subset of language.

• 3.31 morph sequence of graphs or sequence of phones EXAMPLE The word boys consists of two morphs: boy and s.

• 3.38 phone minimal unit in the sound system of a language

Page 5: ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.

Question to LMF: Is the list of labels of a KR system a lexicon?

Page 6: ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.

If Yes? The focus of ISO?

Maybe give a clear definition of what is meant by „lexicalisation“ of ontologies?

• Representig the linguistic information that is conveyed by the „entries“ (or labels):– Apply LMF and LAF/GrapH to the „entries“, or– Another model, like „lemon“ (see next slides on the MONNET

project), related to LMF => This probably an item for W3C (interoperability on the Web)

• Are we done then?– I think not: we have to „compactize“ the ontology lexicon, and

avoid redundancy of entries. We need then to be able to link the „lexicalized“ entries to the larger terms and to the domain concepts => towards a domain specific lexical network. A possible model ist CTL (see next slide)

• A basic question: Would the Ontologizing the lexicalised ontologies („lexicon“) yeld another model of the domain?

Page 7: ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.

CTL, Declerck & Lendvai, LREC 2010

Page 8: ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.

A concrete Example from XBRL

Page 9: ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.

Issues for lexicalisation

• In the example on last slide, linguistic information for a lexical item can be distributed over the labels of different classes. How to cope with this?

• Or the term (label): „Profit (loss) from continuing operations“. Should this be translated into one „ontology lexical entry“ or three:– 1. Profit from continuing operations

– 2. Loss from continuing operations

– 3. Profit (loss) from continuing operations

• Clearly: lexicalisation can lead to very redundant and generalisation missing „ontology lexical entry“.

Page 10: ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.

Or has ISO another focus?

• Ontologised (multilingual) lexicons?– Rather than “lexicalised ontologies”– Toward a domain specific “LinguisticNet”

(combination of WordNet and FrameNet like structures)

• Other principles of building a lexicon as the one of just using term in labels?

• Are we not too restricted if we consider only the lexicon? Should we extend to syntax etc?

Page 11: ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.

MONNET project

• The Monnet project is concerned mainly with ontology localisation, i.e., the translation of the lexico-terminological level of ontologies (often referred to as the ‘ontology labels’). The project outcomes, as currently understood by the project members, can be described as a set of software components as follows, all of which can be used in combination as well as stand-alone:– Ontology Lexicalization (with use of ISO datcats)– Ontology Localization – Cross-lingual Ontology-based Information Extraction – Cross-lingual Knowledge Access & Presentation

Page 12: ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.

MONNET project: Architecture

Page 13: ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.

MONNET: Details• The core objective of Monnet is the provision of advanced services for the translation

of the lexico-terminological level of ontologies, which will be instantiated by the ‘localization service’.

• However, as ontologies often have only a very limited representation of lexico-terminological information, a first step will be to analyze a given ontology and enrich it with appropriate information on

– i) the terminological structure of ontology labels, – ii) linguistic information on terminology items, and – iii) analysis of implicit semantics where needed.

• Together we refer to these analysis and enrichment steps as “ontology lexicalisation”, which will be instantiated by the ‘lexicalisation service’ that takes as input an ontology and outputs an ‘ontology-lexicon’ for at least one default language (depending on the language that was used in defining ontology labels). A ‘corpus service’ will enable access to external domain corpus evidence for modelling and analyzing language use in the ontology labels. The ontology-lexicon will be represented on the basis of the so-called ‘lemon’ format[1], a lexicon model for ontologies that has been defined by the Monnet project for the appropriate integration of lexical/linguistic and terminological information in ontologies. The different lexicons will therefore be handled by use of the ‘lemon API’ as shown.

• This aspect of MONNET will possibly be standardized within W3C activities• [1] http://lexinfo.net/

Page 14: ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.

The lemon model, for encodinglexicalized ontologies

Page 15: ISO-PWI 24622 Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.

Role of ISO Standards for Lexicalization

W3C is mainly about the representation and linking of information.

ISO could design some guidelines on how to formulate labels of ontologies/semantic resources in the LD?