Cross-lingual ontology lexicalisation, translation and information extraction Net2 workshop,...

53
Cross-lingual ontology lexicalisation, translation and information extraction Net2 workshop, University of South Africa Tobias Wunner DERI, National University of Ireland, Galway Copyright 2010 Digital Enterprise Research Institute. All rights us-gaap: GainLossOnSaleOfOilAndGasProperty ifrs:Revenue de-gaap:BilanzsummeSummeAktiva be- gaaap:MinderwaardenBijDeRealisatieVanVasteActiva

Transcript of Cross-lingual ontology lexicalisation, translation and information extraction Net2 workshop,...

Page 1: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Cross-lingual ontology lexicalisation, translation and information

extraction

Net2 workshop, University of South AfricaTobias Wunner

DERI, National University of Ireland, Galway

Copyright 2010 Digital Enterprise Research Institute. All rights reserved, Paul Buitelaar

us-gaap: GainLossOnSaleOfOilAndGasPropertyifrs:Revenue

de-gaap:BilanzsummeSummeAktiva

be-gaaap:MinderwaardenBijDeRealisatieVanVasteActiva

Page 2: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Outline

1. Research challenge and motivation

2. Ontology Translation

3. Lexicalization (lemon)

4. CLOBIE (CL Ontology-based Inf. Extraction)

Page 3: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Context and Motivation

• Monnet use case in financial domain– query financial information

• Cross-vocabulary• Cross-lingual• Get result in your own language

• Research challenges– localization & translation of vocabularies

– cross-lingual ontology-based information extraction

Page 4: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Finance Terminology is complex!

“minimum finance lease payments

receivable,

at present value,

end of period not later than one year”representative term of financial domain

16 words complex structure (conceptually & linguistically)

Page 5: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

DomainIndependent

DomainRelated

Some insight in finance terminology

Domain Terminology(SAPTerm)

DomainRelated

Dictionary(WordNet)

DomainIndependent

XBRL(IFRS)

DomainSpecific

Page 6: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Break down complexity

Terminological Linguistic

3-faceted lexical enrichment

Termdecomposition

Semantic

asset

financialasset

non-financialasset

is-ais-a

available-for-salefinancial

asset

is-a

[asset]

[available-for-sale]

[financial]

[financial asset]

[non-financial asset]

[available-for-sale financial asset]

Noun_Sing: asset

Noun_Plural: assets

?P: available-for-sale

Adjective: financial

NP: available-for-sale fin. asset

VP: to sell financial assets

Page 7: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

XBRL – Semantic Analysis

Page 8: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

XBRL – Semantic Analysis

Page 9: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

XBRL – Semantic Analysis

“Enhance semantics tofacilitate translation andinformation extraction.”

Page 10: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

DomainSpecific

DomainRelated

DomainIndependent

sapTerm:payments

XBRL – Terminological Analysis

Minimum finance lease payments receivable, at present value

ifrs:MinimumFinanceLeasePaymentsReceivable

ifrs:MinimumFinanceLeasePaymentsReceivableAtPresentValue

sapTerm:financeLease

DomainRelated

DomainIndependent

DomainSpecific

googleDefine:leasePayments

googleDefine:Finance_lease

DomainIndependent

Page 11: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

XBRL – Linguistic Analysis

Financial text

XBRL term

plural

“… lease payment …”

… lease payments …

singular

simple

“… received minimum finance lease payments …”

minimum finance lease payments receivable

adverb

verb

complex

Page 12: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Outline

1. Research challenge and motivation

2. Ontology Translation

3. Lexicalization (lemon)

4. CLOBIE (CL Ontology-based Inf. Extraction)

Page 13: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

• Models developed in Monnet– English / German / Spanish / Dutch

• …Net2– Afrikaans?

– Zulu?

– Xhosa?

– …

Translation using STL

ifrs:Revenueifrs:ProfitLossBeforeTaxifrs:MinimumFinanceLeasePaymentsPayable

Page 14: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Application in Machine Translation

available-for-sale financial assets

[voor verkoop beschikbare] [financiële] [activa]

2. translate subterms using: domain TM (IFRS), Linked Open Data (DBPedia),

Translation services (GoogleTranslate)

[available-for-sale] [financial] [assets]

1. term analysis using:

IFRS, SAPTerm, GoogleDefine

3. term synthesis using:

voor verkoop beschikbare financiële activa

grammars (rules, statistical models)

in Dutch

Page 15: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Application in Machine Translation

available-for-sale financial assets

[beskikbaar vir verkoop] [finansiële] [bates]

2. translate subterms using: domain TM (IFRS), Linked Open Data (DBPedia),

Translation services (GoogleTranslate)

[available-for-sale] [financial] [assets]

1. term analysis using:

IFRS, SAPTerm, GoogleDefine

3. term synthesis using:

finansiële bates beskikbaar vir verkoop

grammars (rules, statistical models)

in Afrikaans

Page 16: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Application in Machine Translation

available-for-sale financial assets

[disponibles para la venta] [financia] [activos]

2. translate subterms using: domain TM (IFRS), Linked Open Data (DBPedia),

Translation services (GoogleTranslate)

[available-for-sale] [financial] [assets]

1. term analysis using:

IFRS, SAPTerm, GoogleDefine

3. term synthesis using:

activos financieros disponibles para la venta

grammars (rules, statistical models)

in Spanish

Page 17: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Outline

1. Research challenge and motivation

2. Ontology Translation

3. Lexicalization (lemon)

4. CLOBIE (CL Ontology-based Inf. Extraction)

Page 18: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Why do we need a lexicon?

“loads of unlinked domain-specific

terminology on the web !”

• An interoperable web for … ?– re-use

– enable multilinguality

– cross-lingual search

– cross-lingual fact extraction

http://en.wikipedia.org/wiki/Finance_lease

http://www.investopedia.com/terms/l/lease-payments.asp

Page 19: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Lexicon standards overview

• ISO (XML)– TEI (Text Encoding Initiative)– LMF (Lexical Markup Framework)

• W3C & Semantic Web (RDF / OWL)– build-in rdfs:label– lightweight linguistic representations (SKOS,

SKOS-XL)– rich linguistic representations (GOLD, LexInfo)

Page 20: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

SKOS – Multilingual Information

dbpedia:Lease_payments

ifrs:MinimumFinanceLease

Payments

dbpedia:Finance_lease skos:narrower

skos:related

skos:broader

skos:related

• SKOS concepts with…– germ relations

– multilingual labels

– resource references

Page 21: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

SKOS – Multilingual Information

Not much uptake yet? from http://data.nytimes.com/

Page 22: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Ontology-Text Mismatch

‘Edificio-historico’ vs. ‘…edificio, declarado Monumento Histórico…’

>> goes beyond SKOS (monolingual & multilingual term variants)

>> requires representation of lexical information to compute linguistic variants, e.g.

‘edificio historico[apposVP[NP[Adj]]]’

Page 23: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

A Lexicon Model for Ontologies

• Requirements for ‘ontology-lexicon’ model

– Represent linguistic information relative to ontology

• Avoid unnecessary ambiguities by representing only lexical features relevant to semantics of underlying application

– Keep semantics separate from linguistic info

• Separate clearly ‘world’ (properties of objects referred to by words) from ‘word’ (properties of words) knowledge

– Modular, minimal design

• Provide simple core model that can be easily extended upon need

Page 24: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Was there a solution already? - SKOS

• Simple Knowledge Organization System – SKOS

– General model for formalizing thesauri, terminologies and related semantic and knowledge resources

– Formalization of terminology in focus - terminology, classification, Semantic Web communities

– Does not address linguistic aspects of terminology, or therefore, the lexicon-ontology interface

– http://www.w3.org/2004/02/skos/

Page 25: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Was there a solution already? - GOLD

• General Ontology for Linguistic Description – GOLD

– Community-based ontology of linguistics

– Linguistic study in focus - linguistics community

– Formal model of linguistics as an ontology, but not about connecting lexical features to ontological semantics

– Other issues: very big, modularity?

– http://linguistics-ontology.org/gold/2010

Page 26: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Was there a solution already? - OWN

• OntoWordNet – OWN

– Formal specification of WordNet through extension and axiomatization of its conceptual relations

– Formal knowledge representation in focus - logic, knowledge representation, Semantic Web communities

– Turns WordNet into an ontology but not about connecting lexical features to ontological semantics

– http://wiki.loa-cnr.it/index.php/LoaWiki:OWN

Page 27: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Was there a solution already? - LMF

• Lexical Markup Framework – LMF

– General model for formalizing and sharing of machine-readable dictionaries

– Lexical knowledge representation in focus - lexicography, NLP communities

– Very close to ontology-lexicon requirements, but no view on how lexical features link to ontological semantics – semantics is limited to a notion of sense based on synsets

– Other issues: incomplete formal model, focus on classes, less on properties/relations

– http://www.lexicalmarkupframework.org/

Page 28: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

lemon

• lexicon model for ontologies: ‘lemon’

– General model for formalizing lexical features relative to independently defined ontological semantics

– http://www.monnet-project.eu/lemon

• Two-level modelling

– Abstract level (meta-model): lemon

– Instantiation level (lexicon model): e.g. ‘LexInfo2’

– http://lexinfo.net/

Page 29: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Many solutions…

…with an a priori amount of linguistics or semantics!

Page 30: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

lemon: Overview

Page 31: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

LexicalEntry can be a Word, Phrase, or Part - such as an Affix

lemon: Lexicon

entry

Lexicon: wild animals

LE: Kudu

entry entry

LE: shaped like a Kudu

Page 32: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

lemon: Form

wild animals

LE

“kudu” “greater”

F

“great”

otherForm

LE

F

abstractForm

LE

F

canonicalForm

Page 33: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

lemon: Structure

LexicalEntry can be decomposed into one or more Components and compositional structure can be represented

LE: shaped like a Kudu

LE: shaped

LE: like

LE: aLE: Kudu

?

Page 34: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

lemon: Structure - Example

shaped like a kudu

:LexicalEntry

constituent:PP

:node

shaped, lemma=“shape”

:LexicalEntry:Component

decomposition

element leaf

edge edge

lexemeconstituent:VP

:node

constituent:VBN

:node

edge edge

like, lemma=“like”

:LexicalEntry:Component

element leafconstituent:NP

:node

constituent:IN

:node

edgea

:LexicalEntry:Component

element leafconstituent:DT

:node

Kudu

:LexicalEntry:Component

element leafconstituent:NNP

:node

edge

Page 35: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

lemon: Meaning & Reference

LE: kudu

LS

sense

reference

lexeme

sememe

Page 36: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

lemon: Meaning & Reference

LE: kudu

LS

sense

reference

LS

LE: greater kudu sense

reference

narrower

preSem

Page 37: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

lemon: Meaning & Reference

LS

sense

reference

LS

sense

reference

incompatible

lexicalincompatibility

LE: greater kudu

LE: lesser kudu

dbpedia:Kudu

Page 38: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

lemon: Meaning & Reference

LE: kudu

LS

sense

reference

LS

sense

reference

owl:disjointWith

LE: goat

ontologicalincompatibility

Page 39: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

lemon: Lexical Projection

LexicalEntry can introduce a syntactic frame with arguments that are mapped to LexicalSense and indirectly to ontological semantic objects/properties

Page 40: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Lexical projection (Verb Frame)

SAP AG sold long-term fixed rate conventional mortgage loans

syntactic frame

S ( NP VP( VB NP ) )

…with semanticsugar!

Page 41: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

…more frames with LexInfo2

subject : Argument

ditransitive : Frame

synarg

direct object : Argument indirect object : Argument

synarg synarg

SAP AG sold Company X mortgage loans

verb: Frame

extends

http://lexinfo.net/ontology/2.0/lexinfo#DitransitiveFrame

Page 42: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

…more frames with LexInfo2

subject : Argument

ditransitive_to : Frame

synarg

direct object : Argument indirect object : Argument

synarg synarg

SAP AG sold mortgage loans to Company X

ditransitive: Frame

extends

http://lexinfo.net/ontology/2.0/lexinfo#DitransitiveFrame_To

Page 43: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Or Zulu morphology…

:zuluNC7_8 a lemon:MorphPattern ;   lemon:transform [       lemon:rule "isi(?=[^aeiou])~" ;      lemon:rule "is(?=[aeiou])~" ;      lemon:generates [ lexinfo:number lexinfo:singular ]   ] , [      lemon:rule "izi(?=[^aeiou])~" ;      lemon:rule "iz(?=[aeiou])~" ;      lemon:generates [ lexinfo:number lexinfo:plural ]  ] .

class = lemon:MorphologicalPatternLE: tolo

sense

pattern

LE: angoma

pattern

sense

isitolo (shop) izangoma (witch doctors)

Page 44: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Lemon Editor and Generator

• http://monnetproject.deri.ie/Lemon-Editor

– “asset-backed-debts”

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix lemon: <http://www.monnet-project.eu/lemon#> .@prefix financeV4: <http://fadyart.com/financeV4#> .@prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#> .@prefix pennbank: <http://www.monnet-project.eu/pennbank#> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .…<file:test#assetbackeddebt> lemon:phraseRoot [ lemon:edge [ lemon:edge [ lemon:edge [ lemon:leaf _:n6 ] ; lemon:constituent pennbank:NNP ] ; lemon:constituent pennbank:NP ] , [ lemon:edge [ lemon:edge [ lemon:leaf _:n88 ] ; lemon:constituent pennbank:VBD ] , [ lemon:edge [ lemon:edge [ lemon:leaf _:n69 ] ; lemon:constituent pennbank:NN ] ; lemon:constituent pennbank:NP ] ; lemon:constituent pennbank:VP ] ; lemon:constituent pennbank:S ] ; lemon:decomposition ( _:n6 _:n88 _:n69 ) ; lemon:sense [ lemon:reference financeV4:AssetBackedDebt ] ; lemon:canonicalForm [ lemon:writtenRep "Asset backed debt"@en ] .…

<file:test#back> lexinfo:partOfSpeech lexinfo:verb ; lemon:canonicalForm [ lexinfo:tense lexinfo:past ; lexinfo:verbFormMood lexinfo:indicative ; lemon:writtenRep "backed"@en ; lexinfo:aspect lexinfo:perfective ] .

Finance Ontology

lemon lexicon

_:n88 rdf:type lemon:Component ; lexinfo:tense lexinfo:past ; lemon:element <file:test#back> ; lexinfo:verbFormMood lexinfo:indicative ; lexinfo:aspect lexinfo:perfective .

lemon Lexical Entries

Page 45: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Outline

1. Research challenge and motivation

2. Ontology Translation & Inform.

Extraction

3. Lexicalization (lemon)

4. CLOBIE (Cross-lingual Ontology-based

Information Extraction)

Page 46: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

What is CLOBIE

• Information Extraction– Monolingual– No semantics

• Cross-lingual Information Extraction– Multilingual

• Ontology-based Information Extraction– Semantics in the background

Page 47: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

What is CLOBIE

• Information extraction(monolingual)

• Information extraction (multilingual)

• Information extraction with semantics

.*[COMPANY] sell [ASSETS] .*

“SAP sold risk securities at a value of 12b EUR.”

PATTERN: .*SAP.*[sells|sold|issues].*[risk securities].*[0-9]+b [EUR|USD].*

PATTERN_DE: .*SAP.*verkaufte*.*[Risiko Wertpapiere].*[0-9]+b [EUR|USD].*

PATTERN: .*$COMPANY .*[sells|sold|issues].*$ASSETS.*$MONETARY_VALUE.}

financial assets

risk securities

non-financial assets

Property, Plant & Equipment

Page 48: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

…The fair value of the Group’sfinance lease receivables at23 February 2008 was £5m…

Tesco’s Annual Report 2009

..As at December 31, 2008,the future minimum leasepayments expected to bereceived was €16 million…

SAP Annual Report 2008

…The fair value of the Group’sfinance lease receivables at23 February 2008 was £5m…

Tesco’s Annual Report 2009

..As at December 31, 2008,the future minimum leasepayments expected to bereceived was €16 million…

SAP Annual Report 2008

linguistic analysis payments received receivables

…The fair value of the Group’sfinance lease receivables at23 February 2008 was £5m…

Tesco’s Annual Report 2009

..As at December 31, 2008,the future minimum leasepayments expected to bereceived was €16 million…

SAP Annual Report 2008

…The fair value of the Group’sfinance lease receivables at23 February 2008 was £5m…

Tesco’s Annual Report 2009

..As at December 31, 2008,the future minimum leasepayments expected to bereceived was €16 million…

SAP Annual Report 2008

Application in Information Extraction (IE)

:MinimumFinanceLeasePaymentsReceivable rdfs:subClassOf xbrli:monetaryItemType ; rdfs:label “Minimum finance lease payments receivable”@en .

semantically lifted

Minimum finance lease payments receivableterm analysis

Page 49: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

CLOBIE Interdisciplinary

SemanticWeb Ontologies

SKOS, lemonSPARQL queries

NLP

Corpus queryTerm analysisPOS taggingMorph analysis

MachineTranslation

Statistical MTRule-based MTLocalization

InformationRetrieval

TF-IDFWeb queryranking algorithmsCLIR (ESA, MT-based)

InformationExtraction

Term extractionRelation extractionExtract. grammars

CLOBIE

Page 50: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Why CLOBIE?

• Many unstructured resources (News,

FinReps)

• Knowledge in SW is often:– Not dynamic (no regular, only manual updates)– Knowledge across languages/countries not

integrated

Page 51: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

CLOBIE blackboard architecture

Linguistic Analyzer

• Morphology

• Dependency Parser

Basic NLP

• Splitter

• Tok. / POS

Annotators

Blackboard

Term Analyzer Semantic Analyzer

• Terminology DB

CLOBIE Search

token_id /POS

token_id /token_id

sent_id/term

sent_id/concept…

read / contribute

read / contribute

read / contribute

read / contribute

read

Semantic / Terminological / Linguistic Enrichment Process

Page 52: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

CLOBIE Data set (Wind Energy)

• 10 companies in Wind Energy domain

• Financial reports in– German / Spanish / English / Dutch

– IFRS / DE-GAAP

• Semantics defined by– IFRS vocabulary

– xEBR vocabulary

Page 53: Cross-lingual ontology lexicalisation, translation and information extraction  Net2 workshop, University of South Africa (UNISA)

Multilingual Ontologies for networked knowledgeenabling networked knowledge

Next steps…

• Benchmark development and evaluation on the basis of a data set in finance domain– financial reports and news from different companies in

wind energy domain• multilingual (German, Dutch, Spanish, English)

• multi-vocabulary (IFRS, European local GAAPs, DBPedia)

• Cross-lingual ontology-based information retrieval system

• Generate ontology-based information extraction grammars from lemon ontology-lexicons