Workshop on Modular Ontologies (WoMO) 2012 - KR-MED€¦ · Workshop on Modular Ontologies (WoMO)...

Proceedings of the 6th International

Workshop on Modular Ontologies(WoMO) 2012

held at Graz, Austria, on July 24, 2012as a satellite event of ICBO/FOIS 2012

Workshop chairs and proceedings editors:

Thomas Schneider, University of Bremen, GermanyDirk Walther, Universidad Politecnica de Madrid, Spain

Copyright c© 2012 for the individual papers by the papers’ authors. Copyingpermitted for private and academic purposes. This volume is published andcopyrighted by its editors.

Preface

Modularity has been and continues to be one of the central research topics inontology engineering. The number of ontologies available, as well as their size, issteadily increasing. There is a large variation in subject matter, level of specifica-tion and detail, intended purpose and application. Ontologies covering differentdomains are often developed in a distributed manner; contributions from differ-ent sources cover different parts of a single domain. Not only is it difficult todetermine and define interrelations between such distributed ontologies, it is alsochallenging to reconcile ontologies which might be consistent on their own butjointly inconsistent. Further challenges include extracting the relevant parts ofan ontology, re-combining independently developed ontologies in order to formnew ones, determining the modular structure of an ontology for comprehension,and the use of ontology modules to facilitate incremental reasoning and versioncontrol.

Still catching up with 40 years of related research in software engineering,ontological modularity is envisaged to allow mechanisms for easy and flexiblereuse, generalization, structuring, maintenance, collaboration, design patterns,and comprehension. Applied to ontology engineering, modularity is central notonly to reducing the complexity of understanding ontologies, but also to main-taining, querying and reasoning over modules. Distinctions between modulescan be drawn on the basis of structural, semantic, or functional aspects, whichcan also be applied to compositions of ontologies or to indicate links betweenontologies.

In particular, reuse and sharing of information and resources across ontolo-gies depend on purpose-specific, logically versatile criteria. Such purposes include‘tight’ logical integration of different ontologies (wholly or in part), ‘loose’ asso-ciation and information exchange, the detection of overlapping parts, traversingthrough different ontologies, alignment of vocabularies, module extraction pos-sibly respecting privacy concerns and hiding of information, etc. Another impor-tant aspect of modularity in ontologies is the problem of evaluating the quality ofsingle modules or of the achieved overall modularization of an ontology. Again,such evaluations can be based on various (semantic or syntactic) criteria andemploy a variety of statistical/heuristic or logical methods.

Recent research on ontology modularity has produced substantial results andapproaches towards foundations of modularity, techniques of modularization andmodular developments, distributed and incremental reasoning, as well as the useof modules in different application scenarios, providing a foundation for furtherresearch and development. Since the beginning of the WoMO workshop series,there has been growing interest in the modularization of ontologies, modulardevelopment of ontologies, and information exchange across different modularontologies. In real life, however, integration problems are still mostly tackledin an ad-hoc manner, with no clear notion of what to expect from the resultingontological structure. Those methods are not always efficient, and they often lead

ii

to unintended consequences, even if the individual ontologies to be integratedare widely tested and understood.

Topics covered by WoMO include, but are not limited to:

What is Modularity?

– Kinds of modules and their properties– Modules vs. contexts– Design patterns– Granularity of representation

Logical/Foundational Studies

– Conservativity and syntactic approximations for modules– Modular ontology languages– Reconciling inconsistencies across modules– Formal structuring of modules– Heterogeneity

Algorithmic Approaches

– Distributed and incremental reasoning– Modularization and module extraction– Sharing, linking, and reuse– Hiding and privacy– Evaluation of modularization approaches– Complexity of reasoning– Implemented systems

Application Areas

– Modularity in the Semantic Web– Life Sciences– Bio-Ontologies– Natural Language Processing– Ontologies of space and time– Ambient intelligence– Social intelligence– Collaborative ontology development and ontology versioning

Previous events. The WoMO 2012 workshop follows a series of successfulevents that have been an excellent venue for practitioners and researchers todiscuss latest work and current problems. It is intended to consolidate cutting-edge approaches that tackle the problem of ontological modularity and bringtogether researchers from different disciplines who study the problem of modu-larity in ontologies at a fundamental level, develop design tools for distributedontology engineering, and apply modularity in different use cases and applica-tion scenarios. Previous editions of WoMO are listed below. The links refer totheir homepages and proceedings.

iii

WoMO 2006. The 1st workshop on modular ontologies, co-located with ISWC2006, Athens, Georgia, USA. Invited speakers were Alex Borgida (Rutgers)and Frank Wolter (Liverpool). Organizers and program chairs were

http://www.cild.iastate.edu/events/womo.htmlhttp://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-232

WoMO 2007. The 2nd workshop, co-located with K-CAP 2007, Whistler BC,Canada. The invited speaker was Ken Barker (Texas at Austin).

http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-315

WoRM 2008. The 3rd workshop in the series, co-located with ESWC 2008,Tenerife, Spain, entitled ‘Ontologies: Reasoning and Modularity’ had a spe-cial emphasis on reasoning methods.

http://dkm.fbk.eu/worm08http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-348

WoMO 2010. The 4th workshop in the series, co-located with FOIS 2010,Toronto, Canada. Invited speakers were Simon Colton (London) and MarcoSchorlemmer (Barcelona).

http://www.informatik.uni-bremen.de/%7Eokutz/womo4http://www.booksonline.iospress.nl/Content/View.aspx?piid=16268

WoMO 2011. The 5th workshop in the series, co-located with ESSLLI 2011,Ljubljana, Slovenia. Invited speakers were Stefano Borgo (Trento), StefanSchulz (Graz) and Michael Zakharyaschev (London).

http://www.informatik.uni-bremen.de/%7Eokutz/womo5http://www.booksonline.iospress.nl/Content/View.aspx?piid=20369

Organizers of the previous editions and editors of the proceedings were DiegoCalvanese (Bozen-Bolzano) – 2008; Bernardo Cuenca Grau (Manchester, Ox-ford) – 2007, 2008, 2010; Peter Haase (Karlsruhe) – 2006; Jie Bao (Rensselaer)– 2010; Joana Hois (Bremen) – 2010; Vasant Honovar (Iowa State) – 2006, 2007;Oliver Kutz (Manchester, Bremen) – 2006, 2010, 2011; Ulrike Sattler (Manch-ester) – 2008; Anne Schlicht (Mannheim) – 2007; Thomas Schneider (Bremen) –2011; Luciano Serafini (Trento) – 2008; Evren Sirin, (Clark & Parsia LLC, Wash-ington DC) – 2008; York Sure (Karlsruhe) – 2006; Andrei Tamilin (Trento) –2006, 2008; Michael Wessel (Hamburg) – 2008; Frank Wolter (Liverpool) – 2007,2008

This volume contains the papers presented at the 6th International Workshopon Modular Ontologies (WoMO 2012) held on July 24, 2012 in Graz, as a satelliteevent of the joint ICBO/FOIS conferences. There were nine submissions. Eachsubmission was reviewed by three program committee members. The committeedecided to accept eight papers for long or short presentations. The program alsoincluded two invited talks:

iv

– Thomas Eiter (Vienna University of Technology, Austria)

Distribution and Modularity in Nonmonotonic Logic Programming

– Luciano Serafini (Fondazione Bruno Kessler, Trento, Italy)

Multi Context Logics: a Formal Framework for Structuring Knowledge

Acknowledgments. We would like to thank the PC members and the addi-tional reviewers for their timely reviewing work, our invited speakers for deliver-ing keynote presentations at the workshop, and the authors and participants forcontributing to the workshop program. We would also like to thank the organiz-ers of the ICBO/FOIS conferences for hosting the WoMO workshop, the IAOAfor complimentary registration of invited speakers and one organizer, and theEasyChair developers for greatly simplifying the work of the program committee.

July 12, 2012Bremen and Madrid

Thomas SchneiderDirk Walther

v

Table of Contents

Summaries of Invited Talks

Distribution and Modularity in Nonmonotonic Logic Programming . . . . . . 1Thomas Eiter

Multi context logics: a formal framework for structuring knowledge . . . . . . 2Luciano Serafini

Regular Papers

Structure Formation to Modularize Ontologies . . . . . . . . . . . . . . . . . . . . . . . . 4Serge Autexier and Dieter Hutter

Characterizing Modular Ontologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Sarra Ben Abbes, Andreas Scheuermann, Thomas Meilender and Math-ieu D’Aquin

Combining three Ways of Conveying Knowledge: Modularization ofDomain, Terminological, and Linguistic Knowledge in Ontologies . . . . . . . . 28

Thierry Declerck and Dagmar Gromann

Syntactic vs. Semantic Locality: How Good Is a Cheap Approximation? . . 40Chiara Del Vescovo, Pavel Klinov, Bijan Parsia, Ulrike Sattler, ThomasSchneider and Dmitry Tsarkov

LoLa: A Modular Ontology of Logics, Languages, and Translations . . . . . . 51Christoph Lange, Till Mossakowski and Oliver Kutz

Short Papers

Proposal of a New Approach for Ontology Modularization . . . . . . . . . . . . . . 61Amir Souissi, Walid Chainbi and Khaled Ghedira

Distributed Reasoning with EDDLHQ+ SHIQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

George A. Vouros and George M. Santipantakis

vi

Program Committee

Kenneth Baclawski VIStology, Inc., Framingham, MA, USAStefano Borgo CNR, Trento, ItalyOscar Corcho Universidad Politecnica de Madrid, SpainBernardo Cuenca Grau University of Oxford, UKMike Dean Raytheon BBN Technologies, Ann Arbor, MI, USASilvio Ghilardi Universita degli Studi di Milano, ItalyMichael Gruninger University of Toronto, CanadaJanna Hastings European Bioinformatics Institute, Cambridge, UKRobert Hoehndorf University of Cambridge, UKBoris Konev The University of Liverpool, UKRoman Kontchakov Birkbeck College, London, UKThomas Meyer Meraka Institute, CSIR, Pretoria, South AfricaTill Mossakowski DFKI Lab, Bremen, GermanyImmanuel Normann Birkbeck College, London, UKLeo Obrst MITRE, McLean, VA, USAAdrian Paschke Freie Universitat Berlin, GermanyDavid Perez-Rey Universidad Politecnica de Madrid, SpainUlrike Sattler University of Manchester, UKThomas Schneider (co-chair) University of Bremen, GermanyLuciano Serafini FBK-IRST, Trento, ItalyDirk Walther (co-chair) Universidad Politecnica de Madrid, Spain

Additional Reviewer

Riku Nortje

vii

Author Index

A

Autexier, Serge 4B

Ben Abbes, Sarra 16C

Chainbi, Walid 61D

D’Aquin, Mathieu 16Declerck, Thierry 28Del Vescovo, Chiara 40E

Eiter, Thomas 1G

Ghedira, Khaled 61Gromann, Dagmar 28H

Hutter, Dieter 4K

Klinov, Pavel 40Kutz, Oliver 51L

Lange, Christoph 51M

Meilender, Thomas 16Mossakowski, Till 51P

Parsia, Bijan 40S

Santipantakis, George M. 63Sattler, Ulrike 40

viii

Scheuermann, Andreas 16Schneider, Thomas 40Serafini, Luciano 2Souissi, Amir 61T

Tsarkov, Dmitry 40V

Vouros, George A. 63

ix

Distribution and Modularity inNonmonotonic Logic Programming

Thomas Eiter

Knowledge-Based Systems GroupInstitute of Information Systems,Vienna University of [email protected]

In the recent years, there has been a trend towards considering computation in adistributed setting, due to the fact that increasingly not only data is linked via mediasuch as the internet, but also computational entities which process and exchange dataand knowledge. This leads to the formation of (possibly complex) systems of inter-linked entities, based on possibly heterogenous formalisms, posing challenging issueson semantics and computation. The concept of modularity, which in computer scienceand engineering is a key to structured program development, naturally links to thisas a tool for defining semantics of distributed systems, and has been widely studied,e.g., in the area of ontologies. In line with the general development, distribution andmodularity have been also been receiving increased attention in logic programming, atseveral levels of language expressiveness, from distributed (plain) datalog to advancednonmonotonic logic programming semantics.

In this talk, we shall address the issue of distribution and modularity for logic pro-gramming under the answer set semantics, which is one of the most widely used se-mantics for nonmontonic logic programs do date and at the heart of the Answer SetProgramming paradigm for declarative problem solving. It appeared that the issue ofmodularity for answer set semantics is nontrivial, due to its nonmonotonicity. For thesame reason, also the issue of efficient distributed evaluation, assuming a reasonable be-havior of the semantics for a program composed of distributed modules, is a challengingproblem. We shall discuss these issues, pointing out that modularity and distribution ad-mit different solutions for semantics, depending on the underlying view of a system oflogic programs. We then illustrate this view on particular formalisms that have beendeveloped at the Vienna University of Technology in the last years, including modularnonmonotonic logic programs (Modular ASP) and nonmonotonic multi-context sys-tems (MCS). For these formalisms, various semantics have been developed, as well asexperimental prototype implementations that take local or distributed evaluation intoaccount, adopting different realization schemes. While considerable progress has beenachieved, further work is needed to arrive at highly efficient solvers.

This work is a joint effort with Minh Dao-Tran, Michael Fink, Thomas Krennwall-ner and Tri Kurniawan Wijaya, supported by the project P20841 “Modular HEX-Programs”of the Austrian Science Fund (FWF).

1

Multi context logics: a formal framework forstructuring knowledge

Luciano Serafini

Fondazione Bruno Kesslervia Sommarive 18, I-38123, Trento, Italy

[email protected]

Multi context logics (MCL) is a formalism that allow to integrate multiplelogical theories (contexts) in a more complex structure called multi context sys-tem. In the past 20 years multi context logics have been developed for contextsin propositional logics, first order logics, description logics and temporal logic.The two principles MCL are locality and compatibility. The principle of local-ity states that a context axiomatizes in a logical theory a portion of the world,and that every statement entailed by such a theory is intended to hold withinsuch a portion of the world. The principle of compatibility instead states that,since different contexts can describe overlapping portions of world, the theoriesthey contain must be constrained so that they describe compatible situations.Following these two principles, the formal semantics of an MCL is the result ofa suitable composition of the semantics associated to each single context. Thistakes the name of Local models semantics. The effects of the two principles aboveon the inference engines that can be defined on a multi contextual knowledgebase are the following: Locality principle implies that inference rules appliedto knowledge inside a context (aka, local inference rules) allow to infer localtruths; Compatibility principle instead implies that certain facts in a contextcan be inferred on the base of other facts present on other compatible contexts.This information propagation is formalized via a special type of inter contextualinference rules called bridge rules

In general terms, a multi context logic is defined on a family of logical lan-guages {Li}i∈I where each Li is used to specify what holds in the i-th context.The set I of context indexes (aka context names) can be either a simple set,or a set equipped with some algebraic structure, like total or partial order, andoperations on context indices. The relations and functions defined on I can beused to specify the organization of contexts in terms of an algebraic structure.For instance a partial order ≺ no I, can be used to represent that a context iswider (more general) than another context, e.g., football ≺ sport, means that thecontext of sport is more specific that the context of football. To represent whatis true in a scene in different time stamps, we can enrich I with a total order ≺so that CV Luciano 2010 ≺ CV Luciano 2011 represents the fact that the contextdescribing Luciano’s curriculum vitæat 2010 precedes the context describing hisCV at 2011.

A model for a multi context logic {Li}i∈I is a pair 〈MI , C〉 composed of afamily of local models MI = {Mi}i∈I , where each Mi is a model of Li, anda compatibility relation C among the local models. The formal structure of C

2

can vary depending on the type of local models and the type of constraints itis necessary to impose on the local models of different contexts. For this reasonwe don’t give a general definition of C, which will be completely defined foreach specific multi context logic. Satisfiability of formulas in Li is defined w.r.t1.the local models. Namely If φ is a formula of the language Li, then the a multicontext model satisfies i : φ iff Mi |=i φ, where |=i is the satisfiability relationassociated to the local logic Li.

A Multi context theory in a multi context logic LI = {Li}i∈I is a family oftheories {Ti}i∈I , where each Ti is a set of statements in the logics Li, and a setBR of bridge rules. Intuitively each Ti axiomatizes the constraints on the localmodels Mi, while the bridge rules BR axiomatizes the compatibility relations.Bridge rules are cross logical axioms and their syntactic form depends on thelocal logics, so as in the case of the compatibility relation their syntactic dependson the syntax of each Li.

In my invited talk, I will go through the many possible examples of multicontext logics, starting from the simplest one, the propositional multi contextlogics, going through hierarchical meta logics, multi context logics for beliefs andpropositional attitudes, non-monotonic propositional context logics, distributedfirst order logics, distributed description logics, and logics for semantic importand contextualized knowledge repository for the semantic web.

3

Structure Formation to Modularize Ontologies

Serge Autexier and Dieter Hutter ?

German Research Center for Artificial IntelligenceBibliotheksstr. 1, 28359 Bremen, Germany

{autexier|hutter}@dfki.de

Abstract. It has been well recognized that the structuring of logic-based databases like formal ontologies is an important prerequisite toallow for an efficient reasoning on such databases. Development graphshave been introduced as a formal basis to represent and reason on struc-tured specifications. In this paper we present an initial methodology anda formal calculus to transform unstructured specifications into structuredones. The calculus rules operate on development graphs allowing one toseparate specifications coalesced in one theory into a concisely structuredgraph.

1 Introduction

It has been long recognized that the modularity of specifications is an indis-pensable prerequisite for an efficient reasoning in complex domains. Algebraicspecification techniques provide frameworks for structuring complex specifica-tions and the authors introduced the notion of an development graph [4, 1, 6] asa technical means to work with and reason about such structured specifications.Typically ontologies are large and, even if structured, bear the problem of in-consistencies as any large set of axioms. For instance, the SUMO ontology [7]turned out to be inconsistent [10]. Recently Kurz and Mossakowski presentedan approach [5] to prove the consistency of an ontology in a modular way usingthe (structured) architectural specification in CASL and provide mechanisms tocompose the models of the individual components to a global one. Furthermore,there has been work [9] to (re-)structure ontologies following locality criteria intomodules which cover all aspects about specific concepts. However, since ontologylanguages have simple imports without renaming, duplicated sub-ontologies thatare for instance equal up to renaming remain hidden, thus making the ontolo-gies unnecessarily large, not only from a modeling point of view, but also froma verification point of view as the same derived properties must be derived overan over again.

In this paper we present further steps towards the support for structure for-mation in large specifications that additionally allows to factorize equivalentsub-specification, i.e. sub-ontologies. The idea is to provide a calculus and acorresponding methodology to crystallize intrinsic structures hidden in a speci-fication and represent them explicitly in terms of development graphs. We start

? This work was funded by the German Federal Ministry of Education and Researchunder grants 01 IS 11013B and 01 IW 10002 (projects SIMPLE and SHIP)

4

with a trivial development graph consisting of a single node that contains theentire specification. Step by step, the specification is split to different nodes inthe development graph resulting in an increasingly richer graph. On the oppo-site, common concepts that are scattered in different specifications are identifiedand unified in a common theory (i.e. node).

2 Prerequisites

We base our framework on the notions of development graphs to specify and rea-son about structured specifications. Development graphs D are acyclic, directedgraphs 〈N ,L〉, the nodes N denote individual theories and the links L indicatetheory inclusions with respect to signature morphisms attached to the links.

This approach is based on the notion of institutions [3] (SignI ,SenI ,ModI ,|=I,Σ), where (i) Sign is a category of signatures, (ii) Sen : Sign −→ Setis a functor giving the set of sentences Sen(Σ) over each signature Σ, andfor each signature morphism σ : Σ −→ Σ′, the sentence translation functionSen(σ) : Sen(Σ) −→ Sen(Σ′), where often Sen(σ)(ϕ) is written as σ(ϕ),(iii) Mod : Signop −→ CAT is a functor giving the category of models over agiven signature, and for each signature morphism σ : Σ−→Σ′, the reduct func-tor Mod(σ) : Mod(Σ′) −→Mod(Σ), where often Mod(σ)(M ′) is written asM ′|σ, (iv) |=Σ ⊆ |Mod(Σ)| × Sen(Σ) for each Σ ∈ |Sign| is a satisfaction re-lation,1 such that for each σ : Σ −→Σ′ in Sign, M ′ |=Σ′ σ(ϕ) ⇔ M ′|σ |=Σ ϕholds for each M ′ ∈Mod(Σ′) and ϕ ∈ Sen(Σ) (satisfaction condition).

However, the abstractness of the signatures in the definition of institution(they are just a category) makes it difficult to cope with modularization of signa-tures. Hence, we equip institutions with an additional structure such that signa-tures behave more set-like. Institutions with pre-signatures [2] are defined as in-stitutions equipped with an embedding | | : Sign→ Set, the symbol functor, anda map sym :

⋃Σ∈|Sign| Sen(Σ)→ |Set|, such that ϕ ∈ Sen(Σ) iff sym(ϕ) ⊆ |Σ|

for all ϕ ∈ ⋃Σ∈|Sign| Sen(Σ). The map sym gives the set of symbols used in

a sentence, and sentences are uniform in the sense that a well-formed sentenceis well-formed over a certain signature iff its symbols belong to that signature.Pre-signatures are sets and pre-signature morphisms σ mapping pre-signaturesare related to corresponding signature morphisms σ.

Given an I-institution with pre-signatures, each node N ∈ N of the graphis a tuple (sigN , axN , lemN ) such that sigN is a I-pre-signature called the localsignature of N , axN a set of I-sentences called the local axioms of N , and lemN

a set of I-sentences called the local lemmas of N . L is a set of global definition

links Mσ +3 N. Such a link imports the mapped theory of M (by the pre-

signature σ) as part of the theory of N . Thus we obtain corresponding notionsof global (pre-)signatures, axioms and lemmata that are defined inductively asfollows:

1 |C| is the class of objects of a category C.

5

1. SigD(N) = sigN ∪⋃M

σ +3 N∈Sσ(SigD(M))

2. AxD(N) = axN ∪⋃M

σ +3 N∈Dσ(AxD(M))

3. LemD(N) = lemN ∪⋃M

σ +3 N∈Dσ(LemD(M))

A node N ∈ N is globally reachable from a node M ∈ N via a pre-signature

morphism σ, D ` M _?σ +3 N for short, iff 1. either M = N and σ = id is the

identity pre-signature morphism, or 2. Mσ′+3 K ∈ L, and D ` K _? σ′′

+3 N ,with σ = σ′′ ◦ σ′.

The maximal nodes (root nodes) dDe of a graph D are all nodes withoutoutgoing links. DomD(N) := SigD(N) ∪ AxD(N) ∪ LemD(N) is the set of allsignature symbols, axioms and lemmata visible in a node N . The local domainof N , domN := sigN ∪ axN ∪ lemN is the set of all local signature symbols,axioms and lemmata of N . The imported domain ImportsD(N) of N in D is theset of all signature symbols, axioms and lemmas imported via incoming definitionlinks. DomD =

⋃N∈N DomD(N) is the set of all signature symbols, axioms and

lemmata occurring in D. Analogously we define SigD, AxD, LemD, and AssD.DomdDe =

⋃N∈dDeDomD(N) is the set of all signature symbols, axioms and

lemmata occurring in the maximal nodes of D.

A node N has a well-formed signature iff SigD(N) is a valid IN -signature.A development graph has a well-formed signature iff all its nodes have well-formed signatures. SiglocD (M) := 〈sigM ∪ sym(axM ) ∪ sym(lemM )〉SigD(M) is thelocal signature of N . A node N is well-formed iff it has a well-formed signatureSigD(N) and AxD(N),LemD(N) ⊆ Sen(SigD(N)). A development graph is well-formed, if all its nodes are well-formed.

Given a node N ∈ N with well-formed signature, its associated classModD(N) of models (or N -models for short) consists of those SigD(N)-models

n for which (i) n satisfies the local axioms axN , and (ii) for each Kσ +3 N ∈ S,

n|σ is a K-model. In the following we denote the class of Σ-models that fulfillthe Σ-sentences Ψ by ModΣ(Ψ). A well-formed development graph D := 〈N ,L〉is valid iff for all nodes N ∈ N ModD(N) |= lemN .

Given a signature Σ and Ax,Lem ⊆ Sen(Σ), a support mapping Supp forAx and Lem assigns each lemma ϕ ∈ Lem a subset H ⊆ Ax ∪ Lem such that(i) Mod〈sym(H)∪sym(ϕ)〉Σ (H) |= ϕ (ii) The relation @⊆ (Ax ∪ Lem)× Lem withΦ @ ϕ ⇔ (Φ ∈ Supp(ϕ) ∨ ∃ψ.Φ ∈ Supp(ψ) ∧ ψ @ ϕ) is a well-founded strictpartial order. If D is a development graph, then a support mapping Supp is asupport mapping for D iff for all N ∈ D Supp is a support mapping for AxD(N)and LemD(N).

3 Development Graphs for Structure Formation

As a first step towards structure formation we will formalize requirements ondevelopment graphs that reflect our intuition of an appropriate structuring for

6

(formal) ontologies in particular (and formal specifications in general) in thefollowing principles.

The first principle is semantic appropriateness, saying that the structure ofthe development graph should be a syntactical reflection of the relations betweenthe various concepts in our ontology. This means that different concepts arelocated in different nodes of the graph and the links of the graph reflect thelogical relations between these concepts. The second principle is closure saying,for instance, that deduced knowledge should be located close to the conceptsguaranteeing the proofs. Also the concept defined by the theory of an individualnode of a development graph should have a meaning of its own and providesome source of deduced knowledge. The third principle is minimality sayingthat each concept (or part of it) is only represented once in the graph. Whensplitting a monolithic theory into different concepts common foundations forvarious concepts should be (syntactically) shared between them by being locatedat a unique node of the graph.

In the following we translate these principles into syntactical criteria on devel-opment graphs and also into rules to transform and refactor development graphs.In a first step we formalize technical requirements to enforce the minimality-principle in terms of development graphs. Technically, we demand that eachsignature symbol, each axiom and each lemma has a unique location in the de-velopment graph. When we enrich a development graph with more structure weforbid to have multiple copies of the same definition in different nodes. We there-fore require that we can identify for a given signature entry, axiom or lemma aminimal theory in a development graph and that this minimal theory is unique.We define:

Definition 1 (Providing Nodes). Let 〈N ,L〉 be a development graph. An

entity e is provided in N ∈ N iff e ∈ Dom〈N ,L〉(N) and ∀M σ +3 N. e 6∈Dom〈N ,L〉(M). Furthermore,

1. e is locally provided in N iff additionally e ∈ domN holds.

2. e is provided by a link l : Mσ +3 N if not locally provided in N and

∃e′ ∈ Dom〈N ,L〉(M). σ(e′) = e. In this case we say that l provides e from e′.e is exclusively provided by l iff e is not provided by any other link l′ ∈ L.

Finally, the closure-principle demands that there are no spurious nodes in thegraph that do not contribute anything new to a concept. We combine theserequirements into the notion of location mappings:

Definition 2 (Location Mappings). Let D = 〈N ,L〉 be a development graph.A mapping locD : DomD → N is a location mapping for D iff1. locD is surjective (closure)2. ∀N ∈ N . ∀e ∈ domN . locD(e) = N3. ∀e ∈ DomD. locD(e) is the only node providing e (minimality)

Furthermore, for a given locD we define loc−1D : N → 2DomD by loc−1

D (N) :={e ∈ DomD|locD(e) = N}. We will write loc and loc−1 instead of locD and loc−1

Dif D is clear from the context.

7

Based on the notion of location mappings we formalize our intuition of astructuring. The idea is that the notion of being a structuring constitutes theinvariant of the structure formation process and guarantees both, requirementsimposed by the minimality-principle as well as basic conditions on a developmentgraph to reflect a given formal specification or ontology.

Definition 3 (Structuring). Let D = 〈N ,L〉 be a valid development graph,loc : DomD → N , Σ ∈ |Sign|, Ax,Lem ⊆ Sen(Σ) and Supp be a supportmapping for D. Then (D, loc,Supp) is a structuring of (Σ,Ax,Lem) iff1. loc is a location mapping for D.2. let DomdDe = Σ′ ∪Ax′ ∪ Lem′ then Σ = Σ′, Ax = Ax′ and Lem ⊆ Lem′.

3. ∀φ ∈ LemD . ∀ψ ∈ Supp(φ). ∃σ. loc(ψ) _?σ +3 loc(φ) ∧ σ(ψ) = ψ

4 Refactoring Rules

In the following we present the transformation rules on development graphs thattransform structurings again into structurings. Using these rules we are able tostructure the initially trivial development graph consisting of exactly one nodethat comprises all given concepts step by step. This initial development graphconsisting of exactly one node satisfies the condition of a structuring providedthat we have an appropriate support mapping at hand.

We define four types of structuring-invariant transformations: (i) horizontalsplitting and merging of development graph nodes, (ii) vertical splitting andmerging of development graph nodes, (iii) factorization and multiplication ofdevelopment graph nodes, and (iv) removal and insertion of specific links. Split-ting and merging as well as factorization and multiplication are dual operations.For lack of space and because we are mainly interested in rules increasing thestructure of a development graph we will omit the formal specification of themerging and multiplication rules here.

We illustrate our rules with the help of a running example in mathematics.We start from a flat theory specifying a Ring over two operations + and ×. Astructure (R,+,×) is a Ring, if (R,×) is an abelian group, (R,+) a monoidand × distributes over +. Furthermore, an abelian group is a monoid for whichadditionally commutativity holds and inverse exists. The initial developmentgraph consists of a single node (without any links) containing all the symboldefinitions, axioms and theorems of the example.

Vertical Split. The first refactoring rule aims at the split of specifications indifferent theories. In terms of the development graph a node is replaced by twonodes one of them importing the other; each of them contains a distinct partfrom a partitioning of the specification of the original node. While all outgoinglinks start at the top node, we are free to reallocate incoming links to eithernode. To formalize this rule we need constraints on how to split a specificationin different chunks such that local lemmata are always located in a node whichprovides also the necessary axioms and lemmata to prove it.

8

N

KkK1 M1 Mp

θ1θk σ1 σl

. . .ρ1 ρm

. . . . . .

N2

N1

M1MpKkK1

ρ1 ρm

θ1θk

. . .

. . .

id

σ1 σl

. . .

Vertical Split

Vertical Merge

Fig. 1. Vertical Split and Merge

Definition 4. Let S = (D, loc,Supp) be a structuring of (Σ,Ax,Lem) and N ∈ND. A partitioning P for N is a set {N1, . . . , Nk} with k > 1 such that 1. sigN =sigN1 ] . . . ] sigNk , axN = axN1 ] . . . ] axNk , lemN = lemN1 ] . . . ] lemNk

2. sigNi∪axNi∪lemNi 6= ∅ for i = 1, . . . , k. A node Ni ∈ P is lemma independentiff Supp(ψ) ∩ (axN ∪ lemN ) ⊆ (axNi ∪ lemNi) for all ψ ∈ lemNi .

Definition 5 (Vertical Split). Let S = (〈N ,L〉, loc,Supp) be a structuringof (Σ,Ax,Lem) and P = {N1, N2} be a partitioning for some N ∈ N suchthat N1 is lemma independent. Then, the vertical split S wrt. N and P is S ′ =(D′, loc′,Supp) with D′ = 〈N ′,L′〉 where

N ′ :={N1, N2} ] (N \N)

L′ :={M σ +3 M ′ ∈ L|M 6= N ∧M ′ 6= N} ∪ {N1id +3 N2}

∪ {M σ +3 N1 | M σ +3 N ∈ L} ∪ {N2σ +3 M | N σ +3 M ∈ L}

loc′(e) =

N2 if loc(e) = N and e ∈ DomD′(N2)N1 if loc(e) = N and e 6∈ DomD′(N2)loc(e) otherwise

such that SigD′(Ni), i = 1, 2, are valid signatures and axi, lemi ⊆ Sen(SigD′(Ni)),i = 1, 2. Conversely, S is a vertical merge of N1 and N2 in S ′.

Horizontal Split. Similar to a vertical split we introduce a horizontal split whichdivides a node into two independent nodes. In order to ensure a valid new de-velopment graph, each of the new nodes imports the same theories as the oldnode and contributes to the same theories as the old node did.

Definition 6 (Horizontal Split). Let S = (〈N ,L〉, loc,Supp) be a structuringof (Σ,Ax,Lem), P = {N1, . . . , Nk} be a partitioning for some node N ∈ N suchthat each Ni ∈ P is lemma independent and loc−1(N) = domN . The horizontalsplit of S wrt. N and P is S ′ = (D′, loc′,Supp) with D′ = 〈N ′,L′〉 where1. N ′ := {N1, . . . , Nk} ] (N \N)

9

N

θ1θn

. . .

σ 1σm

. . .

N1 Nk

θ1θnθ1 θn

. . .

σ1|DomN1

σm|DomN

1

σ 1|Dom

Nk

σm|DomNk

. . .

. . .

Horizontal Split

Horizontal Merge

Fig. 2. Horizontal Split and Merge

2. L′ := {M σ +3 M ′ ∈ L|M 6= N ∧M ′ 6= N}∪ {M θ +3 Ni|M θ +3 N ∈ L, i ∈ {1, . . . , k}}

∪ {Niτ|DomNi+3 M|N τ +3 M ∈ L, i ∈ {1, . . . , k}}

3. loc′(e) := Ni if e ∈ domNi for some i ∈ {1, . . . , k} and loc′(e) := loc(e)otherwise.

such that SigD′(Ni) are valid signatures and axi, lemi ⊆ Sen(SigD′(Ni)) fori = 1, . . . , k.

Using the transformation rules, the flat initial theory of our running examplecan be refactored (cf. Fig. 3). We apply the vertical split rule twice to extract Rand the distributivity law, followed by the horizontal-split rule to separate thetwo instances of a monoid. Finally, we apply the vertical-split rule to extractthe extra axioms from Monoid(R,×). We are left with two copies of a monoidwhich we like to generalize to a common abstract monoid. This can be achievedby the following rule.

Factorization The factorization rule allows one to merge equivalent conceptsinto a single generalized concept and then to represent the individual ones asinstantiations of the generalized concept. A precondition of this rule is that allindividual concepts inherit the same (underlying) theories.

Definition 7 (Factorization). Let S = (〈N ,L〉, loc,Supp) be a structuring of(Σ,Ax,Lem). Let K1, . . . ,Kn,M1, . . . ,Mp ∈ N with p > 1 such that sigMj ∪axMj 6= ∅ and ∃σi,j . Ki

σi,j +3 Mj ∈ L for i = 1, . . . , n, j = 1, . . . , p.

Vertical Split (2x)

Horizontal Split

Distrib(*, +), Comm(*), Inverse(*), Monoid(R, +), Monoid(R, *), R

Distrib(*, +)

R

Comm(*), Inverse(*), Monoid(R, +), Monoid(R, *)

Distrib(*, +)

R

Comm(*), Inverse(*), Monoid(R, *)

Monoid(R, +)

Distrib(*, +)

R

Comm(*), Inverse(*),

Monoid(R, +) Monoid(R, *)

Vertical Split

Fig. 3. Applying the split rules to our ring example

10

M1 Mp. . .

K1 Kn

σ1,1 σ

n,1σ 1

,p σn,p

. . .

. . . . . .

N

K1 Kn

N1 Np. . .θ1 θp

σ1σn

. . .

. . . . . .

Factorization

Multiplication

Fig. 4. Factorization and Multiplication (with σi,j := θj ◦ σi)

Suppose there are sets sig, ax and lem with (sig∪ ax∪ lem)∩DomD = ∅ andpre-signature morphisms θ1, . . . , θp and σ1, . . . , σn such that- ∀e ∈ DomD(Ki). θj(σi(e)) = σi,j(e) and σi,j(e) = e ∨ σi,j(e) 6∈ DomD- sigMj ⊆ θj(sig) ⊆ DomD(Mj), axMj ⊆ θj(ax) ⊆ DomD(Mj)

- ∀e ∈ lem holds ∃l ∈ {1, . . . p}. θl(e) ∈ lemMl , θi(e) = θj(e) implies i = j andθj(e) ∈ DomD implies loc(θj(e)) ∈Mj

- there is a support mapping SuppN for ax ∪⋃i=1,...,n σi(DomD(Ki)) and lem.

Then S ′ = (〈N ′,L′〉, loc′,Supp′) is a factorization of S with respect to M1, . . .,Mp and SuppN iff

N ′ :={N} ∪ {Nj |j ∈ {1, . . . p}} ∪ N \ {M1, . . .Mp}with N = 〈sig, ax, lem〉, Nj = 〈∅, ∅, lemMj \ θj(lem)〉

L′ :={K σ +3 K ′ ∈ L|K,K ′ 6∈ {M1, . . .Mp}

∪ {Kiσi +3 N|Ki

σi,j +3 Mj, j ∈ {1, . . . p}, i ∈ {1, . . . n}}

∪ {N θj +3 Nj|j ∈ {1, . . . p}}

∪ {K τ +3 Nj|K τ +3 Mj ∧ (∀i ∈ {1, . . . n}.K 6= Ki ∧ τ 6= σi,j)

∪ {Nj τ +3 K|Mjτ +3 K ∈ L, j ∈ {1, . . . p}}

loc′(x) :=

N if x ∈ DomD′(N) \⋃i=1,...,n DomD′(Ki)

Nj if x ∈ DomD(Nj) and ∀K σ +3 Nj. x 6∈ DomD′(K)

loc(x) otherwise.

Supp′ :=Supp ∪ SuppN .

Applying the factorization rule allows us to introduce a generic concept ofmonoids which is instantiated to obtain both previous copies of a monoid.

The factorization rule only covers a sufficient criterion demanding that eachtheory imported by a definition link to one concept is also imported via definitionlinks by all other concepts. The more complex case in which a theory is imported

11

Distrib(*, +)

R


Monoid(R, +) Monoid(R, *)

Distrib(*, +)

R


Monoid(R, +) Monoid(R, *) Monoid(G, op)

Factorization

Fig. 5. Factorization of our ring example

via a path of links can be handled by allowing one to shortcut a path in a singleglobal link. This results in the following rule.

Definition 8 (Transitive Enrichment). Let S := (〈N ,L〉, loc,Supp) be a

structuring of (Σ,Ax,Lem), K,N ∈ N and there is a path K _?σ +3 N between

both. Then, (〈N ,L ∪ {K σ +3 N}〉, loc,Supp) is a transitive enrichment of S.

Definition links in a development graph can be redundant, if there are al-ternatives paths which have the same morphisms or if they are not used in anyreachable node of the target. We formalize these notions as follows:

Definition 9 (Removable Link). Let S = (D, loc,Supp) (D = 〈N ,L〉) be astructuring of (Σ,Ax,Lem). Let l ∈ L and D′ = 〈N ,L\{l}〉. l is removable fromS and S ′ = (D′, loc,Supp) is a reduction of S iff

1. ∀l′ : Mσ +3 N. if l′ provides exclusively σ(e) from some eDomD(M) then

e ∈ DomD′(N) and l 6= l′;

2. ∀e ∈ DomD.∀M ∈ DGRootsD. if loc(e) _?σ +3M then there exists M ′ ∈

dD′e such that loc(e) _?σ +3M ′;

3. ∀φ ∈ LemD. Supp(φ) ⊆ DomD′(N) and ∀SiglocD (N) ⊆ DomD′(N).

Theorem 1 (Structuring Preservation). Let S := (D, loc,Supp) (D = 〈N ,L〉)be a structuring of (Σ,Ax,Lem). Then1. every horizontal split of S wrt. some N ∈ N and partitioning P of N ,2. every horizontal merge of S wrt. nodes {N1, . . . , Nk} ⊆ N ,3. every vertical split of S wrt. some N ∈ N and partitioning P of N ,4. every factorization of S wrt. nodes M1, . . .Mp ∈ N ,5. every multiplication of S wrt. N ,6. every transitive enrichment of S, and7. every reduction of S

is a structuring of (Σ,Ax,Lem).

5 Refactoring Process

The refactoring rules presented above provide the necessary instruments to ex-ternalize the structure inherent in a given flat theory. Nevertheless we need

12

CharSig. char, A,B, . . . , ZAx. ∀x : char. x = A ∨ · · · ∨ ZLem. —

StringSig. str, strnil, addcAx. ∀c : char, x : str. strnil 6= addc(c, x)

∀x : str. x = strnil ∨ ∃c : char, y : str.x = addc(c, y)∀c, c′ : char, x, x′ : str. addc(c, x) = addc(c′, x′)→ c = c′ ∧ x = x′, . . .}

Lem. —

StringopsSig. strapp, strlenAx. strlen(strnil) = 0

∀c : char, x : str. strlen(addc(c, x)) = succ(x)∀x : str. strapp(strnil, x) = x, . . .

Lem. ∀x, y, z : str. strapp(strapp(x, y), z)= strapp(x, strapp(y, z)), . . .

NatSig. nat, 0, succAx. ∀x : nat. 0 6= succ(x)

∀x : nat. x = 0 ∨ ∃y : nat. x = succ(y), . . .Lem. —

NatlistSig. nlist, natnil, addnAx. ∀n : nat, x : nlist. natnil 6= addn(n, x)

∀x : nlist. x = natnil∨ ∃n : nat, y : nlist. x = addn(n, y)∀n, n′ : nat, x, x′ : nlist. addn(n, x) = addn(n′, x′)→ n = n′ ∧ x = x′, . . .

Lem. —

NatlistopsSig. natapp, nlenAx. nlen(natnil) = 0

∀n : nat, x : nlist. nlen(addn(n, x)) = succ(x)∀x : nlist. natapp(natnil, x) = x, . . .

Lem. ∀x, y, z : nlist. natapp(natapp(x, y), z)= natapp(x, natapp(y, z)) . . .

Fig. 6. Heuristic motivated partitioning of a flat theory

appropriate heuristics to determine suitable partitions for horizontal or verticalsplits, in order to group together strongly related entities and afterwards makeexplicit analogous groupings of entities by factorization. One heuristic is, forinstance, to group together all axioms and lemmas exclusively devoted to a spe-cific sort. E.g. all axioms about characters, or all axioms about natural numbers.From there we carve out those entities about a different sort including an alreadyclassified sorts, and so on. Examples for these are strings or natural numbers,and in the next iteration lists of strings (cf. Fig. 6). These identified subsetscan then be further partitioned into those defining the basic datatype and the(inductive) functions and predicates over these. This separates the definition oflists of natural numbers from the append functions on these, for instance.

Such heuristics guide the application of the horizontal and vertical split rules.From there we can identify nodes with isomorphic local entries and, using thetransitivity and reduction rules together with further applications of the splitrules try to enable the application of the factorization rule.

We illustrate the refactoring process with the help of a further example,where we start with flattened theories, and step by step carve out the intrinsicstructure. The example from Fig. 6 is about lists of natural numbers, stringsformed over characters and length functions operating on such lists. All thesignature symbols, axioms, and lemmas occur in a single node in the initialdevelopment graph. For sake of readability we partition the set of these entitiesinto pairwise disjunct subsets and name them accordingly (cf. [8])

Fig. 7 illustrates the structure formation process for this example. In thefirst step the definitions of Nat and Char are separated from the rest by applyingthe vertical split rule twice. In the next steps we form the individual theories ofString and Stringops respectively. Notice that Stringops includes strlen countingcharacters in a string such that Stringops has to import Nat. After applying againa vertical split we obtain the theories of Natlist and Natlistops. Since the localaxioms of String and Natlist are renamings of each other we factorize these localaxioms to a new theory of generic lists.

13

Char, String, Stringops, Nat, Natlist, Natlistops

Char Nat

String, Stringops, Natlist, Natlistops

String Nat

Stringops, Natlist, Natlistops

Char

Stringops

Nat

Natlist, Natlistops

Char

String

Stringops

Nat

Natlist, Natlistops

Char

String

Stringops

Nat

Natlistops

Char

String Natlist

Stringops

Nat

Natlistops

Char

String Natlist Genlist

Stringops

Nat

Natlistops

Char

[String] [Natlist] Genlist

Stringops

Nat

Natlistops

Char

String Natlist Genlist

Genlistops

1. Initial Graph 2. Vertical Split (2x) 3. Vertical Split 4. Vertical Split

5. Superfluous Link 6. Vertical Split 7. Factorization

8. Transitive Link (2x) 9. Factorization

Fig. 7. Lists

GenlistSig. list, nil, addAx. ∀n : elem, x : list. nil 6= add(n, x)

∀x : list. x = nil∨ ∃n : elem, y : list. x = add(n, y)∀n, n′ : elem, x, x′ : list. add(n, x) = add(n′, x′)→ n = n′ ∧ x = x′, . . .

Lem. —

This node consists of the renamedlocal axioms of String (or Natlist,respectively) plus the necessary sig-nature definitions to obtain a well-formed development graph node.This theory is imported via defini-tion links to String and Natlist using corresponding pre-signature morphismsto map it to list of chars and numbers, respectively. Both, String and Natlisthave now empty local signatures and axioms. In Fig 7 we indicate those nodeswith empty local signature, empty local axioms and empty local lemmata by alight-gray color of the node name.

6 Related Work

There is related work to (re-)structure ontologies (after flattening) following lo-cality criteria into modules containing all knowledge about specific concepts.However, since ontology languages have simple imports without renamings, du-plicated sub-ontologies that are for instance equal up to renaming remain hidden.

14

To excavate these ontologies requires other heuristics than pure logic based onesto guide the structuring process, and therefore we deliberately did not imposespecific criteria how to apply the rules beyond the minimal conditions that thedependency relations among concepts, relations and facts and derived facts andrelations is preserved. It is easy to see that the modularizations of ontologies notincluding derived lemmas obtained by using the locality criteria from [9] canbe constructed with our rules by starting from a single flattened ontology andsingling out the modules.

7 Conclusion

Based on the definition of structurings as concise development graphs, we pre-sented transformation rules that allow one to make explicit common structureshidden in a flat theory in terms of development graphs. It provides a frameworkto modularize flattened ontologies in a useful way as illustrated by two simpleexamples. An implementation is planned to provide an interactive tool to mod-ularize and factorize large ontologies as well as (semi-)automatic procedures.

References

1. S. Autexier and D. Hutter. Mind the gap - maintaining formal developments inMAYA. In Festschrift in Honor of J.H. Siekmann. Springer, LNCS 2605, 2005.

2. S. Autexier, D. Hutter, and T. Mossakowski. Change management for hetero-geneous development graphs. In S. Siegler and N. Wasser, editors, Verification,Induction, Termination Analysis (Festschrift for Christoph Walther), volume 6463of LNAI, pages 54–80. Springer, 2010.

3. J. A. Goguen and R. M. Burstall. Institutions: Abstract model theory for speci-fication and programming. Journal of the Association for Computing Machinery,39:95–146, 1992. Predecessor in: LNCS 164, 221–256, 1984.

4. D. Hutter. Management of change in verification systems. In Proceedings 15thIEEE International Conference on Automated Software Engineering, ASE-2000,pages 23–34. IEEE Computer Society, 2000.

5. O. Kutz and T. Mossakowski. A modular consistency proof for DOLCE. In TheAssociation for the Advancement of Artificial Intelligence (AAAI-2011), 2011.

6. T. Mossakowski, S. Autexier, and D. Hutter. Development graphs - proof manage-ment for structured specifications. Journal of Logic and Algebraic Programming,special issue on Algebraic Specification and Development Techniques, 67(1-2):114–145, april 2006.

7. I. Niles and A. Pease. Towards a standard upper ontology. In Formal Ontology inInformation Systems (FOIS-2001). ACM Press, 2001.

8. I. Normann and M. Kohlhase. Extended formula normalization for ε -retrievaland sharing of mathematical knowledge. In Towards Mechanized MathematicalAssistants (Calculemus/MKM). Springer, LNCS 4573, 2007.

9. C. D. Vescovo, B. Parsia, U. Sattler, and T. Schneider. The modular structureof an ontology: Atomic decomposition. In Proceedings of the 22nd InternationalJoint Conference on Artificial Intelligence (IJCAI’10), pages 2232–2237, 2010.

10. A. Voronkov. Inconsistencies in ontologies. In JELIA 2006. Springer LNCS 4160,2006.

15

Characterizing Modular Ontologies

Sarra Ben Abbes1, Andreas Scheuermann2, Thomas Meilender3, and Mathieud’Aquin4

1Laboratoire d’Informatique de Paris Nord (UMR 7030), CNRS, Paris 13University, Sorbonne Paris Cite, France. E-mail:

[email protected] of Hohenheim, Information Systems 2, 70599 Stuttgart, Germany.

Email: [email protected] 1, LORIA (UMR 7503 CNRS-INPL-INRIA-Nancy 2-UHP),

France. Email: [email protected] Media Institute, The Open University, Walton Hall, Milton

Keynes, MK7 6AA, UK. E-mail: [email protected]

Abstract. Since large monolithic ontologies are difficult to handle andreuse, ontology modularization has attracted increasing attention. Sev-eral approaches and tools have been developed to support ontology mod-ularization. Despite these efforts, a lack of knowledge about character-istics of modularly organized ontologies prevents further development.This work aims at characterizing modular ontologies. Therefore, we ana-lyze existing modular ontologies by applying selected metrics from soft-ware engineering in order to identify recurring structures, i.e. patternsin modularly organized ontologies. The contribution is a set of four pat-terns which characterize modularly organized ontologies.Keywords: modularization, patterns, modules, ontology.

1 Introduction

Difficulties in reusing and maintaining large monolithic ontologies have resultedin an increasing interest in modularizing ontologies. In the past, several ap-proaches and tools (e.g., SWOOP1, NeOn Toolkit2) have been proposed to sup-port the modularization of ontologies. Each of these approaches and tools re-spectively incorporates its own definition and notion of modular ontologies andcriteria underpinning ontology modularization [5]. This proliferation is mainlydue to the fact that the area of ontology modularization appears to be still in itsinfancy. Thus, it lacks the mature, well-defined, well-understood, and commonlyagreed upon definitions and concepts as sociated with modularization in the areaof software engineering [19]. This lack of knowledge about ontology modulariza-tion and, particularly, the lack of knowledge about characteristics of modularontologies prevents its further development. Being able to characterize modular

1 http://www.mindswap.org/2004/2 http://neon-toolkit.org/wiki/Main_Page

16

ontologies would allow for (1) comparing different modularization approaches,(2) assessing the quality of modular ontologies with respect to manually modu-larized ontologies, (3) customizing ontology modularization by providing mod-ularization criteria, and (4) improving the area of ontology modularization as awhole.The goal of this work is to characterize modularly organized ontologies to con-tribute to a better understanding of ontology modularization. For this purpose,we (1) extract a number of existing ontologies from the web, (2) select andadopt metrics from software engineering in order to both (3) identify recurringstructures (patterns) in modularly organized ontologies and (4) characterize theidentified patterns. The contribution is a set of four patterns, which characterizemodularly organized ontologies.The rest of this paper is organized as follows: Section 2 provides an introductionto ontology modularization and reviews related work. Section 3 reports on theresearch design whereas section 4 presents and discusses the results. Section 5draws a conclusion and points to future avenues of research.

2 Ontology Modularization

2.1 Modular Ontologies

The main idea of modular ontologies originates from the general notion of modu-lar software in the area of software engineering. Correspondingly, ontology modu-larization can be interpreted as decomposing potentially large and monolithic on-tologies into (a set of) smaller and interlinked components (modules). Therefore,an ontology module can be considered as a loosely coupled and self-containedcomponent of an ontology maintaining relationships to other ontology modules.Thereby, ontology modules are themselves ontologies [4].In general, ontology modularization aims at providing users of ontologies withthe knowledge they require, reducing the scope as much as possible to what isstrictly necessary. In particular, ontology modules (1) facilitate knowledge reuseacross various applications, (2) are easier to build, maintain, and replace, (3)enable distributed engineering of ontology modules over different locations anddifferent areas of expertise, and (4) enable effective management and browsingof modules [12].

2.2 Approaches for Ontology Modularization

In recent years, the problem of ontology modularization has attracted more andmore attention and, thus, several different approaches for modularizing ontolo-gies appeared. These approaches can be classified in two main categories.The first main category comprises approaches that focus on the composition ofexisting ontologies by means of integrating and mapping ontologies. On the onehand, approaches addressing integration of existing ontologies are owl:import,partial semantic import, e.g., [8, 5], package-based description logics, e.g., [2].

17

On the other hand, mapping approaches basically aim at (inter-)linking sets ofontology modules. The following approaches can be assigned to these two for-malisms: distributed description logics, e.g., [17, 3], ε-connections, e.g., [15, 10].Other approaches establish the relationship between various modular ontologyformalisms [9] in order to have special syntax in the ontology languages for amodeling perspective.The second main category comprises approaches for modularizing ontologies interms of ontology partitioning and ontology module extraction. On the one hand,ontology partitioning aims at splitting up an existing ontology into a set of on-tology modules. Approaches for partitioning ontologies are proposed by [13, 11]whereas [18] proposes a tool. On the other hand, ontology module extraction,which is also called segmentation [16] or traversal view extraction [14], aims atreducing an ontology to its relevant sub-parts. Approaches for ontology moduleextraction are the subject of [14, 16], and the PROMPT tool [14]. More detailsof this category of approaches are discussed in [5].

2.3 Criteria for Ontology Modularization

Criteria for modularizing ontologies generally aim at characterizing modular on-tologies. To the best of our knowledge, only [5] explicitly deals with criteriafor ontology modularization. Therefore, [5] distinguishes between criteria origi-nating in software engineering, logical criteria, local criteria, structural criteria,quality of modules, and relations between modules. First, criteria from soft-ware engineering comprise encapsulation and coherence whereas logical criteriainclude local correctness and local completeness. Structural criteria, which arealso discussed by [6], focus on size and intra-module coherence. It is proposed todetermine the quality of modules in terms of module cohesion, richness of therepresentation, and domain coverage. At least, to assess the relation betweenmodules the criteria of connectedness, redundancy, and inter-module distancecan be applied. Against this background, the evaluation of ontology modular-ization respectively applies a subset of the proposed set of criteria with respectto different scenarios and ontology modularization techniques.Based on best practices in Ontology Engineering, ontology design patterns (ODPs)simplify ontology design by providing a ”modelling solution to solve recurrentontology design problems” [1]. Several types of ODPs has already been iden-tified, e.g., logical patterns that are used to solve problems of expressivity, ornaming patterns that are conventions for naming elements. Among these types,architectural ontology design patterns (AODPs) aim at describing the overallshape of the ontology. More precisely, external AODPs describe the modulararchitecture of an ontology by considering a modular ontology as an ontologynetwork. Involved ontologies are considered as modules and are connected by theimport operation. A semantic web portal3 has been proposed as a repository forODPs. Unfortunately, to our knowledge, no work has been done on proposingand describing external AODPs.

3 http://ontologydesignpatterns.org

18

3 Approach

In order to characterize reoccurring structures in modularly organized ontologies,the following approach establishes a methodological basis to guide the researchprogram of this work. This approach comprises six subsequent steps:

1. Search step: the goal of the first step is to gather modularly organized

ontologies. We use the Semantic Web gateway Watson4 to search for availablemodular ontologies from the WWW. The search query focused on import-relationships between ontologies covering the same domain. The result is aset of 77 modularly organized ontologies.

2. Cleaning up step: the second step aims at cleaning up the initial searchresults in order to establish a thorough basis for further experiments. Thisis necessary because the set of 77 modular ontologies is afflicted with redun-dancies and incompleteness. This results in a set of 38 modular organizedontologies constituting a thorough basis for characterizing ontology modu-larization.

3. Selection of metrics step: the third step selects a set of appropriate met-rics to characterize modularly organized ontologies. The modular ontologiescould be described by various indicators such as the distribution of classes,the network of links between modules, the number of internal links in mod-ules, etc. In general, the literature provides a plethora of various metrics,which could be applied for characterising modular ontologies. As a startingpoint, this work focuses on metrics originating in the area of software en-gineering, due to its maturity. In particular, this work adopts the followingmetrics from software engineering, which are easier to compute, in order tocharacterize modular ontologies [7]: (i) size of the module: the numberof classes and properties (object and datatype properties), (ii) cohesion ofthe module: this metric is a value which is between 0 and 1 and is specifiedas follows:

* Hierarchical Class Cohesion (HCC): the number of direct and indirecthierarchical class links.

HCC = 2∗(NdHC+NidHC)NC2−NC

where:NdHC: Number of direct Hierarchical relationships between Classes,NidHC: Number of indirect Hierarchical relationships between Classes,and NC: Number of Classes.

* Role Cohesion (RC): the number of direct and indirect hierarchical rolelinks.

RC = 2∗(NdR+NidR)NRoles2−NRoles

where: NdR: Number of direct roles between Classes, NidR: Number ofindirect roles between Classes, and NRoles: Number of Roles.

4 http://watson.open.ac.uk/

19

* Object Property Cohesion (OPC): the number of classes which have beenassociated through the particular object property (domain and range).

OPC =2∗∑NRoles

i=1NdC(ri)∗NrC(ri)

NRoles∗(Nc2−NC)

where: NdC(ri): Number of ontology Classes in the domain of the roleri, NrC(ri): Number of ontology Classes in the range of the role ri, NC:Number of Classes, and NRoles: Number of Roles.

The cohesion measure is computed as follows:

Cohesion = α∗HCC+β∗RC+δ∗OPCα+β+δ

where: α, β and δ specify the impact of each type of hierarchical class, roleor object property cohesion. In our case, we choose α = β = δ = 1.(iii) coupling of the module: it takes an estimation of the inter-dependencyof different modules and is specified as follows:

* Hierarchical class dependency (HCD): the number of all direct and indi-rect hierarchical class relationships to foreign ontologies.

HCD = 12 ∗ (NedHCNdHC + NeidHC

NidHC )where: NedHC: Number of direct Hierarchical class dependencies be-tween local classes and external classes, and NeidHC: Number of indi-rect Hierarchical class dependencies between local classes and externalclasses.

* Hierarchical role dependency (HRD): the number of all direct and indi-rect hierarchical role relationships to foreign ontologies.

HRD = 12 ∗ (NdHRNdHR + NeidHR

NidHR )where NdHR: Number of direct roles dependencies between local classesand external classes, and NeidHR: Number of indirect roles dependen-cies between local classes and external classes.

* Object property dependency (OPD): the number of roles that associateexternal classes to local ones.

OPD = NeRolesNRoles

where: NeRoles: Number of all roles that have an external class in theirdomain or range, NRoles: Number of all existing roles in the ontology.

* Axiom dependency (AD) : a role or a class is associated to an externalontological element through an inclusion axiom.

AD =

∑NAxioms

i=1externalAssociationNumber(axmi)∑NAxioms

i=1LS(axmi∗RS(axmi)

where: LS(axm): the size of the left sides of the axiom axm, RS(axm):the size of the right sides of the axiom axm, LSE(axm): the numberof external elements in the left sides of the axiom axm, RSE(axm):the number of external elements in the right sides of the axiom axm, andexternalAssociationNumber(axm): the number of all external ontologi-cal elements that have been associated through the axiom axm to internalelements. externalAssociationNumber(axm) = LSE(axmi)∗RS(axmi)+ LS(axmi) ∗RSE(axmi) - LSE(axmi) ∗RSE(axmi).

20

The coupling measure is computed as follows:

Coupling = α∗HCD+β∗HRD+δ∗OPC+γ∗ADα+β+δ+γ

where, in our case, α = β = δ = γ = 14. Metrics implementation step: the fourth step implements the selected

metrics. The computation was performed by the OWL API5 and the reasonerHermiT6. This step sets up the (technical) evaluation framework.

5. Analysis step: the fifth step is the analysis of the basic population of mod-ularly organized ontologies.

6. Result step: the sixth step involves synthesis and discussion of the resultsfrom the analysis in order to characterize modular ontologies.

4 Results and Discussion

A set of four patterns, which characterize M odular Ontologies MO, are proposedusing previous metrics (size, cohesion and coupling). This section presents anddiscusses the results and the characteristics of each kind of pattern.

4.1 Pattern type 1: 1 module importing n modules

Pattern 1 contains one module which imports n other modules. For instance (Fig-ure 1), the module WildNET.owl imports several modules such as Animal.owl,AnimalSighting.owl, BirdObservers.owl, Birds.owl, etc. The pattern that we pro-pose conforms to an aggregation. This pattern establishes a relationship be-tween a single module and a set of modules in the same ontology. This link isunidirectional. Applying the size metric (Table 1), the first part of the ontol-

Fig. 1. 1 module importing n modules

ogy (one module) is very small (the module WildNET.owl contains 0 concepts)

5 http://owlapi.sourceforge.net/6 http://hermit-reasoner.com/

21

and the second part (N modules) is not structured and is respectively biggerin size. Applying the cohesion and coupling metrics (see Table 1), Pattern 1has a high cohesion compared to the coupling metric. We consider the patterncohesion metric to be an indicator of the degree to which the elements in themodule belong together. The idea of this pattern is that the concepts groupedin an ontology should be conceptually related for a particular domain in orderto achieve common goals.

Metrics Size Cohesion Coupling

MO11

WildNET.owl 0 0,25 0SnakeSightings.owl 18331 1 0,25

Snakes.owl 95 0,5 0,25Sites.owl 8 1 0,25

Observers.owl 7 0,53 0,25Geography.owl 16 0,41 0,25Combined.owl 3497 0,67 0,12

ClimateSensors.owl 255 0,61 0,6ClimateReadings.owl 63 0,5 0,25

Climate.owl 24 0,61 0,15BirdSites.owl 437 1 0,25

BirdSightings.owl 10401 1 0,5Birds.owl 1745 0,5 0,25

BirdObservers.owl 303 0,5 0,25AnimalSighting.owl 26 0,66 0,25

Animal.owl 9 0,82 0,11

Table 1. Results of Pattern 1 to n

4.2 Pattern type 2: n modules importing 1 module

Pattern 2 contains n modules, which respectively import one module. For in-stance (Figure 2), there are three independent modules importing one mod-ule, which containes general knowledge (biopax-level1.owl). The pattern that wepropose corresponds to inheritance. This pattern establishes a correspondencebetween a set of modules and a single module in the same ontology. This corre-spondence is unidirectional. Applying the three metrics (Table 2), the first partof pattern 2 (n modules), biopax-example-ecocyc-glycolysis.owl, biopax-example-Xwnt-b-catenin.owl and Xwnt-b-catenin xref have a high coupling metric withregard to the second part of the pattern (one module) biopax-level1.owl. Thismeans that pattern 2 is characterized by the interconnections between modules.The degree of coupling depends on how complicated the connections are andon the type of connections between modules. As we can see, the second part ofpattern 2 has a high cohesion because it encloses all other modules, which arestrongly related.

22

Fig. 2. n modules importing 1 module


MO21

biopax-example-ecocyc-glycolysis.owl 2139 0,20 0,72biopax-example-Xwnt-b-catenin.owl 236 0 0,2

biopax-level1.owl 285 0,25 0,13Xwnt-b-catenin xref 265 0 0,2

Table 2. Results of Pattern n to 1

4.3 Pattern type 3: n modules importing n-1 modules

Pattern 3 contains n modules, which import n-1 modules. For instance (Figure 3),we have three dependent modules: dublincore.owl, terms.owl and dcmitype.owl.The correspondence between modules is bidirectional. The distinguishing charac-teristic of Pattern 3 is that the n modules each import each other. Applying size,

Fig. 3. n modules importing n-1 modules

cohesion and coupling metrics (Table 3), the module dublincore.owl has a smallsize (0 concepts) with regard to other modules dcmitype.owl and terms.owl. Allmodules have the same degree of relatedness of concepts (cohesion) 20%. The

23

coupling metric of the module dublincore.owl is null. In this case, pattern 3 istransformed to pattern 1 and has the same characteristics.


MO32

dcmitype.owl 21 0,20 0,02dublincore.owl 0 0,20 0

terms.owl 47 0,20 0,25

Table 3. Results of Pattern n to n-1

4.4 Pattern type 4: Pattern mix

Pattern 4 combines all previous patterns (Patterns 1, 2 and 3). For instance(Figure 4), we find partterns 1 (5 * Pattern 1) and 2 (3 * Pattern 2). Theproposed pattern is pattern mix. The correspondence can be undirectionaland bidirectional. The major characteristic of this type of pattern is the highest

Fig. 4. Combination of patterns 1, 2, and 3

coupling metric with regard to the cohesion one. Two modular ontologies iso-metadata and iso-19115, have the same size, cohesion and coupling but they donot have a relationship like import.

4.5 Occurrence of Pattern Types

Having introduced and defined four types of patterns in order to characterizemodularly organized ontologies, we consider how often these types of patterns

24


MO41

iso-metadata 2214 0,02 0,15iso-19108 159 0,08 0,16iso-19103 224 0,15 0,16iso-19115 2214 0,02 0,15

Table 4. Results of Pattern mix

respectively occur. Figure 5 provides an overview of the occurrence of the differ-ent types of patterns in a population of 38 modularly organized ontologies. It is

Fig. 5. Occurrence of Pattern Types

interesting to observe that pattern type 1 accounts for about 37%. The reasonfor this may be the fact that this type of pattern appears to be very intuitive. Itcould therefore be concluded that it implicitly constitutes the rationale under-lying a large part of (semi-)automatic or manual approaches for modularizingontologies. Similarly, pattern type 4 also accounts for about 37% of the basic pop-ulation, i.e. 14 modularly organized ontologies. Pattern type 4 combines Pattern1, Pattern 2, and Pattern 3. On the one hand, it is obvious that not all modu-larly organized ontologies have a rather straightforward structure, which couldbe easily characterized. This is especially true when assuming (semi-)automaticor manual modularizing approaches, which do not use clear and precisely de-fined criteria. And even when these criteria are clearly and precisely defined, themodularly organized ontologies could also have such a structure depending onthe overall purpose of modularization. On the other hand, it is interesting tosee that even more complex structures can (almost completely) be characterizedby more simple and straightforward structural forms. Moreover, pattern type 2and pattern type 3 equally account for about 13%, i.e. 5 modularly organizedontologies. This is particularly interesting because pattern type 2 is reasonable.This is due to the fact that it appears obvious that there exists an ontology thatis of significance for several other ontologies. On the contrary, pattern type 3 ismuch less reasonable than pattern type 2. It is really hard to understand whyontology modules respectively import each other.

25

In this context, it can be observed that domain ontologies combine a clear struc-ture and organization. This means that modularization of domain ontologies tendto rely on pattern type 1 or pattern type 2. In contrast, it appears that top-levelontologies (which represent relevant knowledge to a particular domain such asmedical domain) have a less straightforward structure and organization partic-ularly when compared to domain ontologies (which represent upper (generic)ontologies, covered the knowledge of many domain types such as Biomedicalontology, Dolce). An example for this is dublincore.owl, terms.owl and dcmi-type.owl, which can be characterized by pattern type 3 (Figure 3).

5 Conclusion and Future Work

This work aims at characterizing modularly organized ontologies to contributeto a better understanding of ontology modularization. We introduced the no-tion of modular ontologies, reported on approaches for ontology modularization,and reviewed existing efforts to characterize modular ontologies. To characterizemodular ontologies, we followed an approach comprising six consecutive steps.This approach mainly includes the extraction and selection of modular ontolo-gies, the selection of a set of metrics from software engineering to analyse modu-lar ontologies, and the evaluation of the analysis results. The evaluation resultsin a set of four patterns, which allow for characterizing the modular organizationof ontologies. These patterns show amongst other things that modularly orga-nized domain ontologies have a clear structure whereas top-level ontologies tendto have a rather confusing modular organisation.In the future work, we aim at using firstly further Semantic Web gateways suchas Falcons or Swoogle to identify and extract additional ontologies to gain alarger basic population. Second, extending the set of metrics and applying themto the ontologies should provide further insights to modular ontologies. Third, itwould be interesting to create a comparison framework to conduct experimentswith different modularization approaches, comparing them to each other or tomanually modularized ontologies.

Acknowledgment

The authors would like to thank the organizers of Summer School on ontologyengineering and the Semantic Web 2001 (SSSW’2011).

References

1. Gangemi A. and Presutti V. Ontology Design Patterns, pages 221–243. Springer,Berlin, 2009.

2. Jie Bao, George Voutsadakis, Giora Slutzki, and Vasant Honavar. Package-baseddescription logics. In Modular Ontologies, pages 349–371. Springer, 2009.

26

3. Alexander Borgida and Luciano Serafini. Distributed description logics: Directeddomain correspondences in federated information sources. In On the Move toMeaningful Internet Systems, pages 36–53, London, UK, 2002. Springer-Verlag.

4. Mathieu d’Aquin, Peter Haase, Sebastian Rudolph, Jerome Euzenat, Antoine Zim-mermann, Martin Dzbor, Marta Iglesias, Yves Jacques, Caterina Caracciolo, Car-los Buil Aranda, and Jose Manuel Gomez. D1.1.3: Neon formalisms for modular-ization: Syntax, semantics, algebra. deliverable 1.1.3, NeOn Integrated Project,2008.

5. Mathieu d’Aquin, Anne Schlicht, Heiner Stuckenschmidt, and Marta Sabou. Mod-ular ontologies. chapter Criteria and Evaluation for Ontology Modularization Tech-niques, pages 67–89. Springer-Verlag, Berlin, Heidelberg, 2009.

6. Frederico Luiz Goncalves de Freitas, Zacharias Candeias Jr., and Heiner Stucken-schmidt. Towards checking laws’ consistency through ontology design: The case ofbrazilian vehicles’ laws. JTAER, 6:112–126, 2011.

7. Faezeh Ensan and Weichang Du. A Modular Approach to Scalable Ontology De-velopment, page 79. Springer Science+Business Media, 2010.

8. Bernardo Cuenca Grau, Ian Horrocks, Yevgeny Kazakov, and Ulrike Sattler. Alogical framework for modularity of ontologies. In JCAI-2007, 2007.

9. Bernardo Cuenca Grau, Ian Horrocks, Yevgeny Kazakov, and Ulrike Sattler. Ex-tracting modules from ontologies: A logic-based approach. In Modular Ontologies,pages 159–186. 2009.

10. Bernardo Cuenca Grau, Bijan Parsia, and Evren Sirin. Working with multipleontologies on the semantic web. In International Semantic Web Conference, pages620–634. Springer, 2004.

11. Bernardo Cuenca Grau, Bijan Parsia, Evren Sirin, and Aditya Kalyanpur. Mod-ularity and web ontologies. In 20th International Conference on Principles ofKnowledge Representation and Reasonin, pages 198–209. AAAI Press, 2006.

12. Mustafa Jarrar. Towards Methodological Principles for Ontology Engineering. PhDthesis, Vrije Universiteit Brussel, 2005.

13. Bill MacCartney, Sheila McIlraith, Eyal Amir, and Tomas E. Uribe. Practicalpartition-based theorem proving for large knowledge bases. In Proceedings of the18th international joint conference on Artificial intelligence, pages 89–96, San Fran-cisco, USA, 2003. Morgan Kaufmann Publishers Inc.

14. Natalya F. Noy and Mark A. Musen. Specifying ontology views by traversal.In International Semantic Web Conference, volume 3298/2004, pages 713–725.Springer Berlin, 2004.

15. Ana Armas Romero, Bernardo Cuenca Grau, and Ian Horrocks. Modular combi-nation of reasoners for ontology classification. In Description Logics, 2012.

16. Julian Seidenberg and Alan Rector. Web ontology segmentation: analysis, classifi-cation and use. In Proceedings of the 15th international conference on World WideWeb, pages 13–22, New York, USA, 2006. ACM.

17. Luciano Serafini and Andrei Tamilin. Drago: Distributed reasoning architecturefor the semantic web. In ESWC, pages 361–376. Springer, 2005.

18. Heiner Stuckenschmidt and Michel Klein. Structure-based partitioning of largeconcept hierarchies. In International Semantic Web Conference, pages 289–303,2004.

19. Kevin J. Sullivan, William G. Griswold, Yuanfang Cai, and Ben Hallen. Thestructure and value of modularity in software design. In Proceedings of the 8thEuropean software engineering conference, pages 99–108, New York, USA, 2001.ACM.

27

Combining three Ways of Conveying Knowledge:Modularization of Domain, Terminological, and

Linguistic Knowledge in Ontologies

Thierry Declerck1 and Dagmar Gromann2

1 DFKI GmbH, Language Technology Department,Stuhlsatzenhausweg 3, D-66123 Saarbruecken, Germany

[email protected] Vienna University of Economics and Business,

Nordbergstrasse 15, 1090 Vienna, [email protected]

Abstract. Recently, an overall trend towards increasing complexity ofontologies could be observed, not only in terms of domain modeling,where the complexity should correspond to the information to be mod-eled, but also as regards the addition of further information, which couldbe modeled as external resources to the domain model and linked to itsrelevant elements. This concerns the addition of terminological and lin-guistic information to the description of classes and properties of ontolo-gies. To respond to this development, we propose a functional approachto the modularization of ontologies, based on terminological, linguistic,and conceptual functions each module fulfills. Only the conceptual ele-ments and their structural properties should remain in the domain model,whereas the formalized terminology and linguistics are described in inde-pendent modules referencing the domain models. We provide examplesof such complexity in Knowledge Representation systems, discuss relatedwork, and present our approach to modularization in detail.

Keywords: ontology, terminology, linguistics, lexicon, LabelNet, SKOS,TBX, TMF, lemon

1 Introduction

Nowadays, ontologies in general not only contain domain knowledge but furtherinformation central to various tasks of ontology-based systems. For instance,terminological and linguistic details are substantially different in nature from theformer and usually encoded in labels adjoined to IDs of classes and properties.

There is a growing realization among many researchers that it might notbe the best practice to encapsulate such information within the description ofclasses and properties of domain ontologies. Proposals have already been madefor the separation of terminology and lexicon from domain ontologies and forstrategies on the linking of this information to the elements of the domain model

28

in a more principled way [1–6]. Our approach to modularization can be consid-ered functional, as it is based on the functions the terminological and linguisticelements used in the context of domain models fulfill. Several tasks such assupporting Information Systems (IS), semantic annotation, lexicographic appli-cations, translation, localization among many others benefit from encapsulatedand reusable functions as presented herein.

The need to cull content of labels in ontologies has increased with more pos-sibilities to linguistically process labels, adding linguistic annotations to theirtextual content and thereby more complexity to the ontology. As a result, re-usability and sharing of the information accumulated is considerably impededsince navigation through the entire ontology is required in order to find linguis-tically annotated terms that are relevant to ontology-driven applications.

Therefore, following a series of similar proposals [1–3], extending and speci-fying some points made, we suggest a strict modularization of domain ontologiesin a class hierarchy, a terminology, and a linguistic component, all representedin RDF/OWL and related to each other by means of the Simple Knowledge Or-ganization Scheme of the W3C (SKOS) and similar linking mechanisms. Thus, alexical entry can be used by several terminologies, terms of which are employedin different specific ontologies.

The proposed model largely facilitates the detection of interrelations amongontologies, rendering the formation of new ontologies on the basis of existingindependently built ones faster and less complicated, because the model stripsontologies to their core and most essential elements. It equally aims at morecompact terminologies and lexicons used in relation with domain modeling, sincevariants of these can be more easily detected and collapsed onto harmonized sets.Thus, our three-module system represents a mechanism for increasing flexibilityin reusing ontologies as well as domain-specific lexicons and terminologies.

2 Steadily Growing Complexity of Ontologies

A class defined in the RadLex ontology3 serves to exemplify the growing com-plexity in ontologies. As can be seen in the example below, the class RID 13218contains all information about its superordinate class and the related properties.Furthermore, information on natural language expressions associated with theclass (synonym, NonEnglish Name, Preferred Name, ORIG Preferred Name, Def-inition) as well as other knowledge sources, i.e., FMAID 67112, were accumulatedto form one single ontology class. The knowledge source refers to the Founda-tional Model of Anatomy (FMA)4. Upon looking at the entry in the FMA ontol-ogy, it can quickly be inferred that elements have just been duplicated, such asthe definition, synonym, the (German) Non-English part and the label (preferredname).

3 Version 3, http://bioportal.bioontology.org/ontologies/2027?p=terms4 The URL for the indicated ID is http://bioportal.bioontology.org/ontologies/44507/?p=terms&conceptid=fma\%3AImmaterial_anatomical_entity

29

<class><name>RID13218</name><type>anatomy_metaclass</type><own_slot_value>

<slot_reference>FMAID</slot_reference><value value_type="string">67112</value>

</own_slot_value><own_slot_value>

<slot_reference>Synonym</slot_reference><value value_type="string">immaterial physical anatomicalentity</value>


<slot_reference>Non-English_name</slot_reference><value value_type="string">immaterielles korperlichesanatomisches Wesen</value>


<slot_reference>Preferred_name</slot_reference><value value_type="string">immaterial anatomical entity</value>


<slot_reference>ORIG_Preferred_Name</slot_reference><value value_type="string">immaterial anatomical entity</value>


<slot_reference>Definition</slot_reference><value value_type="string">Physical anatomical entity which is athree-dimensional space, surface, line or point associated with amaterial anatomical entity. Examples: body space, surface of heart,costal margin, apex of right lung, anterior compartment ofright arm.</value>


<slot_reference>Is_A</slot_reference>4 <value value_type="class">RID13441</value>


<slot_reference>Has_Subtype</slot_reference><value value_type="class">RID13221</value><value value_type="class">RID13250</value><value value_type="class">RID13291</value><value value_type="class">RID13307</value><value value_type="class">RID15845</value><value value_type="class">RID13217</value>


<slot_reference>:ROLE</slot_reference><value value_type="string">Concrete</value>

</own_slot_value><superclass>RID13441</superclass>

</class>

[Example of growing complexity in ontologies by means of a RadLex class.]

It seems that the RadLex ontology in this particular case reuses many el-ements of FMA, as the focus of RadLex is rather on phenomena that can beobserved in correlation with specific organs and not the organs themselves.While this integration of terminological and linguistic knowledge in the field ofanatomy is obviously a good move, re-using established terminology, it appearsthat it could be more beneficial to provide this pool of information independentlyfrom the ontologies modeling the domain. Clear links between the original ontol-ogy and terms used as well as linguistic data substantially improve the level of

30

re-usability and readability of semi-structured or definitional natural languageexpressions across a large number of ontologies (or taxonomies).

3 Related Models

Several approaches and models emphasize the importance of separating concep-tual, terminological, and lexical information. Some concentrate on the termino-logical aspect [6, 9], while others focus on the lexical aspect [10, 4]. Buitelaaret al. [10] propose a model called LexInfo and suggest adding lexical, morpho-syntactic, and chunking information to the labels of ontology classes. The authorsdesign an OWL representation scheme for this set of linguistic information andits linking to ontology classes. LexInfo supports in this among other aspects theontology-based semantic annotation of text.

The Terminae [5] model suggests having two distinct, but interlinked highlevels of classes within ontologies: one for the hierarchy of concepts (and asso-ciated relations), and one for (a list of) terms that point to the concepts theydenote. Thus, the concept level world gets cleaner and, for example, the verycumbersome manner of encoding synonyms and other related terms as it is donein RadLex (see RadLex example above) can be avoided, since synonyms are en-coded on the terminological level of the ontology. One major advantage of thisapproach is that a subset of a terminology can more easily be identified and re-used in other (domain) ontologies. Reymont et al. [9] provide an example of theapplication of Terminae in the automotive domain. We note that in Terminaethe lemma and part-of-speech information is encoded within the term classes.

A third approach, suggesting the merging of LexInfo and Terminae is CTL[2]. CTL applies the full model of LexInfo to each word in a term. Thereby itcompletely takes lexical information out of the descriptions of both domain andterm classes. This leads to three layers of description within the ontology, wherea meta-class has three main subclasses describing domain-class, terminology, andlinguistic hierarchies. The linguistic layer is based on and extends LexInfo. How-ever, CTL neither proposed a formalization nor an implementation, but insteadgenerally described such an approach. Both Terminae and CTL accumulate thedifferent modules (meta-classes) in one ontology, which supports an internal viewon the interaction between them, rendering linking of terms to other ontologiesmore difficult.

Some approaches emphasize the added benefit of a combination of all threemodules for specific tasks (e.g. [7]). Bodenreider [7] makes use of existing termi-nologies, ontologies, and lexicons for text mining in biomedicine. The emphasishere is on already existing not perfectly compatible resources and the specifictask of text mining.

All approaches above agree that natural language processing and subsequentlinguistic annotation of the terms used in labels are necessary. In order to ensureinteroperability and re-usability, we use standardized models. The Terminolog-ical Markup Framework (TMF), defined in ISO 16642, ensures the re-usabilityof terminological data across applications and the TermBase eXchange (TBX)

31

format of ISO 30042 represents a best practice for the practical exchange of ter-minology. In line with ISO 704, we take a concept oriented approach towardsterminology, defining terminology as concepts and their designations in a specificdomain. Consequently, a term is a verbal designation denoting a general conceptin a specific domain. The lemon model [4] we discuss below proposes a way toobtain the results of natural language processing and annotation in a modularRDF representation.

4 Modularization of Ontology Labels

We propose LabelNet, a model that modularizes each lexical, linguistic, and ter-minological function related to ontology labels, establishing a net of interlinkedterms with highly detailed information at each level. Term entries in a separateOWL-DL encoded TBX- and TMF-compliant terminology relate semantically tocorresponding ontology classes or other conceptual elements and represent theterminological information in detail. Each token5 of every term entry links toa lexical entry, i.e., to a lemma6, syntactic information, and possible additionalresources such as further ontologies. Fig. 1 exemplifies the structure of Label-Net and shows how each of its modules can be interlinked using SKOS. Theexample data has been taken from an ontology based on the Belgian NationalBank (BNB) taxonomy. Time concepts are linked to the W3C time ontology,e.g., “more than one year” is an interval.

The lexical entries are represented by using partially the lemon model [4],which is described in the next section. The semantics of the list of tokens con-tained in a term is established by referring to the ontology elements on the basisof the term ID in the terminological entries.

By separating the several layers into modules we achieve a more complete andhighly detailed perspective of ontology labels. The separation of lexical entriesand terms into lexicons and terminologies provides a higher degree of re-usability.In addition, it facilitates a number of computations over these labels, such asthe usage of a certain lemma in terms pointing to concepts/role IDs.

4.1 The lemon Model

lemon provides a model that can encode lexical information, using among othersRDF, URIs and linking mechanisms, so that language data can be exchangedfor example in the Linguistic Linked Open Data cloud7. The model aims ata strict separation of ‘world knowledge’ (describing domain objects that are

5 Tokens can be defined as all meaningful elements in a text that result from theprocess of tokenization, i.e., breaking up text into words, phrases, symbols or othermeaningful elements. The ordered collection box in Fig. 1 contains lists of tokens asthey appear in the terms used in the exemplified labels.

6 A lemma represents the canonical form of a set of words called lexemes. For example,accrue is the lemma of accrued, accruing, accrues, etc.

7 http://linguistics.okfn.org/resources/llod/

32

Fig. 1. Simplified example of LabelNet

referenced by lexical objects) from ‘word knowledge’ (describing lexical objects).It is itself modular, having a core component that can be supplemented with a setof modules to be used, extended, or ignored as required as illustrated in Fig. 2.For example, a morpho-syntactic module can be attached to the core, specifyingspecific values for words used in the term, such as gender (feminine, masculine,neuter), number (singular, plural, dual) and case (nominative, accusative, etc).As this model in essence enables the creation of a lexicon for a given ontology, itis called an ontology lexicon model. lemon as such does not provide an explicitterminological level and refers directly from the lexical entry (in lemon a lexicalentry represents the whole content of a label) to an ontology element. In contrast,LabelNet stresses the need and the practicability of a terminological level, were-use only the non-referential part of the lemon model.

4.2 Lexicon Module

While lemon offers a highly interesting perspective, we think that there are stillsome shortcomings, or possible improvements. A first case is the fact that lemonsupports tokenization of terms included in labels, but not the establishment ofthe relation between a token represented as a standalone lexical informationand the terms in which it can occur. Consequently, we propose an extensionthat allows for a single lemma to include the information that it is part of aterm, in the position specified by the tokenization process. Thereby, the word“Verbindlichkeiten” (German for amounts payable or liabilities), for example,

33

Fig. 2. Simplified representation of the lemon model [4]. The link between lexical andontology information is established by the reference link

will be linked to a (possibly) substantial number of terms used in various domains(see Fig. 1). In doing so, we can generate a new kind of WordNet, taking intoaccount the inclusion of relevant words in a category of terms. Adopting theidea of lemon, we model only lexical and linguistic information in this separatemodule, linking to semantic values on the basis of the term ID, which itself linksto an ontology element.

As a matter of fact, lemon entries allow only one semantic reference. Thelemon model represents the content of labels of one ontology at a time. Butfrequently one and the same term is used in different (even related) ontolo-gies/taxonomies. In this case, two or more lemon entries would be required,leading to redundant lexical/linguistic information only differing in the entrypoint to elements of different ontologies. One entry pointing to many ontologiesrepresents a more efficient approach. This would also ease generalization overthe semantics of such terms.

In case different terms are used in concepts of different ontologies, but askos:exactMatch can be established between these concepts, lemon does notprovide the means to express the lexical semantic relationship between theseterms. As a result, SKOS has to be used as a linking means between thoseconcepts, thereby indirectly establishing the lexical semantic relationship, suchas synonymy, between different terms.

Apart from linking different entries or elements of individual modules, certainconstraints need to be reflected. For example, in German and English only the

34

plural of ”Verbindlichkeit/liability” might be used within the context of financialreporting. One possibility in lemon would be to select only terms in which theword ”Verbindlichkeit” appears in its plural form ”Verbindlichkeiten”. Anotherpossibility, which has our preference, would be to associate a feature structurewith the lemma we have extracted from the tokens of the ontology labels, inwhich additional linguistic information can be encoded. Keeping thus the basiclexicon small, i.e., containing mainly lemmas, and using well-defined featurestructures as labels for the edges going from one lemma to a more complex termcontaining the lemma. We suggest having the constraints expressed in SKOS,linking between a lemma and a term (see Fig. 1):

lemma:Verdbindlichkeit -> [plural, feminine, nominative case] -> t1(T3)

The above line expresses that only the plural and nominative form of “Verbind-lichkeit”, which is feminine, can be used in combination with a term (at leastthe term “T3”’) related to a business reporting ontology.

4.3 Terminology in OWL-DL

Terminologies as such consist of terms denominating concepts, their definitionsand concept relations. In case of SKOS, these elements are utilized towards build-ing controlled vocabularies, whereas the TermBase eXchange (TBX) format ofISO TC 37 can be described as discourse-oriented terminology [8]. In controlledvocabularies, terms have to be classified as preferred, synonyms being mapped topreferred terms for retrieval purposes. In case of the discourse-related resources,many synonyms are permitted and the attribute “preferred” can be assignedfor a prescriptive usage. Wright et al. [8] state that terminologies always relateto special language, “designating multiple preferred terms subject to multiplepragmatic constraints”. Thus, the former differs from the latter in that it repre-sents varying conceptual information and semantics with a focus on informationretrieval, whereas discourse-oriented terminological resources are more adequatefor the purpose at hand.

In our model the terminology is supposed to be reusable for other taskssuch as translation, ontology population, ontology building, ontology evolutionto name but a few. Instead of using status attributes such as preferred, alterna-tive, and hidden, TBX allows for the use of subset information such as project,application, customer to clarify the difference between synonyms [8].

Terminologies provide greater multiplicity than only rdfs:labels. Termsand natural language information acquired for and within the process of ontologybuilding are often lost in the final representation due to a required univocity ofeach label. Constructing a net of ontology labels and their synonyms acquiredin the building of ontologies and extraction of information results in a domain-specific, formalized, and reusable resource for ontologies.

Another reason for transferring natural language information from the ontol-ogy to terminologies can be found in its ability to represent conceptual relations

35

different from ontological relations and thus, enhance the representation of in-formation with linguistic details. For example, a financial reporting ontologyclassifies liquid assets as sibling of key balance sheet figures, the latter of whichbeing the parent to assets. In contrast, hypernymic relations in the terminologysee assets as top node, whereas liquid assets is one of its children.

TBX is an XML-encoded markup language for the interchange of terminolog-ical information. Due to reasons of cardinality and variation its transformationto RDF, i.e., SKOS, turns out to be difficult as described in detail in [8]. Insteadof mapping TBX to RDF a member of the OWL family of languages is more ad-equate to the task. The cardinality of OWL-Lite, however, is restricted to 0 and1, which in case of many term entries might constitute a problem to be solvedwith OWL-DL and its ability to allow arbitrary values for cardinality. All coreelements of the terminology are children of the top node owl:datcat to signifythat all subclasses are data categories and interlinked by means of propertiessuch as unionOf and owl:equivalentClass. A detailed description of render-ing TBX in OWL-DL would go beyond the scope of this paper, a representationof terminology in OWL-DL is to be found in [6].

4.4 Step by Step to Modularized Ontologies

Our architectural decisions and selections have been described above, but thespecification of the process of obtaining each resource and achieving modular-ization has yet to be detailed. The main input to building the initial ontologyis financial information, such as annual reports of companies, reporting stan-dards (e.g. IFRS, GAAP, XBRL, etc.), stock exchange websites. We extractdetails from the named sources and build an initial ontology. Furthermore, theextracted information represents the input for the terminology, where all syn-onyms are depicted. On the basis of the ontology and the terminology, the lexiconis established. So at the core of the following steps lies the formalization of theextracted knowledge in a domain ontology representing our input.

1. Extract labels/terms and linguistic analysis of terms (tokenization, lemma-tization, morphological analysis, tagging, parsing, etc).

2. Extract all lemmas, create or map to an existing lemma in a (multilingual)lexicon to collect all lemmas that are used in all possible labels of all possibleontologies.

3. Encode lemmas in lemon. Add a data structure on top of each lemma, whichlists all the tokens in all labels in which the lemma is reproduced. Thislinking also reflects the morpho-syntactic features of the token according toits analysis.

4. Record all morpho-syntactic and lexico-syntactic information and patternsin the corresponding addition to the linguistic module.

5. All identical labels are stored as a unique element in a terminology container.Specify term entries as to their conceptual relations and establish properdefinitions or adapt definitions existing in the ontology.

36

6. Each lemon represented term is associated with a data structure, i.e., ter-minology, that points to a variety of ontology elements in which those termshave been introduced.

7. Eliminate all the labels and other linguistic information from the ontology,flattening class entries to domain specific details.

As a result, we have two interlinked ontologies of lemmas and terms as usedin ontologies/taxonomies. Thereby, we obtain a subset of language data, whichis used in domain ontologies. This can be used in order to analyze textual docu-ments and to annotate them semantically, populate ontologies, or support trans-lations with semantics to name but a few. On the other hand, we have a meansfor testing ontology mapping or merging.

5 Linking all Modules

The main linking device between ontologies is SKOS, such as the linking betweenthe financial reporting ontology and the time ontology in the example providedin Fig. 1. Especially with multilingual ontologies the individual concepts andtheir matching by means of SKOS is important. Oftentimes, the pivotal role ofEnglish as a source language leads to translations of labels instead of properlocalizations. In case of financial reporting standards it is indispensable to takelocal legal and political regulations affecting the standard into consideration,as the Belgian reporting standard in French might differ substantially from thereporting standard used in France, especially in the use and interpretation ofapplied French terms.

By conceptualizing the knowledge in each language individually, the ontologyis actually created in each language and not simply translated. Thereby, we are inthe position of linking for example the English concept pfs_AmountsPayableMoreOneYear to the corresponding Italian concept itcc-ci_DebitiEsigibiliOltreEsercizioSuccessivo by employing skos:exactMatch, which implicitly linksthe term “Debiti Esigibili Oltre l’Esercizio Successivo” to the English term. Forexisting monolingual ontologies this proposal might serve as a method for merg-ing several monolingual ontologies by establishing links.

The domain ontology represents the starting point for the linking, containingthe initial SKOS links to the terminology, as the terminology might be treatedas ontology represented in OWL-DL. From the terminology references to thelexicon holding all individual lemmas can be established. At the same time theterminology represents the interface to lexico- and morpho-syntactic patterns aswell as syntactical information as such and all tokens, the result from the processof tokenization.

One part of the linking process is the representation of lexico- and morpho-syntactic patterns and information to support the evolution and extension ofexisting domain ontologies. Thereby, the construction of new labels is largelyfacilitated on the basis of the structure of existing labels.

Syntactic information is represented by combining tokens and dependencyinformation of individual terms. Basically, syntactic categories are determined

37

on the basis of part of speech tagging and phrasal categories are used for syntacticlabels. For example N-NP = (length=1, token[1]=N, head=token[1]) representsthe term “Verbindlichkeiten”, which has the syntactic category “Noun” and thephrasal category “Noun Phrase” with a length of one and token1. For the purposeof standardization, these categories are mainly taken from the ISOcat database8.

Especially for information extraction in combination with ontology evolu-tion the representation of lexico-syntactic patterns is essential, such as lexico-syntactic ontology design patterns9 and the famous Hearst patterns. One ex-ample for their use is the recognition of relations among entities during infor-mation extraction. The following sentence has been taken from the Interna-tional Financial Reporting Standard (IFRS): “The statement of financial posi-tion (sometimes called the balance sheet) includes an entity’s assets, liabilitiesand equity as of the end of the reporting period”10. The lexico-syntactic equiv-alence <NP class> call in passive <NP class> relation between “statement offinancial position” and “balance sheet” enables us to realize that both termspoint to the same ontology concept as synonyms, however, including a descrip-tion of their difference in the definition of the terminology. The Hearst pattern[NP0] [VBG include] [NP1] [NP2]... indicates that “assets, liabilities andequity” can be modeled as subClassOf “statement of financial position”.

6 Conclusion and Future Directions

Modular and encapsulated domain, linguistic, and lexical functions for knowl-edge modeling enable the support of several IS-related as well as Natural Lan-guage Processing (NLP)-driven tasks. Each modularized resource, i.e., ontology,terminology, or lexical information, can either be used as part of the interlinkedmodel we presented or as individual resource for other purposes. One aspect forfurther improvement certainly is the linking device between the modules, whichcould be optimized towards an enhanced interoperability with other systems andamong the resources themselves.

Acknowledgements. The DFKI part of this work has been supported by theMonnet project (Multilingual ONtologies for NETworked knowledge), co-fundedby the European Commission with Grant No. 248458, and by the TrendMinerproject, co-funded by the European Commission with Grant No. 287863.

References

1. Aggarwal, N., Wunner, T., Arcan, M., Buitelaar, P., O’Riain, S.: A Similarity Mea-sure based on Semantic, Terminological and Linguistic Information. In: Shvaiko, P.,Euzenat, J., Heath, T., Quix, C., Mao, M., Cruz, I.F. (eds.) Proceedings of the 6thInternational Workshop on Ontology Matching. Bonn, Germany (2011)

8 http://www.isocat.org/9 http://ontologydesignpatterns.org/wiki/Submissions:LexicoSyntacticODPs

10 http://www.ifrs.org/Home.htm

38

2. Declerck, T., Lendvai P.: Towards a standardized linguistic annotation of the tex-tual content of labels in knowledge representation systems. In: Proceedings of theSeventh International Conference on Language Resources and Evaluation (LREC’10), pp.3836–3839, ELRA, Valetta, Malta (2010)

3. Roche, C., Calberg-Challot, M., Damas, L., Rouard, P.: Ontoterminology: A newparadigm for terminology. In: Dietz, J.L.G. (ed.) International Conference on Knowl-edge Engineering and Ontology Development. pp. 321–326, Funchal - Madeira, Por-tugal (2009)

4. McCrae, J., Spohr, D., Cimiano, P.: Linking Lexical Resources and Ontologies onthe Semantic Web with Lemon. In: The Semantic Web: Research and Applications.Volume 6643 LNCS, pp. 245-259. Springer, Berlin, Germany (2011)

5. Aussenac-Gilles, N., Szulman, S., Despres, S.: The Terminae Method and Platformfor Ontology Engineering from Texts. In: Proceedings of the 2008 conference onOntology Learning and Population: Bridging the Gap between Text and Knowledge.IOS Press, pp. 199–223, (2008)

6. Reymonet A., Thomas, J., Aussenac-Gilles, N.: Modelling ontological and termi-nological resources in OWL-DL. In:Buitelaar, P., Choi, K.S., Gangemi, A., Huang,C.R (eds) OntoLex 2007, ISWC Workshop. Busan, South-Korea (2007)

7. Bodenreider, O.: Lexical, terminological and ontological resources for biologicaltext mining. In: Ananiadou, S., McNaught, J. (eds) Text mining for biology andbiomedicine, p. 43-66, Artech House, London, England (2006)

8. Wright, S. E., Summers, D.: Crosswalking from Terminology to Terminology: Lever-aging Semantic Information across Communities of Practice. In: Witt, A., Sasaki, F.,Teich, E., Calzolari, N., Wittenburg, P. (eds) Uses and usage of language resource-related standards, LREC, Marrakech, Morocco (2008)

9. Reymonet, A., Thomas, J., Aussenac-Gilles, N.: Ontology based informationretrieval: an application to automotive diagnosis. In: Nyberg, M., Frisk, E.mKrisander, M., Aslund, J. (eds) International Workshop on Principles of Diagno-sis, pp.9-14, Stockholm, Sweden (2009)

10. Buitelaar, P., Cimiano, P. Haase, P., Sintek, M.: Towards linguistically groundedontologies. In: Aroyo, L., Traverso, P., Ciravegna, F., Cimiano, P., Heath, T.,Hyvonen, E., Mizoguchi, R., Oren, E., Sabou, M., Bontas Simpler, E.P. (eds) ESWC2009. pp. 111–125, Springer Berlin/Heidelberg, Heraklion, Crete, Greece (2009)

39

Syntactic vs. Semantic Locality:How Good Is a Cheap Approximation?

Chiara Del Vescovo1, Pavel Klinov2, Bijan Parsia1,Uli Sattler1, Thomas Schneider3, and Dmitry Tsarkov1

1 University of Manchester, UK{delvescc,bparsia,sattler,tsarkov}@cs.man.ac.uk

2 University of Ulm, [email protected]

3 Universität Bremen, [email protected]

Abstract Extracting a subset of a given OWL ontology that capturesall the ontology’s knowledge about a specified set of terms is a well-understood task. This task can be based, for instance, on locality-basedmodules (LBMs). These come in two flavours, syntactic and semantic,and a syntactic LBM is known to contain the corresponding semanticLBM. For syntactic LBMs, polynomial extraction algorithms are known,implemented in the OWL API, and being used. In contrast, extractingsemantic LBMs involves reasoning, which is intractable for OWL 2 DL,and these algorithms had not been implemented yet for expressive onto-logy languages.We present the first implementation of semantic LBMs and report onexperiments that compare them with syntactic LBMs extracted fromreal-life ontologies. Our study reveals whether semantic LBMs are worththe additional extraction effort, compared with syntactic LBMs.

1 Introduction

Extracting a subset of a given OWL ontology that captures all the ontology’sknowledge about a specified set of concept and role names is an interesting taskfor various applications, and it is by now well-understood [2,10,11]. In general,we consider a setting where, for a given signature, we want to determine a (small)subset of a given ontology such that any axiom over the signature entailed bythe ontology is also entailed by the subset. For expressive logics, this task canbe implemented by making use of the notion of locality, and results in what isknown as locality-based modules (LBMs) [2]. Locality comes in many differentflavours, in particular there are notions of syntactic and semantic locality. Asyntactic LBM is known to contain the corresponding semantic LBM, but mightalso contain extra axioms which are, because they are not in the semantic LBM,superfluous for entailments over the given signature. Algorithms for the extrac-tion of syntactic LBMs are known that run in time that is polynomial in the sizeof the ontology (thus much cheaper than reasoning), implemented in the OWL

40

API, and being used. In contrast, despite the fact that algorithms for extractingsemantic LBMs are known, until now and to the best of our knowledge, they hadnot yet been implemented. Moreover, these involve entailment checking, and arethus intractable for expressive profiles of OWL 2.

We present the first implementation of semantic LBMs and report on exper-iments that compare them with syntactic LBMs extracted from real-life onto-logies. The contributions of this paper are as follows: we show with statisticalsignificance that, for almost all members of a large corpus of existing ontologies,there is no difference between any syntactic LBM and its corresponding semanticLBM. In the few cases where differences occur, these differences are modest andnot worth the increased computation time needed to compute semantic LBMs.In addition, we isolate two types of axioms that lead to differences, where oneis a simple tautology that can, in principle, be detected by a straightforwardaddition to the syntactic locality checker. Furthermore, our results show thatthe extraction of semantic LBMs, which is in principle hard, seems feasible inpractice. The lesson we learn from these results is that “Cheap is Great”!

2 Preliminaries

We assume the reader to be familiar with OWL and the underlying descriptionlogic SROIQ [1,8], and will define the central notions around locality-basedmodularity [2].

Let NC be a set of concept names, and NR a set of role names. A signatureΣ is a set of terms, i.e., a set Σ ⊆ NC ∪ NR of concept and role names. We canthink of a signature as specifying a topic of interest. Axioms that only use termsfrom Σ can be thought of as “on-topic”, and all other axioms as “off-topic”. Forinstance, if Σ = {Animal,Duck,Grass, eats}, then Duck v ∃eats.Grass is on-topic,while Duck v Bird is off-topic.

Any concept, role, or axiom that uses only terms fromΣ is called aΣ-concept,Σ-role, or Σ-axiom. Given any such object X, we call the set of terms in X thesignature of X and denote it with X.

Given an interpretation I, we denote its restriction to the terms in a signatureΣ with I|Σ . Two interpretations I and J are said to coincide on a signature Σ,in symbols I|Σ = J |Σ , if ∆I = ∆J and XI = XJ for all X ∈ Σ.

There are a number of variants of the notion of conservative extensions, whichcapture the desired preservation of knowledge to different degrees. We focus onthe deductive variant.

Definition 1. LetM⊆ O be SROIQ-ontologies and Σ a signature.

(1) O is a deductive Σ-conservative extension (Σ-dCE ) ofM if, for all SROIQ-axioms α with α ⊆ Σ, it holds thatM |= α if and only if O |= α.

(2) M is a dCE-based module for Σ of O if O is a Σ-dCE ofM.

Unfortunately, deciding in general if a set of axioms is a module in this senseis hard or even impossible for expressive DLs [6,12], and finding a minimal one

41

is even more so. However, “good sized” modules that are efficiently computablehave been introduced [2]. They are based on the locality of single axioms, whichmeans that, given Σ, the axiom can always be satisfied independently of theinterpretation of the Σ-terms, but in a restricted way: by interpreting all non-Σterms either as the empty set (∅-locality) or as the full domain4 (∆-locality).

Definition 2. A SROIQ-axiom α is called ∅-local (∆-local) w.r.t. signature Σif, for each interpretation I, there exists an interpretation J such that I|Σ =J |Σ , J |= α, and for each X ∈ α \ Σ, XJ = ∅ (for each C ∈ α \ Σ, CJ = ∆and for each R ∈ α \Σ, RJ = ∆×∆).

It has been shown in [2] thatM⊆ O and all axioms in O \M being ∅-local(or all axioms being ∆-local) w.r.t. Σ ∪ M is sufficient for O to be a Σ-dCE ofM. The converse does not hold: e.g., the axiom A ≡ B is neither ∅- nor ∆-localw.r.t. {A}, but the ontology {A ≡ B} is an {A}-dCE of the empty ontology.

Furthermore, locality can be tested using available DL-reasoners [2], whichmakes this problem considerably easier than testing conservativity. However,reasoning in expressive DLs is still complex, e.g. N2ExpTime-complete forSROIQ [9]. In order to achieve tractable module extraction, a syntactic ap-proximation of locality has been introduced in [2]. The following definition cap-tures only the case of SHQ-TBoxes and can straightforwardly be extended toSROIQ ontologies.

Definition 3. An axiom α is called syntactically ⊥-local (>-local) w.r.t. signa-ture Σ if it is of the form C⊥ v C, C v C>, C⊥ ≡ C⊥, C> ≡ C>, R⊥ v R(R v R>), or Trans(R⊥) (Trans(R>)), where C is an arbitrary concept, R is anarbitrary role name, R⊥ /∈ Σ (R> /∈ Σ), and C⊥ and C> are from Bot(Σ) andTop(Σ) as defined in Part (a) (resp. (b)) of the table below.

(a) ⊥-Locality Let A⊥, R⊥ /∈ Σ, C⊥ ∈ Bot(Σ), C>(i) ∈ Top(Σ), n ∈ N \ {0}

Bot(Σ) ::= A⊥ | ⊥ | ¬C> | C u C⊥ | C⊥ u C | ∃R.C⊥ | >n R.C⊥ | ∃R⊥.C | >n R⊥.C

Top(Σ) ::= > | ¬C⊥ | C>1 u C>

2 | >0R.C

(b) >-Locality Let A>, R> /∈ Σ, C⊥ ∈ Bot(Σ), C>(i) ∈ Top(Σ), n ∈ N \ {0}

Bot(Σ) ::= ⊥ | ¬C> | C u C⊥ | C⊥ u C | ∃R.C⊥ | >n R.C⊥

Top(Σ) ::= A> | > | ¬C⊥ | C>1 u C>

2 | ∃R>.C> | >n R>.C> | >0R.C

It has been shown in [2] that ⊥-locality (>-locality) of an axiom α w.r.t.Σ implies ∅-locality (∆-locality) of α w.r.t. Σ. Therefore, all axioms in O \Mbeing ⊥-local (or all axioms being >-local) w.r.t. Σ ∪ M is sufficient for O tobe a Σ-dCE ofM. The converse does not hold; examples can be found in [2].

For each of the four locality notions, modules of O are obtained by startingwith an empty set of axioms and subsequently adding axioms from O that are Σ-non-local. In order for this procedure to be correct, the signature against which4 Or, in the case of roles, the set of all pairs of domain elements.

42

locality is checked has to be extended with the terms in the axioms that areadded in each step, so that the resulting moduleM consists of all the non-localaxioms with respect to Σ ∪ M. Definition 4 (1) introduces locality-based mod-ules, which are always dCE-based modules [2], although not necessarily minimalones. Modules based on syntactic (semantic) locality can be made smaller byiteratively nesting >- and ⊥-extraction (∆- and ∅-extraction), and the resultis still a dCE-based module [2,13]. These so-called >⊥∗-modules (∆∅∗-modules)are introduced in Definition 4 (3).

Definition 4. Let x ∈ {∅, ∆,⊥,>}, yz ∈ {>⊥, ∆∅}, O an ontology and Σ asignature.

(1) An ontology M is the x-module of O w.r.t. Σ if it is the output of Al-gorithm 1. We writeM = x-mod(Σ,O).

(2) An ontologyM is the yz-module of O w.r.t. Σ, writtenM = yz-mod(Σ,O),ifM = y-mod(Σ, z-mod(Σ,O)).

(3) Let (Mi)i>0 be a sequence of ontologies such that M0 = O and Mi+1 =yz-mod(Σ,Mi) for every i > 0. For the smallest n > 0 withMn =Mn+1,we callMn the yz∗-module of O w.r.t. Σ, writtenM = yz∗-mod(Σ,O).

Algorithm 1 Extract a locality-based moduleInput: Ont. O, sig. Σ, x ∈ {∅,∆,⊥,>} Output: x-moduleM of O w.r.t. Σ

M ← ∅; O′ ← Orepeat

changed ← falsefor all α ∈ O′ do

if α not x-local w.r.t. Σ ∪ M thenM←M∪ {α}; O′ ← O′ \ {α}; changed ← true

until changed = falsereturnM

As for (1), it has been shown in [2] that the outputM of Algorithm 1 doesnot depend on the order in which the axioms α are selected.5 Furthermore,the integer n in (3) exists because the sequence (Mi)i>0 is decreasing (moreprecisely, we have M0 ⊃ · · · ⊃ Mn = Mn+1 = . . . ). Due to monotonicityproperties of locality-based modules, the dual notions of ⊥>∗- and ∅∆∗-modulesare uninteresting because they coincide with those of >⊥∗- and ∆∅∗-modules.

Roughly speaking, a ∆- or >-module for Σ gives a view from above becauseit contains all subclasses of class names in Σ, while a ∅- or ⊥-module for Σ givesa view from below since it contains all superconcepts of concept names in Σ.

Modulo the locality check, Algorithm 1 runs in time cubic in |O| + |Σ| [2].Modules based on ⊥/>-locality are therefore a feasible approximation for mod-ules based on ∅/∆-locality. In both cases, modules are extracted axiom by axiom

5 Our algorithm is a special case of the one in [2, Figure 4].

43

but, as said above, the ∅/∆-locality check is more complex. A module extractoris implemented in the OWL API6 and SSWAP7. To summarize:

1. Given an ontology O, the semantic moduleMsemΣ for a signature Σ is con-

tained in the corresponding syntactic moduleMsynΣ for the same seed signa-

ture.8 This means that in principle more unnecessary axioms for preservingentailments over Σ can end up in syntactic modules rather than in semanticmodules.

2. The extraction of a syntactic module can be done in polynomial time w.r.t.the size of the ontology O. In contrast, the extraction of a semantic moduleis as hard as reasoning.

3 Experimental design

The main aim of this paper is to investigate how well syntactic locality approx-imates semantic locality. In particular, we want to see how (un)likely it is thatsyntactic locality-based modules are larger than semantic locality-based onesand how large these differences are. We also want to understand empirically howmuch more costly semantic locality is in terms of performance.

Selection of the Corpus. For our experiments, we have built a corpus containing:(1) from the TONES repository,9 those ontologies that have already been studiedin a previous work on modularity [4]: Koala, Mereology, University, People, mini-Tambis, OWL-S, Tambis, Galen; (2) all ontologies from the NCBO BioPortalontology repository.10

We then filter out all those the ontologies for which at least one of the fol-lowing problems occurs: the ontology is impossible to download; the .owl fileis corrupted when downloaded; the file is not parseable; the ontology is incon-sistent. Furthermore, due to time constraints, we exclude from this preliminaryinvestigation all ontologies whose size exceeds 10, 000 axioms.

This selection results in a corpus of 156 ontologies, which greatly differ insize and expressivity [7], as summarized in Table 3. For a full list of the corpus,please refer to the technical report: http://arxiv.org/abs/1207.1641

Repository Range of expressivity Range #axs. Range sig. sizeBioPortal ALCN -SHIN (D)/SOIN (D) 38–4,735 21–3,161TONES AL-SROIF(D)/SHOIQ(D) 13–9,629 14–9,221

Table 1. Ontology corpus

6 http://owlapi.sourceforge.net7 http://sswap.info8 Recall that ⊥-syntactic modules approximate ∅-semantic modules, while >-syntacticmodules approximate ∆-semantic modules.

9 http://owl.cs.manchester.ac.uk/repository/10 http://bioportal.bioontology.org

44

Comparing Syntactic and Semantic Locality. In order to compare syntactic andsemantic locality, we want to understand:

1. whether, for a given seed signature Σ, the semantic Σ-module is likely to besmaller than the syntactic Σ-module, and if so by how much,11

2. how feasible the extraction of semantic modules is.

Here, we focus on the two corresponding notions of ∅-semantic locality and⊥-syntactic locality. In particular, ⊥-syntactic locality has been throughly in-vestigated in previous work [3], and it has proven to have many interestingproperties. A completion of the investigation described in this paper for all fun-damental notions of modules is planned in our future work.

Due to the recursive nature of the locality-based module extraction algorithm,we want to investigate locality both on a

– per-axiom basis: given an axiom α and a signature Σ, is it likely that α issemantically ∅-local w.r.t. Σ but not syntactically ⊥- local w.r.t. Σ?

– per-module basis: given a signature Σ, is it likely that ⊥-mod(Σ,O) 6=∅-mod(Σ,O)? If yes, is it likely that the difference is large?

Hence we need to pick, for each ontology in our corpus, a suitable set of sig-natures, and this poses a significant problem. First, we do not yet have enoughinsight into what typical seed signatures are for module extraction. One couldassume that large ones are rarely relevant for module extraction—why botherwith extracting a large module—but this still leaves a large, i.e., exponentialspace of possible seed signatures. If m = #O, there are 2m possible seed signa-tures for which axioms can be tested for locality and for which modules can beextracted. Hence a full investigation is infeasible.

One could assume that the comparison between semantic and syntactic mod-ules could be easier since many signatures can lead to the same module. In otherwords, the statistically significant number of modules w.r.t. the total numberof modules is not larger than that of seed signatures needed w.r.t. the totalnumber of seed signatures. In previous work [4,5], however, modules have beenstudied with respect to how numerous they are in real-world ontologies. Theexperiments carried out suggest that the number of modules in ontologies is, ingeneral, exponential w.r.t. the size of the ontology. Moreover, the extraction ofenough different modules can be hard, because by looking just at seed signaturesthere is no chance to avoid the extraction of the same module many times. Inparticular, for a module M there can be exponentially many seed signaturesw.r.t. #M that generateM [3].

As a consequence, we compare the two kinds of locality of axioms—bothon a per-axiom basis and a per-module basis—w.r.t. random signatures. Toavoid any bias, we select a random signature as follows: we set each namedentity E in the ontology to have probability p = 1/2 of being included in thesignature. Thus each seed signature has the same probability to be chosen. Forontologies whose signature exceeds 9 entities, in order to get results where the11 Recall that the semantic Σ-module is always a subset of the syntactic Σ-module.

45

true proportion of differences between the two notions of locality lies in theconfidence interval (±5%) with confidence level 95%, we have to select only 400random signatures [14]. That is, we need to test only 400 random signatures tohave a confidence of 95% (±5%) that the differences/equalities we observe reflectthe real ones.

Non-random seed signatures. Amodule, in general, does not necessarily show anyinternal coherence: intuitively, if we had an ontology describing some knowledgefrom both the domains of Geology and of Philosophy, we could still extract themodule for the signature Σ = {Epistemology, Mineral}. This module is likelyto be the union of the two disjoint modules for Σ1 = {Epistemology} andΣ2 = {Mineral}. This combinatorial behaviour can lead to exponentially manymodules in the size of the signature of the ontology and indeed, as mentionedabove, the number of modules in ontologies seems to be exponential [4,5].

In contrast to general modules, genuine modules can be called coherent: theyare defined as those modules that cannot be decomposed into the union of twodifferent modules. Notably, there are only linearly many genuine modules in thesize of the ontology O, and the set of genuine modules is a base for all generalmodules: any module is either genuine or the union of genuine modules. Thelinear bound on the number of genuine modules is due to the fact that, for eachgenuine x-moduleM, there is an axiom α such thatM = x-mod(α,O).

Thus genuine modules can be said to be interesting modules that we canfully investigate. Hence in addition to the above mentioned investigation of ⊥-and ∅-modules for random signatures, we also look at all axiom signatures.

In summary, we test:

(T1) for random seed signatures Σ,(a) for each axiom α in our corpus, is α semantically ∅-local w.r.t. Σ but

not syntactically ⊥- local w.r.t. Σ?(b) is ⊥-mod(Σ,O) 6= ∅-mod(Σ,O)? If yes, we determine the difference and

its size.(T2) for each axiom signature from our corpus, is ⊥-mod(α,O) 6= ∅-mod(α,O)?

If yes, we determine the difference and its size.

4 Experimental comparison

No differences. The main result of the experiment is that, for 151 of the 156ontologies we tested, no difference between ⊥- and ∅-locality can be observed.These 151 ontologies exclude the two NCBO BioPortal ontologies EFO (Ex-perimental Factor Ontology) and SWO (Software Ontology), as well as Koala,miniTambis, and Tambis. More specifically, for every generated seed signature,the corresponding ⊥- and ∅-module agree, and every axiom is either ⊥- and∅-local, or neither. This statement applies to all randomly generated seed sig-natures as well as for all axiom signatures – which are seed signatures for allgenuine modules. We can therefore draw the following conclusions for the 151ontologies with respect to (T1) and (T2) above.

46

(T1) Given an arbitrary seed signature Σ, there is no difference (a) between⊥- and ∅-locality of any given axiom w.r.t. Σ and (b) between the ⊥- and∅-modules for Σ, both times at a significance level of 0.05.

(T2) Given any axiom signature Σ, there is no difference between the ⊥- and∅-modules for Σ.

In the case of the 151 ontologies, the extraction of a ∅-module (with tautologytests performed by FaCT++) often took considerably longer than the extractionof the corresponding ⊥-module. For example, for MoleculeRole, the largest ofthe 151 ontologies, times to extract a ⊥-module (test all axioms for ⊥-locality,respectively) ranged between 27 and 169ms (21 and 77ms, respectively), whilethe extraction of a ∅-module (test of all axioms for ∅-locality, resp.) took upto 6× as long, on average 2.7× (2.0×, resp.). It is also worth noting that theontologies Galen and People, which are renowned for having particularly large⊥-modules [2,5], are among those without differences between ⊥- and ∅-locality.

Differences. For the five ontologies where differences between ⊥- and ∅-modules(or -locality) occur, we isolated two types of culprits – axioms which are not⊥-local w.r.t. some signature Σ, but which are ∅-local w.r.t. Σ. Type-1 culpritsare simple tautologies that have accidentally entered the “inferred view” – i.e.,closure under certain entailments – of two ontologies. They do not occur in theoriginal “asserted” versions and can, in principle, be detected by a slightly refinedsyntactic locality check. Type-2 culprits are definitions of concept names via aconjunction that satisfies certain conditions explained below. There are not manytype-1 and type-2 axioms in the affected ontologies, and the observed differencesare comparably small. Table 2 gives an overview of the differences observed.

Type-1 culprits are axioms InverseObjectProperties(P, InverseOf(P)),where P is a role. This translates into the tautology P ≡ (P−)− in DL nota-tion. Such an axiom is therefore ∅-local w.r.t. any signature. However, it behavesdifferently for ⊥-locality: if the signature Σ contains P, then both sides of theequation are neither in Bot(Σ) nor in Top(Σ), hence the axiom is considerednon-local; otherwise, both sides are ⊥-equivalent, hence the axiom is local.

Type-1 axioms occur in the “inferred view” of the ontologies EFO and SWO.Table 2 shows the relatively modest differences caused by these axioms. In allcases, there are no other axioms in the differences. This means that no differencesoccur for the non-inferred original versions of EFO and SWO.

Type-2 culprits are complex definitions A ≡ C of a concept name A whereC is a disjunction that contains both a universal and an existential (or min-imum cardinality) restriction on the same role. This affects the ontologies Koala,miniTambis, and Tambis. The effect is best illustrated for Koala, which containsexactly one such axiom, namely M ≡ S u ∀c.F u ∀g.{m} u =3 c.>, where wehave abbreviated the concept names MaleStudentWith3Daughters, Student,Female, the roles hasChildren, hasGender, and the nominal male. Now if thesignature against which the axiom is tested for locality contains {S, c, g} but

47

Ontology #axs #differences difference time culpritsizes ratio type and#axs rel. avg. frequency

SWO 3446 T1 a 400 6–22 0–1% 3.31 1 (30×)T1 b 400 23–29 1–2% 5.11T2 3446 3–1 1–5% 5.86

EFO 6008 T1 a 400 8–24 0–1% 1.42 1 (32×)T1 b 400 13–30 0–1% 1.38T2 128 1–4 9–17% —

Koala 42 T1 a 0 0 0% — 2 (1×)T1 b 2 1 3% —T2 0 0 0% —

miniTambis 170 T1 a 68 1–2 1–3% — 2 (3×)T1 b 93 1–4 1–3% —T2 26 1–7 6–75% —

Tambis 592 T1 a 58 1–3 0–1% 3.31 2 (11×)T1 b 229 2–11 0–2% 5.01T2 191 4–41 2–26% —

Table 2. Overview table of differences observed. The columns show: the ontology name;the overall number of axioms; the name of the test (see list on Page 7); the number ofcases with differences; the number of axioms in the differences (absolute and relativeto the ⊥-case); the average time ratio ∅ : ⊥ (“—” indicates that no reliable statement ispossible: the time for ⊥ is only a few, often 0, milliseconds); the type of culprit presentand the number of axioms of this type.

neither M nor F, then this axiom is not ⊥-local because none of the conjuncts onthe right-hand side is in Bot(Σ). On the other hand, this axiom is a tautologywhen M and F are replaced by ⊥: the conjunction ∀c.⊥u=3 c.> cannot have anyinstances, regardless of how c is interpreted.

For Koala, this effect only causes two singleton differences between sets oflocal axioms for the randomly generated seed signatures, as shown in Table 2.For axiom signatures, there is no difference. Interestingly, this effect does notpropagate to modules: for all signatures, ⊥- and ∅-modules are the same. Thereason might be that (a) g is used in many axioms and is thus very likely tocontribute to the extended signature during module extraction, and (b) then theaxiom defining F is no longer local, which “pulls” F into the extended signature,preventing the observed effect.

In miniTambis and Tambis, this effect is much stronger and affects a largeproportion of modules, as shown in Table 2. The differences in these cases donot only consist of culprit axioms, but also of axioms that become non-localafter the signature has been extended by the terms in the culprit axioms. Still,the size of the differences is mostly modest while, for Tambis, the ∅-locality test(∅-module extraction) takes on average over three times (five times) as long asthe ⊥-locality test (⊥-module extraction).

48

5 Conclusion and Outlook

Summary. We obtain two main observations from the experiments carried out.

– In practice, there is no or little difference between semantic and syntacticlocality. That is, the computationally cheaper syntactic locality is a goodapproximation of semantic locality.

– Though in principle hard to compute, semantic modules can be extractedrather fast in practice.

These results suggest that it is questionable to conclude that semantic localityshould be preferred to syntactic locality. In terms of computation time, there isoften a benefit in using syntactic locality: the average speed-up compared to theextraction of a semantic-locality based module is by a factor of up to 6. Forsome particular module pairs, it is higher by an order of magnitude. The gainin module size is zero or so small that it is hard to justify the extra time spent.In particular, there is no gain in size for the ontologies Galen and People, whichare “renowned” for having disproportionately large modules [2,5].

Our results are interesting not only because they provide an evaluation ofhow good the cheap syntactic locality approximates semantic locality, but alsobecause they enabled us to fix bugs in the implementation of syntactic modular-ity. For example, earlier data from the experiment have shown that reflexivityaxioms had been treated incorrectly by the syntactic locality checker.

Future Work. It is evident that this work is preliminary. It investigates onlythe differences between the related notions of ⊥- and ∅-locality. We plan to ex-tend the same study to other notions of locality, in particular, nested modules(>⊥∗- vs. ∆∅∗-modules) – these notions are the most economical in terms ofmodule size. Moreover, we want to extend the investigation to the remaininglarger ontologies in the BioPortal repository and further large ontologies, e.g.,some versions of the NCI Thesaurus12. Preliminary results with a version thatis not among the regular releases show differences due to type-2 culprits, but wehave not included them here because the differences disappear after removingaxioms that were introduced due a problem with object and annotation proper-ties when the ontology file is parsed by the OWL API. This behaviour is yet tobe investigated and explained.

Another interesting extension is to modify the seed signature sampling. Cur-rently, the random variable “size of the seed signature generated” follows thebinomial distribution with expected value m/2 and variance m/4. Hence, mostsignatures in the sample have size aroundm/2; small and large signatures are un-derrepresented. For example, for one ontology with 915 terms, all signature sizeslay between 422 and 509. One might argue that, for big ontologies, the typicalmodule extraction scenario does not require large seed signatures – but it doessometimes require relatively small seed signatures, for example, when a moduleis extracted to efficiently answer a given entailment query of typically small size.12 Downloadable from http://evs.nci.nih.gov/ftp1/NCI_Thesaurus

49

On the other hand, large modules resulting from larger seed signatures may bemore likely to differ. We therefore plan an alternative seed signature samplingvia bins for average signature sizes: repeat the current sampling procedure scaledto several subintervals of the range of possible signature sizes.

Our current results answer the question whether there is a significant differ-ence between the two locality notions with respect to a given signature. It is alsointeresting to ask the same question relative to a given module. To answer it, thesampling of modules instead of seed signatures requires further investigation.

Acknowledgment. We thank Rafael Gonçalves and the anonymous reviewers forhelpful comments.

References

1. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.F. (eds.):The Description Logic Handbook: Theory, Implementation, and Applications.Cambridge University Press (2003)

2. Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: Modular reuse of ontolo-gies: Theory and practice. J. of Artif. Intell. Research 31, 273–318 (2008)

3. Del Vescovo, C., Gessler, D., Klinov, P., Parsia, B., Sattler, U., Schneider, T.,Winget, A.: Decomposition and Modular Structure of BioPortal Ontologies. In:Proc. ISWC-11 (2011)

4. Del Vescovo, C., Parsia, B., Sattler, U., Schneider, T.: The modular structure ofan ontology: an empirical study. In: Proc. of WoMO-10. Frontiers in AI and Appl.,vol. 211, pp. 11–24. IOS Press (2010)

5. Del Vescovo, C., Parsia, B., Sattler, U., Schneider, T.: The modular structure ofan ontology: atomic decomposition and module count. In: Proc. of WoMO-11.Frontiers in AI and Appl., vol. 230, pp. 25–39. IOS Press (2011)

6. Ghilardi, S., Lutz, C., Wolter, F.: Did I damage my ontology? A case for conser-vative extensions in description logics. In: Proc. of KR-06. pp. 187–197 (2006)

7. Horridge, M., Parsia, B., Sattler, U.: The state of bio-medical ontologies. In: Proc.of 2011 ISMB Bio-Ontologies SIG (2011)

8. Horrocks, I., Kutz, O., Sattler, U.: The even more irresistible SROIQ. In: Proc.of KR-06. pp. 57–67 (2006)

9. Kazakov, Y.: RIQ and SROIQ are harder than SHOIQ. In: Proc. of KR-08. pp.274–284 (2008)

10. Konev, B., Lutz, C., Walther, D., Wolter, F.: Semantic modularity and moduleextraction in description logics. In: Proc. of ECAI-08. Frontiers in AI and Appl.,vol. 178, pp. 55–59. IOS Press (2008)

11. Kontchakov, R., Wolter, F., Zakharyaschev, M.: Logic-based ontology compar-ison and module extraction, with an application to DL-Lite. Artificial Intelligence174(15), 1093–1141 (2010)

12. Lutz, C., Walther, D., Wolter, F.: Conservative extensions in expressive descriptionlogics. In: Proc. of IJCAI-07. pp. 453–458 (2007)

13. Sattler, U., Schneider, T., Zakharyaschev, M.: Which kind of module should Iextract? In: Proc. of DL 2009. ceur-ws.org, vol. 477 (2009)

14. Smithson, M.: Confidence Intervals. Quantitative Applications in the Social Sci-ences, Sage Publications (2003)

50

LoLa: A Modular Ontology of Logics,Languages, and Translations

Christoph Lange1, Till Mossakowski2,3, and Oliver Kutz2

1 School of Computer Science, University of Birmingham2 Research Centre on Spatial Cognition, University of Bremen

3 DFKI GmbH Bremen

Abstract. The Distributed Ontology Language (DOL), currently beingstandardised within the OntoIOp (Ontology Integration and Interoper-ability) activity of ISO/TC 37/SC 3, aims at providing a unified frame-work for (1) ontologies formalised in heterogeneous logics, (2) modularontologies, (3) links between ontologies, and (4) annotation of ontologies.This paper focuses on the LoLa ontology, which formally describes DOL’svocabulary for logics, ontology languages (and their serialisations), aswell as logic translations. Interestingly, to adequately formalise the logi-cal relationships between these notions, LoLa itself needs to be axioma-tised heterogeneously—a task for which we choose DOL. Namely, we usethe logic RDF for ABox assertions, OWL for basic axiomatisations ofvarious modules concerning logics, languages, and translations, FOL forcapturing certain closure rules that are not expressible in OWL4, and cir-cumscription for minimising the extension of concepts describing defaulttranslations.

1 The Distributed Ontology Language (DOL) – OverviewAn ontology in the Distributed Ontology Language (DOL) consists of modulesformalised in basic ontology languages, such as OWL (based on description logic)or Common Logic (based on first-order logic with some second-order features).These modules are serialised in the existing syntaxes of these languages in orderto facilitate reuse of existing ontologies. DOL adds a meta-level on top, whichallows for expressing heterogeneous ontologies and links between ontologies.5Such links include (heterogeneous) imports and alignments, conservative exten-sions (important for the study of ontology modules), and theory interpretations(important for reusing proofs). Thus, DOL gives ontology interoperability a for-mal grounding and makes heterogeneous ontologies and services based on themamenable to automated verification.

DOL is standardised within the OntoIOp (Ontology Integration and Interop-erability) activity of ISO/TC 37/SC 36. The international working group com-prises around 50 experts (around 15 active contributors so far), representing4 For the sake of tool availability it is still helpful not to map everything to FOL.5 The languages that we call “basic” ontology languages here are usually limited toone logic and do not provide meta-theoretical constructs.

6 TC = technical committee, SC = subcommittee

51

a large number of communities in ontological research and application, suchas different (1) ontology languages and logics (e.g. Common Logic and OWL),(2) conceptual and theoretical foundations (e.g. model theory, proof theory),(3) technical foundations (e.g. ontology engineering methodologies and linkedopen data), and (4) application areas (e.g. manufacturing, bio-medicine, etc.).For details and earlier publications, see the project page at http://ontoiop.org.

The OntoIOp/DOL standard is currently in its final working draft stage andwill be submitted as a committee draft (the first formal standardisation stage)in September 2012.7 The final international standard ISO 17347 is scheduled for2015. The standard specifies syntax, semantics, and conformance criteria:

Syntax: abstract syntax of distributed ontologies and their parts; three concretesyntaxes: a text-oriented one for humans, XML and RDF for exchange amongtools and services, where RDF particularly addresses exchange on the Web.Here, we will use the DOL text syntax in listings but also explain the RDFvocabulary; for further details on the DOL syntaxes, see [6].

Semantics: (1) a direct set-theoretical semantics for the core of the language,extended by an institutional and category-theoretic semantics for advancedfeatures such as ontology combinations (technically co-limits), where basicontologies keep their original semantics; (2) a translational semantics, em-ploying the semantics of the expressive Common Logic ontology language forall basic ontologies, taking advantage of the fact that for all basic ontologylanguages known so far translations to Common Logic have been specifiedor are known to exist8; (3) finally, there is the option of providing a collapsedsemantics, where the semantics of the meta-theoretical language level pro-vided by DOL (logically heterogeneous ontologies and links between them)is not just specified in semiformal mathematical textbook style, but oncemore formalised in Common Logic, thus in principle allowing for machineverification of meta properties. For details about the semantics, see [9].

Conformance criteria provide for DOL’s extensibility to other basic ontologylanguages than those considered so far, including future ones. (1) A basic on-tology language conforms with DOL if its underlying logic has a set-theoreticor, for the advanced features, an institutional semantics. Similar criteria ap-ply to translations between languages. (2) A concrete syntax (serialisation)of a basic ontology language conforms if it supports IRIs (Unicode-awareWeb-scalable identifiers) for symbols and satisfies further well-formednesscriteria. (3) A document conforms if it is well-formed w.r.t. one of the DOLconcrete syntaxes, which particularly requires explicitly mentioning all logicsand translations employed. (4) An application essentially conforms if it is ca-pable of processing conforming documents, and providing logical informationthat is implied by the formal semantics.

7 The standard draft itself is not publicly available, but ISO/TC 37 has passed aresolution to make the final standard document open, as has been done with therelated Common Logic standard [3].

8 Even for higher-order logics this works, in principle, by using combinators.

52

2 A Graph of Logic Translations

CL

HOL

Prop

SROIQ(OWL 2 DL)

FOL=

FOLms=

OBOOWL

EL++(OWL 2 EL)

DL-LiteR(OWL 2 QL)

DL-RL(OWL 2 RL)

DDLOWL

ECoOWL

ECoFOL F-logic

bRDF

RDF

RDFS

OWL-Full

Rel-S

subinstitution

theoroidal subinstitution

simultaneously exact and model-expansive comorphisms

model-expansive comorphisms

grey: no fixed expressivity

green: decidable ontology languages

yellow: semi-decidable

orange: some second-order constructs

red: full second-order logic

OBO 1.4

CASL

UML-CD

CL-

Fig. 1. The logic translation graph for the DOL-conforming languages

Fig. 1 is a revised and extended version of the graph of logics and translationsintroduced in [8]. New nodes include UML class diagrams, OWL-Full (i.e. OWLwith an RDF semantics instead of description logic semantics), and CommonLogic without second-order features (CL−). We have defined the translationsbetween all of these logics in earlier publications [9, 8]. The definitions of theDOL-conformance of some central standard ontology languages and translationsamong them will be given as annexes to the standard, whereas the majority willbe maintained in an open registry (cf. Sec. 3). Sec. 4 provides a more fine-grainedview on translations (and projections).

3 A Registry for Ontology Languages and Mappings

The OntoIOp standard is not limited to a fixed set of ontology languages. Itwill be possible to use any (future) ontology language, logic, serialisation, ormapping (translation or projection) with DOL, once its conformance with thecriteria specified in the standard has been established. This led to the idea ofsetting up a registry to which the community can contribute descriptions of anysuch languages, logics, serialisations, or mappings. In the current, early phaseof the standardisation process, we are maintaining this registry manually. Withthe release of the final international standard, everyone will be able to makecontributions, which an editorial board will review and approve or reject. Fig. 2

53

Common Logic

SROIQDL-LiteR

CLIF

XCL

Manchester Syntax

OWL 2 XML

RDF / XML

Turtle

OWL 2 DL

RDF

RDFS

Common Logic

RDFS

RDF

OWL 2 QL

OWL 2 RL

OWL 2 EL

DL-RL

EL++

Serializations Ontology Languages Logics

supports serialization sublanguage of

induced translation exact logical expressivity

translatable to

sublogic of

Fig. 2. Subset of the OntoIOp registry, shown as an RDF graph

shows a subset of the OntoIOp registry: a subgraph of Fig. 1 in the “logic”column, as well as related ontology languages and serialisations. Note that therelation between ontology languages and logics generally is not bijective: e.g.first-order logic is supported by various languages such as Common Logic, TPTPand CASL.

Any entry of the registry shall be identified by an IRI, so that DOL ontologiescan refer to it. At these IRIs, when treated as URLs, there shall be a machine-readable description of the resource according to the linked data principles (cf.[5]), so that, e.g., any agent given a basic ontology can find out the languagesthis ontology can be translated into (cf. Sec. 6 for an example), or that forany language translation, its definition as well as implementations are available.The most widely supported format for linked data is RDF; we have realised theRDF vocabulary for the OntoIOp registry as a subset of the vocabulary used forserialising DOL ontologies as RDF.9

Starting with a plain RDFS vocabulary, we soon realised that we could de-liver added value to tools supporting DOL by encoding additional informationabout the semantics of, e.g., translations into the vocabulary using some OWLconstructs, and eventually arrived at a richer formalisation that goes beyond

9 RDF only allows for describing, not for fully formally defining logics and translations.To that end, we are planning to alternatively offer full formalisations in the richerOMDoc language from the same IRIs.

54

OWL: the LoLa ontology. To realise the benefit of a machine-comprehensiblerepresentation of this semantics in a rich ontology language, consider DOL’sunderstanding of an ontology language translation: Unless a direct translationon the language level has been specified (e.g. from Common Logic to CASL),one can translate an ontology from a language La to La′ if the expressivity ofthese languages is exactly captured by two logics Lo and Lo′, and Lo can (possi-bly transitively) be translated to Lo′. In a plain RDF graph, this would requiremultiple lookups.

4 Architecture of the LoLa Ontology

LoLa, the ontology of logics and languages, is implemented as a heterogeneousontology in DOL, consisting of the following modules:

– An OWL core provides classes and properties for the basic concepts, includ-ing a basic axiomatisation.

– We use additional FOL axioms for closure rules not expressible in OWL,such as non-expressible role compositions.

– We use circumscription [7, 1] for minimising the extension of default trans-lations.

The OntoIOp registry, is implemented as an RDF dataset, acting as theABox of the LoLa ontology. The OntoIOp registry is available through a collec-tion of linked data IRIs in the paths http://purl.net/dol/{logics,languages,

serializations,translations}, e.g. http://purl.net/dol/logics/SROIQ for thelogic SROIQ. We made it originally available in RDF/XML, the most widelysupported linked data format, but other syntaxes can be provided as well. Itcan be browsed with frontends like uriburner; try, e.g., http://linkeddata.

uriburner.com/about/html/http/purl.net/dol/logics/SROIQ.

The OWL core of the LoLa ontology comprises classes for ontology languages,logics, mappings (translations or projections) between ontology languages andbetween logics, as well as serialisations. The LoLa properties relate all of theformer classes to each other, as shown in Fig. 2, e.g. an ontology language tothe serialisations that it supports, or to the logic that exactly formalises itsexpressivity, or an ontology language mapping to the logic mapping it has beenderived from.

Fig. 3 shows the top-level classes of LoLa’s OWL module, axiomatising logics,languages, and mappings to the extent possible in OWL. Concerning meta-levelclasses (that is, classes for describing the graph of languages and logics), Fig. 2already has illustrated the interplay of ontology languages, logics and serialisa-tions.

Object-level classes (that is, classes providing the vocabulary for expressingdistributed ontologies) comprise ontologies, their constituents (namely entities,such as classes and object properties, and sentences, such as class subsumptions),as well as links between ontologies.

55

Fig. 3. Top-level classes in the OWL ontology

Mappings are modelled by a hierarchy of properties corresponding to the dif-ferent types of edges in Fig. 1. For example, object properties such as translatableTomodel the existence of a translation between two languages. mappableToLanguagemodels the fact that a language can be mapped to another one.

However, this only allows for covering the default translations between logics.E.g., we can express that the default translation from SROIQ to F-logic is amodel-expansive comorphism. Besides further alternative translations that thecommunity may contribute, there is, however, also another translation, which canbe obtained from our graph, by composing the SROIQ →FOL= and FOL=→F-logic translations, resulting in a subinstitution. For expressing such alternatives,LoLa additionally reifies mappings into classes, whose hierarchy corresponds tothat of the mapping properties.

Fig. 4 shows the inferred class hierarchy below the class Mapping, as computedwithin protégé. Notice that our ontology produces several cases of multipleinheritance. Mappings are split along the following dichotomies:

– logic mapping versus ontology language mapping, cf. Fig. 2.– translation versus projection: a translation embeds or encodes an ontology

into another one, while a projection is a forgetful operation (e.g. the projec-tion from first-order logic to propositional logic forgets predicates with aritygreater than zero). Technically, the distinction is that between institutioncomorphisms and morphisms [4].

– plain mapping versus simple theoroidal mapping [4]: while a plain mappingneeds to map signatures to signatures, a simple theoroidal mapping mapssignatures to theories. The latter therefore allows for using “infrastructureaxioms”: e.g. when mapping OWL to Common Logic, it is convenient to relyon a first-order axiomatisation of a transitivity predicate for roles etc.

Moreover, we have a class DefaultMapping for mappings that are assumed auto-matically as default when no mapping is given in a certain context.

Other classes concern the accuracy of the mapping, see [8] for details. Theseclasses have mainly been introduced for the classification of logic mappings; how-ever, via the correspondence between logics (mappings) and ontology languages(mappings), they apply to ontology languages as well. Sublogics are the mostaccurate mappings: they are just syntactic subsets. Embeddings come close tosublogics, like injective functions come close to subsets. If the model translationis surjective (“model expansion”) or even bijective, the mapping is faithful inthe sense that logical consequence is preserved and reflected, that is, inference

56

systems and engines for the target logic can be reused for the source logic (alongthe mapping). (Weak) exactness is a technical property that guarantees thisfaithfulness even in the presences of ontology structuring operations [2].

The full OWL ontology is available at http://purl.net/dol/1.0/rdf#; it serves,as said above, simultaneously as an RDF vocabulary for the linked datasetthat constitutes the OntoIOp registry, and for serialising DOL ontologies inRDF—therefore the “rdf” name.

Fig. 4. The part of the OWL ontology concerning mappings

5 Putting It Together in DOL

DOL allows us to put together the pieces collected so far. First, we specify thatthe RDF registry conforms with the OWL ontology. This is achieved by projectingthe registry from RDF to OWL10, and stating an interpretation of theories ofthis into the OWL ontology.

We use circumscription [7, 1] for minimising the extent of the class Default-Translation and thus implementing a closed world assumption. This featurehas been integrated into DOL in a logic independent way: in OWL, it has theeffect that classes and object properties are minimised, while in first-order logic,extensions of predicates are.

10 Basically, this projection turns the RDF graph into an OWL ABox. Impredicativityis escaped from by splitting names that are used in several roles into several distinctnames.

57

Furthermore, we use first-order logic to formulate logical axioms that exceedthe expressiveness of OWL. We here use the Common Logic Interchange For-mat (CLIF) [3]. One such axiom states that supported logics propagate alonglanguage translatability; see the ontology LoLaRules below.

%prefix( : <http://purl.net/dol/>

dol: <http://purl.net/dol/1.0/rdf#>

log: <http://purl.net/dol/logics/>

ser: <http://purl.net/dol/serializations/>

trans: <http://purl.net/dol/translations/> )%

distributed-ontology LoLa

%% projecting the RDF ABox to OWL

ontology ABox = registry hide along RDF2OWL end

%% TBox

ontology TBox = dol: end

%% the RDF registry conforms with the OWL ontology

interpretation conformant : ABox to TBox end

%% integrating RDF ABox with OWL TBox while minimising default mappings

logic log:OWL syntax ser:OWL/Manchester

ontology MinimizedABox =

ABox and TBox

minimize DefaultMapping %% circumscription-like minimisation

end

%% first-order rules for infering new facts in the registry

logic log:CommonLogic syntax ser:CommonLogic/CLIF

ontology LoLaRules =

(forall (subLa superLa lo)

(if (and (dol:translatableTo subLa superLa)

(dol:mappableToLanguage subLa superLa)

(dol:supportsLogic subLa lo))

(dol:supportsLogic superLa lo)))

...

end

%% combining OWL ontology with first-order rules

logic log:CommonLogic syntax ser:CommonLogic/CLIF

ontology LoLa =

dol: translate with OWL2CommonLogic

and

LoLaRules

end

58

6 Using LoLa to Query the OntoIOp Registry

DOL-conforming applications can explore and query the OntoIOp registry tofind out useful information about the logics and languages of concrete givenontologies, or about logics and languages in general.

The following query in the SPARQL RDF query language, e.g., returns alllanguages a given ontology is translatable to:

PREFIX dol: <http://purl.net/dol/1.0/rdf#>

SELECT DISTINCT ?target-language WHERE {

# first determine, by querying the ontology itself, its language

<http://ontohub.org/ontologies/my-ontology>

dol:language ?theLanguageOfTheGivenOntology .

# find out everything the language is translatable to

?theLanguageOfTheGivenOntology

dol:translatableTo ?targetLanguage ;

# just to be sure: We are only interested in mappings to languages.

dol:mappableToLanguage ?targetLanguage .

# (The use of two properties is owed to the orthogonal design of LoLa.)

}

This query assumes that both the information about the ontology and aboutthe OntoIOp registry are available in RDF and ready to be queried as SPARQL.At the moment this cannot be taken for granted; however, we are working onOntohub, an ontology repository engine, which we will, at the same time, alsouse to host the OntoIOp registry instead of the current static file deployment[6].

Aiming at wide tool support, the linked data graph that we deploy has allinferences of the LoLa ontology applied; this means in particular that, from atranslation between two logics, it is inferred that the corresponding ontologylanguages are translatable into each other, and that the transitive closure ofthe translation graph has been computed. Therefore, the query shown aboveoperates on a plain RDF graph, and the query engine does not have to havefurther inferencing support built in.

The following query focuses exclusively on the OntoIOp registry. It answersa frequent question in knowledge engineering: Which logic is the right one forformalising my conceptual model? For the sake of this example, we focus onknowledge representability and thus assume that a logic is suitable if it hastranslations from and to many other logics. This ignores questions of availabilityof reasoners for the respective logics, of tools performing the translations, and oftheir performance. Such information is not yet available in the OntoIOp registryitself, but could be compiled in a separate linked dataset that the registry wouldlink to.

PREFIX dol: <http://purl.net/dol/1.0/rdf#>

SELECT ?logic, COUNT(?targetLogic) AS ?t, COUNT(?sourceLogic) AS ?s WHERE {

?logic a dol:Logic ;

dol:translatableTo ?targetLogic ;

dol:translatableFrom ?sourceLogic .

} ORDER BY ?t, ?s

59

7 Conclusion and Future Work

We have presented LoLa, an ontology of logics, languages, and mappings be-tween them. This ontology formalises the semantics not only of these aspects ofthe Distributed Ontology Language DOL, but also of the vocabulary employedin the OntoIOp registry for extending the DOL framework with further logics,languages and mappings. LoLa is a heterogeneous ontology consisting of a coreOWL module, which declares the vocabulary and provides a basic formalisation,a Common Logic module providing additional first-order rules; furthermore weemploy DOL’s logic-independent circumscription facility to minimise the exten-sion of default translations. Along with our plans to publish not only machine-comprehensible descriptions of logics and mappings, but full formalisations, wewill also expand LoLa to formalise further features of the DOL language, such asthe vocabulary that describes the accuracy of a mapping (cf. Sec. 4).

Acknowledgements

The development of DOL is supported by the German Research Foundation(DFG), Project I1-[OntoSpace] of the SFB/TR 8 “Spatial Cognition”; the firstauthor is additionally supported by EPSRC grant EP/J007498/1. The authorswould like to thank the OntoIOp working group within ISO/TC 37/SC 3 for theirinput, particularly Michael Grüninger for his advice regarding the hierarchy oftypes of ontology translations.

References1. Bonatti, Piero A., Carsten Lutz, and Frank Wolter, ‘The complexity of

circumscription in dls’, J. Artif. Intell. Res. (JAIR), 35 (2009), 717–773.2. Borzyszkowski, Tomasz, ‘Logical systems for structured specifications’, Theoret-

ical Computer Science, 286 (2002), 197–245.3. ‘Common Logic (CL): a framework for a family of logic-based languages’, Tech.

Rep. 24707, ISO/IEC, 2007.4. Goguen, Joseph, and Grigore Roşu, ‘Institution morphisms’, Formal aspects of

computing, 13 (2002), 274–307.5. Heath, Tom, and Christian Bizer, Linked Data: Evolving the Web into a Global

Data Space, 1 edn., Synthesis Lectures on the Semantic Web: Theory and Technol-ogy, Morgan & Claypool, San Rafael, CA, 2011.

6. Lange, Christoph, Till Mossakowski, Oliver Kutz, Christian Galinski,Michael Grüninger, and Daniel Couto Vale, ‘The Distributed Ontology Lan-guage (DOL): Use cases, syntax, and extensibility’, in Terminology and KnowledgeEngineering Conference (TKE), 2012.

7. McCarthy, John, ‘Circumscription - a form of non-monotonic reasoning’, Artif.Intell., 13 (1980), 1–2, 27–39.

8. Mossakowski, Till, and Oliver Kutz, ‘The Onto-Logical Translation Graph’, inOliver Kutz, and Thomas Schneider, (eds.), Modular Ontologies, IOS, 2011.

9. Mossakowski, Till, Oliver Kutz, and Christoph Lange, ‘Three semantics forthe core of the distributed ontology language’, in FOIS, 2012.

60

Proposal of New Approach for Ontology Modularization

Amir Souissi1, Walid Chainbi2 and Khaled Ghedira3

1 Ecole Nationale des Sciences de l’Informatique /SOIE - Manouba - Tunisia [email protected]

2 Sousse National School of Engineers /SOIE, Sousse 4054 - Tunisia [email protected]

3 Institut supérieur de gestion de Tunis /SOIE - Tunisia [email protected]

Ontologies have established themselves as a powerful tool to enable knowledge sharing, and a growing number of applications have benefited from the use of ontolo-gies as a means to achieve semantic interoperability among heterogeneous, distributed systems [1]. With the evolution of cooperative and distributed systems, and the emer-gence of the semantic Web, ontologies have become an indispensable resource. The number of ontologies available on the Web has also increased due to the appearance of several tools that assist users in creating their ontologies. This has posed problems of understanding and reuse of those resources already difficult to design. A solution was then proposed by the knowledge engineers namely modularization. Ontology modularization is crucial to support knowledge reuse on the ever increasing semantic Web [2]. However, modularization methods that serve the reuse goal are often in-tended for humans to assist them in building new ontologies, rather than for applica-tions that need only a relevant part of an existing ontology. Moreover, modules ob-tained are always subject to verification and maintenance by humans to validate the semantic consistency of their contents. Unlike previous studies, we investigate in this paper how a modularization based on semantic comparison, may provide a module directly reusable by the application that requests it. Our contribution is twofold. On the one hand, it allows an application to extract and use a module that covers a sub-domain from an ontology that covers a wider knowledge area, regardless of its struc-ture and the formalism with which it is expressed. On the other hand, the user is re-lieved from manually estimating the meaning of the components of the ontology, after the modularization process.

The modularization approach we propose is part of the decomposition approaches of monolithic ontologies [3,4]. It is an extraction method since it aims to extract a relevant ontology module. The method should allow the user to express its needs by entering the concepts which interest him. The result is a fragment composed of concepts and relations that are relevant to the module i.e., which are in strong semantic relationship with the concepts submitted by the user. We define a strong semantic relationship between two concepts, as one of the six logic functions as follows:

61

─ Identity relationship: it is a semantic relation between two concepts that have the same syntax, the same attributes and operations. Example: Identity (Person, Per-son).

─ Synonymy relationship: it is a semantic relation between two concepts that express the same meaning. Example: Synonymy (Person, Individual).

─ Classification Is-a relationship: two concepts where one is expressing a particular case of the other. Example: Is-a (Student, Person).

─ Homonymy relationship: the same concept can have two different meanings. Ex-ample: Homonymy (Bug, Bug). The first one means an insect. The second one means a fault in a computer system.

─ Equivalence relationship: a semantic relationship between two concepts that play the same role. Example: Equivalence (Teacher, Professor).

─ Antonymy relationship: is used between two concepts totally disjoint. Example: Antonymy (Registered, Visitor).

For example, in an ontology that describes the human anatomy, the user is only in-terested in the anatomy of the foot. The method should extract a coherent module, semantically rich on the foot, from the ontology of departure.

Our approach is based on two basic steps:

─ 1 step: Identifying concepts that are in strong semantic relationship with external concepts.

st

─ 2 step: composition of the module based on the concepts identified in Step 1. All concepts that appear in the definition of the concepts identified are considered part of the module.

nd

The goal is to allow a program to extract automatically a single part of an ontology without human intervention and without restrictions on the ontology structure. This will help programs to satisfy their requirements by reusing directly ontology portions.

References

1. Chainbi, W.: An Ontology Based Multi-Agent System Conceptual Model, In Special Issue of the International Journal of Computer Applications in Technology, Inderscience Pub-lishers, Vol. 31, Nos. 1/2, pp. 35-44, (2008)

2. Cuenca Grau,B., Horrocks, I., Kazakov, Y., Sattler, U.: Just the Right Amount: Extracting Modules from Ontologies. In International World Wide Web conference, (2007)

3. Seidenberg, J.: Web Ontology Segmentation: Extraction, Transformation, Evaluation. In Heiner Stuckenschmidt, Christine Parent, Stefano Spaccapietra (Eds.), Modular Ontologies concepts, Theories and Techniques for Knowledge Modularization, Springer-Verlag Ber-lin, Heidelberg, LNCS 5445, pages 211-243, (2009)

4. Stuckenschmidt, H., Klein, M.: Strutured-based Partitioning of large concept hierarchies. In Proceedings of the 3rd International Semantic Web Conference, (2004)

62

Distributed Reasoning with EDDLHQ+ SHIQ.

George A. Vouros1 and George M.Santipantakis2

1 Digital Systems, University of Piraeus, Greece [email protected] ICSD, University of the Aegean, Greece [email protected]

To deal with autonomous agents’ knowledge and subjective beliefs in open, het-erogeneous and inherently distributed settings, we need special formalisms thatcombine knowledge from multiple and potentially heterogeneous interconnectedcontexts. Each context contains a chunk of knowledge defining a logical theory,called ontology unit). While standard logics may be used, subjectiveness andheterogeneity issues have been tackled by knowledge representation formalismscalled contextual logics or modular ontology languages (e.g. [1] [2]).Nevertheless,in distributed and open settings we may expect that different ontology unitsshould be combined in many different, subtle ways without making any assump-tion about the disjointness of the domains covered by different units. To addressthis issue we need to increase the expressivity of the language used for definingcorrespondences.Towards this goal, we have been motivated to propose the rep-resentation framework EDDL

HQ+ SHIQ (or simply E − SHIQ).

The EDDLHQ+ SHIQ framework. Given a finite index set of units’ identifiers

I, each unit Mi consists of a TBox Ti, RBox Ri, and ABox Ai in the SHIQfragment of Description Logics[3].

Given an i ∈ I, let NCi , NRi and NOi be the sets of concept, role andindividual names respectively. For some R ∈ NRi

, Inv(R) denotes the inverserole of R and (NR ∪ {Inv(R)|R ∈ NR}) is the set of SHIQ-roles. The set ofSHIQ-concepts is the smallest set constructed by the constructors in SHIQ.Cardinality restrictions can be applied on R, given that R is a simple role. Aninterpretation Ii = 〈∆Ii

i , ·Ii〉 consists of a domain ∆Iii 6= ∅ and the interpretation

function ·Ii which maps every C ∈ NCito CIi ⊆ ∆Ii

i , every R ∈ NRito

RIi ⊆ ∆Iii × ∆Ii

i and each a ∈ NOi to an element aIi ∈ ∆Iii . Elements and

axioms in unit Mi are denoted by i : c. Each Tbox Ti contains generalizedconcept inclusion axioms, RBox Ri contains role inclusion axioms, and ABoxAi contains assertions for individuals and their relations [3].

Towards combining knowledge in different units, the proposed frameworkallows the connection of units via: (a) concept-to-concept subjective correspon-

dences [1] specified by onto-bridge rules i : Cw→ j : G, or into-bridge rules

i : Cv→ j : G, where i 6= j ∈ I. (b) Individual subjective correspondences

i : ai=7→ j : bj , where ai ∈ NOi and bj ∈ NOj . The above mentioned subjective

correspondences concern the point of view of Mj . (c) Link-properties [2](or ij-properties, i, j ∈ I ), which can be related via ij-property inclusion axioms, betransitive and, if they are simple, be restricted by qualitative restrictions. Thesets of ij -properties’ names, i.e. the sets εij , i, j ∈ I, are not necessarily pair-

63

wise disjoint, but disjoint with respect to NCi, and NOi

. A set of ij -propertiesconnecting concepts of Mi with concepts of Mj , is defined as the set Eij = εij ,i 6= j ∈ I, and in case i = j, it is the set Eij = εij ∪ {Inv(E)|E ∈ εji}, where εijis the set of (local to Mi) role names. ij -properties are being used for specifyingconcepts (so called i−concepts) in the Mi unit.

Transitive axioms are of the form Trans(E; (i, j)), where E ∈ Eij ∩ Eii, Eis transitive in Mi and transitive ij-property. Transitivity axioms and the finiteset of inclusion axioms for ij -properties form the ij -property box Rij (if i = j,Rii = Ri). The combined property box RBox R is a family of ij -propertyboxes. A combined TBox is a family of TBoxes T= {Ti}i∈I . A distributed ABoxA = {Ai}i∈I , includes a collection of individual correspondences, and propertyassertions of the form (a ·Eij · b), where Eij ∈ Eij . A distributed knowledge baseΣ is composed as Σ = 〈T,R,B,A〉, where B = {Bij}i 6=j∈I is the collectionof bridge rules between ontology units. Each Rij , is interpreted by a valuation

function ·Iij that maps every ij -property to a subset of ∆Iii × ∆

Ij

j . Let Iij =

〈∆Iii , ∆

Ij

j , ·Iij 〉, i, j ∈ I. It must be noted that, for a specific i ∈ I and a propertyE in the i-th unit, this property may be shared between different ij -propertyboxes (i.e. for different j’s). In this case, the denotation of E is

⋃j∈I E

Iij . A

domain relation rij , i 6= j from ∆Iii to ∆

Ij

j is a subset of ∆Iii × ∆

Ij

j , s.t. for

each d ∈ ∆Iii , rij(d) ⊆ {d′|d′ ∈ ∆

Ij

j }, and in case d′ ∈ rij(d1) and d′ ∈rij(d2), then d1 = d2. For a subset D of ∆Ii

i , rij(D) denotes ∪d∈Drij(d). Adomain relation represents only equalities, i.e. each d1 ∈ rij(d) is equal to theother individuals in rij(d). The distributed knowledge base is interpreted by aDistributed Interpretation, I s.t. I = 〈{Ii}i∈I , {Iij}i,j∈I , {rij}i 6=j∈I〉.

We have specified a sound and complete distributed Tableau algorithm thathas been implemented by extending the Pellet reasoner 3. The instance retrievalalgorithm for the framework has been presented in [4].

Acknowledgement: This research project is being supported by the project ”IRAK-LITOS II” of the O.P.E.L.L. 2007 - 2013 of the NSRF (2007 - 2013), co-funded by theEuropean Union and National Resources of Greece.

References

1. Borgida, A., Serafini, L.: Distributed description logics: Assimilating informationfrom peer sources. Journal of Data Semantics 1 (2003) 153–184

2. Parsia, B., Cuenca Grau, B.: Generalized link properties for expressive epsilon-connections of description logics. In: AAAI. (2005) 657–662

3. Baader, F.e.a., ed.: The Description Logic Handbook: Theory, Implementation, andApplications, Cambridge University Press (2003)

4. Santipantakis, G., Vouros, G.: Distributed Instance Retrieval in EDDLHQ+SHIQ Rep-

resentation Framework. In: Artificial Intelligence: Theories and Applications. Vol-ume 7297 of LNCS. Springer Berlin / Heidelberg (2012) 141–148

3 The full paper describing the framework and the tableau algorithm can be found inhttp://ai-lab-webserver.aegean.gr/gsant/ESHIQ.Report

64

Workshop on Modular Ontologies (WoMO) 2012 - KR-MED€¦ · Workshop on Modular Ontologies (WoMO)...

Documents

Transcript of Workshop on Modular Ontologies (WoMO) 2012 - KR-MED€¦ · Workshop on Modular Ontologies (WoMO)...