2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is...

59

Transcript of 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is...

Page 1: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

Interpolation Methods for Assertion Mining

in Hybrid Knowledge Bases

Stefan Schlobach

Department of Computer Science, King's College London,Strand, London WC2R 2LS, United Kingdom

Email: [email protected]

July 19, 2001

2. versioncomments very welcome.

Abstract

Assertion mining is introduced as a framework for knowledge acquisitionfrom assertive data in description logic based hybrid knowledge representationsystems. We propose a rigid theoretical formalism to deal with noisy databased on rough set theory and de�ne generalised decision concepts which coverexactly the secure, non-noisy data.

We extend the notion of logical interpolation for description logics and showhow so called ABox interpolants can be used for assertion mining. For thispurpose we present algorithms to calculate interpolants based on descriptionlogic tableau calculi. This allows for discernibility based knowledge acquisitioneven if it is not possible to �nd explicit logical representations for the knowledgeabout individuals (like a most speci�c concept).

We �nally present several possible assertion mining scenarios and discuss anumber of issues related to the practical realisation of the described methods.

Technical Report TR-01-05, Department of Computer Science, King's College London, July 2001.

Page 2: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

2 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

Contents

1 Introduction 4

2 Knowledge Representation with Description Logics 6

3 Assertion Mining 9

3.1 Learning Criteria: Generalised Decision Concepts . . . . . . . . . . . 10

3.1.1 Domain Properties and Discerning Properties. . . . . . . . . 10

3.1.2 Generalised Decision Concepts. . . . . . . . . . . . . . . . . . 11

3.2 Rough Set Theory for Assertion Mining . . . . . . . . . . . . . . . . 13

3.2.1 Rough and Noisy Data . . . . . . . . . . . . . . . . . . . . . . 13

3.2.2 Discernibility and Approximations . . . . . . . . . . . . . . . 15

3.3 Inductive Bias: Discerning Polarity and Common Vocabulary . . . . 17

4 Interpolation Methods 18

4.1 Concept Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2 ABox Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3 Partial ABox Interpolants . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Tableau Methods for Interpolation 23

5.1 Tableau Calculi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.2 Algorithms for Concept Interpolation . . . . . . . . . . . . . . . . . . 24

5.3 Algorithms for ABox Interpolation . . . . . . . . . . . . . . . . . . . 27

5.3.1 Partial ABox Interpolation . . . . . . . . . . . . . . . . . . . 28

5.3.2 ABox Interpolation . . . . . . . . . . . . . . . . . . . . . . . . 29

5.3.3 Most Speci�c Concepts and ABox Interpolation . . . . . . . . 31

6 Interpolation based Assertion Mining 32

6.1 Assertion Mining of Discernibility Concepts . . . . . . . . . . . . . . 32

6.1.1 Non-Incremental Assertion Mining . . . . . . . . . . . . . . . 33

6.1.2 Revision of Knowledge Bases with new Individuals. . . . . . . 34

6.2 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.3 Assertion Mining Scenarios . . . . . . . . . . . . . . . . . . . . . . . 36

7 Conclusion 39

A Rough Set Theory for Data Analysis 40

B Arrhythmia 46

C Proofs 48

Page 3: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

LIST OF EXAMPLES 3

List of Figures

1 Operators of ALC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 TBox Axioms and ABox Assertions . . . . . . . . . . . . . . . . . . . 73 Tableau Rules for ALC Concept Interpolation . . . . . . . . . . . . . 244 Closure Rule for ALC Concept Interpolation . . . . . . . . . . . . . 255 Preprocessing Rules for ALC partial and ABox Interpolation . . . . 276 ECG Intervals and Waves . . . . . . . . . . . . . . . . . . . . . . . . 477 A typical Arrhytmia Identi�cation Procedure . . . . . . . . . . . . . 48

List of Algorithms

5.1 concept LI(P;N): Concept Interpolation . . . . . . . . . . . . . . . 255.2 obj rel LI(a;B): a{related Concept Lyndon Interpolation . . . . . 265.3 partial LI(B;C; a; pol): Partial Interpolation with Polarity . . . . . 285.4 ABox LI(B; a; b): ABox Lyndon Interpolation . . . . . . . . . . . . . 305.5 propagate LI(B; a; b): Propagation for preprocessing complete ABoxes 316.1 disc aboxmine(A;D;Dom) Discernibility based Assertion Mining . 336.2 decision gdc(Di;A;D) GDCs for a Decision . . . . . . . . . . . . . 346.3 individual gdc(a;Di;A;D) GDCs for an Individual . . . . . . . . . 35A.1 RDM: Algorithms for Data Mining with Rough Sets. . . . . . . . . . . 46

List of Examples

2.1 A Knowledge Base �arr for Arrhythmia Diseases . . . . . . . . . . . 85.1 ABox Interpolation in the Absence of a Most Speci�c Concept . . . 32

Page 4: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

4 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

1 Introduction

Over the last decade the amount of unstructured information has grown signi�-cantly. Availability of such data does not automatically produce any useful infor-mation and knowledge acquisition (KA) and knowledge representation (KR) arejust two examples of e�orts to structure data for further applications. KA attemptsto automatically discover representations of element classes of data mostly usingrelatively simple propositional representation languages. KR investigates mecha-nisms to formally represent knowledge and to provide sound reasoning for theserepresentations using more expressive logical languages. For real life applicationsthe integration of both approaches becomes crucial and much work has been doneto extend traditional learning methods to expressive representations and vice versa.Inductive Logic Programming (ILP) exempli�es such an attempt to include auto-mated learning into fragments of �rst order representation systems.

Hybrid knowledge representation systems (hKRS) based on description logics(DL) extend the idea of semantic networks aiming for sound and complete algo-rithms to reason about conceptual knowledge and its relation to data. A numberof modern systems (DLP and FaCT [Patel-Schneider and Horrocks, 1999], RACER[Haarslev and M�oller, 2001] orWellington [Endriss, 2000]) provide optimised im-plementations to calculate concept hierarchies or realisations of data w.r.t to sucha hierarchy for very expressive modal languages.

In the recent past, learning and knowledge acquisition in description log-ics have attracted much research. [Kietz and Morik, 1994] and [Alvarez, 2000]present un-supervised learning in DL, whereas [Badea and Nienhuys-Cheng, 2000]and [Rouveirol and Ventos, 2000] investigate links to ILP. Least common sub-sumer (lcs) learning has been the most popular approach [Cohen and Hirsh, 1994,Baader and K�usters, 1998]. lcs-learner try to �nd minimal descriptions for com-mon properties of a set of individuals in the assertional component of a hKRS.In the description logic ALC this is trivially just the disjunction of the mostspeci�c concepts (msc). But unfortunately the msc does not exist in DLs withexistential quanti�cation and many papers discuss approximations of the knowl-edge about individuals in knowledge bases for languages with di�erent expressivity[K�usters and Molitor, 2000].

This paper focuses on a di�erent aspect of the knowledge acquisition process.As in lcs-learning we consider a supervised learning scenario: a knowledge basecontains a number of classi�ed individuals. For each of these classes (which we calldecisions) we try to �nd terminological de�nitions which can be used to classify newdata. But instead of searching for the smallest common subsumer, we search for thebiggest concepts which discern positive and negative examples. This discernibilityapproach, which is based on methods in rough set data analysis [Skowron, 1993],can be used independently of lcs-approaches as long as the domain of the decisionsis formally speci�ed. The big advantage in this case is that the existence of themost speci�c concept is not required.

We present a supervised learning method based on logical interpolation. Given

Page 5: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

1 Introduction 5

some classi�ed data (in form of assertions) and terminological background knowl-edge we calculate conceptual axioms. Each of these axioms is meant to provide anintensional de�nition for a class of examples. This process is called assertion min-

ing. To restrict the search space for possible hypotheses we de�ne a set of learningcriteria. Furthermore assertion mining is syntactically biased towards discerning vo-cabulary, i.e. elements of the language of the formal description of positive examplesthat explicitly discern from negative examples.

Consider the following scenario: A hybrid knowledge base �arr contains infor-mation about patients su�ering from cardiac arrhythmia. A patient record mightinclude general assertions about gender, age or habits and family information, e.g.

patient1: Male u :Smoker u :Old u 9has relative.Arrhythmia,

but also technical details about ECG measures such as

(patient1,pw):has pwave; pw::OK; patient2:8has pwave.OK,

terminological de�nitions

Tachycardic v Arrhythmia u :LowHeartRate,

and additional knowledge describing the patients' conditions as diagnosed by somemedical experts:

patient1:Tachycardic; patient2:Healthy.

Assertion mining is the search for terminological axioms which formally de�ne amedical condition and possible diagnostic criteria, e.g.

Sinus Bradycardic _= LowHeartRate u 8qrs cycle.Small u Smoker.

Such new knowledge can be extracted (i.e. mined) from the assertional informationabout already diagnosed patients in the knowledge base. If the quality of theseaxioms can be established (either automatically using statistical methods or morelikely by consulting medical experts) they might allow for previously un-diagnosedpatients to be classi�ed with respect to the studied diseases.

ECG data is well suited for a description logic based learning and representationapproach, because the data about derivations and waves is relational and due to theECG measuring method, there are numerous hidden interrelations between wavesrelated to di�erent derivations.1 Description logic based hKRSs seem to be goodcandidates to represent such structures.2 Appendix B contains a brief overview overECG-based arrhythmia diagnosis to motivate our interest.

1Each derivation records the same events from di�erent perspective angles.2To keep things simple we do not consider the in uence of terminologies on the mining process.

We also introduce assertion mining for ALC, which is too weak to represent knowledge aboutcardiac arrhythmia properly because it cannot deal with numerical values. It is however relativelysimple to extend the proposed methods to more complex languages likeALCR, which is an extensionof ALC with attributes and simple predicates (=;�;�) over real numbers.

Page 6: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

6 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

The advantage of our integrated approach is that the representation of bothinput and output is the same and that standard reasoning mechanisms of hybridknowledge representation systems can be used for preprocessing, assessment and�nally classi�cation of new patients. The integration of background knowledge isconceptually easy (although complexity increases signi�cantly) and of particularimportance for more complex mining domains.

This paper is organised in the following way: the main concepts of hybrid knowl-edge representation are introduced in Section 2. Assertion mining, as our mainlearning theory is presented in Section 3. Sections 3.1 and 3.3 present learning cri-teria and an inductive bias based on discernibility. In Section we introduce roughset theory as the theoretical foundation of our treatment of noisy data. Algorithmsfor logical interpolation are introduced in Section 4 and in Section 6 we sketch theknowledge acquisition procedure based on assertion mining in Algorithm 6.1. Wealso discuss some further practical issues and open questions. For a better un-derstanding we give a brief introduction into traditional rough set data analysisin Appendix A. Appendix C contains the proofs of the technical theorems andlemmas.

2 Knowledge Representation with Description Logics

For the reader who is unfamiliar with description logics we brie y introduce thebasic concepts for the particular logic ALC [Schmidt-Schauss and Smolka, 1991],which is in some sense canonical in the DL family. Let NC be a set of conceptnames and NR be the set of role names. Furthermore let C and D be ALC conceptsand R a role. ALC is inductively de�ned as the smallest language including NC

and being closed under the constructors conjunction (C uD), disjunction (C tD),negation (:C), quali�ed universal (8R:C) and existential quanti�cation (9R:C).We will use the usual abbreviations ? for Au:A and > for At:A for an arbitraryconcept A. Concept names will be called atoms, atoms or negated atoms literalsand literals, >, ? and quanti�ed formulas modal literals.

A Tarski style set theoretical semantics is de�ned as an interpretation I =(U; �I), where U is a universe and �I an interpretation function mapping conceptnames to subsets of U and role names to binary relations over U which is extendedto the di�erent language constructs as de�ned in Fig. 1.

Syntax Semantics

C uD CI \DI

C tD CI [DI

:C U n CI

9R:C fd 2 U j 9e 2 U : (d; e) 2 RI and e 2 CIg8R:C fd 2 U j 8e 2 U : (d; e) 2 RI ) e 2 CIg

Figure 1: Operators of ALC

Page 7: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

2 Knowledge Representation with Description Logics 7

Let NI be the set of individual names, and let a and b be in NI . A TBox T is a�nite set of terminological axioms of the form C _vD or C _=D. An ABox A is a �niteset of concept (a : C) and role ((a; b) : R) assertions. We say that an individual (orelement) a occurs in the ABox A (abbreviated: a 2 A) whenever it occurs in any ofthe assertions of A. The interpretation �I is extended to map individual names toelements of U . Due to this strict separation of conceptual knowledge and knowledgeabout individuals such a knowledge base � = (T ;A) is called hybrid.

Syntax Semantics

C _vD I j= C _vD i� CI � DI

C _=D I j= C _=D i� CI = DI

(a; b) : R I j= (a; b) : R i� (aI ; bI) 2 RI

a : C I j= a : C i� aI 2 CI

Figure 2: TBox Axioms and ABox Assertions

A model for an axiom or an assertion ' is an interpretation I such that I j= 'as de�ned in Fig. 2. An interpretation I for a knowledge base � = (T ;A) is amodel for � if I is a model for all the axioms in T and A.

On the basis of this semantics logical conclusions can be drawn from the repre-sented knowledge. These conclusions include consistency or inconsistency of theknowledge and relations between elements of the ABox and concepts (instancechecking) or between concepts and concepts (subsumption). The reasoning servicesthat will be used in this paper are formally de�ned as:

1. Concept consistency checking, written as � j= C 6= ?, the problem of checkingwhether there exists a model I of � such that CI 6= ?. If the TBox is empty,we sometimes write C 6= ?.

2. ABox consistency, written as � j= A 6= ?, the problem of checking whetherthere is a model for A with respect to �. If the TBox is empty, we sometimesabbreviate A 6= ?.

3. Subsumption, written as � j= C v D, where CI � DI for all models of �.We abbreviate � j= C = D if both � j= C v D and � j= D v C.

4. Concept subsumption, written as (?;?) j= C v D, is a special case of generalsubsumption. In this case we will also write just C v D.

5. Instance checking (d 2� C) where dI 2 CI for all models of �. If the TBoxis empty we will also write d 2A C.

6. Retrieval: (retrieve(C)) For a concept C �nd all the individuals in A suchthat a 2� C.

It is a the standard results that in a hKRS based on description logics withfull negation, the reasoning processes concept consistency, subsumption, concept

Page 8: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

8 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

Example 2.1 A Knowledge Base �arr for Arrhythmia Diseases

If we use the knowledge base �arr = (Tarr;Aarr) there are some simple but non-trivialexamples of the reasoning processes:Aarr = f patient1: Male u :Smoker u 8has relative.Tachycardic,

(patient1,patient1):has relative, patient1:8has pwave.OK(patient2,pw):has pwave, pw::OK, patient2:Arrhythmia g

Tarr = f Tachycardic _v Arrhythmia u :LowHeartRate,Hypertrophic _= Arrhythmia u 9has pwave.:OK g

1. Concept consistency: �arr j= Tachycardic u LowHeartRate=?.

2. ABox consistency: �arr j= A 6= ? but �arr j= A[fpatient1::Arrhythmiag=?.

3. Subsumption:

�arr j= 8has relative.Tachycardicv 8has relative.LowHeartRate.

4. Concept Subsumption: 8has pwave.OK u 9has pwave.> v 9has pwave.OK

5. Instance checking: patient1 2�arrArrhythmia

6. Retrieval: retrieve(Arrhythmia)= fpatient1,patient2g

wherepatient1, patient2 2 NI ; has relative, has pwave2 NR; Male, Smoker, OK,Arrhythmia, Tachycardic, LowHeartRate, Hypertrophic2 NC .

consumption and instance checking can be reduced to ABox consistency using thefollowing equivalences:

� j= C 6= ? i� � j= fa : Cg 6= ? for any a 2 A

� j= C v D i� � j= fa : C u :Dg = ? for any a 2 A

C v D i� (?;?) j= fa : C u :Dg = ? for any a 2 A

d 2� C i� � j= A[ fa : :Cg = ?

Furthermore Nebel [Nebel, 1990] has shown that the ABox does not play a rolewhen checking for concept subsumption and consistency:

(T ;A) j= (C v D) i� (T ;?) j= (C v D)

(T ;A) j= C 6= ? i� (T ;?) j= C 6= ?

We will make constant use of these properties without mentioning.Assertive knowledge about individuals of a domain is usually incomplete which

is re ected in the so called Open World Assumption (OWA) underlying most of thehybrid knowledge representation systems. Under such an assumption we cannotassume that a missing assertion a : C 62 A implies a 2� :C.

Concepts can be ordered according to the subsumption hierarchy which de-�nes a partial order. A concept C is minimal in a set S of concepts if andonly if � j= C v D for all D 2 S, it is optimal, if there exists no concept

Page 9: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

3 Assertion Mining 9

D such that � j= C v D. The most speci�c concept msc(a) for an ABox in-stance a is minimal in the set of concepts which instantiate a. For ALC{ABoxesmsc(a) does not necessarily exist for all a 2 A. This is easy to see consideringpatient1 in �arr as de�ned in example 2.1 which is an instance of all conceptswith an arbitrary number of existential quanti�er over the has relative relation:patient1 2� 9has relative.9has relative.. . . .Arrhythmia.

A TBox is called unfoldable if and only if all the axioms are non cyclic, if theconcepts on the left hand side of the axioms are atomic and if any concept occursonly once on the left hand side of an axiom. It is possible to transform (or \unfold")any knowledge base with unfoldable TBox into an equivalent knowledge base withempty TBox. To simplify the presentation of the knowledge acquisition processwe will in the following only consider knowledge bases with empty TBoxes. Asthe TBox axioms in �arr are unfoldable, we will continue the presentation of thearrhythmia example as if T was empty.

3 Assertion Mining

Assertion mining (ABox mining) was introduced in [Schlobach, 2000]. The settingis the following: we assume that the ABox of a knowledge base consists of a signif-icant amount of possibly noisy information about individuals, a description of theknowledge about the concrete elements in the world. The mining task is to extractterminological knowledge from this data.

For this purpose we consider a supervised learning approach on data which isclassi�ed into decision classes. Here classi�cation corresponds to instance checkingwith respect to a set D = fD1; : : : ;Dng of concepts (which we call decisions). Wesometimes refer to D as the DBox. Each individual o in an ABox A which isan instance of at least one (but possibly more) decisions o 2� Di in A is calledclassi�able. Let class(A) denote the set of all classi�able individuals in A. In�arr the ABox individuals patient1 and patient2 are classi�able w.r.t. a DBoxDarr =fTachycardic, Hypertrophicg, pw is not.

ABox mining is now the search for a formal de�nition for each of the decisions inD which might eventually be added to the TBox after being evaluated and assessed.In the process of the generalisation of a decision Di, we will call the instances ofDi in A the positive examples and all instances of the remaining decisions nega-tive examples. Decisions are usually but not necessarily atomic. Instead of mining�arr for Tachycardic and Hypertrophic, one could re�ne preliminary de�nitionsArrhythmia u :LowHeartRate and Arrhythmia u 9has pwave.OK to the sameavail using the very same methods. It might be worth to point out that this de�-nition does not exclude that an individual could be both a positive and a negativeexample.

Definition 3.1 Assertion mining is de�ned as the following task:

Page 10: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

10 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

� Input: An ABox A classi�ed by a set of decisions D. A possibly empty TBox

T containing background knowledge. T and A must be consistent.

� Output: TBox axioms Di _=LDifor the decisions Di 2 D, where LDi

is a new

concept, which is learned with respect to some given learning criteria and a

prede�ned inductive bias.

An important problem in learning and knowledge discovery from data addressesthe quality of the data. As soon as data becomes large enough to allow for somemeaningful mining it usually also becomes noisy, i.e. there is wrong and incompleteinformation. Some of the noise can be detected using standard description logicreasoning such as ABox and concept consistency checking but additional methodsare needed. Using rough set theory we exclude all individuals from the miningprocess where a logically similar individual is an instance of a di�erent decision.For a detailed discussion about the nature of noise in ABoxes see Section 3.2.

In the remainder of this section we will de�ne conditions �rst on learned con-cepts with respect to the mining ABox and secondly on the data which should begeneralised. It will turn out that these conditions coincide.

3.1 Learning Criteria: Generalised Decision Concepts

Whether a learned concept LD can be used as a formal de�nition of a decision Ddepends on the knowledge which is represented in the ABox A, the TBox T and onsome additional learning criteria. We will discuss several restrictions on the set ofpossible learned concepts with respect to the individuals in the ABox.

3.1.1 Domain Properties and Discerning Properties.

When the user of a knowledge acquisition tool chooses the decisions for the DBoxshe will do so with a particular sub-domain of the elements in the knowledge basein mind. In the arrhythmia example it is not promising to mine for the conceptsOK or Smoker, obviously �arr is a knowledge base about medical conditions andarrhythmic disorders in particular. By choosingTachycardic andHypertrophicas decisions, the mining domain is factually reduced to the domain of Arrhythmia.

There are now two major requirements to the newly learned TBox axiomsD _=LDfor a decision D. First the concept LD must be a discerning descriptions of D inthe sense that it contains the particular features of D (and D only) which allow todiscern elements of D from the elements of the other decisions. Secondly, LD mustcapture the main properties of the domain described by the set of decisions. Thesetwo requirements will be treated as di�erent problems and we will only discuss the�rst one in this paper in suÆcient detail. More formally:

� Discerning Properties: De�ning concepts GDifor each decisionDi which allow

to discern individuals in Di and outside Di using instance checking.

Page 11: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

3 Assertion Mining 11

� Domain Properties: De�ning the common domain of the classi�ed elements,i.e. a concept Dom which describes the properties which are common to allthe elements in D.

In the arrhythmia domain there is only one important domain property, the factthat both decisions are arrhythmia. It is however a non-trivial process to de�neand �nd domain concepts and the development of automated methods is ongoingresearch. We will brie y discuss the issue in Section 6.3 where we present di�erentscenarios for assertion mining. In most cases however (as it is the case for �arr)there are few general mutual properties for the decisions, which can be speci�edthrough an expert \by hand". For this reason we will in the following focus onthe second problem. When de�ning algorithms for Assertion Mining in Section 6we will assume that there is a procedure get domain concept(D;�) which returnsa domain concept for the DBox D with respect to the knowledge base � which isgiven to Algorithm 6.1 as an argument.

3.1.2 Generalised Decision Concepts.

A minimal requirement for a learned concept LD to be useful as a terminologicalaxiom de�ning the decision D is that classi�cation of the classi�ed elements in theABox is preserved, i.e. all instances of LD in A must be instances of D, a conditionwe call A-coveredness.

Definition 3.2 (Coveredness) A concept LD is called A-covered by a decisionD with respect to a knowledge base � = (T ;A) if for all classi�able ABox individualsa in class(A):

a 2� LD ) a 2� D:

This is a standard condition which corresponds to the correctness of a rule inlearning approaches for relatively simple input data structures like decision systemsas described in Appendix A.

But this condition already provokes an interesting question. What happens toall the individuals which are not classi�ed, i.e. for which we do not know if theyare instances of the decision we are about to learn or of one of the other decisions?Should unclassi�ed individuals be considered as counter-examples?

It is computationally impossible to work on all elements of large ABoxes. Wetherefore have to restrict the coveredness condition to classi�able individuals. Thecomputational necessity to restrict the learning conditions to classi�able individualsis the major argument to separate domain from discerning conceptual knowledge,because specialised algorithms should be de�ned to tackle these two very di�erentproblems. It also does not seem too problematic to allow for the odd unclassi�edindividual to instantiate a learned concept because it would be either a strangecoincidence or point at an interesting mining result if a previously unclassi�ed in-dividual would become classi�ed after learning of both domain and discernibilityconcepts.

Page 12: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

12 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

Another problem is related to the Open World Assumption and the revisionof learned concepts with growing knowledge bases. The Open World Assumptionimplies that we cannot conclude anything from the fact that information is missing.For this reason we will have to adapt our learned concepts as soon the knowledgebase is updated with new information. There are two interesting cases:

1. Assertions are added to the ABox A for a constant set of elements,

2. Assertions containing new classi�ed individuals are added to A.

Let us come back to the knowledge base �arr: assume that there is a conceptLD = Male t Smoker covered by the decision Tachycardic for the knowledgebase �arr. Obviously patient1 2� LD and patient2 62� LD. Nevertheless LD isno reliable de�nition for Tachycardic, because patient2 might well be male aswell. If �arr is extended by an assertion patient2:Male, LD will not be coveredby Tachycardic any more.

A robust way to deal with this problem is to require LD to be exclusive, i.e.that o 2� :LD for all individuals o 2 A which are not instances of D.

Definition 3.3 (Exclusiveness) A concept LD is exclusive with respect to a de-

cision D if for all classi�able ABox individuals a in class(A):

a 62� D ) a 2� :LD

Exclusiveness implies coveredness since a 2� :LD implies that a 62� LD. Ex-clusiveness grants cumulativity of the learning process, i.e. a TBox axiom LD _=Dlearned from an ABox A is also correct for a consistent ABox A0 � A. Things areslightly more complicated if assertions containing new elements are added to �. Wewill discuss this issue brie y in Section 6.1.2. A simple logical transformation yieldsthe following alternative de�nition of exclusiveness.

Proposition 3.4 (Exclusiveness) A concept LDiwhich is A-covered by a deci-

sion Di is called exclusive with respect to a set D = fD1; : : : ;Dng of decisions if forall classi�ed ABox individuals a in A:

a 62� Di ) 8k 6=iDk 2 D (a 2� Dk ) a 2� :LDi):

A learned concept LD should also be supported by the fact that there is at leastone example in the ABox which is an instance of LD and we de�ne a witness for aLD as an individual a 2� D in class(A) such that a 2� LD. We can now �nallyde�ne generalised decision concepts.

Definition 3.5 (Generalised Decision Concepts) A generalised decisionconcept (GDC) for a decision Di with respect to a knowledge base � = (T ;A) anda set of decisions D is a concept, which is exclusive w.r.t. Di and for which a

Page 13: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

3 Assertion Mining 13

witness exists in A. Given the de�nition of exclusiveness, the set of all GDCs is

de�ned as:

G�(Di) = fX 2 DL j 9o 2 A : o 2� X and8a 2 A : a 62� Di ) 8k 6=iDk 2 D (a 2� Dk ) a 2� :X):

We usually omit the index � from G�(Di) if no confusion is likely to arise.

A number of concepts exist (e.g. Male or 9has relative.Arrhythmia) for �arr

which are covered by Tachycardic. There are also several GDCs, among them8has pwave.OK and 8has pwave.OK t :Arrhythmia.

A further condition springs to mind: shouldn't a learned concept LD also becomplete in the sense that it covers all the positive examples in D? But here wetouch the issue of uncertain and noisy data because in fact we would only like tolearn from safe instances of D. This corresponds to those elements where we can berelatively sure that the logical information contains important knowledge makingthe positive examples distinctly di�erent from all the negative examples and are notjust wrongly classi�ed or insuÆciently speci�ed. We will leave the discussion forthe moment and de�ne total GDCs using a predicate safe(a) which we will de�nein a more formal way later on.

Definition 3.6 (Total GDCs) A GDC LD is called total with respect to aknowledge base � = (T ;A) and a decision D if for all classi�able and safe ABox

individuals a in A:

a 2� D ) a 2� LD:

Our solution to de�ne such a safety predicate is based on the adaptation ofrough set theory for hybrid knowledge representation systems which we will presentin the following section.

3.2 Rough Set Theory for Assertion Mining

Rough set theory has played a successful role in the theoretical investigation of vagueknowledge and noisy data and we will adapt the methodology to hybrid knowledgebases, thus providing conditions on the data which we want to use for the miningprocess. We will show that rough set theory o�ers an alternative characterisationof the set of generalised decision concepts. For a brief introduction to traditionalrough set data analysis see Appendix A or an introductory paper like [Pawlak, 1998,Skowron, 1993].

3.2.1 Rough and Noisy Data

Rough set theory deals with noisy data by comparing the classi�cation of similarindividuals. We will brie y discuss possible forms of noise in ABox data. The ideabehind the analysis of these cases is based on a black sheep theory: i.e. we assumethat the overall quality and size of the ABox is such that noisy data is the exception.

Page 14: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

14 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

Comparison of properties of indivuals (which are mostly correct) thus becomes ourprime tool against noise.

1. Correct Classi�cation: a 2� D.

(a) Incomplete Speci�cation: Everything learned from a should also be cor-rect for a if more information about a is available. (Exclusiveness)

(b) Wrong Speci�cation:

i. b 2� D for all correctly speci�ed similar instances b means that themistake in the speci�cation is not crucial. Nothing has to be done.

ii. b 62� D for a b similar to a, implies that a (as well as b) is an insecureexample for D. Unfortunatly we have no logical means to �nd outwhether a or b are erroneous.

iii. there are no correctly speci�ed similar individuals. There is nothingwe can do using comparison only, statistical methods have to beused, because it should be noticable that the learned concept shouldonly be supported by few elements.

2. Incorrect Classi�cation:

(a) Overspeci�cation: a 2� D, but should not be.

i. b 2� D for all correctly classi�ed similar instances b means that themistake in the classi�cation is not crucial. This case is impossible todetect and seems to be acceptable, because the wrong classi�cationof a does not contradict the correct classi�cation of all concepts bsimilar to a.

ii. b 62� D for a b similar to a, implies again that a (as well as b) is aninsecure example for D. Unfortunatly we have no logical means to�nd out whether a or b are erroneous.

iii. there are no correctly speci�ed similar individuals. There is nothingwe can do using comparison only, statistical methods have to beused, because it should be noticable that the learned concept shouldonly be supported by few elements.

(b) Incomplete classi�cation: a 62� D, but should be. Since a is classi�able,there is a decision Dk such that a 2� Dk.

i. There are hidden hierarchies D v Dk or Dk v D: These cases canonly be investigated by comparision of all instances of D with Dk, anissue which has to be addressed in a further research. The maximalgoal in this setting is to detect the possibility of such hierarchies inthe process of assertion mining.

ii. No hierarchies, but D and Dk are not disjoint. There is no logicbased way to �nd out that a should be an instance of D u Dk bycomparison with other instances of D or Dk. If there is a b 2� Dwhich is similar to a, a and b are insecure examples.

Page 15: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

3 Assertion Mining 15

iii. Disjoint decisions D and Dk. There should not be a similar elementin b 2� D. In this case a cannot be considered as a witness for D.

Rough set theory o�ers a mechanism to identify data as good examples (lowerapproximation), bad examples (outside region) or insecure examples (boundary re-gion). Item 1a requires the learned results to preserve correctness with growingdata. This is re ected in our choice for a very strong discernibility relation.

There is an interesting �nal point to be made about this analysis: since we allowfor multiple classi�cation, it might happen that the same object is an instance oftwo or more decisions. On the other hand consider the case where we have di�erent,but similar objects, which are in di�erent decisions. Given the above arguments theidenti�ation of cases where two individuals are in di�erent decisions is the primaryprotection mechanism for noisy data. In the second casse rough set theory forassertion mining therefore classi�es the examples into the boundary region, whereasthe example in the �rst case will be considered to be secure.

3.2.2 Discernibility and Approximations

There are some signi�cant di�erences to traditional rough set theory which is usu-ally applied in data analysis of simple decision systems, i.e. complete sets uniqueattribute value pairs. Some of them have been made explicit in the analysis of noisydata. The Open World Assumption means that we have to deal with incomplete in-formation and have to accept it as part of the knowledge representation philosophy.This means that we cannot adopt a traditional discernibility approach based onlogical equivalence. Hierarchical decisions and data represention allows for individ-uals to be instances of multiple decisions. According to De�nition 3.1 for assertionmining we decided to mine for decisions and not for equivalence classes of objectsw.r.t. to decisions. An alternative approach will be analysed in one of the scenariosin Section 6.3.

Definition 3.7 (Strong Discernibility/Weak Indiscernibility) Two indi-

viduals a and b are weakly indiscernible if for all concepts C 2 DL :

aS�b i� (a 2� C ) b 62� :C)

The dual notion is strong discernibility for a and b in a knowledge base � = (T ;A)de�ned as:

a ~S�b i� there exists a concept C such that a 2 C and b 2 :C:

Based on discernibility we can now de�ne approximations of decisions. Thelower approximations contain all elements of the ABox A for which all e�ectivelyindiscernible individuals using instance checking on arbitrary ALC concepts are alsoin the decision. This implies a high degree of certainty in the data and it will be theelements of the lower approximation of a decision which are used for generalisation.

Page 16: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

16 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

If there is logically \similar", i.e. indiscernible individual to an individual owhich is not in D o is not in the lower approximation. The upper approximationfor a decision D collects all elements, for which there are indiscernible individuals,which are in D. We will not generalise upper approximations but calculate thecardinality to assess the accuracy of a learned concept, which will be de�ned as thefactor of the cardinalities of lower and upper approximation.

Definition 3.8 (Rough Approximation of Decisions) The lower approxima-tion of a decision D 2 D with respect to a knowledge base � = (T ;A) using thetolerance relation S� is de�ned as follows:

S�(D) = fa 2 A j 8b 2 class(A) : aS�b) b 2� Dg;

the upper approximation as:

S�(D) = fa 2 A j 9b 2 class(A) : aS�b & b 2� Dg:

With strong discernibility the lower approximation is equivalent to:

S�(D) = fa 2 A j 8b 2 class(A) : b 62� D ) (9C : a 2 C & b 2 :C)g:

The boundary region is de�ned as: bound(D) = S�(D) n S�(D). If the boundary

region is non{empty the decision will be called rough, otherwise crisp.

Obviously a 2 S�(D)) a 2� D follows by re exivity of S� for all a 2 class(A).We can now show that there is a GDC in G(D) for each individual in the lowerapproximation and vice versa, restricting the set of GDCs to the non-vague, i.e.secure knowledge in A. The proofs can be found in Appendix C.

Proposition 3.9 For all classi�able individuals a 2 A the existence of a GDC Xin G(D) for a decision D such that a 2� X implies that a 2 S�(D).

Proposition 3.10 For all elements a in the lower approximation S�(D) for a

decision D there is a GDC LD in G(D) such that a 2� LD.

This shows that our generalised decision concepts instantiate exactly the ele-ments in the lower approximation. These are important results because accordingto rough set theory the elements in the lower approximation of a decision D are themost secure data available for generalisation. We therefore de�ne a safety predicatesafe(a) using the lower approximation as: safe(a) i� a 2 S�(D). This de�nition andPropositions 3.9 and 3.10 provide an alternative interpretation for approximationsin assertion mining, namely as sets of concepts instead of sets of individuals.

Theorem 3.11 If the safety predicate is de�ned using the de�nition: safe(a) i� a 2S�(D) any concept

LD :=G

for each witness a 2 class(A)a concept Ga 2 G(D)is chosen

Ga

Page 17: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

3 Assertion Mining 17

is a total generalised decision concept for the decision D and

a 2� LD i� a 2 S�(D):

The signi�cance of this theorem is that it combines a rigid formal de�nition forsecurity in data with a set of concepts which can e�ectively be constructed andwhich instantiate exactly the safe examples. Furthermore we get some statisticalglobal accuracy and support measures for GDCs for free.

Definition 3.12 (Accuracy and Support) Global accuracy and support for atotal generalised decision concept G if constructed for a decision D as in Theorem

3.11 are de�ned as:

acc(G) =jS�(D)j

jS�(D)jsupp(G) =

jS�(D)j

jfa 2 class(A)gj

Global accuracy and support can help to assess the quality of the data whichwas used for the mining. It does not provide statistical information about thequality of each GDC learned for a witness and with this about the credibility of asingle witness. Additional statistic methods have to be included in the assertionmining process to increase the chance of identifying wrongly classi�ed or speci�edindividuals.

A further interesting application of rough set theory for assertion mining is basedon the close relation between approximations and modal logic operators [Liau, 2000].It is easy to see that lower and upper approximations for the tolerance relation S�correspond to KTB box and diamonds. This allows for consistency checks of themining results using traditional modal logic model checking. We are currentlyinvestigating the prospects of such an approach.

Theorem 3.11 identi�es all possible total generalised decision concepts but thereare still in�nitely many possible total GDCs. In learning terms we have de�ned thehypothesis space [Mitchell, 1997]. To learn from the examples we need to performan inductive leap, i.e. we have to deliberately choose a �nite set of interestingconcepts which are eÆciently calculable. This choice is called the inductive bias.

3.3 Inductive Bias: Discerning Polarity and Common Vocabulary

There are di�erent aspects related to the choice of the inductive bias underlyingour approach and we will discuss them brie y. In some of the cases we also discusssome alternatives.

� Individual abstraction: We use the fact that DLs have language features(quanti�cation) to reason about individuals in an abstract way, thus general-ising from particular examples to more general facts (e.g. existence of a rolesuccessor).

Page 18: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

18 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

� Common language with discerning polarity: We take the designer ofthe knowledge base and the witnesses they provided seriously and relate thelearned concepts to the vocabulary of each example. The GDCs should consistof the atoms which are used in both the examples and the counterexamplesand which are used in the opposite way (i.e. with discerning polarity) asthey were in the examples. This is not discussed in the literature, because itautomatically applies to most learning methods or is enforced using algorithmsfor missing attribute values.

� Optimality: In addition to the syntactical restrictions there is a se-mantical restriction based on optimality w.r.t. subsumption.3 Leastcommon subsumer learning [Frazier and Pitt, 1996, Cohen and Hirsh, 1994,Baader and K�usters, 1998] focuses on common properties of the examples,whereas GDCs formalise the discerning properties. This requires di�erencesin the choice of semantic bias.

The fact that assertion mining is based on discernibility in uences the choice ofpolarity of vocabulary as a major learning bias.

In the remaining sections we will introduce interpolation for hybrid knowledgerepresentation systems and show how to use algorithms for interpolation to calculatetotal GDCs. We provide correctness and completeness results for the tableau basedAlgorithms 5.1, 5.3 and 5.4, which calculate interpolants. Finally we show thatthese interpolants are generalised decision concepts and present Algorithm 6.1 tocalculate a total generalised decision concept based on the discussion in this section.

4 Interpolation Methods

Having identi�ed secure learning targets and an inductive bias to restrict the hy-pothesis space it remains to provide e�ective algorithms to �nd some of these totalGDCs. Algorithm 6.1 sketches the mining process from input (a knowledge baseand a set of decisions) to the �nal output of a set of new terminological axioms Itis based on ABox interpolation which is a generalisation of concept interpolation.We will now introduce di�erent types of interpolation for description logics.

4.1 Concept Interpolation

Craig proved interpolation for �rst order logic in 1957. Since then many paperson interpolation have been published, just to mention [Lyndon, 1959] on �rst-orderLyndon interpolants or [Kracht, 1999, Marx, 1999] on modal logic. Traditionallyinterpolation is de�ned for consequence relations. Let us consider the de�nition forthe local consequence relation `ML in classical modal.

� `ML i� for all models M and worlds w :M; w j= �)M; w j= :3In fact, we mine for example related optimal GDCs, which are not actually optimal. See

[Schlobach, 2001] for details.

Page 19: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

4 Interpolation Methods 19

The corresponding notion in description logics with empty knowledge bases is:

C `DL D i� for all interpretations I and individuals u 2 U : u 2 CI ) u 2 DI :

But the semantics of description logics as de�ned in Section 2 implies that C `DL Dif and only if C v D. In the following we therefore consider concept subsumptionas a consequence relation.

Definition 4.1 (Interpolation) Let L(C) be the language related to a concept

C. Concept L{Interpolation holds for two concepts P and N where P v N if there

exists a concept I such that P v I and I v N and L(I) � L(P ) \ L(N).

This de�nition depends on the particular de�nition of the language L. Accordingto our learning bias, the language related to an ALC concept C is de�ned includinginformation about the polarity of atoms.

Lyndon Interpolation. Interpolation which includes polarity information wasintroduced in [Lyndon, 1959] for �rst order logic. By polarity one usually under-stands information about atoms, whether they occur negated or not a formula wheremultiple negations have to considered. We have to extend the traditional rule: evennumber of negations = positive polarity by information about the quanti�ers.

Informally, the language occ(C) is a set of concept names occurring in C, labelledwith the polarity and the quanti�er depth. A concept name A has positive (negative)polarity as usual if it is embedded in an even (odd) number of negations. Thequanti�er depth describe the sequence of role names over which the concept C isquanti�ed.

Definition 4.2 (Language related to Concepts) For roles Ri 2 NR let Sbe the set of sequences R1 : : : Rn called quanti�er depth. Let � denote the quanti-

�er depth of length 0. If S is the set of all elements es11 ; : : : ; esmm labelled with the

respective quanti�er depth sj, SR will denote the set S where each element is now

labelled with sjR.The set Occ = NC � f+;�g � S is a relation between concept names, polari-

ties and quanti�er depths. We de�ne two mappings occ; ~occ : NC ! 2Occ recursively:

occ(C)s = f(A;+)sg, ~occ(C)s = f(A;�)sg if C = A and A is an atom.

occ(C)s = ~occ(D)s, ~occ(C)s = occ(D)s if C = :Docc(C)s = occ(C1)

s [ occ(C2)s if C = C1 u C2 or C = C1 t C2

~occ(C)s = ~occ(C1)s [ ~occ(C2)

s if C = C1 u C2 or C = C1 t C2

occ(C)s = occ(D)sR if C = 9R:D or C = 8R:D~occ(C)s = ~occ(D)sR if C = 9R:D or C = 8R:D

We will abbreviate occ(C)� with occ(C).

Take the concept C = 8R:((:9R:8S:C)uD). The concept related language forC is de�ned as occ(C) = f(C;�)RRS ; (D;+)Rg.

Page 20: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

20 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

Definition 4.3 A concept I is a Concept Lyndon Interpolant for two concepts Pand N where P v N i� P v I and I v N and occ(I) � occ(P ) \ occ(N).

It is a well known result that interpolation exists for modal logic K and thereforealso for concept subsumption in the description logic ALC. We will proof this factby constructing concept interpolants using Algorithm 5.1.

4.2 ABox Interpolation

Traditionally interpolation is de�ned for semantic consequence or implication, whichcorresponds to subsumption in the case of description logics. In this section weextend our approach to cover a more indirect sort of implication where conceptsare represented implicitly as the knowledge about an individual represented in theABox.

Definition 4.4 (Knowledge about Individuals) Positive knowledge (or P{knowledge) about an individual a with respect to a knowledge base � is de�nedas the set of all concepts of which a is an instance of. The dual notion is negative(or N{knowledge) which contains all the concepts, a is de�nitely not an instance

of.

P�(a) = fC 2 DL j a 2� Cg

N�(a) = fC 2 DL j a 2� :Cg:

P{ and N{knowledge are dual notions in the sense that :C 2 N�(a) for allconcepts C 2 DL in case C 2 P�(a) and vice versa. Both are in�nite sets. P{knowledge is characterised by the smallest, i.e. most speci�c concepts, N{knowledgeby the biggest, i.e. most general concept. It is clear that these sets can usually notbe represented implicitly, but we will use the notions to clarify the relations ofindividuals when de�ning ABox interpolation.

In order to be able to de�ne an interpolation property on ABox individuals,we have to de�ne how to relate the implicit knowledge about individuals using P{knowledge and N{knowledge as de�ned above. These mixed relations form the basisof assertion mining.

Definition 4.5 (Relating Individuals through P{ and N{ Knowledge)We can de�ne two relations ` and `s im based on the P{knowledge and

N{knowledge about an ABox individual.

a ` C i� C 2 P�(a)

a `� b i� 9P 2 P�(b) & 9N 2 N�(a) such that P v N

C `� a i� C 2 N�(a)

Page 21: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

4 Interpolation Methods 21

a `� b signi�es that the positive knowledge about a is more speci�c than the negative

knowledge about b. Similarly, C `� b means that C is more speci�c than the negative

information about b.

If the most speci�c concepts for two individuals exists, the intuition underlyingthe ` relations can be represented more intuitively. The following proposition showsthis link and should help to better understand ABox interpolation.

Proposition 4.6 If the most speci�c concept msc(a) and msc(b) exists for two

individuals a and b with respect to a knowledge base � the following simpli�cationshold:

a ` C i� msc(a) v C

a `� b i� msc(a) v :msc(b)

C `� b i� C v :msc(b):

Both relations a `� b and C `� b link the content of information about ABoxindividuals using inconsistency, for example a `� b if and only if they are instancesof two di�erent inconsistent concepts. We will explore this fact later on.

Proposition 4.7 a `� b if and only if P (a) \N(b) 6= ?.

It is now easy to see that whenever a `� b there always exists a concept Csuch that a ` C and C `� b because P (a) \ N(b) 6= ? implies that there mustexists a concept in both N(b) and P (a). Using this fact we can now de�ne ABoxinterpolation.

Definition 4.8 (ABox Interpolation) An ABox Interpolant for an individual

a and an individual b where a `� b in a consistent ABox A is a concept C such

that:

a ` C & C `� b

and where L(C) � L(a) \ L(� b), where � b somehow denotes the \negation" of b.

It remains to explain what we mean by the language related to an individualin an ABox. It is not trivial to de�ne such a language appropriately. Consider thefollowing example ABox:

(john,jane):child ofjohn:8child of.Teacherjane:Logician

Our choice of language is based on logical inference, i.e. we propose thatL(john) should be f(Teacher,+)child of,(Logician,+)child ofg and L(jane)be f(Teacher,+)(Logician,+)g, because it is the logical link which is missing andnot the language elements. But this choice is obviously arbitrary and a more

Page 22: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

22 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

simple solution could be L(john)=f(Teacher,+)child of,(>,+)child ofg andL(jane)=f(Logician,+)g.

The de�nition of the instance related language is based on the de�nition of thelanguage related to ALC concepts in De�nition 4.2.

Definition 4.9 (Conceptual Language related to Instances) Let again

NC be the set of concept names, NR the set of role names and NI the set of indi-vidual names.

The language related to an instance a w.r.t. and ABox A is the smallest set

occA(a) such that occ(C) � occA(a) for all conceptual axioms (a : C) 2 A,occA(b)R � occA(a) for all (a; b) : R 2 A and occ(C) � occA(a) for all

occ(C)R � occA(b) if (b; a) : R 2 A. Similarly for ~occA(a).

Note that the de�nition of occA(a) depends on a particular ABox A. Thefollowing de�nition is a special case for the previous de�nition where L(C) = occ(C).

Definition 4.10 (ABox Lyndon Interpolants) A concept I is an ABox Lyn-

don interpolant for a and b i� a 2� I and b 2� :I and where occ(I) �occA(a) \ ~occA(b).

In the remainder of this paper ABox and Concept Lyndon interpolants willbe calculated using tableau methods. We might sometimes omit the \Lyndon",but will always consider interpolation with respect to occ and occA ( ~occ and ~occA

respectively).

4.3 Partial ABox Interpolants

Algorithms for ABox interpolants relate implicit knowledge about two individualsin an ABox whereas concept interpolation is de�ned for two explicitely de�nedconcepts.

Partial ABox interpolants play an intermediate role between the two notionsby relating knowledge about an individual with a concept. We will see that partialinterpolation for an ABox A can be reduced to concept interpolation by constructionof a preprocessing complete ABox A0 for A. Furthermore ABox interpolation canbe reduced to partial interpolation if some properties of individuals are propagatedinto the ABox. Let us formally de�ne partial ABox interpolation.

Definition 4.11 (Partial ABox interpolant) A concept I is a positive par-tial interpolant for an individual o, an ABox A and a concept C where o 2� C if

and only if:

o 2A I and I v C and occ(I) � occA(o) \ occ(C)

It is called negative partial interpolant if and only if

C v I and o 2A :I and occ(I) � ~occA(o) \ occ(C)

Page 23: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

5 Tableau Methods for Interpolation 23

5 Tableau Methods for Interpolation

The algorithms to calculate Lyndon interpolants using logical tableaux presentedhere follow the lines of [Fitting, 1996] and [Kracht, 1999]. Lyndon interpolantsfor ALC concept subsumption can be constructed from a fully expanded closedtableau collecting contradicting literals on each branch using construction rulescorresponding to traditional tableau rules. For ABox interpolants the more complexinteraction between role and concept assertions has to be taken into account. Thesolution is to preprocess the ABox (according to [Hollunder, 1996]) and propagatethe result of all possible inference steps into a concept. This concept can then beused in an intermediate interpolation step (partial interpolation) to represent thecomplete knowledge about one of the individuals. Further application of the samepreprocessing steps then allow a further reduction of the interpolation problem toconcept interpolation.

5.1 Tableau Calculi

As the algorithms described in this paper are based on tableau proof calculi wehave to de�ne the general framework. A tableau is a set of branches, where eachbranch is a set of formulas or of labelled formulas. A formula is a term of the form(a : C) or (a; b) : R where a and b are individual variables, C an ALC concept andR a role name. A labelled formula is a term of the form �y, where � a formula andwhere y 2 fn; pg labels the formula with p if the origin of the formula is a positiveexample and n if it is a negative one. A formula can occur both positively andnegatively labelled on the same branch. We say that a branch is labelled if onlylabelled formulas occur. The notions of open branch and closed and open tableauare de�ned as usual and do not depend on the labels. We will identify a branchB with the set of (labelled) formulas �y on the branch and write �y 2 B. Sincebranches of a tableau and an ABox are both just sets of formulas we happily switchbetween the notions. Whenever we are dealing with branching in a calculus we talkof branches but we also have a notion of instance checking of an element a and aconcept C with respect to a branch B, i.e. we will write a 2B C.

The rules extend existing branches by new (labelled) formulas or branch if aset of preconditions is satis�ed. Every rule can only be applied once on the sameformula. For soundness or completeness results the order of the application of rulesis irrelevant. A branch is saturated if no more rule can be applied, a tableau issaturated if all branches are. To further simplify the presentation of our rules, weassume that all formulas are in negation normal form, i.e. negation is always pusheddown to the atomic level [Horrocks, 1997].

Definition 5.1 (NNF) A concept is in negation normal form (nnf) if all negations: appear in front of atomic concepts only. Every ALC concept can be transformed

Page 24: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

24 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

into nnf using the following equivalences as rewrite rules:

:(C uD) � :C t :D:(C tD) � :C u :D

::C � C:8R:C � 9R::C:9R:C � 8R::C

The algorithms to calculate ABox Interpolants is based on Concept Interpola-tions. Therefore we present a tableau calculus for concept interpolation �rst.

5.2 Algorithms for Concept Interpolation

To prove subsumption P v N for two concepts P and N and to extract a Lyndoninterpolant we construct a tableau proof from a branch B containing only the twoformulas (a : P )p and (a : :N)n by applying the rules in Fig. 3 as long as possible.4

If P 6v N there are no interpolants. In this case there will be an open branch in thefully expanded tableau. To calculate Lyndon interpolants we saturate the tableau.

(u): if (a : C1 u C2)x 2 B, but not both (a : C1)

x 2 B and (a : C2)x 2 B

then B0 := B [ f(a : C1)x; (a : C2)

xg.

(t): if (a : C1 t C2)x 2 B, but neither (b : C1)

x 2 B nor (b : C2)x 2 B.

then B0 := B [ f(a : C1)xg and B00 := B [ f(a : C2)

xg.

(9): if if (a : 9R:C)x 2 B and all other rules have been applied on allformulas over a, and if f(a : 8R:D1)

x1 ; : : : ; (a : 8R:Dn)xng � B is

the set of universal formulas for a w.r.t. R in B,then B0 := B [ f(b : C)x; (b : D1)

x1 ; : : : ; (b : Dn)xng

where b is a new name not occurring in B.

Notation: a; b 2 NI ;C;C1; C2; D1; : : : ; Dn 2 ALC;x; x1; : : : ; xn 2 fp; ng

Figure 3: Tableau Rules for ALC Concept Interpolation

If no more rule can be applied each branch is checked for closure using theclosure rule de�ned in Fig. 4. Application of the rules in Fig.3 and 4 without theconstruction of concept interpolants provides a decision procedure for ALC conceptsatis�ability. We will call this restricted tableau system LI{TAB.

LI{TAB is a labelled variant of a standard tableau (or constraint) algorithmsto decide modal logic K. Interpolation for the traditional completion systems forALC ABoxes is more tricky because of the way role assertions have to be handled.LI{TAB is not complete with respect to ABox satis�ability. For details about

4We use the concept (and label) names P and N (p and n) because in an assertion miningapplication, P would traditionally be a concept describing the knowledge about the positive, Nabout the negative examples. Nevertheless the interpolation results hold for the general case.[Kracht, 1999] and [Areces, 2000] call p: a and c: n which denotes antecedent and consequence.

Page 25: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

5 Tableau Methods for Interpolation 25

soundness, completeness and termination of LI{TAB with respect to consistency ofa set of concepts we refer to Theorem 3.8.3 of [Kracht, 1999].

Theorem 5.2 ([Kracht, 1999]) The tableau system LI{TAB based on the rules

in Fig. 3 and 4 is a decision procedure for concept consistency.

Algorithm 5.1 describes the application of the rule and the construction forconcept interpolants in detail.

Algorithm 5.1 concept LI(P;N): Concept Interpolation

Input: Two concepts P and N .Output: Concept Interpolant for P and N .Uses: Algorithm 5.2.

return obj rel LI(a; f(a : P )pg; (a : :N)ng);

Notation: a 2 NI ;P;N 2 ALC

For Algorithm 5.2 the technical notion of individual related interpolants for a setof labelled formulas is introduced. Let f(o : C1)

p; : : : ; (o : Cmp)p; (o : D1)

n; : : : ; (o :Dmn)

ng be the set of all labelled formulas for an individual o inB. The o{interpolantfor a branch B is an interpolant for the two concepts C1u : : :uCmp and :D1t : : :t:Dmn .

Whenever Algorithm 5.1 is applied it return with a concept concept LI(P;N),which is a concept interpolant for P and N . This follows immediately from the cor-rectness of obj rel LI(a;B) (see Lemma 5.4) because obj rel LI(a; f(a : P )p; (a ::N)ng) is an a{interpolant for a branch f(a : P )p; (a : :N)ng and therefore aLyndon interpolant for P and N .

Theorem 5.3 (Soundness of Algorithm 5.1) The concept concept LI(P;N)as calculated by Algorithm. 5.1 is a Lyndon interpolant for P and N .

Closure: for all individuals a and atomic concepts A1; : : : ; An whereboth (a : Ai)

xi 2 B and (a : :Ai)yi 2 B and 1 � i � n

Interpolant:

8>>>>>><>>>>>>:

obj rel LI(a;B) = > if there are xi = n and yi = n orobj rel LI(a;B) = ? if there are xi = p and yi = p orobj rel LI(a;B) =

Fxi = p; yi = n

1 � i � n

Ai

tF

xi = n; yi = p1 � i � n

:Ai

obj rel LI(a;B) = unde�ned if there no contradiction.

Notation: a 2 NI ;A1; A2 2 NC ;xi; yi 2 fp; ng

Figure 4: Closure Rule for ALC Concept Interpolation

Algorithm 5.2 uses some auxiliary functions which are more or less self-explanatory: get applicable rule Fig.3(B; a) returns one of the applicable rule

Page 26: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

26 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

Algorithm 5.2 obj rel LI(a;B): a{related Concept Lyndon Interpolation

Input: A branch B (without role assertions) and an individual a.Output: a{related Lyndon Interpolant for the branch B.Uses: Fig. 3 and 4.

rule := get applicable rule Fig.3(B,a);apply(rule,B);if rule = (u)fB0g := get new branches;return obj rel LI(a;B0);

if rule = (t)label := get label;fB0; B00g:= get new branches;if label = n return obj rel LI(a;B0) u obj rel LI(a;B00);else if label = p return obj rel LI(a;B0) t obj rel LI(a;B00);

if rule = (9)label := get label;fB0g := get new branches;b := get new variable;if obj rel LI(a;B0) existsif obj rel LI(b; B0) existsif label = p return obj rel LI(a;B0) t 9R:obj rel LI(b; B0);else if label = n return obj rel LI(a;B0) t 8R:obj rel LI(b; B0);

else return obj rel LI(a;B0);else if obj rel LI(b; B0) exists

if label = p return 9R:obj rel LI(b; B0);else if label = n return 8R:obj rel LI(b; B0);

else return unde�ned;if no more rule can be appliedreturn obj rel LI(a;B) according to closure rule in Fig. 4;

Notation: a; b 2 NI ;R 2 NR; label2 fp; ng and branches B;B0; B00

in Fig.3, where each rule can only be applied once on the same formulas. get label,get new variable and get new branches return a label, a new variable and thenew branch(es) for the just applied rule. For the following algorithms we will appealto the readers to read such auxiliary algorithms in the intended way.

The Algorithm 5.2 is correct in the sense that it returns with an individualrelated Lyndon interpolant if there is one.

Lemma 5.4 (Soundness of Algorithm 5.2) The concept obj rel LI(a;B) as

calculated by Algorithm 5.2 is an a{related concept Lyndon Interpolant for the la-

belled branch B.

The proof of this lemma is by induction over the applicable rules. We show thatwhenever a rule was applied and recursive application of obj rel LI(a;B) returns

Page 27: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

5 Tableau Methods for Interpolation 27

an a-related concept interpolant, the construction rules calculate a-interpolants forthe original branch.

It remains to show that our algorithms are complete, i.e. that we indeed manageto calculate an interpolant whenever there is one. The full proofs for Theorems 5.3and 5.5 and for Lemma 5.4 are given in Appendix C.

Theorem 5.5 (Completeness of Algorithm 5.1) Whenever there is a Lyn-

don concept interpolant for two concepts P and N Algorithm 5.1 returns with anALC concept concept LI(P;N), which is a concept interpolant for P and N .

5.3 Algorithms for ABox Interpolation

To calculate ABox interpolants we adapt the reduction introduced by Hollunder in[Hollunder, 1996]. In order to reduce ABox interpolation to concept interpolationand partial ABox interpolation (which was de�ned in De�nition 4.11), we have topreprocess the ABox using the rules in Fig. 5 until no more rule can be applied. Inthis case the ABox is called preprocessing complete.

(u): if (a : C1 u C2) 2 B, but not both (a : C1) 2 B and (a : C2) 2 B

then B0 := B [ f(a : C1); (a : C2)g.

(t): if (a : C1 t C2) 2 B, but neither (a : C1) 2 B nor (a : C2) 2 B.then B0 := B [ fa : C1g and B00 := B [ fa : C2g.

(8): if (a : 8R:C) 2 B and ((a; b) : R) 2 B but not (b : C) 2 B.then B0 := B [ fb : Cg:

Notation: a; b 2 NI ;C;C1; C2 2 ALC;R 2 NR and branches B;B0; B00

Figure 5: Preprocessing Rules for ALC partial and ABox Interpolation

Hollunder shows that for any consistent ABox A there is a consistent prepro-cessing complete ABox A0 derivable from A. We identify A0 with the set of allbranches B build from A and write B 2 A0. An ABox A is consistent if there is anopen branch B in the preprocessing complete ABox A0.

Definition 5.6 The set CBa of concepts related to an individual a 2 B in a branch

B is now de�ned as follows: C 2 CBa i�

� a : C 2 B, where C is a literal,

� a : 9R:C if there is no R successor of a in B,

� a : 8R:C.

For an individual a 2 B and a branch B we de�ne a concept Ba =dC2CBa

C.

Page 28: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

28 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

The concept Ba contains all the conceptual information about an individual ain B. If all implicit properties of an ABox have been made explicit by exhaustiveapplication of the rules in Fig.5 (in particular of the (8) rule on the role assertions),satis�ability of a branch B can be reduced to concept satis�ability of all the conceptsBa for a 2 B. If an ABox is consistent there must be a satis�able branch in itsrelated preprocessing complete ABox. We will make use of the following simpleproposition.

Proposition 5.7 If a branch B is consistent, then d 2B Bd for any individual

d 2 B.

5.3.1 Partial ABox Interpolation

Algorithm 5.3 partial LI(B;C; a; pol): Partial Interpolation with Polarity

Input: A consistent Branch B, a concept C, an individual a and a polarity pol

Output: Partial a-Interpolant L: if pol = pos: L v C and a 2B L

occ(L) � occB(a) \ occ(C)if pol = neg: C v L and a 2B :Locc(L) � ~occB(a) \ occ(C)

Uses: Preprocessing Rules in Fig. 5 and Algorithm 5.1

rule := get applicable rule Fig.5(B);apply(rule,B);if rule = (u)fB0g := get new branches;return partial LI(B0; C; a; pol);

if rule = (t)fB0; B00g := get new branches;if pol = pos return partial LI(B0; C; a; pol) t partial LI(B00; C; a; pol);else return partial LI(B0; C; a; pol) u partial LI(B00; C; a; pol);

if rule = (8)fB0g := get new branches;return partial LI(B0; C; a; pol)

if rule = unde�ned;if pol = pos return concept LI(Ba; C);else return concept LI(C;:Ba);

Notation: a 2 NI ;C;Ba; L 2 ALC;R 2 NR; pol 2 fpos; negg and branches B;B0; B00

Positive and negative partial ABox interpolants are dual notions and the onecan be de�ned through the other using the following simple equality.

Proposition 5.8 If L is a positive partial interpolant for a concept C and an

individual o with respect to an ABox A, :L is a negative partial interpolant for :Cand o.

Page 29: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

5 Tableau Methods for Interpolation 29

The reason why we introduce both notions here is that they can be used in-dependently in the context of assertion mining although we will not discuss thisapplication of partial interpolants in this technical report.

Algorithm 5.3 makes use of the fact that the implicit properties related to anindividual a in a knowledge base � can be made explicit using the preprocessingprocedures of Hollunder. The rules in Fig. 5 are applied as long as possible. Partialinterpolants are calculated recursively for each branch. If the ABox is preprocessingcomplete, i.e. if no more rule can be applied, the information about the individual acan be collected in the concept Ba as de�ned in de�nition 5.6. A partial interpolantfor a concept C and a can now be calculated using Algorithm 5.1 for conceptinterpolation on C and Ba (or C and :B).

The proof that partial LI(B;C; d; pol) as calculated by Algorithm 5.3 is apartial ABox interpolant for d;C and B with polarity pol is by induction over thepreprocessing rules and is given in full in Appendix C.

Theorem 5.9 (Soundness of Algorithm 5.3) If partial LI(B;C; d; pol) is aconcept calculated by Algorithm 5.3 it is a d-partial{interpolant with polarity pol.

Since partial ABox interpolation can be reduced to concept interpolation usingthe construction by Hollunder, completeness of Algorithm 5.3 follows immediatelyfrom completeness of Algorithm obj rel LI and the soundness and completenessresult in [Hollunder, 1996].

Theorem 5.10 (Completeness of Algorithm 5.3) Whenever there is a posi-

tive (negative) partial ABox interpolant for an individual d and a concept C, theconcept partial LI(B;C; d; pos) (partial LI(B;C; d; neg)) as returned by Algo-

rithm 5.3 is de�ned.

5.3.2 ABox Interpolation

We can now calculate the ABox Lyndon interpolant for each open branch of apreprocessing complete ABox.

Algorithm 5.4 for ABox interpolation for two individuals a and b in an ABoxA uses the same preprocessing rules in Fig. 5 as Algorithm 5.3 for partial ABoxinterpolation to make the explicit knowledge about both individuals implicit. If abranch B is not satis�able the ABox Interpolant is > if B is contradicting on b,? otherwise. The crucial idea is now to propagate the properties of the role �llerfor a into the universally quanti�ed concepts for b 2 B and to search for possbilecontradictions. This can be done using Algorithm 5.3 for partial ABox interpolation.The ABox interpolant propagate LI(B; a; b) for a and b on each branch B is nowthe disjunction over all concepts which are responsible for closure of the tableau, i.e.the concept interpolants for the logical representations Ba and Bb for a and b andthe partial interpolants for the propagated information. It is shown in Appendix Cthat Algorithm 5.4 is sound and complete.

Page 30: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

30 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

Algorithm 5.4 ABox LI(B; a; b): ABox Lyndon Interpolation

Input: A branch B and two individuals a and b.Output: ABox Lyndon interpolant for a and b.Uses: Algorithm 5.5.

rule := get applicable rule Fig.5(B);apply(rule,B);if rule = (u)fB0g := get new branches;return ABox LI(B0; a; b);

if rule = (t)individual := get individual;fB0; B00g := get new branches;if individual = b return ABox LI(B0; a; b) u ABox LI(B00; a; b);else return ABox LI(B0; a; b) t ABox LI(B00; a; b);

if rule = (8)fB0g := get new branches;return ABox LI(B0; a; b);

if rule = unde�ned;if inconsistent(Bb)

return >;if there is a c 2 NI ; c 6= b such that inconsistent(Bc)

return ?;else

return propagate LI(B; a; b);

Notation: individual,a; b; c 2 NI ;Bb; Bc 2 ALC and branches B;B0; B00

Theorem 5.11 (Soundness of Algorithm 5.4) If ABox LI(B; a; b) is a conceptcalculated by Algorithm 5.4 it is an ABox interpolant for a and b with respect to the

branch B.

If the application of the preprocessing rules can be shown to be correct, Theorem5.11 is an immediate consequence of the correctness of Algorithm 5.5 as stated inLemma 5.12 and proven in Appendix C. We omit the proof for the correctness ofthe preprocessing rules, because they are almost identical to the rules of Algorithm5.3 and so is the proof.

Lemma 5.12 (Soundness of Algorithm 5.5) Let B be a consistent branch of

a preprocessing complete ABox. The concept propagate LI(B; a; b) as de�ned in

Algorithm 5.5 is an ABox interpolant for the ABox instances a 2 B and b 2 B.

Theorem 5.13 (Completeness of Algorithm 5.4) If there is an ABox Lyn-

don interpolant for two individuals a and b with respect to an ABox A, the concept

ABox LI(A; a; b) as returned by Algorithm 5.5 is de�ned.

Page 31: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

5 Tableau Methods for Interpolation 31

Algorithm 5.5 propagate LI(B; a; b): Propagation for preprocessing completeABoxesInput: A preprocessing complete consistent branch B and two individuals a and b.Output: ABox Lyndon Interpolant for a and b with respect to B.Uses: Algorithm 5.3 and 5.1.

LI := unde�ned;for all assertions ((b; d) : R) 2 B

for the set of all formulas fa : 8R:C1; : : : ; a : 8R:Cmg � B

which are universally quantifying over role R for aif partial LI(B;C1 u : : : u Cm; d; neg) existsLI := LI t 8R:partial LI(B;C1 u : : : u Cm; d; neg);

for all assertions ((a; d) : R) 2 B

for the set of all formulas fb : 8R:C1; : : : ; b : 8R:Cmg � B

which are universally quantifying over role R for bif partial LI(B;:C1 t : : : t :Cm; d; pos) existsLI := LI t 9R:partial LI(B;:C1 t : : : t :Cm; d; pos);

if concept LI(Ba;:Bb) existsLI := LI t concept LI(Ba;:Bb);

return LI ;

Notation: a; b; d 2 NI ;LI;C1; : : : ; Cm; Ba; Bb 2 ALC;R 2 NR

From Theorems 5.11 and 5.13 (which is shown to be correct in Appendix C)follows the correctness of the presented algorithms with respect to ABox interpola-tion. Before we sketch how to use the proposed methods for assertion mining andthe acquisition of knowledge we brie y discuss the issue of the most speci�c concept.

5.3.3 Most Speci�c Concepts and ABox Interpolation

If the most speci�c concept exists for two elements a and b the set of ABox inter-polants for a and b is equivalence to the set of interpolants for msc(a) and :msc(b).This can easily be established by the de�nitions of ABox interpolation and therelations ` and `�.

More interesting is the case where the msc does not exist for some individualsa and b. Our algorithms avoid the construction of a representation for the ABoxindividuals and therefore of the most speci�c concepts as it shown in Example 5.1.

Even if there are ABox circles involving positive and negative examples, it is notpossible to construct in�nite chains of contradictions for two elements in ALC. Notehowever, that this approach cannot be extended to description logics with numberrestrictions, because these would allow for dual contradicting in�nite chains.

Consider the following ALCN knowledge base � = (?;A) where A = f(a; a) :

Page 32: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

32 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

Example 5.1 ABox Interpolation in the Absence of a Most Speci�c Concept

Consider the ABox A = f(a; a) : R; a : C; b : 8R:8R::C t 8R::Cg. The ABoxinterpolant ABox LI(A; a; b) according to Algorithm 5.4 is now calculated:

(1) (a; a) : R(2) a : C(3) b : 8R:8R::C t b : 8R::C

ABox LI(A; a; b)= 9R:C u 9R:9R:C

B0 := B [ f(4) b : 8R:8R::Cg

(6) a : 8R::C(7) a : :CABox LI(B0; a; b)= 9R:9R:C

B00 := B [ f(5) b : 8R::Cg

(8) a : :CABox LI(B00; a; b)= 9R:C

u

where (4) and (5) have been created by application of (t) rule on (3); and (6),(7)and (8) by application of a (8) rule on (4) and (5).

R; a : C; (b; b) : R; b : jRj = 1; b : :Cg. There are no most speci�c concepts for aand b, because:

a 2� C u 9R:C u 9R:9R:C u : : :b 2� :C u 8R::C u 8R:8R::C u : : : :

Unfortunately Algorithm 5.5 would not be complete because of the implicit universalquanti�cation for b, which is not easily detected.

6 Interpolation based Assertion Mining

Section 4 provided the algorithmic toolkit for the discerning part of the assertionmining process as it was de�ned in Section 3. Based on these methods we will nowpresent an Algorithm 6.1 for discerning assertion mining. In Section 6.3 we willdiscuss several scenarios of how to use this algorithm in real application.

6.1 Assertion Mining of Discernibility Concepts

We present an algorithmic framework to calculate generalised decision conceptsbased on interpolation and in particular on Algorithm 5.4. The algorithm hopefullymakes more explicit how interpolation can be used to construct GDCs and whichadditional issues are to be addressed. We begin with the presentation of the o�-linecase, i.e. when a classi�ed knowledge base is given and a set of TBox axioms has

Page 33: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

6 Interpolation based Assertion Mining 33

to be learned for the entire ABox at once. Section 6.1.2 explains what happens inthe incremental case, i.e. when the learned concepts have to be updated with newinformation.

6.1.1 Non-Incremental Assertion Mining

The idea for non-incremental assertion mining is simple: input to the system isan ABox A and a DBox D and an empty or unfoldable TBox. Preprocessing ofthe ABox might consist of unfolding, restriction to fewer elements, retrieval of theinstances of each decision, etc. For the concepts in D, a concept Dom describingthe common domain is de�ned. For each decision D in D Algorithm 6.2 is calledwhich returns a total generalised decision concept for each decision (if de�ned).Each total GDC restricts the common domain in a discerning way, Dom is parti-tioned (with respect to the elements on A, the GDCs need not be logically disjoint).The evaluation of the GDCs could comprise statistical analysis, decomposition intominimal subconcepts or consistency checks using description logic reasoning for thenew concepts, but could also involve human expertise. Algorithm 6.1 �nally returnsa TBox T with new axioms which de�ne the decisions in D.

Algorithm 6.1 disc aboxmine(A;D;Dom) Discernibility based Assertion Mining

Input: An ABox A, set of decisions D and a domain concept Dom.Output: A TBox T =

SD2DfD _=GDg, where all GD are total GDCs

Uses: Algorithm 6.2.

A� := preprocess(A);T := ?;for all D 2 D

GDC := decision gdc(D;A�;D);evaluate(GDC);T := T [ fD _=GDC uDomg;

return T ;

Notation: GDC;Dom 2 ALC;D 2 D;

In practice assertion mining is a combination of logical methods based on in-terpolation and further non-logical methods. In Algorithm 6.1 some of these non-logical methods are hinted at in the de�nitions of methods like preprocess(A) andevaluate(GDC). It is beyond the scope of this paper to discuss this issue anyfurther, but to show the potential of the rough set theory based approach w.r.t.vagueness, Algorithm 6.2 also provides the relative support and the accuracy forthe GDCs.

The TBox disc aboxmine(A;D;DOm) resulting from Algorithm 6.1 is correctwith respect to our learning criteria as long as Algorithm 6.2 returns a total gen-eralised decision concept for a decision Di w.r.t. the knowledge base � = (?;A).But this is easy to show. If jupperj = 0, individual gdc(a;Di;A;D) is de�ned for

Page 34: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

34 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

Algorithm 6.2 decision gdc(Di;A;D) GDCs for a Decision

Input: A decision Di, an ABox A and a DBox D.Output: A generalised decision concept GDC for Di with support and accuracy

for each sub{decision.Uses: Algorithm 6.3.

support := 0; notsupport := 0;GDC := unde�ned;for all a 2 class(A)

low := true;if a 2 Di

LIa := individual gdc(a;Di;A;D) with upper = fb j b 62� Di & bS�ag;if jupperj = 0

support:=support [ fag;GDC := GDC t LIa;

else

notsupport:= notsupport [ upper [ fag;return GDC with support: jsupportj � (jsupportj+jnotsupportj)

and accuracy: jsupportj � jclass(A)j;

Notation: a; b 2 NI ;GDC;LIa 2 ALC;Di 2 D; support, notsupport, upper2 2NI ;

all b and it follows easily that both a 2� individual gdc(a;Di;A;D) and b 2� :individual gdc(a;Di;A;D) because ABox LI(A; a; b) is an ABox interpolant for aand b and so is their conjunctions. If jupperj 6= 0, there is a b 62� D such that aS�b.Therefore a 62 S�(D) and no GDC is returned. Since for all elements a 2 S�(D) aGDC is returned, the resulting GDC is total. This implies the following Theorem.

Theorem 6.1 (Correctness of Algorithm 6.1) disc aboxmine(D;A;D) as

calculated by Algorithm 6.1 is a set of TBox axioms for each decision Di based on

total generalised decision concepts with common language and polarity.

6.1.2 Revision of Knowledge Bases with new Individuals.

Based on the algorithms presented in the last sections it is now relatively easyto show that the proposed method is well suited for cumulative mining if newconceptual assertions are added to the ABox. To deal with new information in aconstant domain, exclusiveness was introduced and we will show that this conditionalso ensures that assertions containing new classi�ed elements can be added to theknowledge base without having to redo the mining process from scratch. There aretwo slightly di�erent cases and we use Algorithms 6.3 and 5.3 to o�er appropriatesolutions for each case. Let us assume that the concept LD is a total generaliseddecision concept for D in �. Let �0 denote the knowledge base � extended by somenew assertions.

In the �rst case a new example a is introduced, where a 2�0 D. The algorithm

Page 35: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

6 Interpolation based Assertion Mining 35

Algorithm 6.3 individual gdc(a;Di;A;D) GDCs for an Individual

Input: An Individual a, a Decision Di, an ABox A and a DBox D.Output: A generalised decision concept LIa for Di and a and the elements of

the upper approximation (if jupperj6= 0 then a 62 S�(Di)).Uses: Algorithm 5.4.

LIa = unde�ned;upper = ?;for all b 2 class(A)

if not b 2 Di

for all Dk 2 D; Dk 6= Di

if b 2 Dk

if ABox LI(A; a; b) is de�nedLIa := LIau ABox LI(A; a; b);

else upper:= upper [ fbg;return LIa with upper;

Notation: a; b 2 NI ;LIa 2 ALC;Di; Dk 2 D; upper2 2NI .

corresponds exactly to the non incremental case and the GDC L0D is de�ned asL0D := LD t gdc(a;D;A;D).

The second case is slightly more complicated. Assume �0 now contains a newcounterexample b such that b 62�0 D. The solution is to calculate the conceptL0D :=partial LI(A; LD; b; neg). Since a 2� LD for all a 2 S�(D) we get therequired result by Theorem 3.11 since by de�nition of partial ABox interpolationnow also a 2�partial LI(A; LD; b; neg) and b 2� : partial LI(A; LD; b; neg).

It has to be mentioned that we cheated, because we did not mention that theresult of ABox interpolation can obviously change with new assertions and thealgorithms for revision as described above are not complete. This is because newassertions could imply implicit properties which could only be made explicit usingthe interpolation algorithms all over again. It is out the scope of this paper to discussthe case of the introduction new TBox axioms or the revision of the knowledge baseby elimination of axioms or assertions.

6.2 Optimality

The emphasis in this paper was laid on the presentation of the methods for learningbased on the syntax-based bias criteria and the semantical issue of optimality hasnot been suÆciently addressed. As mentioned before the main di�erence betweenlcs-learning and interpolation learning is that we are not looking for commonalitiesbut for di�erences. We are interested in large discerning concepts instead of smallconcepts inclosing common properties.

The algorithms described in Section 4 are not �ne-tuned to calculate maximalGDCs (with respect to subsumption) and as we will see the full level of optimalitydepends on the order of the application of the rules in the calculus.

Page 36: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

36 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

The generalised decision concepts which are created are relatively big, becausewe disjunctively collect all contradicting positive literals on a closed branch (com-pare the closure rule in Fig. 4 as applied in Algorithm 5.1). But in two cases wealso opted against optimality:

� If a branch is contradicting on positive literals, i.e. if a : Cp 2 B and a : :Cp 2B for some a and C this does not really correspond to (part of) a positiveexample. We choose ? for LIa(B) instead of collecting C for all a : Cp 2 Band a : :Cn, which would also produce an (even larger) interpolant.

� We only collect contradicting literals with contradicting vocabulary on thesame branches, but there might be even bigger interpolants if atoms occurwith the same polarity and the same quanti�er depth on di�erent branches.This could be avoided if the tableau rules were always applied on negativelylabelled formulas �rst.

We conjecture that full optimality over-speci�es the examples and that the typeof sub-optimality which is achieved by the algorithms in Section 4 is suÆcient andappropriate for real applications. Furthermore the construction of the interpolantsfollows more or less the representation of the examples and preserve some of thesyntactical structure of the examples which was one of the main reasons for thechoice of the vocabulary bias in the �rst place.

6.3 Assertion Mining Scenarios

With the more theoretical aspects dominating the discussions so far we might forgetabout the very practical nature of what we are doing and the issues arising. Thebest way to get a better grip for the ideas is to think about how they could be usedin an assertion mining scenario and what further problems might occur.

Before presenting a number of scenarios for the use of interpolation based meth-ods for assertion mining let us remind ourselves about the role that discerningassertion mining plays in the knowledge acquisition process.

If only a subset of the elements of the domain is classi�ed it is very likely thatthe decisions have a common domain which is disjoint from the domain of the non-classi�ed data. In �arr the classi�ed ABox instances are the patients, and in a morerealistic knowledge base it would be even the subset of patients with arrhythmia.

In this simple example we can easily de�ne the domain knowledge by hand.But even in this case we might loose the hierarchy information with respect tonon-classi�ed data. Assume that we had chosen Arrhythmia as the concept de-scribing the domain knowledge for the patients in �arr. If Arrhythmia is not suf-�ciently de�ned to classify all patients as being ill, i.e. if the subsumption relationArrhythmiav:Healthy cannot be established, critical information might be lost.Consider the case where

Tachycardic t 9has pwave.:OK _v:Healthy

Page 37: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

6 Interpolation based Assertion Mining 37

was an axiom of �arr. In this case, the concept

8has pwave.OK u Arrhythmia

would be a generalised decision concept for Tachycardic restricted by the domainconcept Arrhythmia, but the crucial information that patient1 2�arr:Healthy

is lost. This example shows how critical the construction of the domain conceptbecomes and further research in knowledge acquisition has to focus on this particularaspect.

In this technical report we have concentrated on the discerning part of assertionmining. To make the issue about domain and discerning concepts explicit, wepropose several assertion mining scenarios.

Assertion Mining Scenarios

Input: A consistent ABox A classi�ed by a set of decisions D.Output: TBox axioms Di _=LDi

for the decisions Di 2 D

� Assertion mining with least common subsumer: The �rst idea whichsprings to mind is based on least common subsumer learning. If the lcs orapproximations for all the instances of decisions in the TBox (denoted bylcs(D)) are constructed as domain concept, assertion mining then just returnsthe axioms: disc aboxmine(A;D; lcs(D)).

We would conjecture that these new axioms are too speci�c to perform wellon previously unknown ABox instances because the partitioning of the leastcommon subsumers by conjunction with generalised decision concepts willresult in concepts which are in most cases identical to the logical descriptionsof the original examples. A further problem is the necessity for the tediousand imprecise construction of approximations for the most speci�c concepts.

� Assertion mining for completely classified ABoxes: The most simplescenario is of course that the ABox is completely classi�ed. In this case,assertion mining is just the search for discerning properties of the di�erentdecisions an the system should return disc aboxmine(A;D;>).

� Completing ABoxes: There is a very simple solution for assertion miningfor incompletely classi�ed ABox which is based on the previous case and oncompletion of the classi�cation. Let D = fD1; : : : ;Dng be the DBox for asub-domain of the ABox. In a preprocessing step we label each non-classi�edinstance of the ABox with the new atomic decision Unclassified.

The common domain concept for D is now any total generalised decisionconcept for the decisionD := D1t: : :tDn with respect to A and the new DBoxD0 = fD,Unclassifiedg as returned by the procedure decision gdc(D;A;D0)in Algorithm 6.2. The procedures for assertion mining should return the set

Page 38: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

38 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

of TBox axioms: disc aboxmine(A;D;decision gdc(D;A;D0)) as returnedby Algorithm 6.1.

The crucial problem with this approach is the size of the ABox. If the ABox isvery large, completion might become infeasible because the calculation of thecommon domain concept using the Algorithms in Section 4 now means a fullyexpanded tableau for each pair of classi�ed individuals. For this approachto be feasible, computational optimisations have to be implemented, whichcorresponds to the calculation of non-optimal generalised decision concepts(compare Section 6.2).

Even if this approach was feasible, it might well be that the di�erences be-tween classi�ed and non-classi�ed individuals is so obvious to the knowledgeengineer, that there are no discerning properties to be found (see e.g. theconcepts Small and Arrhythmia im �arr.

� Mining decision for hierarchies: Assume that the decisions in a DBox Dfor a given knowledge base � are ordered using subsumption as linear order.We assume that it is possible to calculate such a hierarchy using a procedureconcept hierarchy(D;�). Furthermore we assume that there is a domainconcept for the top element of the domain. The learning task is now to re�nethis hierarchy by calculating generalised decision concepts for each group ofsuccessors of a concept. There are two interesting alternatives:

{ Starting from the top element Top of the hierarchy, we �rst calcu-late the direct successors Succ1; : : : ; Succn. D = fSucc1; : : : ; Succngis now a set of decisions with a common domain concept Top. Wecan now calculate generalised decision concepts and TBox axioms with:disc aboxmine(A;D; T op). This process can be repeated recursively foreach successor.

{ Mining equivalence classes of decision instances: instead of de�ning ap-proximations for decisions, the set of classi�ed individuals in an ABoxcould be factorised w.r.t. to an instance relation for decisions, i.e. aSDbi� 8D 2 D : a 2� D , b 2� D. Note that this case is more restrictivebecause it produces hidden hierarchies and terminologies are needed torepresent the relations between the equivalence classes.

Please note that this case is not related to the fact that there might be hiddenhierarchies in the data of decisions. In this case the procedure decision gdc

will return with the value unde�ned, because all elements are in the upper-and none in the lower approximations of the related decisions. A detaileddiscussion of this issue is beyond the scope of this technical report and is verymuch open research.

The above list is just a simple collection of ideas and there will be more appli-cations of interpolation based assertion mining. Before we conclude, we will brie y

Page 39: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

7 Conclusion 39

discuss some other issues related to the practical aspects which arose from thesescenarios.

7 Conclusion

We presented assertion mining as an important component for knowledge acquisitionin hybrid knowledge representation systems based on description logics. For a givenABox we identi�ed generalised decision concepts as promising generalisations forclasses of ABox instances. By restricting the hypothesis space to concepts withdiscerning polarity the discerning character of GDCs is ensured. In logical termsthese concepts are Lyndon interpolants for positive and negative examples.

The novelty of our approach is to identify assertion mining as a logical inter-polation problem. Following the rough set approach discernibility of individualsbecomes our crucial inductive bias. If this notion is extended to description log-ics traditional covering problems correspond to the search for Lyndon interpolants.This new idea allows well known machine learning methods to be extended to veryexpressive logical languages.

It has to be mentioned that the criteria we de�ne are very strong and tailoredto provide secure and cumulative mining. The price we have to pay is the failure torecognise underlying hidden decision hierarchies, in fact any related individuals indi�erent decisions are excluded due to exclusiveness. The second important issue isthat assertion mining based on discernibility focuses on the di�erences, not on thecommon properties of the elements of di�erent decisions and the domain knowledgecommon to all decisions is not automatically recognised. In Algorithm 6.1 theseproblems are hidden in the procedure evaluate and by de�ning the domain concept\by hand". Any real knowledge acquisition tool for hybrid knowledge bases mustaddress all three of these problems and combinations seems to be crucial. For thisreason we presented some assertion mining scenarios.

At King's College London we have recently implement Wellington's Katwhich extends the hKRS Wellington with assertion mining for ALC(D). It cur-rently provides interpolation based assertion mining methods for discernibility learn-ing in JAVA and is planned to be extended to a full scale knowledge acquisition andreasoning system in the near future.

Page 40: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

40 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

A Rough Set Theory for Data Analysis

Data Mining can be de�ned as the application of algorithms to recognise patterns indata. It can be seen as part of the Knowledge Discovery process, the process de�nedas the \nontrivial extraction of implicit, previously unknown, and potentially usefulinformation from data" [Mollestad, 1997].

Rough set theory was created by Pawlak in the early eighties [Pawlak, 1982] andhas since then become very popular. One of its applications is data mining. Wewill present the basic ideas in a very compact way following Bazan [Bazan, 1998].There is a good introduction into rough set theory by Pawlak [Pawlak, 1998].

We break down the description of rough set data mining techniques into twoparts, the description of rough set approximation and of rough analysis. Roughanalysis is the classi�cation of individuals into equivalence classes, calculation ofreducts (or dependencies and keys) and the deduction of decision rules.

Rough Set Theory.

Pawlak Approximation Space (PAS) Let R be an equivalence relation on auniverse U . The R�lower and R�upper approximation of a subset X � U of Uare:

RX = fx 2 U j [x]R � Xg andRX = fx 2 U j [x]R \X 6= ;g:

An approximation for X through R is the pair appR(X) = (RX;RX). TheR-Boundary (R-Outside) region of X is de�ned as boundR(X) = RX � RX(outR(X) = U � RX). A set is said to be rough (resp. crisp) if the boundaryregion is non-empty (resp. empty).

Relational Approximation Space(RAS). Pawlak de�ned rough sets with re-spect to equivalence relations. We will see in the following that the is a naturalgeneralisation, where the relational is not restricted. In our case we will mostlywork with tolerance relations.

Knowledge Representation for Rough Set Data Analysis

One mayor di�erence of the various machine learning approaches is the way, knowl-edge is represented. We de�ne three languages, which are all interconnected throughthe use of vocabulary, linked through simple translations.

Input Representation: Decision Systems. A decision system D is a tuple(U;C [fdg), where U is a �nite (non-empty) set of individuals, called the universe,C a set of condition attributes and d 62 C an attribute called the decision attribute.For all attributes a 2 C[fdg, such that a : U ! Va, Va is called the value set of theattribute a. The cardinality of the image a(U) = fk j d(u) = k for some u 2 Ug iscalled the rank of a and is denoted by r(a).

Page 41: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

A Rough Set Theory for Data Analysis 41

Output Representation: Decision Rules. The output representation lan-guage LD for rough analysis is based on a decision system D = (U;C t fdg) and isde�ned in the following way: for the set of condition attributes C = fa1; : : : ; ang,where each vi 2 Vai is the value for attribute ai, LD contains (ai = vi). We willrefer to these formulas as atoms. Furthermore LD is closed under the usual booleanoperators disjunction _, conjunction ^ and negation :.

We de�ne an interpretation ID for formulas of LD with respect to a decisionsystem D such that (a = v)ID = fo 2 U j a(o) = vg and extend it in the standardinductive way:

� (� ^ )ID = (�)ID \ ( )ID

� (� _ )ID = (�)ID [ ( )ID

� (:�)ID = U n (�)ID .

We will omit the brackets whenever no confusion is likely to occur. We will use theusual abbreviation �! � :�_ . For a decision attribute d and a decision valuevd 2 Vd, decision rules are of the form

� =) (d = vd);

where � 2 LD. The interpretation is trivially extended for the decision d such that(d = vd)

ID = fo 2 U j d(o) = vg. An individual o is matched by a decision rulei� o 2 �ID . The rule is correct with respect to a decision system D i� �ID �(d = vd)

ID . A decision rule is basic if � is a conjunction of atoms. If the number ofconjuncts is minimal, i.e. there is no conjunctive formula such that =) (d = vd)is correct and where the conjuncts of are a proper subset of the conjuncts of �,the rule is called an optimal basic rule. This is to say that if you take away oneconjunct from an optimal decision rule it is not correct any more.

The set of atomic sub-formulas sub(� of a negation free formula � is de�nedinductively as sub(�) = (a = v) if � = (a = v) and sub(� _ ) = sub(� ^ ) =sub(�) [ sub(�). A formula � is optimal i� for all 2 LD : ! �.

Intermediate Representation: Discernibility function. To allow for inter-mediate calculation we de�ne a language Lf for discernibility functions. For eachattribute a 2 C [ fdg there is a boolean variable �a in Lf . Furthermore Lf is closedunder conjunction ^, disjunction _ and negation :.

The intermediate representation is closely linked to the output representation.In fact it has equivalent logical structure, the only di�erence being the logical atoms.Atoms in Lf are boolean variables representing attributes, atoms in LD are attributevalue pairs. We de�ne a translation function (�)tf (u) which takes a formula of Lfand translates it into a related formula in LD depending on the individual u 2 U .

Definition A.1 Let (�)tf (u) be the following translation from Lf to LD for an

individual u 2 U :

Page 42: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

42 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

(�a)tf (u) = (a = v) if a(u) = v.

(� ^ )tf (u) = (�)tf (u) ^ ( )tf (u)

(� _ )tf (u) = (�)tf (u) _ ( )tf (u)

(:�)tf (u) = :(�)tf (u)

We might again omit the brackets. This translation will be used in De�nitionA.2 to link rough approximations with a constructive method to calculate optimaldecision rules.

Rough Set Data Analysis

A well known application of Rough Set theory is in Data Analysis, where the ap-proximation space is de�ned using discernibility relations derived from decisionsystems. For these approximations decisions rules are created, which can be usedfor the classi�cation of previously unknown data.

Rough Approximations through Discernibility

A discernibility relation is de�ned using equivalence of attribute values for attributesin the set of condition attributes or the decision d of a decision system.

Discernibility. With an attribute a we associate the equivalence relation Ra

(indiscernability) on U so that uRav if and only if a(u) = a(v). The collectionU=Ra = f[u]Ra j u 2 Ug is called a classi�cation. With a set of attributes A weassociate an equivalence relation RA on U so that uRAv if and only if for all a 2 A,a(u) = a(v). Again, the collection U=RA = f[u]RA j u 2 Ug is a classi�cation.For any equivalence class E 2 U=RC , a(E) = v abbreviates that a(e) = v for anyattribute a 2 C and element e 2 E. Similarly, d(E) = v abbreviates that d(e) = vfor the decision d in an equivalence class E 2 U=Rd.

Approximations. The approximations appRC (�) = (RC�; RC�) for a givendecision system D and for each set � 2 U=Rd are de�ned for � as expected:

RC(�) = fa 2 U j 8b 2 U : aRCb) b 2 �gRC(�) = fa 2 U j 9b 2 U : aRCb and b 2 �g

and can easily be calculated in polynomial time [Guan and Bell, 1998].

Rough Analysis.

The overall aim of rough analysis is to de�ne a set S(�) of LD formulas for eachequivalence class � 2 U=Rd which satisfy the following condition for all elementsu 2 U : there is formula � 2 S(�) such that u 2 �ID i� u 2 RC(�), i.e. S is theset of formulas which match all and only non-noisy data. These formulas can beused to create decision rules. Furthermore a learning bias is required to restrict thehypothesis space.

Page 43: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

A Rough Set Theory for Data Analysis 43

Reducts and Decision Rules. The approximations are the starting point forthe calculation of generalisations of the decisions. LetD = (U;C[fdg) be a decisionsystem. Reducts are minimal sets of attributes containing just enough informationto preserve the classi�cation of the individuals in D. Formally, a reduct of D foran equivalence class E0 2 U=RC with an element e0 is the smallest set of attributesC 0 � C such that

fv j 9e (eRCe0 & d(e) = v)g = fv j 9e (eRC0e

0 & d(e) = v)g:

One can use reducts to compute optimal basic decision rules [Skowron, 1993]. IfRed = fa1; : : : ; amg is a reduct and if the rule

(a1 = v1) ^ (a2 = v2) ^ : : : ^ (am = vm) =) (d = vd)

is correct, it is also an optimal basic decision rule.

Discernibility Matrix Method. The Discernibility Matrix Method was devel-oped by Skowron to calculate reducts. For each pair of individuals ei and ej thematrix contains the set of attributes which can discern ei from ej . From this matrixa boolean discernibility function f(D; k can be calculated for each equivalence classEk. Formally, the discernibility matrix modulo d of D is M(D) = (cij) assumingcij = fa 2 C j a(ei) 6= a(ej)g if d(ei) 6= d(ej) for ei 2 Ei and ej 2 Ej for allEi; Ej 2 U=RC and cij = ? otherwise. The k-relative discernibility function for anequivalence class Ek 2 U=RC is de�ned as

f(D; k) =^

i 6=k;cik 6=?

0@ _a2cik

a

1A

where cik = fa j a 2 cikg and a is a Boolean attribute in Lf corresponding to a.Skowron goes on to show that from every prime implicant of f(D; k) a reduct for

Ek and therfore optimal basic decision rules can be constructed. The constructionof the boolean discernibility function however already proves the correctness of anyrule constructed from it. In an intermediate step we want to make this result moreexplicit. This separation allows for a clearer distinction between the logically soundanalysis according to discernibility criteria and the inductively bias reduction toobtain syntactic optimality.

Discernibility for Rule Induction. For each equivalence class � 2 U=Rd withrespect to the decision attribute d we now construct a boolean discernibility function(which is a formula in LD) and collect them in the set D(�).

Definition A.2 (Discernibility functions for decisions) For each equiva-

lence class � 2 U=Rd the set

D(�) = f� 2 LD j 9Ek 2 U=RC : Ek � RC�

such that � = f(D; k)tf (e) for an e 2 Ekg

Page 44: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

44 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

contains the k-relative discernibility functions for the equivalence classes which are

subsets of the equivalence class �.

To exclude noisy data only elements in the lower approximation are considered.The following lemma, which easily follows by construction of D(�) shows that onlyindividuals in RC(�) are matched by any rule constructed using a formula in D(�).This implies that these rules already make statements about safe and non-noisy dataonly.

Lemma A.3 Let D = (U;C [ fdg) be a decision system. For any individual u 2 Uand equivalence class � 2 U n Rd:

u 2 D(�) i� u 2 RC(�) (1)

This lemma constitutes the basis of rough set data analysis, because it re-lates sets of concepts with the reliable and secure part of the data. In traditionalRough Set Data Analysis it is computationally easy to calculate approximations[Guan and Bell, 1998] and the construction of boolean functions and reducts canthus be reduced to the equivalence classes E � RC(�). This is not the case inassertion mining, where the approximations are diÆcult to �nd. Lemma A.3 showshowever that already the construction of f(D; k) prevents unsafe and rough datato in uence the construction of decision rules. The remaining step is to performan inductive leap, assuming that if Equation (1) holds for the available non-vagueand secure data, it will also hold for new and unknown data (inductive learninghypothesis [Mitchell, 1997] p.23).

Note that this result does not mean that D(�) is the set of all formulas withthis property, i.e. there might be a formula � 62 D(�) such that u 2 RC(�) andu 2 �. We will use rest of this section to show that D(�) is a correct, but necessarilyincomplete set, and we will give two interpretations of the chosen bias.

Correctness of Rules. Let us show that D(�) is correct in the sense that if aformula is in D(�) is can be used to form a correct decision rule.

Proposition A.4 Given an decision system D. Let Ek 2 U=RC be an equivalence

class with respect to RC . If the value of the decision attribute is unique, i.e. d(e) = vfor all e 2 Ek, we know that for all u 2 U :

u 2 (f(D; k)tf (ek))ID implies (2)

u 62 (d = vj)ID for all vj 6= v which is equivalent to (3)

u 2 (d = v)ID (4)

where f(D; k) is the k-related discernibility function and ek 2 Ek.

Proof ( 3) follows by construction of f(D; k) from (2). u 2 (f(D; k)tf (ek))ID i�for all equivalence classes Ei where i 6= k and cik 6= ; and elements ei 2 Ei there

Page 45: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

A Rough Set Theory for Data Analysis 45

is an attribute a 2 C such that d(ek) 6= d(ei) implies that a(ei) 6= a(ek), i.e. ifthe decision attribute is di�erent for two individuals of Ei and Ek, there must bean attribute a discerning the two individuals. Furthermore u 2 (a = v)ID whichimplies that a(u) = v.

Now assume that condition (3) holds, i.e. that there is a decision value vj 2 Vdsuch that u 2 (d = vj)

ID . Therefore u 62 Ek (because the individuals in Ek haveunique decision values) and let us assume u 2 E0

i in an equivalence class E0i. But

this is a contradiction, because we have both a(u) 6= a(ek) (because d(u) 6= d(ek))and a(u) = v = a(ek).

(4) is equivalent to (3) because there is exactly one value for the decision attributefor each individual.

This proposition shows that for any decision system inconsistency with coun-terexamples implies correctness of the rules constructed from the k-relative discerni-bility function.

From this proposition, Lemma A.5 follows immediately, because � 2 D(�)implies that there exist an Ek � RC(�) and � = f(D; k)tf (e) for an e 2 Ek.

Lemma A.5 For any equivalence class � 2 U=Rd where d(�) = v:

� 2 D(�) implies that the rule � =) (d = v) is correct.

Unfortunately the other direction does not hold. Take the following very simpledecision system D with just two individuals o1 and o2, where a(o1) = b(o1) =b(o2) = > and a(o2) = ? where d(o1) = 1 and d(o2) = 2.

a b d

o1 > > 1o2 ? > 2

Now f(D; 1)tf (o1) = (a = >). But for � = (a = >) _ (b = ?) we also haveo1 2 �ID and o1 2 (d = 1)ID and o2 62 (f(D; 1)tf (o1))ID . This implies that the rule� =) (d = 1) is correct in D, but � 62 D([o1]Rd).

The reason for this problem, that whatever attribute value pair is inconsistentwith the negative examples can be added disjunctively to �, even though theremight be no positive example to support it.

Optimal Basic Decision Rules. The solution in several rough set data analysissystems follows Skowron and restricts the syntactic structure of decision rules.

It has be shown that the set of all prime implicants of f(D; k) determines theset of all k-relative reducts [Skowron, 1993] of D. Optimal decision rules are usuallycreated for all equivalence classes E � RC� for any � 2 U=Rd. Let us de�ne theseresults more formally.

Page 46: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

46 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

Definition A.6 The set of prime implicants for D(�) is

PI(�) =[

�2D(�)

PI(�);

where PI(�) denotes the set of prime implicants for the formula �. We say that an

individual a 2 D(�) (or a 2 PI(�)) if and only if there is a formula � 2 D(�)(� 2 D(�) respectively) and a 2 �ID for u 2 �.

Skowron's classical result in [Skowron, 1993] states the fact that the prime im-plicants of the discernibility functions correspond to the set of reducts, which corre-sponds to the following theorem linking optimal decision rules with the set PI(�).

Theorem A.7 For any equivalence class � 2 U=Rd where d(�) = v

� 2 PI(�) i� the rule � =) (d = v) is optimal.

Algorithms. Algorithm RDM just gives an overview over the order of applicationof the above de�nitions for a practical realisation. RDM calculates a set of optimaldecision rules for a given decision system using rough set data mining techniques.

Algorithm A.1 RDM: Algorithms for Data Mining with Rough Sets.

Input: Decision System D = (U;C [ fdg).Output: A set of optimal decision rules for d.

U=Rd = find decision equivalence classes;U=RC= find condition equivalence classes;rules:= ?

for all � 2 U=Rd

RC(�) get lower approximations (Æ; U=RD);

RC(�) get upper approximations (�; U=RC);for all equivalence classes Ek 2 RCfdg

Md(D) = get decision matrix modulo k(D);f(D; k) = get discernibility function(Md(D));primeimplicants := prime implicants(f(D; k));reducts := get reducts(primimplicants);rules := rules [ get optimal decision rules(reducts);

return rules;

B Arrhythmia

To justify our approach we want to introduce an example from the medical domainwhich is taken from the UCI Machine Learning Repository [Blake and Merz, 1998].

Page 47: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

B Arrhythmia 47

It is called the Arrhythmia Database as described in [G�uvenir et al., 1997]. The aimis to distinguish between the presence and absence of cardiac arrhythmia and toclassify it automatically in one of 16 groups. There are 452 examples, 279 attributes,of which 206 are numeric. The examples are classi�ed by experts.

An Electrocardiogram (ECG) is generated by electrical activity that accompa-nies muscle contraction; it is recorded from the body surface with various electrodecon�gurations. Characteristic waves seen on a typical ECG from a healthy personinclude di�erent types of waves, e.g., p waves representing atrial depolarisation,the QRS complex representing ventricular contraction and the T wave representingventricular repolarization.

An electrocardiographic lead is a recording electrode or a pair of recording elec-trodes at a speci�ed location. In clinical practice twelve leads are usually used indiagnostic ECG. They are placed on wrists and ankles. A "typical" ECG tracing isillustrated in Figure 6 [Yanowitz, 1997].

Figure 6: ECG Intervals and Waves

The data consists of data about patients (age, sex, . . . ) and of recordings froman ECG. For each of the 12 leads the average width and amplitude of the Q- ,R- ,S-,p- and t-wave are given, the number of intrinsic de ections and boolean values forthe existence of ragged or diphasic derivations for the R-, p- and t-wave. In Fig. 7we present excerpts from a typical Arrhythmia Identi�cation Procedure as it canbe found on medical information web-sites like [Rathe, 1997].

The given procedure has been devised using the experience of human medicalexperts only. Given the amount of data available for each patient it is highlydesirable to investigate more complicated interrelations between di�erent attributesand their values.

Page 48: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

48 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

Rate less than 60 beats/min ! Bradycardicgreater than 100 beats/min ! Tachycardic

Rhythm regular regular and P with every QRS/slow! supraventricular bradycardias

regular regular and P with every QRS/fast! supraventricular tachycardias

P{Wave not present and narrow QRS/irregular ! atrial �brillationnot present and wide QRS/fast ! ventricluar tachycardias. . .

Figure 7: A typical Arrhytmia Identi�cation Procedure

C Proofs

Proof of Proposition 3.9 a 2� X implies that a 2� D and that for allx 2 class(A):

x 2 A : x 62A D ) x 2A :X: (5)

Assume now that a 62 S�(D), i.e. there must be a classi�able individual b0 62A Dwhich is indiscernible from a, i.e. for all concepts C:

a 2A C ) b0 62 :C: (6)

But b0 62A D implies b0 2A :X (Equation 5) and b0 62A :X follows from a 2� Xbecause of Equation 6, which is a contradiction. Therefore a 2 S�(D).

Proof of Proposition 3.10 For an instance a 2 S�(D) we construct a conceptX 2 G(D) such that a 2� X. Let disc(a; b) = fC 2 DL j 9C : a 2� C & b 2� :Cgbe the set of all concepts discerning a and b.

Now assume that X 62 G(D) for all X, i.e. that there exists a b0 2 A such thatb0 62� D and b0 62� :X(�). But for all b 2 A we know that b 62� D implies that thereis a concept C 0 2 DL such that b 2� :C 0. It follows that disc(a; b) 6= ? becauseC 0 2 disc(a; b).If we now choose

X =l

b 2 class(A)disc(a; b) 6= ?

some C 2 disc(a; b)

C

where we have chosen C 0 2 disc(a; b0) we derive a 2� X, because a 2� C for allC 2 disc(a; b).

But then we know that � j= X v C 0, which implies � j= :C 0 v :X by contra-position and from b0 2� :C follows b0 2� :X immediately, which contradicts (�).

Page 49: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

C Proofs 49

Proof of Theorem 5.4 The proof that obj rel LI(a; f(a : P )p; (a : :N)ng)is an interpolant for P and N is by induction. We show that if obj rel LI(a;B0)(obj rel LI(a;B00)) is an o{interpolant for B0 (for B00) and B0 (and B00) result fromthe application of a tableau rule, then obj rel LI(o0; B) is an o0{interpolant for B.

� It is easy to verify for the leaves of each closed branch that obj rel LI(a;B)is indeed an o{interpolant for B. There are three simple cases.

1. If a branch B of a fully expanded tableau closes for an individual o witha contradiction between two negatively labelled atoms > is an o-relatedinterpolant for B.

2. if it closes on two positive atoms ? is an o{interpolant for B.

3. If positively labelled formulas close a branch with non-positively labelledliterals only, each literal and the disjunction of literals atoms are indeedLyndon interpolants.

� The only interesting propositional case is the t-rule. Let obj rel LI(a;B0)and obj rel LI(a;B00) be o{interpolants for B0 and B00.

{ Assume that the t-rule was applied on a formula (a : C tD)n 2 B. Letf(a : C1)

p; : : : ; (a : Cmp)p; (a : D1)

n; : : : ; (a : Dmn)n; (a : C tD)ng be the

set of all o related formulas in B. It then follows that obj rel LI(a;B0)uLI(a;B00) is a o-interpolant because:

C1 u : : : u Cmp v obj rel LI(a;B0) v :D1 t : : : t :Dmn t :C andC1 u : : : u Cmp v obj rel LI(a;B00) v :D1 t : : : t :Dmn t :Dimply thatC1 u : : : u Cmp v obj rel LI(a;B0) u obj rel LI(a;B0)

v :D1 t : : : t :Dmn t :(C tD):It is also easy to verify thatocc(obj rel LI(a;B0) u obj rel LI(a;B0)) �

occ(C1 u : : : uCmp) \ occ(D1 u : : : u :Dmn u :C u :D);because bothocc(obj rel LI(a;B0)) �

occ(C1 u : : : u Cmp) \ occ(D1 u : : : u :Dmn u :C) andocc(obj rel LI(a;B00)) �

occ(C1 u : : : uCmp) \ occ(D1 u : : : u :Dmn u :D):

{ Assume that the t-rule was applied on a formula (a : C tD)p 2 B. Letf(a : C tD)p; (a : C1)

p; : : : ; (a : Cmp)p; (a : D1)

n; : : : ; (a : Dmn)ng be the

set of all o related formulas in B. It then follows that obj rel LI(a;B0)t

Page 50: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

50 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

LI(a;B00) is a o-interpolant because:

C u C1 u : : : u Cmp v obj rel LI(a;B0) v :D1 t : : : t :Dmn andD u C1 u : : : u Cmp v obj rel LI(a;B00) v :D1 t : : : t :Dmn

imply thatC uD u C1 u : : : u Cmp v obj rel LI(a;B0) u obj rel LI(a;B0)

v :D1 t : : : t :Dmn :

The rest of the argument is similar to the one before.

� It remains to show that the (9)-rule preserves correctness. Let fDp1 ; : : : ;D

pi g

(respectively fDni+1; : : : ;D

nmg) be the set of concepts which occur positively

(negatively) labelled in the scope of R in the set f(a : 8R:D1)x1 ; : : : ; (a :

8R:Dm)xmg.

{ Assume B0 was created by application of the 9{rule on a positively la-belled formula (a : 9R:C)p in B.

We have to show that 9R:obj rel LI(b;B0) is an a{interpolant for B ifobj rel LI(b;B0) is a b{interpolant for B0. We know that

C uDp1 u : : : uD

pi v obj rel LI(b;B0) v :Dn

i+1 t : : : t :Dnm

which impliesPRest u 9R:C u 8R:Dp

1 u : : : u 8R:Dpi v

9obj rel LI(b;B0) v9R::Dn

i+1 t : : : t 9R::Dnm t :NRest;

where PRest (NRest) is the conjunction of all remaining assertionsover a. If obj rel LI(a;B0) exists, it is an a{interpolant for B0 andtherefore:PRest u 9R:C u 8R:Dp

1 u : : : u 8R:Dpi v

obj rel LI(a;B0) t 9R:obj rel LI(b;B0) v9R::Dn

i+1 t : : : t 9R::Dnm t :NRest;

In this case we can also show that:occ(obj rel LI(a;B0) t 9R:obj rel LI(b;B0)) �

occ(PRest u 9R:C u 8R:Dp1 u : : : u 8R:D

pi )

\ occ(8R:Dni+1 t : : : t 8R:D

nm tNRest)

because bothocc(obj rel LI(b;B0))R �

occ(C uDp1 u : : : uD

pi )R \ occ(:Dn

i+1 t : : : t :Dnm)

R andocc(obj rel LI(a;B0)) �

occ(PRest u 9R:C u 8R:Dp1 u : : : u 8R:D

pi )

\ occ(8R:Dni+1 t : : : t 8R:D

nm tNRest):

{ Assume B0 was created by application of the 9{rule on a negativelylabelled formula (a : 9R:C)n in B.

Page 51: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

C Proofs 51

We have to show that 8R:obj rel LI(b;B0) is an a{interpolant for B ifobj rel LI(b;B0) is a b{interpolant for B0. We know that

Dp1 u : : : uD

pi v obj rel LI(b;B0) v :Dn

i+1 t : : : t :Dnm t :C

which impliesPRest u 8R:Dp

1 u : : : u 8R:Dpi v

8obj rel LI(b;B0) v8R::Dn

i+1 t : : : t 9R::Dnm t 9R::C t :NRest;

where PRest (NRest) is the conjunction of all remaining assertionsover a. If obj rel LI(a;B0) exists, it is an a{interpolant for B0 andtherefore:PRest u 8R:Dp

1 u : : : u 8R:Dpi v

obj rel LI(a;B0) t 8R:obj rel LI(b;B0) v8R::Dn

i+1 t : : : t 9R::Dnm t 9R::C t :NRest;

The language argument is identical to the positive case.

Proof of Theorem 5.5 We prove this lemma by showing that if there is a Lyndonconcept interpolant obj rel LI(a; f(a : P )p; (a : :N)ng) exists. We thus just haveto consider the case when obj rel LI(a; f(a : P )p; (a : :N)ng) is unde�ned andshow that in this case P and N do not have interpolants. But this is trivial,because obj rel LI(a; f(a : P )p; (a : :N)ng) is only unde�ned if there is an openbranch for the tableau starting with (a : P )p; (a : :N)n which implies that P 6v N .

Proof of Theorem 5.9 The theorem consists of two separate parts which wewill prove separately.We �rst prove the case for L := partial LI(B;C; d; pos) with positive polarity, i.e.we show that L is a positive partial ABox interpolant for a branches B, a conceptC and an individual d. We have to show that

1. d 2B L

2. L v C

3. occ(L) � occA(d) \ occ(C)

The proof is by induction over the four cases:

� A (u) rule has been applied on a formula (c : C1 u C2) 2 B:

L = partial LI(B;C; d; pol)

Properties (1) and (2) follow immediately from the hypothesis:

d 2B0 partial LI(B;C; d; pol) & partial LI(B0; C; d; pol) v C

Page 52: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

52 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

because soundness and completeness of the rules with respect to satis�abilityimply that d 2B0 D i� d 2B D for all concepts D 2 DL.

Property (3) follows because occB(d) = occB0(d).

� A (t) rule has been applied on a formula (c : C1 t C2) 2 B:

L = partial LI(B0; C; d; pol) t partial LI(B00; C; d; pol)

Property (1) follows because d 2B0 partial LI(B0; C; d; pol) and d 2B00partial LI(B00; C; d; pol) imply that

d 2B partial LI(B0; C; d; pol) t partial LI(B00; C; d; pol):

Property (2) follows immediately from the second pair of hypothesis:

d 2B0 partial LI(B0; C; d; pol) & d 2B00 partial LI(B00; C; d; pol);partial LI(B0; C; d; pol) v C(�) & partial LI(B00; C; d; pol) v C:(��)because (�) and (��) imply:partial LI(B0; C; d; pol) t partial LI(B00; C; d; pol) v C:

Property (3) follows trivially from the construction of the branch B0, becauseoccB(d) = occB

0(d) = occB

00(d).

� A (8) rule has been applied on a formula (c : 8R:D) 2 B and (c; b) : R 2 B:

L = partial LI(B0; C; d; pol)

Properties (1) and (2) follow immediately from the hypothesis:

d 2B0 partial LI(B0; C; d; pol) & partial LI(B0; C; d; pol) v C:

Property (3) is only interesting if b = d. Then occB[fd:Dg(d) =occB(d) [ occ(D) but since both (b; d) : R 2 B and b : 8R:D 2 B alsoocc(D) � occB(d) by de�nition of occA in De�nition 4.9.

� No more rule can be applied: L = concept LI(Bd; C). We know thatBd v L v C because L is a concept interpolant for Bd and C. This im-plies Property (1), i.e. d 2B L, because d 2B Bd by Proposition 5.7. Property(2) follows directly. Property (3) follows because occ(L) � occ(Bd) \ occ(C)and occB(d) � occ(Bd) because B is preprocessing complete.

We now show that L := partial LI(B;C; d; neg) is a negative partial ABox in-terpolant for a branches B, a concept C and an individual d. We have to showthat

Page 53: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

C Proofs 53

1. d 2B :L

2. C v L

3. occ(L) � ~occA(d) \ occ(C)

The proof is done by induction over the four cases:

� A (u) rule has been applied on a formula (c : C1 u C2) 2 B:

L = partial LI(B;C; d; pol)

Properties (1) and (2) follow immediately from the hypothesis:

d 2B0 :partial LI(B;C; d; pol) & C v partial LI(B0; C; d; pol)

because soundness and completeness of the rules with respect to satis�abilityimply that d 2B0 :D i� d 2B :D for all concepts D 2 DL.

Property (3) follows because ~occB0(d) = ~occB(d).

� A (t) rule has been applied on a formula (c : C1 t C2) 2 B:

L = partial LI(B0; C; d; pol) u partial LI(B00; C; d; pol)

Property (1) follows because d 2B0 :partial LI(B0; C; d; pol) and d 2B00:partial LI(B00; C; d; pol) imply that

d 2B :partial LI(B0; C; d; pol) t :partial LI(B00; C; d; pol):But this equivalent to:d 2B :(partial LI(B0; C; d; pol) u partial LI(B00; C; d; pol)):

Property (2) follows immediately from the second pair of hypothesis:

C v partial LI(B0; C; d; pol) & C v partial LI(B00; C; d; pol):

Property (3) follows again because ~occB00(d) = ~occB

0(d) = ~occB(d).

� A (8) rule has been applied on formulas (c : 8R:C) 2 B and (c; d) : R 2 B.:

L = partial LI(B0; C; d; pol)

Properties (1) and (2) follow immediately from the hypothesis:

d 2B0 partial LI(B0; C; d; pol) & partial LI(B0; C; d; pol) v C:

Property (3) follows from the same argument as in the positive case.

Page 54: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

54 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

� No more rule can be applied: L = concept LI(Bd; C). We know that C vL v :Bd because L is a concept interpolant for C and Bd. This impliesProperty (1), i.e. d 2B :L, because d 2B Bd by Proposition 5.7. Property(2) follows directly. Property (3) follows because occ(L) � occ(:Bd)\ occ(C)and ~occB(d) � occ(:Bd) because B is preprocessing complete.

Proof of Lemma 5.12 Let LI(a; b) abbreviate the concept propagate LI(B; a; b).To show that LI(a; b) is an ABox Lyndon Interpolant for B, we have to show thefollowing three properties:

1. a 2B LI(a; b)g

2. b 2B :LI(a; b)g

3. occ(LI(a; b)) � occA(a) \ occA(b)

The proof is an induction over the three possible cases

� LI(a; b) = 8R:partial LI(B;C1 u : : : u Cm; d; neg) has been constructed be-cause fa : 8R:C1; : : : ; a : 8R:Cmg � B and ((b; d) : R) 2 B.

Now L :=partial LI(B;C1 u : : : u Cm; d; neg) is by Theorem 5.9 a negativepartial Lyndon interpolant for the branch B, a concept C1 u : : :uCm and theindividual d. This is by De�nition 4.11 equivalent to: C1 u : : : u Cm v L(�)and d 2B :L(��). This proves the three properties because:

1. a 2B 8R:(C1 u : : : uCm) and 8R:(C1 u : : : uCm) v 8R:L (which followsfrom (�)) implies that a 2B 8R:L.

2. B [ fb : 8R:Lg is inconsistent because d 2B :L (��) and (b; d) : R 2 B.

3. We know that occ(L) � ~occB(d) \ occ(C1 u : : : u Cm). Furthermore~occB(d)R � ~occB(b) because (b; d) : R 2 B and occ(C1 u : : : u Cm)

R �occB(a) because fa : 8R:C1; : : : ; a : 8R:Cmg � B. This implies that

occ(C1 u : : : u : : : Cn)R \ ~occB(d)R � occB(a) u ~occB(b)

Induction hypothesis ensures that occ(LI) � occB(a) \ ~occB(b),so that occ(LI(a; b)) = occ(L)R [ occ(LI) implies occ(LI(a; b)) �occA(a) \ ~occA(b).

� LI(a; b) = 9R:partial LI(B;:C1 t : : : t :Cm; d; pos) has been constructedbecause fb : 8R:C1); : : : ; b : 8R:Cmg � B and ((a; d) : R) 2 B.

Again L :=partial LI(B;:C1t: : :t:Cm; d; pos) is by Theorem 5.9 a positivepartial Lyndon interpolant the branch B, a concept :C1 t : : : t:Cm and theindividual d. This is by De�nition 4.11 equivalent: L v :(C1 u : : : u Cm)(�)and d 2B L(��). This proves the three properties because:

Page 55: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

C Proofs 55

1. B [ fa : 8R::Lg is inconsistent because d 2B L (��) and (a; d) : R 2 B.

2. b 2B 8R:C1 u : : : u Cm and 8R:C1 u : : : u Cm v 8R::L (which followsfrom (�)) implies that b 2B :9R:L.

3. We know that occ(L) � occB(d) \ occ(:C1 t : : : t :Cm). FurthermoreoccB(d)R � occB(a) because (a; d) : R 2 B and occ(C1 u : : : u Cm)

R �occB(b) because fb : 8R:C1; : : : ; b : 8R:Cmg � B. This implies that

occ(C1 u : : : u Cn)R \ occB(d)R � occB(a) u occB(b)

Induction hypothesis ensures that occ(LI) � occB(a) \ occB(b),so that occ(LI(a; b)) = occ(L)R [ occ(LI) implies occ(LI(a; b)) �occA(a) \ occA(b).

� LI(a; b) =concept LI(Ba;:Bb) if concept LI(Ba;:Bb) exists. By construc-tion: a : Ba and b : Bb. Properties 1 and 2 follow immediately from the factthat LI(a; b) is a concept interpolant for Ba and :Bb by Theorem 5.3, whichimplies Ba v LI(a; b) v :Bb.

Proof of Theorem 5.13 We have to show that if there is an ABox interpolantL for an ABox A, ABox LI(A; a; b) is de�ned. But there are only three cases suchthat ABox LI(A; a; b) could be unde�ned, i.e. for a branch B of the preprocessingcomplete ABox A0:

1. For all (a; d) : R 2 B: partial LI(B;:C1;t : : :t:Cm; d; pos) does not exist.

2. For all (b; d) : R 2 B: partial LI(B;C1;u : : : u Cm; d; neg) does not exist.

3. concept LI(Ba; Bb) does not exist.

and we will show that in none of the cases an ABox interpolant for a and b exists.

Proposition C.1 If there is an ABox interpolant for an ABox A and two individ-

uals a and b it also is an ABox interpolant for a and b in the preprocessing complete

ABox A0, i.e. there must be an branch B 2 A0 and a concept L which is an ABox

interpolant for a and b.

Assume there exists an ABox interpolant L for B. We show that at least one of thethree types of interpolants is de�ned. The proof is by induction over the structureof L.

� L = 8R:C, i.e. We know that both b 2B 9R::C and a 2B 8R:C.

Page 56: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

56 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

{ Bb u 8R:C = ?, i.e. L v :Bb. But we know that B [ fa : 9R::Cg = ?which implies again that Ba v :Bb.

Since interpolation holds for concept subsumption there must be an in-terpolant L0 for Ba and :Bb such that Ba v L0 v :Bb and occ(L0) �occ(Ba) \ occ(:Bb).

Remember that it is enough to show that Algorithm 5.4 returns withan arbitrary ABox interpolant whenever there exists one. But this isestablished because L0=concept LI(Ba;:Bb) is de�ned and returnedby Algorithm 5.1 and 9R:L0 is indeed an ABox interpolant for a and bwith respect to B. But this contradicts the assumption.

{ there is a (b; d) : R 2 B and d 2B :C. a 2B 8R:C i� there are somefa : 8R:C1; : : : ;8R:Cig � B such that :CuC1u: : :uCi is inconsistent (nomore role assertions can be applied because only a : 9R::C would be newin a preprocessing complete branch). This implies :C v :(C1u : : :uCx)and therefore :C v :(C1 u : : : u Cm) and d 2B :(C1 u : : : uCm).

But now we know by Theorem 5.10 that there must be a negative partialABox interpolant L0 for d and :(C1 u : : : u Cm) with respect to B, i.e.L0=partial LI(B;C1 u : : : u Cm; d; neg) exists. But again 8R:L0 is anABox interpolant for a and b which contradicts the assumption.

� L = 9R:C, i.e. We know that both a 2B 9R:C and b 2B 8R::C.

{ Ba u 8R::C = ?, i.e. Ba v L. But we know that B [ fb : 9R:Cg = ?i� Bb u 9R:C is inconsistent. Therefore L v :Bb and �nally Ba v :Bb.The same argument holds as before.

{ there is an (a; d) : R 2 B and d 2B C. b 2B 8R::C i� there are somefb : 8R:C1; : : : ;8R:Cig � B such that CuC1u : : :uCi is inconsistent (nomore role assertions can be applied because only b : 9R:C would be newin a preprocessing complete branch). This implies C v :C1 t : : : t :Cx

and therefore C v :C1 t : : : t :Cm.

d 2B C and C v :C1 t : : : t :Cm implies that there must be a positivepartial ABox interpolant L0 for d and :C1t : : :t:Cm with respect to B,i.e. partial LI(B;:C1t : : :t:Cm; d; pos) exists and the concept 9R:L0

is an ABox interpolant which contradicts the assumption.

References

[Alvarez, 2000] J. Alvarez. TBox acquisition and information theory. In DL'2000[2000].

Page 57: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

REFERENCES 57

[Areces, 2000] C. Areces. Logic Engeneering. The Case of Description and Hybrid

Logics. PhD thesis, Institute for Logic, Language and Computation, Amsterdam,Holland, 2000.

[Baader and K�usters, 1998] F. Baader and R. K�usters. Least common subsumercomputation w.r.t. cyclic ALN -terminologies. In Proceedings of the 1998 Inter-

national Workshop on Description Logics (DL'98), 1998.

[Badea and Nienhuys-Cheng, 2000] L. Badea and S.-H. Nienhuys-Cheng. Re�ningconcepts in description logics. In DL'2000 [2000].

[Bazan, 1998] J. Bazan. A comparison of dynamic and non-dynamic rough setmethods for extracting laws from decision tables. In Polkowski and Skowron[1998], pages 322{365.

[Blake and Merz, 1998] C.L. Blake and C.J. Merz. UCI repository of machine learn-ing databases, 1998.

[Cohen and Hirsh, 1994] W. Cohen and H. Hirsh. Learning the classic descriptionlogic: Theoretical and experimental results. In KR-94, pages 121{133, Bonn,Germany, 1994.

[DL'2000, 2000] International Workshop on Description Logics (DL'2000), Aachen,Germany, 2000.

[Endriss, 2000] U. Endriss. Reasoning in description logic with wellington 1.0.In Proceedings of the Automated Reasoning Workshop 2000, London, UK, 2000.

[Fitting, 1996] M. Fitting. First-Order Logic and Automated Theorem Proving.Springer, 2nd edition, 1996.

[Frazier and Pitt, 1996] M. Frazier and P. Pitt. CLASSIC learning. Machine Learn-ing, 25:151{193, 1996.

[Guan and Bell, 1998] J. W. Guan and D. A. Bell. Rough computational methodsfor information systems. Arti�cial Intelligence, 105:77{103, 1998.

[G�uvenir et al., 1997] H.A. G�uvenir, B. Acar, G. Demir�oz, and A. Cekin. A su-pervised machine learning algorithm for arrhythmia analysis. In Computers in

Cardiology, volume 24, pages 433{436, 1997.

[Haarslev and M�oller, 2001] V. Haarslev and R. M�oller. Racer system description.In International Joint Conference on Automated Reasoning, IJCAR'2001, Siena,Italy, 2001.

[Hollunder, 1996] B. Hollunder. Consistency checking reduced to satis�ability ofconcepts in terminological systems. Annals of Mathematics and Arti�cial Intel-

ligence, 18:95{131, 1996.

Page 58: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

58 Interpolation Methods for Assertion Mining in Hybrid Knowledge Bases

[Horrocks, 1997] I. Horrocks. Optimising Tableaux Decision Procedures for Descrip-

tion Logics. PhD thesis, Faculty of Science and Engeneering, University of Manch-ester, 1997.

[Kietz and Morik, 1994] J.U. Kietz and K. Morik. A polynomial approach to theconstructive induction of structural knowledge. Machine Learning, 14:193{217,1994.

[Kracht, 1999] M. Kracht. Tools and Techniques in Modal Logic. North Holland,1999.

[K�usters and Molitor, 2000] R. K�usters and R. Molitor. Computing most speci�cconcepts in description logics with existential restrictions. LTCS-Report 00-05,LuFG Theoretical Computer Science, RWTH Aachen, Germany, 2000.

[Liau, 2000] C. J. Liau. An overview of rough set semantics for modal and quanti-�er logics. International Journal of Uncertainty, Fuzziness and Knowledge-based

Systems, 8(1):93{118, 2000.

[Lyndon, 1959] R.C Lyndon. An interpolation theorem in the predicate calculus.Paci�c Journal of Mathematics, 9:155{164, 1959.

[Marx, 1999] M. Marx. Interpolation in modal logic. In Algebraic methodology and

software technology (Amazonia, 1999), pages 154{163. Springer, Berlin, 1999.

[Mitchell, 1997] T. Mitchell. Machine Learning. McGraw Hill, New York, 1997.

[Mollestad, 1997] T. Mollestad. A rough set approach to data mining. PhD thesis,Norwegian University of Science and Technology, 1997.

[Nebel, 1990] Bernhard Nebel. Reasoning and revision in hybrid representation sys-

tems, volume 422 of Lecture Notes in Arti�cial Intelligence and Lecture Notes in

Computer Science. Springer-Verlag Inc., New York, NY, USA, 1990. Revision ofthe author's thesis (Saarland, 1989).

[Patel-Schneider and Horrocks, 1999] P. F. Patel-Schneider and I. Horrocks. DLPand FaCT. In Tableau 99, pages 19{23, 1999.

[Pawlak, 1982] Z. Pawlak. Rough sets. International Journal of Computer and

Information Sciences, 11(5):341{356, 1982.

[Pawlak, 1998] Z. Pawlak. Rough set elements. In Polkowski and Skowron [1998],pages 322{365.

[Polkowski and Skowron, 1998] L. Polkowski and A. Skowron, editors. Rough Sets

in Knowledge Discovery, volume 1. Physica-Verlag, 1998.

[Rathe, 1997] R. Rathe. Primary care baseline. Available for educational purposefrom http://www.med.u .edu/medinfo/baseline/arrhythm.html, 1997.

Page 59: 2 In · sp eci ed. The big adv an tage in this case that the existence of most sp eci c concept is not required. W e presen t a sup ervised learning metho d based on logical in terp

REFERENCES 59

[Rouveirol and Ventos, 2000] C. Rouveirol and V. Ventos. Towards learning inCARIN-ALN . In J. Cussens and A. Frisch, editors, ILP00, volume 1866 ofLNAI, pages 191{208. Springer, 2000.

[Schlobach, 2000] S. Schlobach. Assertional mining in description logics. In DL'2000[2000].

[Schlobach, 2001] S. Schlobach. Interpolation methods for assertion mining in hy-brid knowledge bases. Technical report, King's College London, 2001.

[Schmidt-Schauss and Smolka, 1991] M. Schmidt-Schauss and G. Smolka. Attribu-tive concept descriptions with complements. Arti�cial Intelligence, 48:1{26, 1991.

[Skowron, 1993] A. Skowron. Extracting laws from decision tables. In W. Ziarko,editor, Proceedings of the Second International Workshop on Rough Sets and

Knowledge Discovery (RSKD'93), pages 101{104, 1993.

[Yanowitz, 1997] F.G. Yanowitz. Ecg intervals and waves, 1997.