Guidelines for Constructing Reusable Domain Ontologies

download Guidelines for Constructing Reusable Domain Ontologies

of 4

Transcript of Guidelines for Constructing Reusable Domain Ontologies

  • 8/4/2019 Guidelines for Constructing Reusable Domain Ontologies

    1/4

    Guidelines for Constructing Reusable Domain Ontologies

    Muthukkaruppan AnnamalaiDepartment of Computer Science & Software

    EngineeringThe University of MelbourneVictoria 3010, Australia

    [email protected]

    Leon SterlingDepartment of Computer Science & Software

    EngineeringThe University of MelbourneVictoria 3010, Australia

    [email protected]

    ABSTRACTThe growing interest in ontologies is concomitant with theincreasing use of agent systems in user environment. On-tologies have established themselves as schemas for encodingknowledge about a particular domain, which can be inter-preted by both humans and agents to accomplish a task incooperation. However, construction of the domain ontolo-gies is a bottleneck, and planning towards reuse of domainontologies is essential. Current methodologies concernedwith ontology development have not dealt with explicit reuseof domain ontologies. This paper presents guidelines forsystematic construction of reusable domain ontologies. Apurpose-driven approach has b een adopted. The guidelineshave been used for constructing ontologies in the Experi-mental High-Energy Physics domain.

    1. INTRODUCTIONThe World Wide Web has become the de facto mediumfor distributed user community to share digital information,posing a challenge for users to effectively utilise the accessedinformation. The next generation agentised web promises

    to dispense with some of the human effort through machineprocessable metadata linked to ontologies. The working def-inition of ontology is a specification of shared conceptuali-sation [4]. An ontology presents a shared understanding ofhow the world is organised in a particular aspect of the do-main and specifies the meaning of terms that make up thevocabulary in the domain of discourse a necessity for in-formation access and interoperability.

    Constructing ontologies from scratch to support domain ap-plications requires a great deal of effort and time. Alterna-tively, reusable domain ontologies provide opportunities fordevelopers to exploit and reuse existing domain knowledgeto build their applications with much ease and reliability.

    A common belief is that reusable ontologies ought to beconceived and developed independent from application andcontext of its use. The consequence of adhering strictly tothis notion is that the developed reusable domain ontologiesare: a) usually over-generalised and omit useful knowledge;b) often are also sparse constructs because it is not easyto determine which part of the concrete domain knowledgecan be reused, particularly when the capturing of the do-main knowledge is attempted in a top-down fashion; and c)necessitates modification and considerable extension workbefore it can be utilised. A better alternative is to be ableto develop reusable ontologies without over-compromisingtheir usability in the domain.

    The approach we propose is to first ask which kind of do-main knowledge should be encoded; bring together the rele-vant pieces of knowledge; only then identify which of thosepieces of knowledge can be reused and isolate them. Thispaper outlines a strategy to develop reusable domain on-tologies in this manner. We will discuss this issue in thecontext of a case study to develop reusable ExperimentalHigh-Energy Physics (EHEP) ontologies for the Belle col-

    laboration (http://belle.kek.jp/belle/). The distributed sci-entific community collaboratively sets-up experiment, accu-mulate event data, generate simulation data, construct soft-ware tools to analyse the data, carry out data analysis, pub-lish their findings, and progressively build on each otherswork. We aim to show that suitable ontologies be developedand reused to facilitate this scientific community to produceand share information effectively on the agentised web [1,2]. The reusable domain ontologies will serve as a basis forcommunication, integration and sharing of information per-taining to experimental analysis within the collaboration.

    The paper is organised as follows. In Section 2, we sketchout our basic strategy for developing reusable domain on-

    tologies. This strategy is further elaborated and illustratedin Sections 3 and 4. The development guidelines are alsogiven in here. Finally, in the concluding section we sum-marise the contribution of this paper.

    2. STRATEGISING THE DEVELOPMENT

    OF REUSABLE DOMAIN ONTOLOGIESWe have devised a strategy for developing reusable domainontologies by putting together the key points in traditionaland modern ontology development methodologies and prin-ciples, such that of [3, 4, 5, 8, 9, 10]. The notable featuresin these methodologies will be pointed out as we advancethrough this paper.

    There are two types of ontologies in our environment, namely,domain and purposive ontologies. A domain ontology cap-tures an area of the domain generically, and serves as areusable knowledge entity that can be shared across this do-main. The domain ontologies are loosely coupled to one an-other, reflecting the association between the different facetsof the domain captured by the respective ontologies. Onthe other hand, a purposive ontology explicitly defines theterms for supporting specific purpose or use. The purposiveontology encodes specialised domain knowledge by compos-ing various reusable domain ontologies and then affectingthe necessary application-specific extensions.

  • 8/4/2019 Guidelines for Constructing Reusable Domain Ontologies

    2/4

    Analysis

    Variable

    Publishing

    AnalysisStatistical

    DistributionModel

    Dimension

    Unit

    Parametric

    Constraint

    Particle

    Detector

    Figure 1: Interconnected Ontologies

    The ontological commitment required to support a particu-lar need is embodied in a set of reusable domain ontologiesas illustrated in Figure 1. The filled box denotes the purpo-sive ontology modelled for publishing EHEP analysis. Eachempty b ox represents generic domain ontology. An arrowpoints at the ontology that holds the definition of the re-ferred terms. A generic ontology is linked to another viaconcept relations. This links depicts the dependencies be-tween the ontologies in this conceptualisation of the domain.This kind of loose coupling allows scalable modifications of

    the domain ontologies.

    The value of reusable ontologies is long recognised by on-tology researchers. The research on the technology for sup-porting knowledge sharing and reuse originated in StanfordsKnowledge Systems Laboratory (KSL) [7]. It spurred thedevelopment of reusable, general ontologies (such as On-tology of Time, Money, Measure, etceteras) for the KSLsOntolingua server. The design principles espoused in [4]were commonly used to develop these ontologies that aremeant to b e shared across different knowledge domains.1

    While the past attempt was concerned with the develop-ment of domain-independent single ontologies, we focus onconstructing inter-depending ontologies that are small andeasier to reuse in a particular domain.

    There appears to be no clear methodology for building do-main ontologies that can be reused with minimal extensions.Moreover, a disagreeable prospect arises from the conven-tional idea that reusable domain ontologies ought to be con-ceived and developed independent from their application,due to the following reasons: a) Unlike general knowledge,which ground out in primitives with assumptions that are or-dinarily understood, domain knowledge deals with domain-specific jargon; b) It is not feasible to think of knowledgeneeds of all foreseeable domain applications; c) The intendedmeaning of some terms can be different according to contextof application; and d) It is also impractical for knowledgeengineers to describe all knowledge they know about the do-

    main. Hence, it is difficult to see how it would be possible forknowledge engineers to rely on their intuition alone to buildontologies that adequately capture the reusable knowledgein their domain.

    Given this situation, we maintain that it is only sensibleif we allow our current needs to dictate the creation of us-able domain ontologies that are also reusable.2 Modelling

    1Clarity, Coherence, Extendibility, Minimal encoding biasand Minimal ontological commitment are prescribed as suit-able design principles.2As a matter of fact, the modern ontology development

    according to the purpose and use helps to determine whatfeatures of the domain knowledge should be encoded andprovide a focus for knowledge acquisition. Consequently, weadvocate development of purposive ontologies, and simulta-neously pursue the creation of reusable domain ontologies.

    Our basic strategy is spelt out as follows: a) Adopt a bottom-up view of the domain to conceptualise the knowledge re-

    quired to support a specific need, and build the conceptualmodel; b) Identify potentially reusable chunk in the con-ceptual model and generalise it; and c) Formalise the gen-eralised domain model into reusable domain ontology. Inwhat follows, we will elaborate this strategy and illustrateit using simple examples.

    3. PURPOSE-DRIVEN CREATION OF RE-

    USABLE DOMAIN ONTOLOGIESWe begin by modelling the purposive ontology, which definesthe vocabulary for the purpose of describing the experimen-tal analysis in collaboration documents, such as researchnotes and research papers. The ontology can be appliedfor document annotation, query-answering and informationretrieval.

    3.1 Modelling the Purposive OntologyWe set out to accomplish this task by creating the conceptmodel, which will serve as the foundation for the purpo-sive ontology. Individuals in this conceptualisation are typ-ically defined as concepts, and are constrained by proper-ties, relations and axioms. The taxonomic and cross rela-tionships among the concepts are explicitly specified. Weused Protege-2000 (http://www-protege.stanford.edu) , a fr-ame-based modelling tool to construct the concept model.Figure 2 describes a partial hierarchy of top-level conceptsin the model that was developed to conceive this purposive

    ontology.

    The model is elaborated from scientific collaboration doc-uments, mainly books, research papers, existing standardHEP terminology and consulting the EHEP physicists. Ourinitial discussion with the EHEP physicists and related liter-ature review enabled us to recognise some of the distinct con-cepts required for describing a typical EHEP experimentalanalysis. They are Signal event, Background event, Kine-matic variables, Topological variables, Particle, etceteras.We called them the hook-concepts, as they serve as hooks (orlinks) for structuring additional concepts into the conceptmodel. Initially, our competency questions [5], that are thequestions we want the ontology to answer, revolved around

    these hook-concepts.

    3

    Subsequently, new concepts affiliatedwith the existing concepts emerge, which are organised inthe hierarchical model by bottoming-up and middling-outprocesses [10]. The internal structures of these concepts aredefined to limit their possible interpretation and relations in

    methodologies [8, 9] follow the tradition of knowledge en-gineering and back the development of application-orientedontologies.3In essence, the competency questions identify the kind ofdomain knowledge that should be encoded. Examples ofcompetency questions are: What are the kinematic cuts per-formed in B event analysis? What are the suppressedbackground events?

  • 8/4/2019 Guidelines for Constructing Reusable Domain Ontologies

    3/4

    Figure 2: Externalising Reusable Domain Models

    the ontology. This cycle is continued until the model is sat-isfactorily developed, that is when the set of compiled com-petency questions and their related answers can be clearlyrepresented using the terms defined in the ontology. Each

    cycle evolves the model closer to the desired form.

    3.2 Abstraction of Reusable ChunksThe concepts in the model are distinctly organised accord-ing to their role in elucidating particular aspects of the do-main. As a rule, homogeneous concepts tend to cluster un-der a common parent concept and are potentially reusablein another situation (in whole or part). The reuse poten-tial of these clustered concepts is reflected by their coherentnature.4 The task of the knowledge engineer is to exam-ine each cluster of concepts to identify the chunk that hasreuse potential. The reusable chunk is isolated from themodel and generalised into an independent reusable domainmodel. For example, the definition about Particle and itssub-classes in the concept model (shown in Figure 2) can beexternalised from this model and componentised as reusableknowledge entity. Using this simple technique, we generatedall the domain ontologies supporting the purposive ontologyfor describing experimental analysis as depicted in Figure 1.

    Ontology Development Guideline I. We summarise theoperational guidelines for developing purposive and domainontologies (that does not reuse existing ontologies) as fol-lows: a) Specify purpose and uses of ontology; b) Iden-tify hook-concepts; c) Formulate competency questions; d)Identify new terms required to precisely formulate the com-petency questions and their generated answers; e) Definethe new terms (concepts, properties, relations and axioms).

    Structure the concept into the concept model; f ) Evaluateconcept model against the set of competency questions andmake the necessary changes; g) If still not satisfied with thelevel of details in the model, return to step [c]; h) Analyseconcept clusters in the model that particularises specific ar-eas of the domain. If a group of related concepts has reusepotential, generalise and shape them as a separate reusabledomain model (sub-model); i) Link back the sub-models tothe main model. Make the necessary application-specific ex-

    4The coherency among a set of concepts can be appropri-ately assessed by performing an ontological analysis [6] in-volving the concepts and relations in the model.

    tensions to the incorporated generalised domain knowledge;and j) Formalise the main model and the sub-models intopurposive ontology and reusable domain ontologies, respec-tively. The ontological terms and definitions are expressedformally in a web-ontology specification language, such asDAML+OIL (http://www.daml.org).

    Future domain applications can exploit these reusable do-

    main ontologies and may even churn out new reusable do-main ontologies. This matter is discussed in the next sec-tion.

    4. THE REUSE OF DOMAIN ONTOLOGIESUsing an existing domain ontology for supporting anotherapplication or to serve as a basis for building another on-tology is cost-effective, provided the reusable ontology doesnot require much customisation effort. We describe two suchscenarios that illustrate the reuse of existing domain ontolo-gies and the creation of new versions of existing ontologies.

    Purpose 1: Supporting Analysis Specification. Weenvision applications that allow physicists to partially au-

    tomate experimental analysis such as skimming, tracking,vertexing and particle reconstruction. As a result, the vo-cabulary to describe the rudiments of these low-level anal-yses will need to feature in the purposive ontology built tosupport these applications.

    We begin by sketching the concept model of this purposiveontology. The preliminary study shows that the purposiveontology will provide the vocabulary to describe the EHEPanalysis, as in the previous case (refer to Section 3), but ingreater detail. In particular, there will b e additional anal-ysis variables to be considered, and requires a much finerrepresentation.

    All the domain ontologies created earlier are candidates forreuse. However, the Analysis Variable ontology will be re-vised to cater for this new requirement. Low-level analysesalso involve tracks and clusters associated with event parti-cles. Like Particle ontology, the knowledge about tracks andclusters can be held as separate entities. As we do not antic-ipate changes to the other existing ontologies, the purposiveontology being developed can make use of those reusabledomain ontologies directly.

    This developmental activity has generated two new domainontologies, namely Track and Cluster; and a new version ofAnalysis Variable ontology, extended from its earlier version(see Figure 3). The distinction between the two versions ofthe Analysis Variable ontology can be made on the basis ofadditional definitions used to characterise the conceptualisa-tion in the later version. Since no alteration to the existingdefinitions in this ontology was made, the new version is in-deed backward compatible. Our rationale for constructing anew version of this ontology is to ensure that the prospectiveusers who have adopted the original version can continue torely upon their ontology, and not be overwhelmed with ab-straction unrelated to their need.

    Purpose 2: Supporting Detector Description. An-other planned application aims to provide information aboutthe Belle detectors. The purposive ontology developed for

  • 8/4/2019 Guidelines for Constructing Reusable Domain Ontologies

    4/4

    derived

    AnalysisVariables

    (Ver. 2)

    Describing

    Detector

    DimensionUnit

    Parametric

    Constraint

    Particle

    Detector

    (Ver. 2)

    Track

    Cluster

    AnalysisVariables

    Detector

    derived

    Figure 3: Reusing Existing Domain Ontologies

    this application will be used to build a knowledge base aboutdetectors particle identification capabilities.5

    The existing Detector ontology merely describes the detec-tor and regions where the presence of tracks and clustersare sensed and is referred to by definitions in Analysis Vari-able ontology. The new requirement necessitates definitionsin Detector to refer to definitions in Analysis Variable in-stead (see Figure 3). The new version of Detector ontology

    will have to provide the vocabulary to describe the Belledetectors in greater details, including inverse relationshipsbetween definitions in Detector and Analysis Variable.

    Ontology Development Guideline II. We now recapitu-late the operational guidelines for developing purposive anddomain ontologies by reusing existing domain ontologies: a)Specify purpose and uses of ontology; b) Sketch the modelof the purposive ontology; c) Identify existing reusable do-main ontologies that can be used to support the modellingprocess; d) Construct the unsupported portion of the modelbased on Guideline I steps [b] - [g]; e) Identify reusable re-gion in the model. If found, convert the related concepts intoan independent reusable domain model. It corresponds to

    Guideline I step [h]; f ) Select reusable ontologies (identifiedearlier) and make necessary changes to accommodate appli-cation needs. Sometimes, the modified ontologies may beredeveloped as new versions of the existing ontologies; andg) Rebuild the model of the purposive ontology by linkingthe sub-models, and formalise the ontologies. It is similarto Guideline I steps [i] and [j].

    5. CONCLUDING REMARKSWe have presented guidelines for creating reusable domainontologies for a scientific user-community. In our environ-ment, the different domain ontologies organise and structurethe knowledge of diverse parts of the domain. The domainontologies are constructed as small reusable knowledge com-

    ponents that can be easily shared across applications. Theseontologies are loosely coupled to one another reflecting theirassociation in the real world.

    The conceptualisation is guided by a common set of compe-tency questions generated when application-dependent pur-posive models are conceived. This provides the reason forbelieving that the same kind of reusable chunks will emerge

    5An example piece of knowledge may look like this: Aero-gel Cerenkov Counter discriminates Kaon over Pion (eventvariable) with 93% efficiency for momentum greater than700MeV/C.

    from the different purposive models, even if the order inwhich the applications are considered is varied. The con-cepts in the domain ontologies are captured in some gener-ality to make reuse possible. We are consistent with makingspecific knowledge more generic.

    Sometimes it would be necessary to allow co-existence ofdifferent versions of a domain ontology to accommodate dif-

    ferent needs in the domain. For example, in Section 4 weexemplified the creation of new versions of existing ontolo-gies. These versions are seen as distinct domain ontologieswith dissimilar reuse p otential. Herein lays the larger issueof the management of the domain ontologies. A mechanismto control the different versions of domain ontologies is es-sentially required.

    Acknowledgments: We are grateful to our associates inthe Physics department, particularly Glenn Moloney andLyle Winton for providing insights into EHEP experimentalanalysis.

    6. REFERENCES

    [1] M. Annamalai, L. Sterling, and G. Moloney. Acollaborative framework for distributed scientificgroups. In Proceedings of AAMAS02 Workshop onOntologies in Agent Systems, 2002.

    [2] L. Cruz, M. Annamalai, and L. Sterling. Analysinghigh-energy physics experiments. In Proceedings ofAAMAS02 Workshop on AgentCities, 2002.

    [3] M. Fernandez, A. Gomez-Perez, and N. Juristo.Methontology: From ontological art towardsontological engineering. In Proceedings of AAAI97Spring Symposium on Ontological Engineering, 1997.

    [4] T. Gruber. A translation approach to portableontologies. Knowledge Acquisition, 5(2):199220, 1993.

    [5] M. Gruninger and M. S. Fox. Methodology for thedesign and evaluation of ontologies. In Proceedings ofIJCAI95 Workshop on Basic Ontological Issues inKnowledge Sharing, 1995.

    [6] N. Guarino and C. Welty. Ontological analysis oftaxonomic relationships. In Proceedings of ER00Conference on Conceptual Modelling, 2000.

    [7] R. Neches, R. Fikes, T. Finin, T. Gruber, T. Patil,R. Senator, and W. R. Swartout. Enabling technologyfor knowledge sharing. AI Magazine, 12(3):1636,1991.

    [8] A. T. Schreiber, J. M. Akkermans, A. A. Anjewierden,

    R. Dehoog, N. R. Rhadbold, W. V. D. Velde, andB. J. Wielinga. Knowledge Engineering andManagement - The CommonKADS Methodology.University of Amsterdam, 1998.

    [9] S. Staab, R. Studer, H. P. Schnurr, and Y. Sure.Knowledge process and ontologies. IEEE IntelligentSystems, pages 2634, Jan/Feb 2001.

    [10] M. Uschold. Building ontologies: Towards a unifiedmethodology. In Proceedings of the British ComputerSociety Specialist Group Conference on ExpertSystems, 1996.