NOW L T he Sem antic W eb: DGE T he R oles of X M L and R ...

12
XML and RDF are the current standards for establishing semantic interoperability on the Web, but XML addresses only document structure. RDF better facilitates interoperation because it provides a data model that can be extended to address sophisticated ontology representation techniques. T he World Wide Web is possible because a set of widely established standards guarantees interoperability at various levels. Until now, the Web has been designed for direct human processing, but the next-generation Web, which Tim Berners-Lee and others call the “Seman- tic Web,” aims at machine-processible information. 1 The Semantic Web will enable intelligent services—such as information brokers, search agents, and information filters—which offer greater functionality and interoper- ability than current stand-alone services. The Semantic Web will only be possible once further levels of interop- erability have been established. Standards must be defined not only for the syntactic form of documents, but also for their semantic content. Notable among recent W3C standardization efforts are XML/XML schema and RDF/RDF schema, which facilitate semantic interoperability. In this article, we explain the role of ontologies in the architecture of the Semantic Web. We then briefly summarize key elements of XML and RDF, showing why using XML as a tool for semantic interoperability will be ineffective in the long run. We argue that a further representation and inference layer is needed on top of the Web’s current layers, and to estab- lish such a layer, we propose a general method for encoding ontology rep- resentation languages into RDF/RDF schema. We illustrate the extension method by applying it to Ontology Interchange Language (OIL), an ontol- ogy representation and inference language. 2 ONTOLOGIES: DOMAIN CONCEPTUALIZATION Ontologies can play a crucial role in enabling Web-based knowledge pro- cessing, sharing, and reuse between applications. Generally defined as shared formal conceptualizations of particular domains, ontologies pro- vide a common understanding of topics that can be communicated between people and application systems. Ontologies are used in e-commerce to enable machine-based commu- nication between buyers and sellers; vertical integration of markets (such 63 IEEE INTERNET COMPUTING 1089-7801/00/$10.00 ©2000 IEEE http://computer.org/internet/ SEPTEMBER • OCTOBER 2000 The Semantic Web: The Roles of XML and RDF STEFAN DECKER AND SERGEY MELNIK Stanford University FRANK VAN HARMELEN, DIETER FENSEL, AND MICHEL KLEIN Vrije Universiteit Amsterdam JEEN BROEKSTRA Aidministrator Nederland B.V. MICHAEL ERDMANN University of Karlsruhe IAN HORROCKS University of Manchester KNOWLEDGE NETWORKING

Transcript of NOW L T he Sem antic W eb: DGE T he R oles of X M L and R ...

XML and RDF are the current

standards for establishing

semantic interoperability on the

Web, but XML addresses only

document structure. RDF better

facilitates interoperation

because it provides a data

model that can be extended to

address sophisticated ontology

representation techniques.

T he World Wide Web is possible because a set of widely establishedstandards guarantees interoperability at various levels. Until now,the Web has been designed for direct human processing, but the

next-generation Web, which Tim Berners-Lee and others call the “Seman-tic Web,” aims at machine-processible information.1 The Semantic Webwill enable intelligent services—such as information brokers, search agents,and information filters—which offer greater functionality and interoper-ability than current stand-alone services.

The Semantic Web will only be possible once further levels of interop-erability have been established. Standards must be defined not only for thesyntactic form of documents, but also for their semantic content. Notableamong recent W3C standardization efforts are XML/XML schema andRDF/RDF schema, which facilitate semantic interoperability.

In this article, we explain the role of ontologies in the architecture ofthe Semantic Web. We then briefly summarize key elements of XML andRDF, showing why using XML as a tool for semantic interoperability willbe ineffective in the long run. We argue that a further representation andinference layer is needed on top of the Web’s current layers, and to estab-lish such a layer, we propose a general method for encoding ontology rep-resentation languages into RDF/RDF schema. We illustrate the extensionmethod by applying it to Ontology Interchange Language (OIL), an ontol-ogy representation and inference language.2

ONTOLOGIES: DOMAIN CONCEPTUALIZATIONOntologies can play a crucial role in enabling Web-based knowledge pro-cessing, sharing, and reuse between applications. Generally defined asshared formal conceptualizations of particular domains, ontologies pro-vide a common understanding of topics that can be communicatedbetween people and application systems.

Ontologies are used in e-commerce to enable machine-based commu-nication between buyers and sellers; vertical integration of markets (such

63IEEE INTERNET COMPUTING 1089-7801/ 00/$10.00 ©2000 IEEE h t tp ://computer.org/ in te rne t/ SEPTEMBER • OCTOBER 2000

The Semantic Web:The Roles of XML and RDFSTEFAN DECKER AND SERGEY MELNIK

Stanford UniversityFRANK VAN HARMELEN, DIETER FENSEL, AND MICHEL KLEIN

Vrije Universiteit AmsterdamJEEN BROEKSTRA

Aidministrator Nederland B.V.MICHAEL ERDMANN

University of KarlsruheIAN HORROCKS

University of Manchester

KN

OW

LEDG

E NETW

ORK

ING

K N O W L E D G E N E T W O R K I N G

64 SEPTEMBER • OCTOBER 2000 h t tp ://computer.org/ in te rne t/ IEEE INTERNET COMPUTING

as VerticalNet [http://www.verticalnet.com]); anddescription reuse between different marketplaces.Search engines also use ontologies to find pageswith words that are syntactically different butsemantically similar.

An ontology typically contains a hierarchy ofconcepts within a domain and describes each con-cept’s crucial properties through an attribute-valuemechanism. Further relations between conceptsmight be described through additional logical sen-tences. Finally, constants (such as “January”) areassigned to one or more concepts (such as“Month”) in order to assign them their proper type.

The OIL ontology consists of slot definitions(slot-def) and class definitions (class-def). Figure 1is an example ontology (though it omits slot defi-nitions due to space constraints). A slot-def describesa binary relation between two entities. A class-defassociates a class name with a class description andconsists of the following components (any of whichcan be omitted):

! definition type can be either “defined” or “prim-itive.” For defined types, a class is completelyspecified in the class definition. For primitivetypes, the conditions in the class definition arenecessary, but insufficient for determining classmembership.

! slot constraint restricts the possible values a slotcan have when applied to an instance of the class.

! subclass-of relates the defined class to a list ofone or more class expressions—class names, slotconstraints, or an arbitrarily complex Booleancombination of these. Hence, the class beingdefined is a subclass of those defined by eachclass expression.

The main components of a slot constraint are:

! name—a string that delineates the slot beingconstrained.

! value-type—a list of one or more class expres-sions that are range constraints for the slot with

class-def animal % animals are a classclass-def plant % plants are a class subclass-of NOT animal % that is disjoint from animalsclass-def tree subclass-of plant % trees are a type of plantsclass-def branch slot-constraint is-part-of % branches are parts of trees has-value treeclass-def leaf slot-constraint is-part-of % leaves are parts of branches has-value branchclass-def defined carnivore % carnivores are animals subclass-of animal slot-constraint eats % that eat only other animals value-type animalclass-def defined herbivore % herbivores are animals, but not carnivores subclass-of animal, slot-constraint eats % that eat only plants or parts of plants value-type plant

OR (slot-constraint is-part-of has-value plant)class-def giraffe % giraffes are herbivores subclass-of herbivore slot-constraint eats % and they eat leaves value-type leafclass-def lion subclass-of animal % lions are also animals slot-constraint eats % but they eat herbivores value-type herbivoreclass-def tasty-plant % tasty plants are plants that are eaten by subclass-of plant % both herbivores and carnivores slot-constraint eaten-by has-value herbivore, carnivore

NOT carnivore

Figure 1. Example ontology defining African wildlife. The hierarchy of concepts is formulated using theOntology Interchange Language.

T H E S E M A N T I C W E B

65IEEE INTERNET COMPUTING h t tp ://computer.org/ in te rne t/ SEPTEMBER • OCTOBER 2000

respect to the class defined. All instances arerelated by the slot to an instance of the classdefined by it and must be instances of all theclass expressions in the list. For example, if aclass has a slot eats with the slot constraintvalue-type animals, then instances of the classonly eat animals.

! has-value—a list of one or more class expressionsin which all instances of the class defined by theslot must be related via the slot to at least oneinstance of each class expression in the list. Forexample, if a class has a slot eats with the slot con-straint has-value zebra, wildebeest, then eachinstance of the class eats at least one instance ofclass zebra and one of class wildebeest. (It mightalso eat instances of other classes, such as gazelle.)

XML and RDF each have their merits as a foun-dation for the Semantic Web, but RDF providesmore suitable mechanisms for applying ontologyrepresentation languages like OIL to the task ofinteroperability. To argue this point, we will firstbriefly describe XML and RDF and then discusstheir respective merits for knowledge representa-tion on the Web.

XML GRAMMARSXML is already widely known in the Internet com-munity and is the basis for a rapidly growing numberof software development activities.3 It is designed formarkup in documents of arbitrary structure, asopposed to HTML, which was designed for hyper-text documents with fixed structures.

A well-formed XML document creates a balancedtree of nested sets of open and close tags, each ofwhich can include several attribute-value pairs. Thereis no fixed tag vocabulary or set of allowable combi-nations, so these can be defined for each application.In XML 1.0, this is done using a document type def-inition to enforce constraints on which tags to useand how they should be nested within a document.A DTD defines a grammar to specify allowable com-binations and nestings of tag names, attribute names,and so on. Developments are well under way atW3C to replace DTDs with XML-schema defini-tions.4,5 Although XML schema offer several advan-tages over DTDs, their role is essentially the same:to define a grammar for XML documents.

Figure 2 shows an example serialization of partof the ontology from Figure 1. The basic XML datamodel is a labeled tree, where each tag correspondsto a labeled node in the model, and each nestedsubtag is a child in the tree. Of course, this example

shows just one possible XML-based syntax for theontology. XML is foremost a means for defininggrammars, and because different grammars can beused to describe the same content, XML allowsmultiple serializations. The information in the finalclass definition in Figure 2, for example, could beexpressed in an entirely different form as:

<class-def><name>branch</name><slot-constraint>

<name>is-part-of</name><has-value>tree</has-value>

</slot-constraint></class-def>

XML is used to serve a range of purposes:

! Serialization syntax for other markup languages.For example, the Synchronized MultimediaIntegration Language (SMIL)6 is syntacticallyjust a particular XML DTD; it defines thestructure of a SMIL document. The DTD isuseful because it facilitates a common under-standing of the meaning of the DTD elementsand the structure of the DTD.

! Semantic markup of Web pages. An XML serial-ization (such as the example above) can be used

<class-def> <class name="plant"/> <subclass-of> <NOT><class name="animal"/></NOT> </subclass-of></class-def><class-def><class name="tree"/> <subclass-of> <class name="plant"/> </subclass-of></class-def><class-def> <class name="branch"/> <slot-constraint> <slot name="is-part-of"/> <has-value> <class name="tree"/> </has-value> </slot-constraint></class def>

Figure 2. Partial XML serialization of the exampleontology in Figure 1. This XML document containsone possible serialization. It introduces a tag foreach part of the grammar.

K N O W L E D G E N E T W O R K I N G

66 SEPTEMBER • OCTOBER 2000 h t tp ://computer.org/ in te rne t/ IEEE INTERNET COMPUTING

in a Web page with an XSL style sheet to ren-der the different elements appropriately.7

! Uniform data-exchange format. An XML serial-ization can also be transferred as a data objectbetween two applications.

It is important to note that a DTD specifies onlysyntactic conventions; any intended semantics areoutside the realm of the XML specification.

RDF: RESOURCE METADATAThe Resource Description Framework is a recentW3C recommendation designed to standardize thedefinition and use of metadata—descriptions ofWeb-based resources.8 However, RDF is equallywell suited to representing data.

RDF FoundationsThe basic building block in RDF is an object-attribute-value triple, commonly written as A(O,V).That is, an object O has an attribute A with valueV. Another way to think of this relationship is as alabeled edge A between two nodes, O and V:[O]–A→[V]. This notation is useful because RDFallows objects and values to be interchanged. Thus,any object can play the role of a value, whichamounts to chaining two labeled edges in a graph-ic representation. The graph in Figure 3, for exam-ple, expresses the following three relationships inA(O,V) format:

hasName(‘http://www.w3.org/employee/id1321’,”Jim Lerners”)

authorOf(‘http://www.w3.org/employee/id1321’,’http://www.books.org/ISBN0012515866’)

hasPrice(‘http://www.books.org/ISBN0012515866’,“$62”).

RDF also allows a form of reification in whichany RDF statement can be the object or value of atriple, which means graphs can be nested as well aschained. On the Web this allows us, for example,to express doubt or support of statements createdby other people. Finally, it is possible to indicatethat a given object is of a certain type, such as stat-ing that “ISBN0012515866” is of the rdf:type book,by creating a type arc referring to the book defini-tion in an RDF schema:

<rdf:Description about=“www.books.org/ISBN0012515866”>

<rdf:type rdf:resource=“http://description.org/schema/#book”>

</rdf:Description>

It is important to note that RDF is designed toprovide a basic object-attribute-value data modelfor metadata. Other than this intentional seman-tics—described only informally in the standard—RDF makes no data-modeling commitments. Inparticular, no reserved terms are defined for furtherdata modeling. As with XML, the RDF data modelprovides no mechanisms for declaring propertynames that are to be used.

RDF SchemaJust as XML schema provides a vocabulary-definitionfacility, RDF schema lets developers define a partic-ular vocabulary for RDF data (such as authorOf) andspecify the kinds of object to which these attributescan be applied.9 In other words, the RDF schemamechanism provides a basic type system for RDFmodels. This type system uses some predefined terms,such as Class, subPropertyOf, and subClassOf, for appli-cation-specific schema. RDF schema expressions arealso valid RDF expressions (just as XML schemaexpressions are valid XML).

RDF objects can be defined as instances of one

Jim Lerners

$62http://www.w3.org/employee/id132

s:hasName

s:authorOf www.books.org/ISBN0012515866

s:hasPrice

Figure 3. RDF graph. RDF is defined as syntax independent; the RDF data model defines objects andrelationships between them.

T H E S E M A N T I C W E B

67IEEE INTERNET COMPUTING h t tp ://computer.org/ in te rne t/ SEPTEMBER • OCTOBER 2000

or more classes using the type property. The sub-ClassOf property allows the developer to specify thehierarchical organization of such classes, and sub-PropertyOf does the same for properties. Constraintson properties can also be specified using domainand range constructs, which can be used to extendboth the vocabulary and the intended interpreta-tion of RDF expressions. This is the mechanism weused to translate an ontology representation lan-guage to RDF.

KNOWLEDGE REPRESENTATIONThe Web is the first widely exploited many-to-many data-interchange medium, and it poses newrequirements for any exchange format:

! Universal expressive power. Because it is not pos-sible to anticipate all potential uses, a Web-based exchange format must be able to expressany form of data.

! Syntactic interoperability. Applications must beable to read the data and get a representationthat can be exploited. Software componentslike parsers or query APIs, for instance, shouldbe as reusable as possible among different appli-

cations. Syntactic interoperability is high whenthe parsers and APIs needed to manipulate dataare readily available.

! Semantic interoperability. One of the mostimportant requirements for an exchange formatis that data be understandable. Whereas syn-tactic interoperability is about parsing data,semantic interoperability is about definingmappings between terms within the data,which requires content analysis.

Using XMLXML fulfills the universal expressive power require-ment because anything for which a grammar can bedefined can be encoded in XML. It also fulfills thesyntactic interoperability requirement because anXML parser can parse any XML data, and is usuallya reusable component. When it comes to semanticinteroperability, however, XML has disadvantages.

XML’s major limitation is that it just describesgrammars. There is no way to recognize a semanticunit from a particular domain because XML aims atdocument structure and imposes no common inter-pretation of the data contained in the document.Although this limitation is at the heart of the “schema

XML-based communicationusing DTD A

Translation

<xsd:schema xmlns:xsd="http://..."> <xsd:annotation> </xsd:...</xsd:schema>

Conceptualdomain model

DTD or XML schema

Deployment

Sender Recipient

Parse tree

Application 2Application 1 XML parser

Figure 4. DTD development and point-to-point communication with XML. Using XML to establish com-munication between applications requires that the domain model be encoded in XML. Both parties mustagree on a translation schema before any communication can take place.

K N O W L E D G E N E T W O R K I N G

68 SEPTEMBER • OCTOBER 2000 h t tp ://computer.org/ in te rne t/ IEEE INTERNET COMPUTING

wars” currently raging at forums such as Biztalk.comand RosettaNet.org, it is not yet widely recognized.

Fixed vs. flexible communication. Figure 4 (previouspage) shows two applications trying to communicatewith each other. Both agree on the use and intendedmeaning of the document structure given by DTDA, but a model of the domain of interest must bebuilt to clarify the kind of data being sent before thedata can be exchanged. (This model is usuallydescribed in terms of objects and relations, as it is inUnified Modeling Language10 or entity-relationshipmodeling.11) A DTD or an XML schema is then con-

structed from the domainmodel—usually in an ad-hoc way.

Figure 5a shows a simple bina-ry relationship, principal (owner),between two concepts, purchaseorder and company. Figure 5bshows several possibilities forencoding the relationship inXML. A DTD just describes agrammar, and there are multipleways to encode any given domainmodel into a DTD, so no directconnection remains betweenthem. It is impossible to deter-mine from the DTDs the con-cepts and relation between them,and significantly more encodingoptions exist when there are, forexample, multiple ordered rela-tionships. It is thus difficult toreengineer the domain modelfrom the DTDs. (Note, however,that the relationship depicted inFigure 5a represents a valid RDFmodel.)

The advantage of using XMLin this case is limited to thereusability of the parsing softwarecomponents. This is certainlyuseful, but this scenario dealswith a one-on-one communica-tion between parties with anadvance agreement. It neglectsthe reality of the Web, whichrequires communicating withmultiple partners that changefrequently.

Not the silver bullet. XML is use-ful for data interchange between

applications that both know what the data is, butnot for situations where new communication part-ners are frequently added. On the Web, new infor-mation sources continually become available andnew business partners join existing relationships.It is thus important to reduce the costs of addingcommunication partners as much as possible. Asthe steps in Figure 6 show, however, using XMLand DTDs (or schema) for this operation requiresmuch more effort than necessary.

One domain model cannot be mapped easily toanother because they are both encoded in DTDs. Adirect mapping based on the different DTDs is not

Encoding DTD Example XML Instance Data

<!ELEMENT PurchaseOrder (principal)><!ATTLIST PurchaseOrder id ID #REQUIRED><!ELEMENT principal (Company)><!ATTLIST Company id ID #IMPLIED>

<PurchaseOrder id="X"> <principal> <Company id="Y"/> </principal><PurchaseOrder>

<!ELEMENT principal (PurchaseOrder,Company)><!ELEMENT PurchaseOrder (#CDATA)><!ELEMENT Company (#CDATA)>

<principal> <PurchaseOrder>X</PurchaseOrder> <Company>Y</Company></principal>

<!ELEMENT PurchaseOrder (id, principal)><!ELEMENT id (#CDATA)><!ELEMENT principal (Company)><!ELEMENT Company (id)>

<PurchaseOrder> <id>X</id> <principal> <Company> <id>Y</id> </Company> </principal>

</PurchaseOrder>

<!ELEMENT rel EMPTY><!ATTLIST rel src CDATA #REQUIRED type CDATA #REQUIRED dest CDATA #REQUIRED>

<rel src="X" type="principal" dest="Y"/>

<!ELEMENT PurchaseOrderInfo (Company)><!ATTLIST PurchaseOrderInfo orderID ID #REQUIRED><!ELEMENT Company (#CDATA)>

<PurchaseOrderInfo orderID="X"> <Company>Y</Company></PurchaseOrderInfo>

Principal

(a)

(b)

Purchaseorder Company

Figure 5. Binary relationship (a) and XML encoding possibilities (b). The same informa-tion can be represented in many ways in XML. Communication between parties can bedifficult because they must first establish a common understanding of the information.

T H E S E M A N T I C W E B

69IEEE INTERNET COMPUTING h t tp ://computer.org/ in te rne t/ SEPTEMBER • OCTOBER 2000

possible as the task is not to map grammars to eachother, but to map objects and relations betweendomains of interest. Therefore, we must reengineerthe original domain models and define the mappingsbetween the concepts and relationships. (Techniquesdeveloped in knowledge engineering and databaseresearch are often helpful for this task.12,13) After-ward, the additional step of defining the mappingbetween DTDs must be performed.

To exchange XML documents, the domain map-pings must be translated using mapping proceduressuch as XSL Transformations (XSLT) for grammars.This is again a high-effort task and depends on theencoding used to construct the initial DTDs. Addi-tional effort is required in translating the reengi-neered domain model into an XML DTD and gen-erating mapping procedures for XML documentsbased on established domain mappings. Using amore suitable formalism than pure XML for datatransfer can save much of this additional effort.

Using RDFRDF’s nested object-attribute-value structure sat-isfies our universal expressive power requirementfor an exchange format, although this is not easy tosee. Application-independent RDF parsers are alsoavailable, so RDF fulfills our syntactic interoper-ability requirement as well.

When it comes to semantic interoperability,RDF has significant advantages over XML: Theobject-attribute structure provides natural seman-tic units because all objects are independent enti-ties. A domain model—defining objects and rela-tionships—can be represented naturally in RDF, sotranslation steps are not necessary as they are withXML. To find mappings between two RDF descrip-tions, techniques from research in knowledge rep-resentation are directly applicable. Of course, thisdoes not solve the general interoperability problemof finding semantics-preserving mappings betweenobjects, but using RDF for data interchange raisesthe level of potential reuse of software componentsmuch beyond parser reuse, which is all XML offers.Furthermore, the RDF model (and software usingthe RDF model) can still be used even if the currentXML syntax changes or disappears because RDFdescribes a layer independent of XML.

Ideally, we would like a universal shared knowl-edge-representation language to support theSemantic Web, but for a variety of pragmatic andtechnological reasons, this is unachievable in prac-tice. Instead, we will have to live with a multitudeof metadata representations. RDF contains asmuch knowledge-representation technology as canbe shared between widely varying metadata lan-guages. Furthermore, the RDF schema language is

<xsd:schema xmlns:xsd="http://..."> <xsd:annotation>A-Schema </xsd:...</xsd:schema>

<xsd:schema xmlns:xsd="http://..."> <xsd:annotation>B-Schema </xsd:...</xsd:schema>

<xsl:stylesheet version="1.0” xmlns:xsl="http://....Transform" <xsl:template match="/"> .... </xsl:template></xsl:stylesheet>

<xsl:stylesheet version="1.0” xmlns:xsl="http://....Transform" <xsl:template match="/"> .... </xsl:template></xsl:stylesheet>

Matching

Step 1:Reengineerconceptual model

Step 2:Matchdomain model rules to XML documenttranslation rules

DTD A DTD B

Step 3:Translate XSLT document fromDTD A to DTD B(and B to A)

Figure 6. Alignment of conceptual models. Several additional steps are necessary in XML to add new communication part-ners to an existing communication, including the reengineering of the conceptual model used to construct the DTDs.

K N O W L E D G E N E T W O R K I N G

70 SEPTEMBER • OCTOBER 2000 h t tp ://computer.org/ in te rne t/ IEEE INTERNET COMPUTING

powerful enough to define richer languages on topof RDF’s limited primitives.

ENRICHING RDFBefore showing how RDF can be enriched todefine sophisticated data models, we recall Brach-man’s distinction of the three layers in a knowledgerepresentation system14:

! the implementation level consists of data struc-tures for a particular implementation;

! the logic level defines, in an abstract way, theinferences that are performed by the system; and

! the epistemological level defines adequate repre-sentation primitives for expressing knowledgein a convenient way—usually those used by aknowledge engineer.

The epistemological level is usually defined by agrammar that defines the language of interest, but therepresentation primitives can also be regarded as anontology—as objects of a particular domain. Usingthis view, domain-modeling techniques are again suit-able for defining a knowledge representation lan-guage. Thus, the epistemological level is itself anontology that defines the terms of the representation

language. Defining an ontology in RDF means defin-ing an RDF schema, which specifies all the conceptsand relationships of the particular language. (Oneexample of a language definition as an ontology is theRDF schema, which is defined in terms of itself.9)

Defining a More ExpressiveOntology LanguageFigure 7 shows how the RDF schema mechanism canbe used to define elements of OIL. The shadedellipses are elements that must be added to the exist-ing schema definition to obtain a schema for OIL.

We illustrate this principle with an examplefrom the ontology in Figure 1. The definition ofherbivore uses the subClassOf modeling primitive,which is a relation that identifies the two argu-ments as objects: the class (herbivore) and a sophis-ticated class expression (animal AND NOT carni-vore). The class expression can justifiably beviewed as an object because the expression animalAND NOT carnivore indeed defines a new(unnamed) class.

Defining the language primitives as an ontologyresults in the RDF graph depicted in Figure 7, whichdefines several properties and classes. The classoil:ClassExpression is a placeholder class that groups

Modeling Semantics

There are two main approaches to modeling semantics incomputer science: declarative and procedural semantics.

With declarative semantics, an expression E is givenmeaning by mapping it to another well-understood for-malism, or by stating the conclusions or properties that fol-low from E. The expression can be understood without ref-erence to any specific computational procedure, which iswhy this approach is dubbed “declarative.”

Using procedural semantics, expression E is given mean-ing by referring to the behavior that some real or virtualprocedure (or program, or machine) will exhibit on E. Oftenthe only way to obtain the expression’s meaning using pro-cedural semantics is to simply execute the procedure andobserve the outcome.

This difference between declarative and proceduralsemantics loosely coincides with the difference between theXML and RDF approaches to Web-page semantics. As we’veargued, an XML expression has no inherent semantics, and itsmeaning is only determined by the actions that one or moreprograms undertake on it (whether tag-nesting is interpretedas part-of or as a subtype-of, for instance). An RDF expres-sion, on the other hand, has a specific declarative semantics

(such as the intended meaning of subClassOf), and this isspecified independently of any RDF processor; that is, all RDFprocessors must conform to the intended semantics.

Together with the W3C, we stand in a long tradition incomputer science and AI, which argues that the declarativeapproach to semantics leads to more sharable and exten-sible information and knowledge sources. The argumentsabout XML versus RDF do not change substantially whenXML schema is used instead of XML DTDs for specifyingXML document structure.

Readers might be tempted to compare XML schema’s“type-extension” mechanism with the subclassOf mecha-nism in RDF schema, but the similarity between them is onlysuperficial. In fact, the type-extension mechanism cannotbe used to model ontological subtypes at all: in XMLschema, if type T′ is derived from type T, then elements ofthe derived type T′ are not necessarily members of the orig-inal type T. In the subClassOf relationship in RDF schema,on the other hand, a member of a subclass is also a mem-ber of the original superclass. As a result, subClassOf canbe used to model ontological subtyping, whereas XMLschema’s type extension cannot.

T H E S E M A N T I C W E B

71IEEE INTERNET COMPUTING h t tp ://computer.org/ in te rne t/ SEPTEMBER • OCTOBER 2000

various types of class expressions for definitional pur-poses. oil:AND and oil:NOT are two particular typesof class expressions. The property oil:hasOperand isan auxiliary property needed to connect an opera-tor-type class expression with another class expres-sion. The existing description of the primitiverdfs:subClassOf is extended so that the range nowincludes oil:ClassExpression (instead of just rdfs:Class).Also, the existing description of rdfs:Class is extend-ed: It is also a subclass of oil:ClassExpression (becauseit is considered a particular type of class expression).

Using the graph in Figure 7, the expression:

class-def defined herbivore subclass-of animal,NOT carnivore

can be described in RDF as depicted in Figure 8.Because every ontology (RDF schema) uses its

own namespace (we chose the prefix oil), termsfrom different ontologies can be mixed in one RDFdocument without confusion.3 RDF defines a clearobject structure, so it is possible to make assertionswith one language about an object defined in termsof another language. For example, we could mixthe ontology with behavior statements about dif-ferent animal classes using a finite-state automatonlanguage.15 This is not possible in XML because atag’s meaning (object, attribute, value, and so on)is not defined, and nothing can be assumed aboutthe object structure.

Using OILThe domain ontology defines a vocabulary—prop-erties and classes—that can be used to write instance

information in RDF. We propose a mechanism forextending the RDF data model with modeling prim-itives from any ontology language. Our approach forusing the primitives from ontology language L todescribe a particular domain has three steps:

! Step 1. Describe language L’s modeling primi-tives using RDF schema (effectively writing themeta-ontology of L in RDF schema).

! Step 2. Describe a specific ontology in L usingthe resulting RDF schema document.

! Step 3. Use the RDF schema documents todescribe instances of the specific L ontologymodeled in step 2.

Table 1 (next page) lists the expression types foreach step, along with specific examples of whateach produces and the RDF vocabulary used forencoding. This three-step approach suggests anadditional requirement for an ontology languageL: Because RDF schema is already an ontology def-inition language, L must be compatible with it.Thus, existing RDF schema processors can makemaximum use of ontologies defined in L. The sametools are applicable for all possible Ls, which leadsto flexibility in designing customized languages.

MERGING AN ONTOLOGYLANGUAGE WITH RDF SCHEMADefining an ontology language as an extension ofRDF schema means every RDF-schema ontologyis valid in the new language (for example, an OILprocessor will also understand RDF schema).However, by defining the new language as closely

rdf:Property

rdf:type

rdf:type

rdf:typerdf:type

rdf:typeoil:ClassExpressionoil:hasOperand

rdf:subClassOf

rdf:subClassOfrdf:subClassOf

rdfs:Class

oil:NOToil:AND

rdfs:domain

rdfs:domain

rdfs:range

rdfs:range

Figure 7. RDF graph. The relationships between RDF schema primitives and OIL modeling primitives aredescribed using RDF schema.

K N O W L E D G E N E T W O R K I N G

72 SEPTEMBER • OCTOBER 2000 h t tp ://computer.org/ in te rne t/ IEEE INTERNET COMPUTING

as possible to RDF schema, we also maximize reuseof existing RDF-schema-based applications andtools. Because the ontology language usually con-tains vocabulary the RDF schema processor doesnot know, however, 100-percent compatibility isnot possible.

In OIL, a class can be a subclass of a Booleanclass expression, but the original RDF subclassstatement only allows primitive classes as values.We thus had to extend the OIL subclass statementdefinition. We also introduced oil:SlotConstraint toallow restrictions on slots in class definitions—another aspect of OIL that is unavailable in RDFschema. To maintain maximum compatibility withexisting applications, however, it is recommendedto use the RDF schema vocabulary wherever pos-

sible (for a case study on how to do this with OIL,see Broekstra et al.16).

RDF follows an object-attribute-value model,so we made a basic design decision that everyOIL expression would be an object. To allowsubexpressions, we introduced auxiliary attributesthat do not correspond to any original OILvocabulary (oil:hasOperand, oil:hasClass, andoil:hasProperty).

The major integration points between RDF/RDFschema and OIL are defined by the abstract OIL classClassExpression. Furthermore, OIL slots are realizedas instances of rdf:Property or as subproperties of theoriginal rdf:Property. The subslot relationship is alsoexpressed by original RDF means—namely, therdfs:subPropertyOf relationship. rdf:Property is enriched

rdf:Class

rdf:type rdf:type

rdf:type

herbivorerdf:subClassOf

oil:hasOperand

oil:hasOperand

oil:hasOperandoil:AND carnivore

oil:NOT

animal

Figure 8. RDF graph for the RDF-OIL definition of herbivore. The figure shows the subclass definition ofherbivore (Figure 1) expressed as an RDF graph. Blank ovals indicate anonymous resources, which arenecessary as intermediate nodes.

Table 1. Steps in the OIL approach to using an ontology language to extend RDF.

Step Expression Type Example Encoding

1 Modeling primitives oil:AND, oil:NOT, … RDF: Meta-ontology in RDF schemaof ontology language L

2 Specific ontology Class-def giraffe RDF: Ontology (using meta-ontologyexpressed in L Subclass of herbivore and RDF schema)

Slot-constraint eatsValue-type leaf

3 Instances of the specific animal12-eats-leaf34… RDF (RDF schema, meta-ontology,ontology and ontology)

T H E S E M A N T I C W E B

73IEEE INTERNET COMPUTING h t tp ://computer.org/ in te rne t/ SEPTEMBER • OCTOBER 2000

by several properties that specify inverse and transi-tive roles and cardinality constraints, which were notoriginally possible in RDF/RDF schema (for details,see Horrocks et al.2).

Class definitions are inherited from the originalrdfs:Resource from the RDF schema specification.Classes can be related with arbitrary class expres-sions via subclass or equivalence relationships. Bydoing this, existing classes from RDFS vocabulariescan be accessed and refined in OIL descriptions.Table 2 contains some of the OIL mappings.

We have implemented an inference engine forOIL based on the description logic inference engine,FaCT (http://www.cs.man.ac.uk/~horrocks/FaCT/).We have also performed several case studies, includ-ing modeling an ontology for the CIA World Fact-book (see http://www.ontoknowledge.org/oil/), totest the overall framework’s flexibility. Furthermore,the DARPA Agent Markup Language (http://www.daml.org) uses the principles of OIL as well.

CHALLENGESThe Web community currently regards XML as themost important step toward semantic integration,but we argue that this is not true in the long run.Semantic interoperability will be a sine qua non forthe Semantic Web, but it must be achieved byexploiting the current RDF proposals, rather thanXML labeling. The RDF data model is sound, andapproaches from artificial intelligence and knowl-edge engineering for establishing semantic inter-operability are directly applicable to extending it.

Our experience with OIL shows that this propos-al is feasible, and a similar strategy should apply toany knowledge-modeling language. The challenge isnow for the Web and AI communities to expand thisgeneric method for creating Web-enabled, special-purpose knowledge representation languages. !

REFERENCES1. T. Berners-Lee, Weaving the Web, Harper, San Francisco,

1999.

2. I. Horrocks et al., “The Ontology Interchange Language

OIL,” tech. report, Free Univ. of Amsterdam, 2000; avail-

able online at http://www.ontoknowledge.org/oil/.

3. T. Bray, J. Paoli, and C.M. Sperberg-McQueen, “Extensible

Markup Language (XML) 1.0,” W3C Recommendation,

Feb. 1998; available online at http://www.w3.org/TR/

REC-xml.

4. H.S. Thompson et al., “XML Schema Part 1: Structures,”

W3C, work-in-progress, current as of Apr. 2000; available

online at http://www.w3.org/TR/2000/WD-xmlschema-

1-20000407/.

5. P.V. Biron and A. Malhotra, “XML Schema Part 2:

Datatypes,” work-in-progress, current as of Apr. 2000;

available online at http://www.w3.org/TR/2000/

WD-xmlschema-2-20000407/.

6. P. Hoschka, “Synchronized Multimedia Integration Lan-

guage (SMIL) 1.0 Spec.,” W3C Recommendation, June

1998; available online at http://www.w3.org/TR/REC-smil/.

7. J. Clark, “XSL Transformations (XSLT),” W3C Recom-

mendation, Nov. 1999; available online at http://www.w3.

org/TR/xslt/.

8. O. Lassila and Ralph Swick, “Resource Description Frame-

work (RDF) Model and Syntax Specification,” W3C Rec-

ommendation, Feb. 1999; available online at http://

www.w3.org/TR/REC-rdf-syntax/.

9. D. Brickley and R. Guha, “Resource Description Frame-

work (RDF) Schema Specification,” W3C Candidate Rec-

ommendation, Mar. 2000; available online at

http://www.w3.org/TR/2000/CR-rdf-schema-20000327/.

10. M. Page-Jones and L.L. Constantine, Fundamentals of

Object-Oriented Design in UML, Addison-Wesley-Long-

man, Reading, Mass., 1999.

11. R. Barker, Entity Relationship Modeling, Addison-Wesley,

Boston, Mass., 1990.

12. J. Jannink et al., “An Algebra for Semantic Interoperation of

Semistructured Data,” Proc. IEEE Knowledge and Data Eng.

Exchange Workshop, IEEE Computer Soc. Press, Los Alami-

tos, Calif., 1999.

13. D.L. McGuinness et al., “The Chimaera Ontology Environ-

ment,” Proc. 17th Nat’l Conf. Artificial Intelligence (AAAI

2000), AAAI Press, Menlo Park, Calif., 2000.

14. R.J. Brachman, “On the Epistemological Status of Seman-

tic Networks,” in Associative Networks: Representations and

Use of Knowledge by Computers, N.V. Findler, ed., Acade-

mic Press, 1979, pp. 3-50.

15. S. Melnik, H. Garcia-Molina, and A. Paepcke, “A Media-

tion Infrastructure for Digital Library Services,” Proc. ACM

Digital Libraries Conf., ACM Press, New York, 2000.

Table 2. OIL vocabulary mapping to RDF schema.

OIL original vocabulary RDF vocabulary

Class-def rdfs:Class

Subclass-of rdfs:subClassOf

Slot constraint oil:hasSlotConstraintoil:SlotConstraint

AND oil:AND

NOT oil:NOT

Has-value oil:hasValue

K N O W L E D G E N E T W O R K I N G

74 SEPTEMBER • OCTOBER 2000 h t tp ://computer.org/ in te rne t/ IEEE INTERNET COMPUTING

16. J. Broekstra et al., “OIL: A Case Study in Extending RDF-

Schema,” tech. report, Vrije Universiteit, Amsterdam, 2000;

available online at http://www.ontoknowledge.org/oil/.

Stefan Decker is a postdoctoral fellow in the computer science

department at Stanford University and the cofounder of

Ontoprise, a startup company focusing on ontology-based

knowledge management. He consults frequently on RDF,

XML, and interoperability issues.

Sergey Melnik is a visiting researcher in the computer science

department at Stanford University, where he works on data-

base and interoperability topics with special attention to

RDF and XML. His research interests include knowledge

representation and database systems for the Web, informa-

tion integration, and digital libraries.

Frank Van Harmelen is a senior lecturer with the artificial intel-

ligence research group at the Vrije Universiteit, Amsterdam.

His current interests include specification languages for

KBS, using these languages for validation and verification

of KBS, developing gradual notions of correctness for KBS,

and verifying weakly structured data.

Dieter Fensel is an associate professor at Vrije Universiteit, Ams-

terdam. His research interests include problem-solving

methods of knowledge-based systems, and the use of

ontologies to mediate access to heterogeneous knowledge

sources and apply them in knowledge management and

electronic commerce.

Michel Klein is a PhD student in the Knowledge Engineering

and Reasoning Group of the Vrije Universiteit in Amster-

dam. His research interests include RDF and RDF schema,

ontology modeling, maintenance, and integration, as well

as representation, visualization, and querying of semi-

structured data.

Jeen Broekstra is a PhD student in the knowledge engineer-

ing and reasoning group of the Vrije Universiteit in Ams-

terdam, and a knowledge engineer working for Aidmin-

istrator Nederland B.V. His research interests include

ontology modeling, maintenance, and integration, as well

as representation, visualization, and querying of semi-

structured data.

Michael Erdmann is a PhD student at the Institute for Applied

Computer Science and Formal Description Methods

(AIFB) at the University of Karlsruhe. His research inter-

ests include semantic knowledge modeling with ontologies,

and intelligent Web applications based on KR techniques

and open Web standards.

Ian Horrocks is a lecturer in computer science at the Universi-

ty of Manchester, UK. His research interests include knowl-

edge representation, automated reasoning (particularly deci-

sion procedures for description and modal logics),

ontological engineering, and optimizing, testing, and eval-

uating reasoning systems.

Readers can contact Decker at [email protected].

How to Write for IC . . .IEEE Internet Computing is a bimonthly magazine focused on Internet-based applications and supporting tech-nologies. We seek articles on the use and development of Internet applications, services, and technologies that letpractitioners leverage them in engineering and applying the Internet toolset. We aim to support individual engi-neers, as well as groups, in collaborative and coordinated work.

All articles are subject to peer review and should be submitted in HTML or a common format (such as PostScript)easily read by reviewers. Submissions should be relevant to the typical professional subscriber of IC and should illus-trate the applicability or effect of a specific Internet-based technology. Fielded, tested applications with hard resultsare preferred. Prototypes must at least include test results. Submissions should be no longer than 6,000 words.

For detailed instructions, see our Author Guidelines athttp://computer.org/internet/edguide.htm.