The schema theory for semantic link networkh.zhuge/data/SLG-FGCS2010-schema...Schema::::,:::::

13
Future Generation Computer Systems 26 (2010) 408–420 Contents lists available at ScienceDirect Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs The schema theory for semantic link network Hai Zhuge a,* , Yunchuan Sun a,b,c a China Knowledge Grid Research Group, Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, China b Graduate School of Chinese Academy of Sciences, Beijing, China c School of Economics and Business Administration, Beijing Normal University, 100875, Beijing, China article info Article history: Received 30 January 2009 Received in revised form 18 May 2009 Accepted 29 August 2009 Available online 6 September 2009 Keywords: Data models Normal form Schema Semantic link network Semantic web abstract The Semantic Link Network (SLN) is a loosely coupled semantic data model for managing Web resources. Its nodes can be any type of resource. Its edges can be any semantic relation. Potential semantic links can be derived out according to reasoning rules on semantic relations. This paper proposes the schema theory for the SLN, including the concepts, rule-constraint normal forms, and relevant algorithms. The theory provides the basis for normalized management of semantic link network. A case study demonstrates the proposed theory. © 2009 Published by Elsevier B.V. 1. Introduction The schema of a relational database defines the structure of the database. It defines a set of relations with attributes and the de- pendencies among attributes. The normalized theory of the rela- tional schema is to ensure high consistency, low redundancy and better efficiency [1,2]. A relational data model is limited in repre- senting rich semantic relationships between various resources and supporting reasoning on semantic relations. The Semantic Web aims at making Web resources machine- understandable by enriching semantics in resources [3]. XML (eXtensible Markup Language) is to describe the structure in Web resources for cross-platform information sharing (www.w3. org/XML). The XML schema defines a set of syntaxes and rules to express the shared vocabularies (www.w3.org/XML/Schema). It provides a means for defining the structure, content and semantics of XML documents. Based on XML, many markup lan- guages have been proposed. RDF (Resource Description Frame- work, www.w3.org/TR/2004/REC-rdf-mt-20040210) focuses on describing the universal resources on the Web by an object- attribute-value triple. RDF Schema (RDFS, www.w3.org/TR/2004/ REC-rdf-schema-20040210) defines a set of syntaxes to store the metadata of resources with XML syntax and provides basic RDF * Corresponding author. Fax: +86 1062562703. E-mail addresses: [email protected] (H. Zhuge), [email protected] (Y. Sun). URL: http://www.knowledgegrid.net/ h.zhuge (H. Zhuge). vocabularies for structuring RDF resources. RDFS is still weak in expressing rich semantic relationships and supporting relational reasoning. OWL (Web Ontology Language) is designed to describe the semantics of the resources themselves with ontologies and semantic relationships between resources with roles (www.w3. org/2004/OWL). It can represent the meaning of terms in vocab- ularies explicitly and the relationships between those terms. Its logical foundation is description logics which has the decidabil- ity of ontology consistency. The Rule Markup Language (RuleML) is to express rules in XML for deduction, rewriting, and further inferential-transformational tasks (www.ruleml.org). The Seman- tic Web Rule Language (SWRL) is based on the combination of OWL with RuleML [4]. The Semantic Link Network (SLN) is a loosely coupled semantic data model for managing Web resources with the following main features of the Web: (1) Easy to build and easy to use; and, (2) Any semantic node can semantically link to any other semantic node. A semantic link network instance is a directed graph, denoted as S (ResourceSet, LinkSet ), where S is the name of the semantic link network, ResourceSet is a set of resources, and LinkSet is a set of se- mantic links in the form of R α -→ R 0 , where R, R 0 ResourceSet , and α is a semantic factor representing a semantic relation between R and R 0 . A set of reasoning rules on semantic links enables a se- mantic link network to derive out potential semantic links. The 0167-739X/$ – see front matter © 2009 Published by Elsevier B.V. doi:10.1016/j.future.2009.08.012

Transcript of The schema theory for semantic link networkh.zhuge/data/SLG-FGCS2010-schema...Schema::::,:::::

Page 1: The schema theory for semantic link networkh.zhuge/data/SLG-FGCS2010-schema...Schema::::,:::::

Future Generation Computer Systems 26 (2010) 408–420

Contents lists available at ScienceDirect

Future Generation Computer Systems

journal homepage: www.elsevier.com/locate/fgcs

The schema theory for semantic link network

Hai Zhuge a,∗, Yunchuan Sun a,b,ca China Knowledge Grid Research Group, Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100190, Beijing, Chinab Graduate School of Chinese Academy of Sciences, Beijing, Chinac School of Economics and Business Administration, Beijing Normal University, 100875, Beijing, China

a r t i c l e i n f o

Article history:Received 30 January 2009Received in revised form18 May 2009Accepted 29 August 2009Available online 6 September 2009

Keywords:Data modelsNormal formSchemaSemantic link networkSemantic web

a b s t r a c t

The Semantic Link Network (SLN) is a loosely coupled semantic data model for managingWeb resources.Its nodes can be any type of resource. Its edges can be any semantic relation. Potential semantic links canbe derived out according to reasoning rules on semantic relations. This paper proposes the schema theoryfor the SLN, including the concepts, rule-constraint normal forms, and relevant algorithms. The theoryprovides the basis for normalized management of semantic link network. A case study demonstrates theproposed theory.

© 2009 Published by Elsevier B.V.

1. Introduction

The schema of a relational database defines the structure of thedatabase. It defines a set of relations with attributes and the de-pendencies among attributes. The normalized theory of the rela-tional schema is to ensure high consistency, low redundancy andbetter efficiency [1,2]. A relational data model is limited in repre-senting rich semantic relationships between various resources andsupporting reasoning on semantic relations.The Semantic Web aims at making Web resources machine-

understandable by enriching semantics in resources [3]. XML(eXtensible Markup Language) is to describe the structure inWeb resources for cross-platform information sharing (www.w3.org/XML). The XML schema defines a set of syntaxes and rulesto express the shared vocabularies (www.w3.org/XML/Schema).It provides a means for defining the structure, content andsemantics of XML documents. Based on XML, many markup lan-guages have been proposed. RDF (Resource Description Frame-work, www.w3.org/TR/2004/REC-rdf-mt-20040210) focuses ondescribing the universal resources on the Web by an object-attribute-value triple. RDF Schema (RDFS, www.w3.org/TR/2004/REC-rdf-schema-20040210) defines a set of syntaxes to store themetadata of resources with XML syntax and provides basic RDF

∗ Corresponding author. Fax: +86 1062562703.E-mail addresses: [email protected] (H. Zhuge), [email protected] (Y. Sun).URL: http://www.knowledgegrid.net/∼h.zhuge (H. Zhuge).

0167-739X/$ – see front matter© 2009 Published by Elsevier B.V.doi:10.1016/j.future.2009.08.012

vocabularies for structuring RDF resources. RDFS is still weak inexpressing rich semantic relationships and supporting relationalreasoning. OWL (Web Ontology Language) is designed to describethe semantics of the resources themselves with ontologies andsemantic relationships between resources with roles (www.w3.org/2004/OWL). It can represent the meaning of terms in vocab-ularies explicitly and the relationships between those terms. Itslogical foundation is description logics which has the decidabil-ity of ontology consistency. The Rule Markup Language (RuleML)is to express rules in XML for deduction, rewriting, and furtherinferential-transformational tasks (www.ruleml.org). The Seman-ticWeb Rule Language (SWRL) is based on the combination of OWLwith RuleML [4].The Semantic Link Network (SLN) is a loosely coupled semantic

data model for managing Web resources with the following mainfeatures of the Web:

(1) Easy to build and easy to use; and,(2) Any semantic node can semantically link to any other semanticnode.

A semantic link network instance is a directed graph, denotedas S(ResourceSet, LinkSet), where S is the name of the semantic linknetwork, ResourceSet is a set of resources, and LinkSet is a set of se-mantic links in the form of R

α−→ R′, where R, R′ ∈ ResourceSet , and

α is a semantic factor representing a semantic relation betweenR and R′. A set of reasoning rules on semantic links enables a se-mantic link network to derive out potential semantic links. The

Page 2: The schema theory for semantic link networkh.zhuge/data/SLG-FGCS2010-schema...Schema::::,:::::

H. Zhuge, Y. Sun / Future Generation Computer Systems 26 (2010) 408–420 409

Global SLN Schema

SLN Instances

SLN Instances

Sub-Schema Sub-Schema

SLN Instances

Abstract SLNAbstract SLN

Fig. 1. The role of SLN schema.

basic concept and model of the SLN have been introduced in [5–11]. More references are available at www.knowledgegrid.net/∼h.zhuge/SLN.htm.The motivation of this paper is to construct a schema theory

and a rule-constraint normalized theory for SLN construction andresource management.An SLN schema specifies resource types, semantic link types,

and reasoning rules. The resources and semantic links instanceare regulated by the resource types and semantic link types. Thereasoning on instances is based on the reasoning rules defined bythe schema. The SLN schema provides a blueprint to build SLNinstances and provides a way to normalize the SLN instances. Theglobal SLN schema reflects a consensus on the basic semantics ofthe domain. Users can define SLN instances by instantiating theglobal schema, or define a sub-schema according to the globalschema first and then instantiating the sub-schemas.Fig. 1 shows the role of schema in developing semantic link

networks. There are two ways to form the SLN schema:

(1) defined by domain experts; and,(2) induced from existing abstract semantic link networks or fromSLN instances [8].

2. Schema for the semantic link network

The schema of the SLN is a triple S(ResourceTypes, LinkTypes,Rules). ResourceTypes is a set of resource types denoted as{rt1, rt2, . . . , rtk}, where rti is defined by its field. LinkTypes is aset of semantic link types, where each takes the form of rti

α−→ rtj,

where rti, rtj ∈ ResourceTypes, and α is a semantic relation definedby its field. The Rules is a set of reasoning rules on link types. Aformal definition of semantic link network is given in [5].The possible semantic relationship types between two re-

sources are determined by the types of the start resource and theend resource. For example, the possible types of a semantic link be-tween a researcher and a paper are authorOf, editorOf, and readerOf,but not fatherOf. So a semantic link instance with semantic factorα from a resource R of type rti to another resource R′ of type rtj canbe described as R

α−→ R′. For two resource types rti and rtj, we use

[rti, rtj] to denote the set of all semantic link types with the startresource type rti and the end resource type rtj.For a pair of resource types, relationships between semantic link

types can be classified into the following three categories.

(1) Implication. A semantic link type α implies semantic link typeβ , denoted as α ⇒ β , between the same pair of resources asshown in Fig. 2(d).

(2) Compatible. Two semantic link types α and β do not affect eachother.

(3) Incompatible. Two semantic link types α and β cannot co-occurbetween the same pair of resources.

For a semantic link Rω−→ R′ between two resources R and R′,

the reversion is a semantic link R′ω−1−→ R, whichmeans that if there

is a semantic relationship ω from R to R′, then there is a semanticrelationship ω−1 from R′ to R [6].A reasoning rule takes the following form as shown in

Fig. 2(a): Rα−→ R′, R′

β−→ R′′ ⇒ R

γ−→ R′′ denoted as α ·

β ⇒ γ in abbreviation. Fig. 2(b) and (c) show the following

two forms of reasoning rule: Rα−→ R′, R

β−→ R′′ ⇒ R′

γ−→ R′′

is equivalent to R′α−1−→ R, R

β−→ R′′ ⇒ R′

γ−→ R′′, i.e., α−1 ·

β ⇒ γ . R′α−→ R, R′′

β−→ R ⇒ R′

γ−→ R′′ is equivalent to

R′α−→ R, R

β−1

−→ R′′ ⇒ R′γ−→ R′′, i.e., α · β−1 ⇒ γ .

Lemma 1. The following two kinds of rules hold:(1) α · β ⇒ γ is equivalent to β−1 · α−1 ⇒ γ−1, and,(2) α−1 · β ⇒ γ is equivalent to β−1 · α ⇒ γ−1.

Fig. 2(d) shows the following reasoning rule: Rα−→ R′ ⇒

Rβ−→ R′, in simple α ⇒ β , which means that the semanticrelationship α is stronger than the semantic relationship βbetween two resources.

Proposition 1. For a rule α ⇒ β , if α ∈ [rti, rtj], then β ∈ [rti, rtj].

Proposition 2. For a rule α · β ⇒ γ , if α ∈ [rti, rtj], β ∈ [rtj, rtk],then γ ∈ [rti, rtk].

3. SLN operations

We first define the notion of sub-schema and reference relationbetween schemas. For the SLN schemas S(R, L, rs) and S ′(R′, L′, rs′),if R′ ⊆ R and L′ ⊆ L, S ′ is called a sub-schema of S, denoted asS ′ ⊆ S. If two schemas S(R, L, rs) and S ′(R′, L′, rs′) share a semantic

link type, for instance, rtiα−→ rtj in L is identical to rt ′i

α′

−→ rt ′j inL′, a reference relation Ref (α, α′) can be built from α to α′, whichmeans that for two instances S and S ′ under S and S ′ respectively,all semantic links with type of α′ in S ′ can be used in S as semanticlinks with type of α.The following are operations on semantic link networks.

Page 3: The schema theory for semantic link networkh.zhuge/data/SLG-FGCS2010-schema...Schema::::,:::::

410 H. Zhuge, Y. Sun / Future Generation Computer Systems 26 (2010) 408–420

a b c d

Fig. 2. Basic forms of reasoning rule.

(1) Union. The union of two semantic link networks S(RS, LS) andS ′(RS ′, LS ′) under the same schema S is a new semantic linknetwork S(RS ∪ RS ′, LS ∪ LS ′). The union operation does notguarantee to generate a connective semantic link network.The union operation enables user or application to operate ondifferent semantic link networks as a whole.

(2) Intersection. The intersection of two semantic link networksS(RS, LS) and S ′(RS ′, LS ′) under the same schema S is a newsemantic link network S(RS ∩ RS ′, LS ∩ LS ′). The intersectionoperation does not guarantee to generate a connectivesemantic link network. The intersection operation enables useror application to operate on the common part of differentsemantic link networks.

(3) Selection. A selection of a semantic link network S, σP(S), isa new semantic link network satisfying condition P , a logicalexpression for selecting a set of resources or a set of seman-tic links from S. The selection operation enables a user or ap-plication to operate on the interested part of a semantic linknetwork.

(4) Projection. The projection of a semantic link network S underschema S(R, L, rs) on another schema S ′(R′, L′, rs′), denoted asS ′ = ΠS′(S), is a new semantic link network derived from S byremoving all resources whose types are not in R′ and remov-ing all semantic links whose types are not in L′. Reasoning onS ′ is executed according to the rule set rs′. The projection op-eration enables a user or application to operate a semantic linknetwork from different schemas, enables a semantic link net-work to suit different schemas for different applications, andenables reasoning on a semantic link network to be localizedon a relevant schema. It is especially useful for large semanticlink network.

(5) Reasoning. The SLN reasoning is to derive some new semanticlinks from the semantic link network. An atomic reasoning ina semantic link network is an execution of a rule in rs to geta new semantic link. Reasoning is a series of atomic reasoningconnected one another over the semantic link network.

(6) Join. The join of two semantic link networks S(RS, LS)and S ′(RS ′, LS ′) under S(R, L, rs) and S ′(R′, L′, rs′) respectivelyis a new semantic link network S×P S ′ = SJ(RS J , LS J)under a new schema SJ(RJ , LJ , rsJ), where P is a set ofreferences {Ref (α1, α′1), . . . , Ref (αm, α

′m), Ref (β

1, β1), . . . ,Ref (β ′n, βn)}, LJ = (L∪L

′)−{α′1, . . . , α′m, β

1, . . . , β′n}, RJ is the

set of resources involved in LJ , rsJ is the set rs∪ rs′ by replacingα′i (or β

j ) with αi (or βj) (1 ≤ i ≤ m, 1 ≤ j ≤ n), LS J is the setLS∪LS ′ by replacingα′i (orβ

j )withαi (orβj), andRS J = RS∪RS′.

It is easy to verify that S = ΠS(SJ) and S′= ΠS′(SJ). The union

operation is a special case of the join operation. The join oper-ation enables a user or application to operate on relevant se-mantic link networks as a whole.

(7) Decomposition. A decomposition of S(R, L, rs) is a set of sub-schemas S1(R1, L1, rs1), . . ., and Sm(Rm, Lm, rsm), if there doesnot exist any pair of (Si, Sj) such that Si is a sub-schema of Sj;and R = R1 ∪ R2 ∪ · · · ∪ Rm, and L = L1 ∪ L2 ∪ · · · ∪ Lm.

At the instance level, a semantic link network S under S hasthe corresponding decomposition {S1, S2, . . . , Sm}, where eachSi is under the schema Si respectively. And for all i, we haveSi =

∏Si(S). The decomposition operation enables a user or ap-

plication to operate on a small schema to raise the efficiencywhen facing a large schema.

(8) Query. A query on semantic link networks is a new semanticlink network from a combination of some operations of union,intersection, reasoning, selection, projection, or join.

Two semantic link networks S and S ′ are called equivalent if andonly if the results from S and S ′ are identical for any query. Clearly,the equivalence among semantic link networks is symmetric, re-flexive, and transitive [6].For a given schema, there are many kinds of decomposition,

but some decompositions may not be good. For example, for aschema S(R, L, rs), where R = {rt1, rt2, rt3}, L includes [rt1, rt2] ={α1, α2}, [rt2, rt3] = {β1, β2} and [rt1, rt3] = {γ }, and rs ={α1 · β1 ⇒ γ }, the two sub-schemas S1(R, {α1, β2, γ }, φ)and S2({rt1, rt2}, {α2, β1}, φ) construct a decomposition. Thedecomposition is not good since it loses the reasoning rule. Adecomposition of semantic link network S is called loss-less if thejoin of the decomposition is equivalent to S. A decomposition of aschema is called loss-less if, for any instance under the schema, thecorresponding instance decomposition is loss-less.

Theorem 1. Adecomposition {S1, S2, . . . , Sm} of the schema S is loss-less if

(1) rs = rs1 ∪ rs2 ∪ . . . ∪ rsm; and,(2) for any rule r: α · β ⇒ γ in rsi, for all possible j, if γ ∈ Lj andr 6∈ rsj, there is a reference relation Ref (γ ∈ Sj, γ ∈ Si).

Proof. We need to verify that for a semantic link network S(RS,LS) under schema S is equivalent to the join of the correspondingdecomposition S1(RS1, LS1), S2(RS2, LS2), . . . , Sm(RSm, LSm)underschemas S1, S2, . . . , Sm respectively.Precondition 1 means that for each rule r : α · β ⇒ γ (or

r : α ⇒ β) in rs, there is at least one i such that r is in rsi. Accordingto the instance decomposing algorithm Si =

∏Si(S), all semantic

links with type of α, β and γ in S are included in Si. For reasoningin S, assume that rules in the sequence: (1) r1 : α1 · β1 ⇒ γ1 (orα1 ⇒ γ1), (2) r2 : α2 ·β2 ⇒ γ2 (or α2 ⇒ γ2), . . . , (k) rk : αk ·βk ⇒γk (or αk ⇒ γk), . . . , (n)rn : αn · βn ⇒ γn (or αn ⇒ γn) arefired one after another to find the γn-type semantic link betweentwo resources. Conveniently, we denote semantic links involvedin the above reasoning process as lαi , lβi , and lγi respectively forall 1 ≤ i ≤ n. Then, the reasoning process is executed as asequence of atomic reasoning: (1) lα1 · lβ1 ⇒ lγ1 (or lα1 ⇒ lγ1 ),(2) lα2 · lβ2 ⇒ lγ2 (or lα2 ⇒ lγ2), . . . , (k) lαk · lβk ⇒ lγk (or lαk ⇒lγk), . . . , (n) lαn · lβn ⇒ lγn (or lαn ⇒ lγn ) in the semantic linknetwork S. For each semantic link lαk involved in the reasoningprocess as a prerequisite, lαk ∈ LS or else the type of lαk is somelγt derived from some previous rule lαt · lβt ⇒ lγt (or lαt ⇒ lγt )(t < k) in the above sequence.

Page 4: The schema theory for semantic link networkh.zhuge/data/SLG-FGCS2010-schema...Schema::::,:::::

H. Zhuge, Y. Sun / Future Generation Computer Systems 26 (2010) 408–420 411

Fig. 3. Semantic link types relative net for RC-NF1 schemas of a research semantic link network.

For the same reasoning, we will verify that the same processcan be executed on the join of the corresponding decompositionS1, S2, . . ., and Sm under schemas S1, S2, . . ., and Sm respectively.Actually, there exists at least one i such that rk ∈ rsi for each rule rkin the above rule sequence,αk, βk, γk ∈ Li. Thus, Si under Si consistsof all semantic links with types αk, βk, and γk in S.

(1) If lα1 , lβ1 ∈ LS i, then the first atomic reasoning in the abovesequence can be executed and the semantic link lγ1 with γ1type can be deduced in Si.

(2) Assume that the first k − 1 atomic reasoning can be executedand the semantic links lγ1 , lγ2 , . . . , lγk−1 can be derived in someSi. If lαk , lβk ∈ LS j for some j, then the kth atomic reasoning canbe fired and the semantic link lγk with type γk can be deducedin Sj. Else, if lαk 6∈ LS i, then lαk is someone semantic link lγk′from the rule lαk′ · lβk′ ⇒ lγk′ (or lαk′ ⇒ lγk′ ) according to rk′ ,where 1 < k′ < k.(1) If rk′ ∈ rsi, lαk (i.e., lγk′ ) can be deduced in Si, then thesemantic link lγk with type γk can be derived.

(2) If rk′ 6∈ rsi, there exists at least one j s.t. rk′ ∈ rsj, thenwe can get lγk′ in Sj according to the assumption. Fromthe precondition 2, there is a reference relation Ref (γk′ ∈Si, γk′ ∈ Sj). Then lγk′ can be duplicated into Si. The rule rkcan be fired and the semantic link lαk can be deducted in Si.

According to the law of mathematic induction, the above showthat the semantic link lαn can be derived from the reasoning on thejoin of the decomposition {S1, S2, . . . , Sm}. �

The following deduction can be derived from the theoremimmediately.

4. Rule-constraint normal forms for the SLN schema

A reasoning rule represents the semantic relevance amongsemantic link types. For example, the rule editorOf ⇒ readerOfshows that an editor of a paper is certainly a reader of the paper.Similarly, α ·β ⇒ γ shows that γ is semantically relevant to α andβ .

Definition 1. Let S(R, L, rs) be an SLN schema, αi , αj, αk ∈ L, αi iscalled direct semantic relative to αj (denoted as αi E αj) if there isa rule in rs with one of the following forms αj ⇒ αi, αj · αk ⇒ αi,or αk · αj ⇒ αi.

If there is a sequence of semantic relative α1 E α2 E α3 E· · · E αm (m is an integer), then α1 is called semantic relative toαm, denoted as α1 C αm. We can construct a semantic relative net,as shown in Fig. 3, for an SLN schema by drawing an arrow from αito αj for each semantic relative αj E αi retrieved from the rule set.

Definition 2. Let S(R, L, rs) be an SLN schema, and M ⊆ L a setof semantic link types. The closure of M , denoted as C(M), undersemantic relative is a set of semantic link types derived from thefollowing steps.

(1) Let C(M) = M .(2) Check for each semantic dependence αi C αj, if αi ∈ C(M), letC(M) = C(M)∪{αj}; and, ifαj ∈ C(M), letC(M) = C(M)∪{αi}.

(3) Repeat from step 2 until C(M) remains unchanging.

It is easy to verify that all closures of the single semantic linktypes construct a classification for the semantic link types of theSLN schema, and we can decompose an SLN schema based on sucha classification. Actually, each disjunction part in the semantic linkrelative net forms a closure.For an isolated semantic link type α, the closure includes only

itself, i.e., C(α) = {α}. For a schema of semantic link networkS(R, L, rs), there may be a cycle of semantic relatives α1 E α2 Eα3 E · · · E αm E α1, α1, α2, . . . , αm ∈ L, called a semantic relativecycle. It is easily to verify the following lemma.

Lemma 2. Let α1 E α2 E α3 E · · · E αm E α1 be a semanticrelative cycle, where αi ∈ L (1 ≤ i ≤ m). We have:

1. C(αi) is identical to C(αj), for all 1 ≤ i, j ≤ m.2. αi ∈ C(αj), for any i, j, 1 ≤ i, j ≤ m.3. For any α ∈ L, if α E αi then α E αj, 1 ≤ i, j ≤ m.

All semantic link types at one semantic relative cycle constructan equivalent class according to Lemma 2. A cycle α1 E α2 Eα3 E · · · E αm E α1 in the semantic link type relative net canbe regarded as a unit, denoted as U(αi), where αi is any semantic

Page 5: The schema theory for semantic link networkh.zhuge/data/SLG-FGCS2010-schema...Schema::::,:::::

412 H. Zhuge, Y. Sun / Future Generation Computer Systems 26 (2010) 408–420

link type in the cycle. Therefore, a semantic relative cycle is shrunkinto a node (all out and in arrows in the cycle will be focused onthe node). We can find all cycles in the net by using the classicalgorithmswhich find the cycles in a directedmap. In the followingdiscussion, we do not mention the cycles for they are regarded assingle semantic link types.The following definition normalizes the SLN schema.

Definition 3. An SLN schema S(R, L, rs) is in Rule-ConstraintNormal Form 1 (RC-NF 1), if for any semantic link type α ∈ L,C(α) = L holds.

Lemma 3. For SLN schema S(R, L, rs) with RC-NF1 and α1, α2 ∈ L,then C(α1) = C(α2).

Any SLN schema S(R, L, rs) can be decomposed into several sub-schemas satisfying RC-NF1 by the following algorithm.

Algorithm 1. Let S(R, L, rs) be an SLN schema.(1) Compute the classification of the semantic link types in Saccording to the definition of the closure of semantic relativeand the rule set rs denoted as {L1, L2, . . . , Lk}.

(2) For each Li, 1 ≤ i ≤ k, we can get a resource type set Ri, each ofwhich is attached with the semantic link types in Li, and a ruleset rsi where each rule is only involved in semantic link typesfrom Li. Clearly, Ri ⊆ R, Li ⊆ L, and rsi ⊆ rs. Thus, Si(Ri, Li, rsi)is a sub-schema of S(R, L, rs).

(3) S(R, L, rs) is decomposed into k sub-schemas Si(Ri, Li, rsi),where 1 ≤ i ≤ k.

Theorem 2. The decomposition by Algorithm 1 is loss-less.

Proof. From the algorithm and the definition of closure, for eachrule r : α ·β ⇒ γ in rs, α, β, γ are semantic relative. There is somei such that α, β, γ ∈ Li for {L1, L2, . . . , Lk} is a classification of L. So,rs = rs1 ∪ rs2 ∪ · · · ∪ rsm. For any rule r : α · β ⇒ γ , there is onlyone i such that r ∈ rsi and α, β, γ ∈ Li due to the decompositionconstruct a classification. So, we do not need any reference relationamong schemas. According to Theorem 1, the decomposition is aloss-less one. �

The classification of the schema of semantic link type can beeasily found from the semantic relative net. Different unconnectedparts determine different sub-schemas. For a RC-NF1 SLN schema,its semantic relative net is a connected graph. That meansreasoning on such a semantic link network is closed. However, thisdoes not mean that any two semantic link types are semanticallyrelated.Let S(R, L, rs) be an SLN schema, α ∈ L is called a top semantic

link type if there is no semantic link typeα′ (6= α) such thatα C α′.α is called a bottom semantic link type if there is no semantic linktype α′ (6= α) such that α′ C α. Two top semantic link types ortwo bottom ones are not semantic relative. α is called an isolatedsemantic link type if there is no semantic link type α′ (6= α) suchthat α C α′ or α′ C α. An isolated semantic link type is both topand bottom. The following algorithm finds all bottom semantic linktypes for an SLN schema.

Algorithm 2. For an SLN schema S(R, L, rs), find the set B of allbottom semantic link types in S.

(1) Let B = {α|αi · αj ⇒ α ∈ rs}, where αi, αj ∈ L;(2) Loop for each α ∈ B. Check each rule r in rs, if α occurs in theprecondition of r and does not occur in the result of r , removeα from B; else check next rule in rs.

The bottom semantic link types cannot affect other link typesin reasoning. We can compute all semantic link types that affect acertain semantic link type.

Definition 4. Let S(R, L, rs) be an SLN schema, and α ∈ L. The up-closure of α with respect to the rule set rs, denoted as Cup(α), is aset of semantic link types derived from the following steps.

(1) Let Cup(α) = {α};(2) Check L, for each semantic relative αi C αj, if αi ∈ Cup(α), letCup(α) = Cup(α) ∪ {αj};

(3) Repeat from step 2 until Cup(α) does not change.

Lemma 4. Let α1 E α2 E α3 E · · · E αm E α1 be a semanticrelative circle, then

(1) Cup(αi) is identical to Cup(αj) for all 1 ≤ i, j ≤ m.(2) αj ∈ Cup(αi), for any i and j, 1 ≤ i, j ≤ m.

Definition 5. An SLN schema S(R, L, rs) is in Rule-ConstraintNormal Form 2 (RC-NF2), if it satisfies (1) S(R, L, rs) is RC-NF1; and,(2) β C α for bottom semantic link type β in L and any other linktype α.

Obviously, for a RC-NF2 schema, the up closure of the bottomsemantic link type β according to the rule set rs is just L, i.e., L =Cup(β).

Lemma 5. For a RC-NF2 SLN schema S(R, L, rs), it has aunique bottomsemantic link type.

Proof. Assume that there are at least two different bottom seman-tic link types αb1 6= αb2 . According to Definition 4, Cup(αb1) =Cup(αb2). So, we have αb1 ∈ Cup(αb2), αb2 E αb1 . Similarly, αb1 Eαb2 . It leads to a contradiction, so the lemma holds. �

Lemma 6. For an SLN schema S(R, L, rs), let S1(R1, L1, rs1) andS2(R2, L2, rs2) be two RC-NF2 sub-schemas. If α ∈ S1∩ S2, C

(1)up (α) in

S1 is identical to C(2)up (α) in S2.

Proof. For a semantic link type α0 ∈ C(1)up (α), α E α0. Let αb1 be

the bottom semantic link types for S1 and αb2 for S2 according toLemma 5. Obviously, αb1 E α and αb2 E α. Thus αb2 E α0, whichmeans that α0 ∈ C

(2)up (α). �

Lemma 7. S1(R1, L1, rs1) and S2(R2, L2, rs2) are two RC-NF2 sub-schemas of S(R, L, rs) with bottom semantic link types αb1 and αb2respectively. Let E = L1 ∩ L2, if E 6= φ, then there does not exist anysemantic relative α E β satisfying that α ∈ E and β 6∈ E; and, foreach semantic link type α ∈ E, we have αb1 C α and αb2 C α.

Proof. For α ∈ E, and a semantic relative α E β . E = L1 ∩ L2, thenα ∈ L1. For L1 is the closure of a bottom link type according to thesemantic relatives, then β ∈ L1. Similarly, we can get β ∈ L2. Soβ ∈ E. The second part of the Lemma is trivial from the definition.�

Reasoning is closed in a RC-NF2 SLN schema. Different semanticlink networks based on different RC-NF2 sub-schemas of the sameoriginal schema are reasoning closed and independent. And, thereasoning service is more efficient and easier to execute in a sub-schema than in the original one.For an application, the whole schema may include several sub-

schemas with RC-NF2.We can decompose an RC-NF1 SLN schema into several RC-NF2

sub-schemas according to the following algorithm.

Algorithm 3. Let S(R, L, rs) be a RC-NF1 SLN schema.

(1) Find all bottom semantic link types of S, denoted as αb1 ,αb2 , . . . , and αbm .

Page 6: The schema theory for semantic link networkh.zhuge/data/SLG-FGCS2010-schema...Schema::::,:::::

H. Zhuge, Y. Sun / Future Generation Computer Systems 26 (2010) 408–420 413

Fig. 4. Semantic link types relative nets for RC-NF2 schemas of sub-schema S1.1, S1.2 , and S1.3 .

(2) For each αb i (1 ≤ i ≤ m), compute its up-closure accordingto reasoning rule set rs, denoted as Cup(αb i). Then, we can geta resource type set Rbi , in which each element is involved atleast one link type in Lbi , and a rule set rsbi in which each rule isinvolved only link types from Lbi . Clearly, Rbi ⊆ R, Lbi ⊆ L, andrsbi ⊆ rs. Thus, Sbi(Rbi , Lbi , rsbi) is a sub-schema of S(R, L, rs).

(3) S(R, L, rs) can be decomposed into k sub-schemas Sbi(Rbi , Lbi ,rsbi), where 1 ≤ i ≤ m.

Theorem 3. The decomposition by Algorithm 3 is loss-less.

Proof. For a rule r: α · β ⇒ γ in rs, suppose γ ∈ Lbi for some i.γ C β, γ C α, so α, β ∈ Lbi and r ∈ rsbi . Then, rs = rs1 ∪ rs2 ∪· · · ∪ rsm. Definition 5 and Algorithm 3 show that the sub-schemaSbi(Rbi , Lbi , rsbi) is based on the bottom link type bi. For any ruler: α · β ⇒ γ in rsbi , for some j, if γ ∈ Lbj , then α, β ∈ Lbj andr ∈ rsj. There is no reference relation needed among sub-schemas.According to Theorem 1, the decomposition from Algorithm 3 isloss-less. �

Definition 6. Let S(R, L, rs) be a SLN schema, and α ∈ L be asemantic link type. The down closure of α with respect to the ruleset rs, denoted as Cdown(α), is a set of semantic link types derivedfrom the following steps.

(1) Let Cdown(α) = {α};(2) Check L, for each semantic relative αj C αi, if αi ∈ Cdown(α), letCdown(α) = Cdown(α) ∪ {αj};

(3) Repeat from step 2 until Cdown(α) does not change.

Intuitively, the down closure of a link type α is the set of alllink types which are affected by α. We can easily get the followingcharacters.

Lemma 8. For a top link type αt in the SLN schema S(R, L, rs), thedown closure Cdown(αt) constructs a tree with root αt in the semanticlink type relative net.

Lemma 9. For two top link types α1 and α2 in the SLN schemaS(R, L, rs), if Cdown(α1) ∩ Cdown(α2) 6= φ, then α1 and α2 are in thesame RC-NF2 sub-schemas from Algorithm 3.

In some cases, two different SC-NF2 schemas may share acommon segment. Such a common segment leads to redundancy.In Fig. 4, for example, S1.1 and S1.2 share a common segment thatincludes two semantic link types: publishedIn and editorOf. In fact,we can deal with such a redundancy by decomposing the schemasinto advanced forms and building reference relations betweenschemas.

Definition 7. For an SLN schema S(R, L, rs), let S1(R1, L1, rs1),S2(R2, L2, rs2), . . ., and Sk(Rk, Lk, rsk) be a RC-NF2 level decompo-sition. We call such a decomposition redundancy-free if Si and Sjdo not share any semantic relative for all 1 ≤ i, j ≤ m.

Algorithm 4. For an SLN schema S(R, L, rs), let S1(R1, L1, rs1) andS2(R2, L2, rs2) be two sub-schemas of the RC-NF2 level decomposi-tion from Algorithm 3. If S1 and S2 share a set of semantic relativesSR, let I = L1 ∩ L2.

(1) A new schema SI(RI , I, rsI) can be constructed according to I .(2) Decompose the schema into several sub-schemas accordingAlgorithm 2: SI1 , SI2 , . . . , and SIm .

(3) Denote βIk as the bottom link type in SIk . Build the referencerelations Ref (βIk ∈ S1, βIk ∈ SIk) and Ref (βIk ∈ S2, βIk ∈ SIk)for each k (1 ≤ k ≤ m).

We can use Algorithm 4 to repeatedly deal with redundantproblems of the decomposing sub-schemas from Algorithm 3 andthe sub-schemas from Algorithm 4 are all in RC-NF2. However, inmany cases, such a decomposition may produce some small patchsub-schemas. In fact, we can deal the redundancy by building ref-erence relations directly between the schemas rather than creat-ing new sub-schemas. The following algorithm is an amendmentof Algorithm 3. It provides an approach to decompose an RC-NF1schema into several redundancy-free RC-NF2 sub-schemas.

Algorithm 5. Let S(R, L, rs) be a RC-NF1 SLN schema.

(1) Let reflist = φ.(2) Find all bottom semantic link types of S, denoted as αb1 ,

αb2 , . . . , and αbm .

Page 7: The schema theory for semantic link networkh.zhuge/data/SLG-FGCS2010-schema...Schema::::,:::::

414 H. Zhuge, Y. Sun / Future Generation Computer Systems 26 (2010) 408–420

(3) For each αbi (1 ≤ i ≤ m), compute its up-closure according toreasoning rule set rs, denoted as Cup(αbi).

(4) For each pair of (αbi , αbj), i 6= j, let Iij = Cup(αbi) ∩Cup(αbj), find all bottom link types of Iij, denoted as β1, β2, . . .,and βk respectively. Let Lbi = Cup(αbi) and Lbj =

(Cup(αbj) − Iij) ∪ {β1, β2, . . . , βk}. Let reflist = reflist ∪{(β1, Lbj , Lbi), (β2, Lbj , Lbi), . . . , (βk, Lbj , Lbi)}.

(5) We get the sub-schema Sbi(Rbi , Lbi , rsbi) according to Lbi , andbuild reference relations Ref (βt ∈ Sbj , βt ∈ Sbi) for each(βt , Lbj , Lbi) ∈ reflist . Clearly, Rbi ⊆ R , Lbi ⊆ L, and rsbi ⊆rs. Sbi(Rbi , Lbi , rsbi) is a sub-schema of S(R, L, rs).

Thus, S(R, L, rs) is decomposed intom sub-schemas Sbi(Rbi , Lbi ,rsbi), where 1 ≤ i ≤ m.

Deduction 1. The decomposition by Algorithm 5 is loss-less.

5. Schema maintenance and reasoning

5.1. Schema maintenance

Updating an SLN schema concerns resource types, semanticlink types, and reasoning rules. The essence of semantic rule-con-straint normal form is to classify the semantic link type set intodifferent parts according to the rule set. So only the updatingof the rule set can lead to different sub-schemas. The resourcetype set and semantic link type set for the sub-schemas will alsochange. Therefore, we can study the updating issues according tothe variation of the rule set.The following algorithm is to modify decomposition after add

ing rules to schemas.

Algorithm 6 (Schema Revision after Extension). Let {S1(R1, L1, rs1),S2(R2, L2, rs2), . . . , Sn(Rn, Ln, rsn)} be the RC-NF2 schemas decom-posed from SLN schema S(R, L, rs). A new semantic link type set{α′1, α

2, . . . , α′

l } and a new rule set {r′

1, r′

2, . . . , r′

k} are appended tothe schema S. The extension of S is denoted as SE(RE, LE, rsE).

(1) Take all new semantic link types as isolated semanticlink types firstly, and then construct new sub-schemas asS ′1(R

1, L′

1, φ), S′

2(R′

2, L′

2, φ), . . . , S′

l (R′

l, L′

l, φ). Let SE = {S1, S2,. . . , Sn, S ′1, S

2, . . . , S′

l }.(2) For a rule αi1 · αi2 ⇒ αj in rsE − rs, we get semantic relatives

αj E αi1 and αj E αi2. For a rule αi ⇒ αj in rsE − rs, we get asemantic relative αj E αi.

(3) For each semantic relative αj E αi retrieved in step 2, the list ofsub-schemas in which αi involved are Si1 , Si2 , . . . , Si1s , and thelist of sub-schemas in which αj involved are Sj1 , Sj2 , . . . , Sjt .(1) Compute the up-closure Cup(αi) for αi in Si1, let Ljp = Ljp ∪Cup(αi), for all 1 ≤ p ≤ t;

(2) Union all resource types involved in Cup(αi) into Rjp , andunion all rules involved in Cup(αi) into rsjp ; and,

(3) Check if Sjp ⊆ Sjq or not for each pair (p, q), 1 ≤ p ≤ s,1 ≤ q ≤ t . If yes, remove Sjp from the list of SE .

The following algorithm is for schema revision after deletingsome rules.

Algorithm 7. Let {S1(R1, L1, rs1), S2(R2, L2, rs2), . . . , Sn(Rn, Ln,rsn)} be the RC-NF2 schemas decomposed from SLN schemaS(R, L, rs). Denote the deleted resource type set as RD, the semanticlink types set as LD, and the deleted rules set rsD.

(1) For each semantic sub-schema Si, let Ri = Ri − RD, Li =Li − LD, and rsi = rsi − D. Applying Algorithm 1 or 3 onSi to get a decomposition for Si with corresponding semanticconstraint normal form. Assume that the decomposition is{Si1(Ri1 , Li1 , rsi1), Si2(Ri2 , Li2 , rsi2), . . . , Sik(Rik , Lik , rsik)}.

(2) Unite all sub-schemas retrieved in step 1, denote the new set ofsub-schemas as {S ′1(R

1, L′

1, rs′

1), S′

2(R′

2, L′

2, rs′

2), . . . , S′m(R′m, L′m,

rs′m)}.(3) Check if S ′i ⊆ S

j or not for each pair (i, j), 1 ≤ i, j ≤ m. If yes,remove S ′i .

5.2. Reasoning algorithms

Several kinds of reasoning can be carried out on a semantic linknetwork. The basic reasoning is to obtain the potential semanticrelationships between resources. The following algorithm is forderiving potential semantic links.

Algorithm 8. Let R, R′, and R′′ be resources of a semantic linknetwork instance S of schema S(R, L, rs), the set of semantic linksfrom R to R′ and from R′ to R′′ be A = {α1, α2, . . . , αs} and B ={β1, β2, . . . , βt} respectively, the set of known semantic links fromR to R′′ be C0, and the types of R, R′, and R′′ are rt1, rt2, and rt3respectively.

(1) Let C = φ.(2) For each pair (αi, βj) in A× B, check rs to determine if there isa rule in form of αi · βj ⇒ γk. If yes, let C = C ∪ γk, else skip tothe next pair.

(3) Let C = C ∩ [rt1, rt2].(4) C = C ∪ C0 is the solution.

The measure of A × B is a constant s × t for A ⊆ [rt1, rt2],B ⊆ [rt2, rt3]. Meanwhile, the size of rule set rs is a constant. So,the complexity of the above algorithm isO(1). For a connected pathR0

α1−→ R1

α2−→· · ·

αn−→ Rn with length n, the algorithm to compute

the semantic relations between R0 and Rn is to recursively useAlgorithm 7. And, the complexity is O(n).Moreover, we can compute the semantic relations between any

two resources. The immediate idea is to find all connected pathsbetween them, compute the semantic relations of each connectedpath, and then combine all semantic relations between them. Fora semantic link network with m resources and n semantic links,under the assumption of equal distribution, there are d = n

m(m−1)links between two resources on average. Therefore, from R0 to Rn,the number of paths with length k is Pk−1m−2d

k. Thus, the complexityfor computing the semantic relations between any two resources isO(

∑m−1k=1 kP

k−1m−2d

k) = O(∑m−1k=1 k(m−2)!n

k/((k−1)!mk (m− 1)k)).We can see that the complexity depends on the two variables mandn, which could be large. However, it can be reducedby reducingthe scale of the semantic link network. The idea of decomposing asemantic link network into RC-NF2 forms can reduce the two vari-ablesm and n prominently. And the reasoning in RC-NF2 semanticlink networks is closed and independent.Moreover, we can find some better qualified methods. Indeed,

the up-closure can help develop an efficient algorithm to deter-mine semantic links between two resources. The following algo-rithm is for determining the existence of semantic relationR

α−→ R′

in a semantic link network.

Algorithm 9. Let rs be the reasoning rule set of semantic linknetwork S and the types of R and R′ be rt and rt ′ respectively.

(1) Determine if α ∈ [rt, rt ′]. If no, return false; otherwise, con-tinue the next step.

Page 8: The schema theory for semantic link networkh.zhuge/data/SLG-FGCS2010-schema...Schema::::,:::::

H. Zhuge, Y. Sun / Future Generation Computer Systems 26 (2010) 408–420 415

(2) Compute the up-closure of α with respect to rs, denoted asCup(α).

(3) Compute the projection of S with respect to Cup(α), denotedas S0, which may be constituted by several unconnected seg-ments or some isolated resources, where each segment is in-terconnected.

(4) If R and R′ are in two different segments, return false.(5) If R and R′ are in the same segment S, return determination ifR

α−→ R′ is true in S by using Algorithm 8.

This algorithm is efficient because the up-closure for a semanticlink type may be much smaller than the original link type set,and the projection may be sparse in light of semantic links. Thefollowing algorithm computes semantic links set Γ = {α|Rα−→ R′}.

Algorithm 10. Let S be a semantic link network, and the types ofR and R′ be rt and rt ′ respectively, then any potential semantic linktype between R and R′ is in [rt, rt ′].

(1) Let Γ = φ.(2) For each semantic link type α ∈ [rt, rt ′], execute Algorithm 8to determine if R

α−→ R′ or not. If yes, let Γ = Γ ∪ {α}; else

skip to the next semantic link type.(3) Return Γ .

For a given resources R, the following algorithm computes theresource setA = {Ri|R

α−→ Ri}. The setA is to point out all resources

which have an α relation with R in a semantic link network.

Algorithm 11. Let S be a semantic link network, and the type of Rbe rt.

(1) Compute the up-closure of α with respect to rs, denoted asCup(α);

(2) Compute the projection of S with respect to Cup(α), denoted asS0;

(3) Denote T = {rti|α ∈ [rt, rti]} and A = {R′|R′ ∈ S0, and the typeof R′ is in T };

(4) For each resource R′ ∈ A, applying Algorithm 8 to determineif R

α−→ R′ or not. If no, let A = A − {R}; else skip to the next

resource in A;(5) Return A.

The algorithm for computing the set of resource pair {(R, R′)|R

α−→ R′} for a given semantic link typeα is similar toAlgorithm11.However, the complexity would be much higher because both thestart and the end resources need to compute.

6. Case study

6.1. Schema example of science network

We firstly construct a schema example of science network con-sisting of three components, and then discuss the decompositionof the schema:

(1) a set of resource types {University, Department, Professor, Stu-dent, Course, Book, Project, Journal, Paper, Conference, Proceed-ing, Field, Publisher};

(2) semantic link types between these resource types as shown inTable 1; and,

(3) reasoning rules for these semantic link types as shown in Ta-ble 2.

6.2. Decompose the schema into RC-NF1

We construct the semantic link type relative net, which consistsof six parts as shown in Fig. 3. Therefore, we get the following sixRC-NF1 sub-schemas according to Algorithm 1.

Sub-schema S1:R1 = {Professor, Student, Course, Book, Project, Journal, Paper,Conference, Proceeding, Field},L1 = {acknowledgeIn, textbook, sameField, publishedIn, subFieldOf,belongTo, authorOf, editorOf, leaderOf, readerOf, takePartIn, chairOf,engageIn, attenantOf, productOf }, andrs1 = {rule 1–3, rule 5–6, rule 8, rule 12–24, rule 32}.

Sub-schema S2:R2 = {Professor, Student, Department, University, Course},L2 = {classmateOf, partOf, teach, studyIn, setup, supervisorOf,colleague, headOf, facultyOf } andrs2 = {rule 4, rule 7, rule 9–11, rule 25–27, rule 31, rule 33}.

Sub-schema S3:R3 = {Journal, Proceeding, Book, Publisher},L3 = {publishedBy, samePublisher} andrs3 = {rule 28, rule 29}.

Sub-schema S4:R4 = {Course},L4 = {priorTo}, andrs4 = {rule30}.

Sub-schema S5:R5 = {Professor, Student},L5 = {friend}, andrs5 = φ.

Sub-schema S6:R6 = {Paper, Book, Proceeding, Journal},L6 = {referenceOf }, andrs6 = φ.

6.3. Decompose the schema into RC-NF2

Sub-schemas S4, S5 and S6 satisfy RC-NF2 because each has onlyone rule. Sub-schema S3 is also in RC-NF2 for it has only one bot-tom link type in the set of semantic link types from Fig. 3. How-ever, sub-schemas S1 and S2 are not in RC-NF2, so they need to bedecomposed into RC-NF2 according to Algorithm 3.We first find the following three bottom semantic link types

in sub-schema S1: engageIn, readerOf and attenantOf, and thencompute the following three up-closures of them, shown in Fig. 4.L1.1 = Cup(engageIn) = {textbook, sameField, acknowledgeIn,productOf, publishedIn, subFieldOf, belongTo, authorOf, editorOf,leaderOf, takePartIn, chairOf, engageIn},L1.2 = Cup(readerOf )= {readerOf, editorOf, publishedIn}, and,L1.3 = Cup(attenantOf )= {chairOf, attenantOf }.The corresponding set of involved resource types and involved

rules for L1.1, L1.2 and L1.3 are listed as follows.R1.1 = {Professor, Student, Course, Book, Project, Journal, Paper,Conference, Proceeding, Field},R1.2 = {Professor, Student, Book, Journal, Paper, Proceeding}, and,R1.3 = {Professor, Student, Conference}.rs1.1 = {rule 2, rule 6, rule 8, rule 12–24, rule 32},rs1.2 = {rule 1, rule 5, rule 13}, and,rs1.3 = {rule 3}.

Therefore, we get three RC-NF2 sub-schemas S1.1(R1.1, L1.1, rs1.1),S1.2(R1.2, L1.2, rs1.2), and S1.3(R1.3, L1.3, rs1.3) from sub-schema S1.For sub-schema S2, we get two bottom semantic link types in

Page 9: The schema theory for semantic link networkh.zhuge/data/SLG-FGCS2010-schema...Schema::::,:::::

416 H. Zhuge, Y. Sun / Future Generation Computer Systems 26 (2010) 408–420

Table 1Semantic link types in scientific research SLN schema.

Resource type pairs Semantic link type sets Explanation

[Professor, Professor] {colleague, friend} A professor may be a colleague or a friend of another professor.[Professor, Student] {supervisorOf } A professor may be a supervisor of a student.[Professor, Course] {teach} A professor may teach a course.[Professor, field] {engageIn} A professor may engage in a research field.[Professor, Department] {facultyOf, headOf } A professor may be a faculty or the head of a department.[Professor, Project] {leaderOf, takePartIn} A professor may be a leader of or take part in a project.[Professor, Paper] {authorOf, readerOf, editorOf } A professor may be an author, a reader, or an editor of a paper.[Professor, University] {facultyOf } A professor may be a faculty of a university.[Professor, Journal] {readerOf, authorOf, editorOf } A professor may be an author, a reader, or an editor of a journal.[Professor, Conference] {chairOf, attendantOf } A professor may be a chair, or an attendant of a conference.[Professor, Proceeding] {authorOf, readerOf, editorOf } A professor may be an author, a reader, or an editor of a proceeding.[Professor, Book] {readerOf, authorOf, editorOf } A professor may be an author, a reader, or an editor of a book.[Student, Student] {classmateOf } Two students may be classmates.[Student, Paper] {readerOf, authorOf } A student may be an author or a reader of a paper.[Student, Journal] {readerOf, authorOf } A student may be an author or a reader of a journal.[Student, Conference] {attendantOf } A professor may be an attendant of a conference.[Student, Proceeding] {authorOf, readerOf } A student may be an author or a reader of a proceeding.[Student, Project] {takePartIn} A student may take part in a project.[Student, Field] {engageIn} A student may engage in a research field.[Student, University] {studyIn} A student may study in a university.[Student, Department] {studyIn} A student may study in a department.[Student, Course] {study} A student may study a course.[Student, Book] {readerOf, authorOf } A student may be an author or a reader of a book.[Course, Course] {priorTo, sameField} A course may be prior to another one, and two courses may be in the same field.[Course, Field] {belongTo} A course may involve in a research field.[Book, Book] {referenceOf, sameField, samePublisher} A book may reference another one, and two books may be in the same field, or have the

same publisher or the same author.[Book, Course] {textbook, sameField} A book may be a textbook, and a book and a course may be in the same field.[Book, Paper] {referenceOf, sameField} A book may reference a paper, and a book and a paper may be in the same field or have the

same author or have the same publisher.[Book, Journal] {referenceOf, sameField, samePublisher} A book may be a reference of a journal, and a book and a journal may be in the same field or

have the same author or have the same publisher.[Book, Proceeding] {referenceOf, sameField, samePublisher} A book may be a reference of a proceeding, and a book and a proceeding may be in the

same field or have the same author or have the same publisher.[Book, Publisher] {publishedBy} A book may be published by a publisher.[Paper, Paper] {referenceOf, sameField} A paper may reference another one, and two papers maybe in the same field or have the

same author.[Paper, Journal] {publishedIn, sameField} A paper may be published in a journal, and a paper and a journal may be in the same field.[Paper, Proceeding] {publishedIn, sameField} A paper may be published in a proceeding, and a paper and a proceeding may be in the

same field.[Proceeding, Field] {belongTo} A proceeding may belong to a research field.[Proceeding, Conference] {productOf } A proceeding may be a product of a certain conference.[Proceeding, Proceeding] {sameField, samePublisher} Two proceedings may be in the same field, or have the same publisher or the same author.[Proceeding, Publisher] {publishedBy} A proceeding may be published by a publisher.[Department, Course] {setup} A course may be setup by a department.[University, Course] {setup} A course may be setup by a university.[Department, University] {partOf } A department may be a part of a university.[Project, Book] {acknowledgeIn} Aproject may be acknowledged in a book.[Project, Paper] {acknowledgeIn} Aproject may be acknowledged in a paper.[Project, Field] {belongTo} A project may belong to a research field.[Conference, Conference] {sameField} Two conferences may be in the same field or have the same chair.[Conference, Field] {belongTo} A conference may belong to a research field.[Journal, Journal] {sameField, samePublisher} Two journals may be in the same field or have the same publisher.[Journal, Proceedings] {sameField, samePublisher} A journal and a proceeding may be in the same field or have the same publisher.[Journal, Field] {belongTo} A journal may belong to a research field.[Journal, Publisher] {publishedBy} A journal may be published by a publisher.[Field, Field] {subFieldOf } A research field may be a sub of another one.

facultyOf and setup, and then compute the following up-closuresof them, shown in Fig. 5.

L2.1 = Cup(facultyOf ) = {classmateOf, partOf, teach, studyIn, super-visorOf, colleague, headOf, facultyOf },L2.2 = Cup(setup)= {setup, partOf }.The corresponding set of involved resource types and involved

rules for L2.1 and L2.2 are listed as follows.R2.1 = {Professor, Student, Department, University, Course}, andR2.2 = {Department, University, Course}.rs2.1 = {rule 4, rule 7, rule 9–11, rule 26–27, rule 31, rule 33}, andrs2.2 = {rule 25}.Then, we get two RC-NF2 sub-schemas S2.1(R2.1, L2.1, rs2.1) and

S2.2(R2.2, L2.2, rs2.2) from sub-schema S2.

To sum up, we get nine RC-NF1 sub-schemas from the originalschema: S1.1, S1.2, S1.3, S2.1, S2.2, S3, S4, S5, and S6. According toTheorems 2 and 3, the decomposition is loss-less. Clearly, thenumber of semantic links in a semantic link network influencesthe reasoning complexity. Dividing a schema into several RC-NF2sub-schemas can significantly reduce the scale so that the queryand reasoning can be executedwithin a small scale. The soundnessand the integrity of the query and reasoning on sub-schemas andup-closure are guaranteed by previous sections.

6.4. Redundancy-free decomposition

In most cases, the decomposition to RC-NF2 is enough for thereason that a RC-NF2 schema is the smallest unit in the sense of

Page 10: The schema theory for semantic link networkh.zhuge/data/SLG-FGCS2010-schema...Schema::::,:::::

H. Zhuge, Y. Sun / Future Generation Computer Systems 26 (2010) 408–420 417

Table 2Rule set in scientific research SLN schema.

No. Rules Explanation of the rule

rule 1 editorOf ⇒ readerOf An editor of a paper, a journal, a proceeding, or a book is also a reader.rule 2 leaderOf ⇒ takePartIn A professor is a leader of a project means that he also takes part in the project.rule 3 chairOf ⇒ attendantOf A chair of a conference is also an attendant.rule 4 headOf ⇒ facultyOf A head of a department or institute is also a faculty.rule 5 readerOf · publishedIn⇒ readerOf A reader of a paper published in a journal or a proceeding is also a reader of the journal or the proceeding.rule 6 authorOf · publishedIn⇒ authorOf An author of a paper published in a journal or a proceeding is also an author of the journal or the proceeding.rule 7 facultyOf · partOf ⇒ facultyOf A faculty of a department which is a part of a university is also a faculty of the university.rule 8 editorOf · belongTo⇒ engageIn An editor of a paper, a journal, a proceeding, or a book engages in the field of the involved field.rule 9 colleague · facultyOf ⇒ facultyOf A colleague of a faculty of a department, an institute, or a university is also a faculty.rule 10 supervisorOf · studyIn⇒ facultyOf If a student study in a department or a university, then his supervisor is a faculty of the department or the

university.rule 11 facultyOf · partOf ⇒ facultyOf If a professor is a faculty of a department, then he is also a faculty of the university.rule 12 publishedIn · belongTo⇒ belongTo If a paper is published in a journal or a proceeding of a field, then the paper is in the same field.rule 13 editorOf · publishedIn⇒ editorOf An editor of a paper is also an editor of the journal or the proceeding which includes the paper.rule 14 leaderOf · belongTo⇒ engageIn If a professor is a leader of a project in a field, then the professor engages in the same field.rule 15 takePartIn · belongTo⇒ engageIn If a professor or a student takes part in a project of a field, then he engages in the same field.rule 16 authorOf · belongTo⇒ engageIn An author engages in the field of his paper or his book.rule 17 chairOf · belongTo⇒ engageIn If a professor is a chair of a conference of a field, then the professor engages in the same field.rule 18 editorOf · belongTo⇒ engageIn An editor of a paper, a journal, a proceeding, or a book of a field engages in the same field.rule 19 acknowledgeIn · belongTo⇒ belongTo If a project is acknowledged in a paper, a proceeding, or a book of a field, then the project is in the same field.rule 20 belongTo · subFieldOf ⇒ belongTo If a paper, a journal, a proceeding, or a book is involved in a field, then it is involved in a super field.rule 21 engageIn · subFieldOf ⇒ engageIn If a professor or a student engages in a field, then he engages in a super field.rule 22 sameField · belongTo⇒ belongTo If A and B share the same field, and B is involved in the field F , then A is involved in F . A and B are two resources

with the type of paper, course, journal, book, or proceeding.rule 23 textbook · belongTo ⇒ belongTo If a book B is a textbook of a course C , and C is involved in the field F , then B is involved in F .rule 24 productOf · belongTo⇒ belongTo If a proceeding P is a product of a conference C , and C is involved in the field F , then P is in the field F .rule 25 (partOf )−1·setup⇒ setup A course is setup by a department which is a part of a university means the university setup the course.rule 26 classmateOf · studyIn⇒ studyIn A student and his classmate both study in the same university or the same department.rule 27 studyIn · partOf ⇒ studyIn A student studies in a department affiliated to a university means he studies in the university.rule 28 samePublisher · publishedBy⇒ pub-

lishedByIf A and B share the same publisher, and B is published by the publisher P , then A is published by P . A and C areresources with type of journal, proceeding, or book.

rule 29 samePublisher · samePub-lisher⇒samePublisher

If A and B share the same publisher, and B and C share the same publisher, then A and C share the samepublisher. A, B and C are resources with type of journal, proceeding, or book.

rule 30 priorTo · priorTo⇒ priorTo A course C1 is prior to C2 , and C2 is prior to C3 , then C1 is prior to C3 .rule 31 colleague · colleague⇒ colleague If A is a colleague of B, and B is a colleague of C , then A is a colleague of C .A, B, and C are professors.rule 32 subFieldOf · subFieldOf ⇒ subFieldOf If field F1 is a sub-field of F2 , and F2 is a sub-field of F3 , then F1 is a sub-field of F3 .rule 33 teach · (study)−1 ⇒supervisorOf If a professor teaches a course and a student studies the course, then the professor is a supervisor of the student.

Fig. 5. Semantic link types relative nets for RC-NF2 schemas of sub-schema S2.1 and S2.2 .

reasoning closed. However, there is some redundancy problem inthe above decomposition. In Fig. 4, S1.1 and S1.2 share the same se-mantic relative: editorOf E publishedIn. According to Algorithm 4,we get a new decomposition by replacing the sub-schema S1.1 andS1.2 with the following three schemas S1.1′ , S1.2′ and S1.4′ and thesemantic relative net is shown in Fig. 6.

Schema S1.1′ (R1.1′ , L1.1′ , rs1.1′ ):L1.1′ = L1.1,R1.1′ = R1.1, andrs1.1′ = {rule 2, rule 5–6, rule 8, rule 12, rule 14–24, rule 32};

Schema S1.2′ :L1.2′ = L1.2,R1.2′ = R1.2, andrs1.2′ = {rule 1, rule 5};

Schema S1.4′ :L1.4′ = {publishedIn, editorOf },R1.4′ = {Professor, Student, Book, Journal, Paper, Proceeding}, andrs1.4′ = {rule 13}.We can also avoid the redundancy problem by modifying the

schema S1.2 and building a conference to S1.1 directly rather than

Page 11: The schema theory for semantic link networkh.zhuge/data/SLG-FGCS2010-schema...Schema::::,:::::

418 H. Zhuge, Y. Sun / Future Generation Computer Systems 26 (2010) 408–420

Fig. 6. Semantic link type relative nets for S1.1′ , S1.2′ and S1.4′ with redundancy-free RC-NF2.

Fig. 7. Semantic link type relative nets for S1.1′ and S1.2′ with redundancy-free RC-NF2.

creating a new schema. As shown in Fig. 7, the schema S1.1 neednot change while the schema S1.2 should be changed into S1.2′ asfollows.

Schema S1.2′ :R1.2′ = {Professor, Student, Book, Journal, Paper, Proceeding},L1.2′ = {readerOf, editorOf, publishedIn} with a reference relation

{ref (editorOf ∈ S1.2′ , editorOf ∈ S1.1)},and rs1.2′ ={rule 1, rule 5}.Therefore, we have two choices for redundancy-free decompo-

sition of the original schema, and both of them are loss-less.

6.5. Schema maintenance

The following are two examples to show how to maintain theSLN schemawhile appending some rules to or removing some rulesfrom the rule set according to Algorithms 6 and 7 respectively.

Assume that two semantic link types are inserted into Link-Types: vistingScholarOf ∈ [Professor, Department] and vistingSchol-arOf ∈ [Professor, Univesity], and two new reasoning rules areadded to rs.rule 34. vistingScholarOf · partOf ⇒ vistingScholarOfrule 35. publishedIn⇒ sameField.According to Algorithm 6, we can modify the sub-schemas as

follows.

(1) By vistingScholarOf · partOf ⇒ vistingScholarOf, we get thatvistingScholarOf E partOf . A new sub-schema S2.3 shouldbe appended, where S2.3 involves only one semantic relativevistingScholarOf E partOf . And its resource types, semanticlink types, and the reasoning rule set are as follows.

R2.3 = {Professor,Department,Univesity},L2.3 = {vistingScholarOf , partOf },rs2.3 = {rule34}.

Page 12: The schema theory for semantic link networkh.zhuge/data/SLG-FGCS2010-schema...Schema::::,:::::

H. Zhuge, Y. Sun / Future Generation Computer Systems 26 (2010) 408–420 419

Fig. 8. The semantic link type relative nets after deleting semantic link type engageIn.

(2) By the rule publishedIn ⇒ sameField, we get sameField EpublishedIn. For the two semantic link types occur in the sub-schema S1.1, we only need to modify it. And, rule 35 should beinserted into the rule set.

Assume that the semantic link type engageIn is removed fromlinkTypes. According to Algorithm 7, we find the sub-shcema S1.1which involve in engageIn. Then, we need to delete the semanticlink type engageIn from the sub-schemas and the rules relatedwithengageIn, i.e., rule 8, rule 14–18 and rule 21, from the rule set.Thus, we get a new semantic relative net as shown in Fig. 8.

According the new rule set, the schema S1.1 can be decomposedinto four sub-schemas S1.a, S1.b, S1.c , and S1.d. The correspondingresource type sets, the semantic link type sets, and the reasoningrule sets can be easily obtained. Finally,we remove the sub-schemaS1.d for S1.d ⊆ S1.3. And the other sub-schemas such as S1.2, S1.3, S2.1,and S2.2 need not to be changed.

7. Discussion

As a form of knowledge representation, a traditional semanticnetwork is a directed graph of concepts and semantic relationsbetween concepts [12]. It does not support rule reasoning. The SLNis different from a traditional semantic network in the followingaspects.

(1) SLN is a semantic data model for managing various resources.(2) Nodes in semantic link network can be anything such asweb pages, documents, software, images, concepts and evena semantic link network.

(3) Semantic links in semantic link network represent any seman-tic relations, even implicit relations.

(4) SLN supports rule reasoning.

The schema of relational database is for defining the structureandmetadata of relational tables [1,2]. It does not support complexsemantics and reasoning. The traditional network database modelprovides a natural way to specify and manage data, but it isrelatively rigid in maintaining data [13,14]. Compared with theprevious database models, the SLN model not only provides anatural way for users to create their models but also offers thereasoning ability.Compared with the XML schema and the RDF schema, which

mainly provide syntax for XML and RDF respectively, the SLNschema has the built-in relational reasoning ability.

8. Conclusions

The SLN schema regulates the semantics of semantic link net-works so as to manage semantic link network instances. The pro-posed rule-constraint normal forms can helpmanage andmaintainthe SLN schema efficiently. Two algorithms for SLN schema exten-sion and reduction are introduced. Reasoning algorithms for de-riving more semantic relations have been proposed. The proposedSLN schema and relevant theory are important parts of the SLNmodel.

Acknowledgments

We thank all team members of the China Knowledge GridGroup for their help and cooperation (www.knowledgegrid.net).This work was supported by National Basic Research Programof China (2003CB317001), International Cooperation Project ofMinistry of Science and Technology of China (2006DFA11970),National Science Foundation of China (60773057), and NationalHigh Technology Research and Development Program of China(2007AA12Z220).

References

[1] E.F. Codd, A relational model of data for large shared data banks, Communica-tions of the ACM 13 (6) (1970) 377–387.

[2] J.D. Ullman, Principles of Database and Knowledge-Base Systems, Vol. I,Computer Science Press, 1988.

[3] T. Berners-Lee, J. Hendler, O. Lassila, The semantic web, Scientific American284 (5) (2001) 34–43.

[4] I. Horrocks, P.F. Patel-Schneider, SWRL: A Semantic Web Rule LanguageCombining OWL and RuleML. Available at: http://www.w3.org/Submission/SWRL/.

[5] H. Zhuge, Communities and emerging semantics in semantic link network:Discovery and learning, IEEE Transactions onKnowledge andData Engineering21 (6) (2009) 785–799.

[6] H. Zhuge, Y. Sun, et al., Algebra model and experiment for semanticlink network, International Journal of High Performance Computing andNetworking 3 (4) (2005) 227–238.

[7] H. Zhuge, The Knowledge Grid, World Scientific, 2004.[8] H. Zhuge, Autonomous semantic link networking model for the KnowledgeGrid, Concurrency and Computation: Practice and Experience 7 (19) (2007)1065–1085.

[9] H. Zhuge, et al., Modeling language and tools for the semantic link network,Concurrency and Computation: Practice and Experience 20 (7) (2008)885–902.

[10] H. Zhuge, Y. Li, Semantic profile-based document logistics for cooperativeresearch, Future Generation Computer Systems 20 (1) (2004) 47–60.

[11] H. Zhuge, Y. Sun, J. Zhang, The schema theory for semantic link network, in:Proceeding of the Fourth International Conference on Semantics, Knowledgeand Grid, Beijing, China, 2008, pp. 189–196.

Page 13: The schema theory for semantic link networkh.zhuge/data/SLG-FGCS2010-schema...Schema::::,:::::

420 H. Zhuge, Y. Sun / Future Generation Computer Systems 26 (2010) 408–420

[12] M.R. Quillian, in: M. Minsky (Ed.), Semantic Memory. Semantic InformationProcessing, MIT press, Cambridge, Mass, 1968.

[13] C.W. Bachman, Data structure diagrams, ACM SIGMIS Database 1 (2) (1969)4–10.

[14] C.W. Bachman, The programmer as navigator, Communications of the ACM 16(11) (1973) 653–658.

Hai Zhuge is a professor at the Chinese Academy ofSciences’ Institute of Computing Technology. He is also thechief scientist and the former director of the academy’sKey Lab of Intelligent Information Processing. He isthe chief scientist of the China National Semantic andKnowledge Grid Research Project and the founder of theChina Knowledge Grid Research Group. His main researchinterests include Knowledge Grid, Resource Space Model(RSM), Semantic Link Network (SLN) and KnowledgeFlow. He presented over ten keynotes in internationalconferences. He initiates the International Conference on

Semantics, Knowledge and Grid (SKG, www.knowledgegrid.net) and was the chairand program co-chair of several international conferences like CoopIS2009. Heis an associate editor of the Future Generation Computer Systems, the associate

editor-in-chief of Journal of Computer Science and Technology, and on the editorialboard of IEEE Intelligent Systems. He also severs as reviewer of several nationalfoundations such as NSF of Australia, NSF of China, SFI of Ireland, and NSF of USA.He is the author of The Knowledge Grid and TheWeb Resource SpaceModel. Hewasthe top scholar in systems and software engineering during 2000 to 2004, accordingto a Journal of Systems and Software assessment report. His publications appearedin CACM, Computer, IEEE TKDE, IEEE TPDS and ACM TOIT. He received 2007’sInnovation Award of China Computer Federation for his fundamental theory of theKnowledge Grid. He is a senior member of IEEE and CCF.

Yunchuan Sun is a Ph.D. student at the Institute ofComputing Technology, Chinese Academy of Science,Beijing, China. He also works at the School of Economicand Business Administration, Beijing Normal University,Beijing, China. His current research interest is the theoryand application of semantic link network.