Rules validation - Copy

44
A Performance Evaluation and Optimization of the Rule Materialization Process in OWL Databases Presented by Hicham Berrada Supervised by Dr. Harroud

Transcript of Rules validation - Copy

Page 1: Rules validation - Copy

A Performance Evaluation and Optimization of the Rule Materialization

Process in OWL DatabasesPresented by Hicham Berrada

Supervised by Dr. Harroud

Page 2: Rules validation - Copy

OutlineIntroductionBackground studyA new approach for the

materialization processPerformance evaluation and

optimizationConclusionFuture workProblems encountered

Page 3: Rules validation - Copy

IntroductionThe materialization process aims at

physically storing inferred (discovered) data to insure rapid query answering.

The purpose of this project is to enhance this process in OWL databases

This family of knowledge representation is used in the Semantic Web

Page 4: Rules validation - Copy

Today’s Web Limitations

Weaknesses

High recall, low precision.

Results are highly sensitive

to vocabular

y

Results are single

Web pages

Human involvem

ent is necessary

to interpret

and combine results

Results of Web

searches are not readily

accessible by other software

tools

The meaning of Web

content is not

machine-accessible

Page 5: Rules validation - Copy

Semantic WebRepresent the Web content in a form that is more easily machine-

processable

A way to specify

data and data

relationships

reason over explicitly

declared or defined

knowledge

infer new,

implicit knowled

ge

Materialization process

Page 6: Rules validation - Copy

Semantic Web Applications: Search engines

Page 7: Rules validation - Copy

The Semantic Web Stack

Page 8: Rules validation - Copy

The Semantic Web Stack

Page 9: Rules validation - Copy

Related WorkUrbani:Presents materialization and backward-chaining as

different modes of performing inference. ◦ Materialization:

+ Very efficient responses at query time - Expensive up front closure computation, which needs to

be redone every time the knowledge base changes. ◦ Backward-chaining:

+ No expensive and change-sensitive pre-computation => suitable for more frequently changing knowledge bases

- Has to perform more computation at query time. Present a hybrid algorithm to perform efficient

backward-chaining reasoning on very large datasets expressed in the OWL Horst fragment.

Page 10: Rules validation - Copy

Related WorkJiménez Described a Prolog library for

OWL RL. ◦This library has been implemented

under the SWI-Prolog interpreter and is based on the RDF library provided by the SWI-Prolog environment, in such a way that OWL triples are computed and stored in secondary memory

Page 11: Rules validation - Copy

Improving Rule MaterializationI proposed to add a new parameter to the

materialization process which is the strategy of applying rules. So, S’=closure(S, R, Strategy).

Weights functions to determine the order of applying rules

The way of applying these rules is important and can have a great impact on machine resources which is similar to the impact of query planning on the processing cost of SQL queries.

Page 12: Rules validation - Copy

Selected OWL Profile I have chosen a subset of the OWL RL profile

Page 13: Rules validation - Copy

OWL Rules Rule dependencyEach rule R1 has a premise and a

consequent. The result of a given rule R1

(consequent) can eventually be used as premises of another rule R2.

This implies a dependency between R1 and R2

Page 14: Rules validation - Copy

Rule dependency graphThe nodes of the dependency

graph are rules The dependency R1 R2 means

that the results of firing the rule R1 can make the rule R2 fireable again.

Page 15: Rules validation - Copy

Rule Dependency Matrix

Page 16: Rules validation - Copy

OWL rules implementationOWL rules were implemented

using the construct SparQL queries as given in the following:

Construct {?u ?p ?v.} WHERE { {?p rdf:type owl:SymmetricProperty}. {?v ?p ?u }. }

Page 17: Rules validation - Copy

SesameA framework for storage,

querying and inferencing of RDF and RDF Schema

A Java Library for handling RDF

Page 18: Rules validation - Copy

How Strategies work ?Associate a weight that

represents the rule’s priority during the materialization process with each rule (node) in the RDG.

Highest weight means highest priority.

Page 19: Rules validation - Copy

Strategies: In-degree weight (IW)Maps to each node the number of

edges having that node as a terminal and fires the rules with more IN edges first.

Page 20: Rules validation - Copy

Strategies: Out-degree weight (OW)Assigns to each node, the

number of edges having that node as their initial node. Fires the rules with more OUT edges first.

Page 21: Rules validation - Copy

Strategies: Reachable sub graph weight (SW)Associates with each node the

number of edges in the sub graph which is reachable from that node.

Page 22: Rules validation - Copy

Strategies: Reachable rule weight (RW)Associates to each node the

number of reachable nodes from that node

Page 23: Rules validation - Copy

Strategies: Reachable sub graph weight with attenuation (SWA)The further rule1 is from rule2 in

the RDG, the less likely rule1 will make rule2 fireable.

This weight function reflects the attenuation of dependency with distance.

Page 24: Rules validation - Copy

Strategies: Cyclomatic complexity weight (CCW)Counts the number of cycles + 1

in a given sub graph.

Page 25: Rules validation - Copy

Experiment 1:Experimenting with strategies

Page 26: Rules validation - Copy

Finding the rule execution order for each strategyThe Rule’s orders calculated by these

different strategies is provided below:

Strategies SW, RW and CCW show the same order of applying rules.

I will represent them by SW in my experimentations

IW 14 17 16 13 20 7 15 6 10 18 12 11 8 3 4 9 5 2 1 19 22 21

OW 3 4 8 9 12 20 18 19 15 21 22 13 14 11 10 17 16 2 1 7 6 5

SW 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

RW 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

SWA 3 4 8 9 12 20 19 21 22 15 18 11 10 13 14 17 16 2 1 7 6 5

CCW 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Page 27: Rules validation - Copy

Reasoning time per Rule for each strategy

Execution time for the four strategies on a store with 100000 triples.

Rule

1Ru

le 2Ru

le 3Ru

le 4Ru

le 5Ru

le 6Ru

le 7Ru

le 8Ru

le 9 Ru

lRu

lRu

lRu

lRu

lRu

lRu

lRu

lRu

lRu

lRu

lRu

lRu

l0

20000

40000

60000

80000

100000

120000

140000

160000Execution time for OWL rules in ms

IWOWSW SWA

Page 28: Rules validation - Copy

Reasoning time per store for all strategies

1K 10K 100K0

200000

400000

600000

800000

1000000

1200000

1400000

1600000

1800000

Reasoning time per store in ms with different strategies

IWOWSWSWA

Page 29: Rules validation - Copy

Second Experiment: Using a much larger store

Page 30: Rules validation - Copy

1K 10K 100K 1000k0

10000000

20000000

30000000

40000000

50000000

60000000

Performance comparison for all strategies in different stores

IWOWSWSWA

Page 31: Rules validation - Copy

Performance comparison on a 1000k store

We notice that SWA is now the best option for larger stores

IW OW SW SWA41000000

42000000

43000000

44000000

45000000

46000000

47000000

48000000

49000000

1000k

1000k

Page 32: Rules validation - Copy

Execution time improvement

SW

OW

SWA

1 1.02 1.04 1.06 1.08 1.1 1.12

1.04160828616711

1.0856861531755

1.10890198330886

Execution time improvement on a 1000k Store

(Compared to IW)

1000k

Page 33: Rules validation - Copy

Third Experiment:Optimizing the strategies

Page 34: Rules validation - Copy

Dynamic Approach algorithm

Page 35: Rules validation - Copy

• All the dynamic strategies seem to do better than the normal ones.

• SW in no longer the best strategy to use, but SWA dynamic then OW dynamic in this case.

IW IW Dynamic OW OW Dynamic SW SW Dynamic SWA SWA Dynamic0

2000

4000

6000

8000

10000

12000

14000

16000

18000

Comparison of classical and dynamic strategies on a 1K store

Page 36: Rules validation - Copy

Most dynamic strategies do better than the normal ones except SW dynamic.

OW dynamic and SWA dynamic seem to be the best strategies to use for this size of database.

IW IW Dynamic OW OW Dynamic SW SW Dynamic SWA SWA Dynamic0

20000

40000

60000

80000

100000

120000

140000

160000

180000

Comparison of classical and dynamic strategies on a 10K store

Page 37: Rules validation - Copy

The two dynamic strategies OW and SWA do slightly worst than SW.

=> I cannot generalize and say that dynamic strategies will always to better than classical ones.

IW IW Dynamic OW OW Dynamic SW SW Dynamic SWA SWA Dynamic0

200000

400000

600000

800000

1000000

1200000

1400000

1600000

1800000

2000000

Comparison of classical and dynamic strategies on 100K triples stores

Page 38: Rules validation - Copy

All the dynamic strategies perform better than the normal ones.

OW dynamic and SWA dynamic do a lot better than any other strategy.

IW IW Dynamic OW OW Dynamic SW SW Dynamic SWA SWA Dynamic0

10000000

20000000

30000000

40000000

50000000

60000000

Comparison of classical and dynamic strategies on a 1000K store

1000k

Page 39: Rules validation - Copy

Execution time improvement

SW

OW

SWA

IW Dynamic

SW Dynamic

OW Dynamic

SWA Dynamic

0 10 20 30 40 50 60

1.04160828616711

1.0856861531755

1.10890198330886

1.36724430233728

2.37168570850941

51.7775893881754

54.3297100385185

Execution time improvement on a 1000k Store

(Compared to IW)

1000k

Page 40: Rules validation - Copy

1K 10K 100K 1000k0

10000000

20000000

30000000

40000000

50000000

60000000Comparison of classical and

dynamic strategies on different stores

SWA DynamicOW Dynamic

IW

OWSWA

SW Dynamic

IW Dynamic

SW

=> Use SW for stores less than 1000k and use SWA dynamic for stores of 1000k and more

Page 41: Rules validation - Copy

ConclusionApplication of the materialization

process in OWL databases for a subset of the OWL RL Profile

Support of rules sub-set (scalability)Demonstration of the impact of rules

order on the materialization processesImprovement of the materialization

process by 54.32%

Page 42: Rules validation - Copy

Future workRule dependencies in the used database

could have a significant impact on the performance of the materialization process. ◦ Add a new metric to estimate the complexity

between rules in the dependency graph.

Study the impact of the new metric on the materialization process.

Study the performance of my approach on other OWL profiles

Page 43: Rules validation - Copy

Problems encounteredTime (Weeks of execution)Computing power (Server)Application compatibility with LinuxLooking for a larger OWL database. No sesame parser for Nquads (new

database)Errors in the new database (several

crashes)

Page 44: Rules validation - Copy

References M. El Koutbi, A. Salah, I. Khriss (2012) Strategies for Applying Rules in OWL Entailment Regimes. A Semantic Web Primer. G.Antoniou and F.Van Harmelen, (2003) Massachusetts Institute of Technology D. Fensel1, et al (n,d). Semantic Web Application Areas. Retrieved September 12th from ebscohost D.Fensel et al, (2002). On-To-Knowledge: Semantic Web Enabled Knowledge Management.

Retrieved September 19th from ebscohost Jeff Heflin (n,d). AN INTRODUCTION TO THE OWL WEB ONTOLOGY LANGUAGE. Retrieved October 12th from ebscohost F. Baader, et al (n,d). The Description Logic Handbook: Theory, Implementation and Applications. Cambridge:

Cambridge University Press, 2002 P. Patel-Schneider, I. Horrocks, and F. van Harmelen. Reviewing the Design of DAML+OIL: An Ontology Language for

the Semantic Web. In Proceedings of the 18th National Conference on Artificial Intelligence (AAAI02), (2002). <http://www.cs.vu.nl/frankh/abstracts/AAAI02.html>.

O.Lassila, Ralph R. Swick. Resource Description Framework (RDF) Model and Syntax Specification. Retrived on October 30th from the W3C Recommendation of the 22nd February 1999

 T.Bizer, T.Heat. (2009). Linked Data - The Story so far. International Journal on Semantic Web and Information Systems, 5(3):1–22, 2009.[Sparql, 2012] http://www.w3.org/TR/sparql11-query/

G.Ianni, T.Krennwallner, A.Martello A.Polleres(2009). A Rule System for Querying Persistent RDFS Data. ESWC 2009: 857-862.

J.Urbani, S.Kotoulas,J.Maaseen, F.Van Harmelen, H.Bal. (2010), OWL reasoning with WebPIE: calculating the closure of 100 billion triples, In Proceedings of the ESWC '10. 

Almendros-Jiménez , (2011). A Prolog Library for OWL RL. In ´ Proceedings of the Logic in Databases, LID’2011, EDBT/VLDB. ACM, 2011.

B.Bishop, S.Bojanov, (2011). Implementing OWL 2 RL and OWL 2 QL rule-sets for OWLIM. In M.Dumontier, M. Courtot, Proc. of the OWL: Experiences and Directions Workshop (OWLED 2011), Volume 796 of CEUR WS Proceedings.

M.Krötzsch, A.Mehdi, S.Rudolph (2010. Orel: Database-Driven Reasoning for OWL 2 Profiles. Description Logics Ontario, Canada

J. Urbani, S. Kotoulas, J. Maassen, F. van Harmelen, and H. Bal (2010). OWL reasoning with WebPIE: calculating the closure of 100 billion triples. In Proceedings of the ESWC, ( 2010).Hogan, et al. Scalable OWL 2 Reasoning for Linked Data. Reasoning Web: 250-325. Galway, Ireland, 2011.