Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical...

55
Efficiently Querying Relational Databases using OWL and SWRL Martin O’Connor Stanford Medical Informatics, Stanford University

Transcript of Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical...

Page 1: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

Efficiently Querying Relational Databases using OWL and SWRL

Martin O’ConnorStanford Medical Informatics, Stanford University

Page 2: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

2

Talk Outline

• Rules and the Semantic Web: OWL + SWRL

• SWRLTab: a Protégé-OWL development environment for SWRL

• Knowledge-driven Querying

• Relation-to-OWL mapping

Page 3: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

3

Semantic Web Stack

Page 4: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

4

Rule-based Systems are common in many domains

• Engineering: Diagnosis rules

• Commerce: Business rules

• Law: Legal reasoning

• Medicine: Eligibility, Compliance

• Internet: Access authentication

Page 5: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

5

Rule Markup (RuleML) Initiative

• Effort to standardize inference rules.• RuleML is a markup language for publishing

and sharing rule bases on the World Wide Web.

• Focus is on rule interoperation between industry standards.

• RuleML builds a hierarchy of rule sublanguages upon XML, RDF, and OWL, e.g., SWRL

Page 6: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

6

What is SWRL?

• SWRL is an acronym for Semantic Web Rule Language.

• SWRL is intended to be the rule language of the Semantic Web.

• SWRL includes a high-level abstract syntax for Horn-like rules.

• All rules are expressed in terms of OWL concepts (classes, properties, individuals).

Page 7: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

7

Example SWRL Rule: Has uncle

hasParent(?x, ?y) ^ hasBrother(?y, ?z)→ hasUncle(?x, ?z)

Page 8: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

8

Example SWRL Rule with Named Individuals: Has brother

Person(Fred) ^ hasSibling(Fred, ?s) ^ Man(?s) → hasBrother(Fred, ?s)

Page 9: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

9

Example SWRL Rule with Literals and Built-ins: is adult?

Person(?p) ^ hasAge(?p,?age) ^ swrlb:greaterThan(?age,17)

→ Adult(?p)

Page 10: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

10

SWRL Characteristics

• W3C Submission in 2004: http://www.w3.org/Submission/SWRL/

• Based on OWL-DL• Has a formal semantics• Rules saved as part of ontology• Increasing tool support: Bossam, R2ML,

Hoolet, Pellet, KAON2, RacerPro, SWRLTab

• Can work with reasoners

Page 11: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

11

SWRLTab

• A Protégé-OWL development environment for working with SWRL rules

• Supports editing and execution of rules

• Extension mechanisms to work with third-party rule engines

• Mechanisms for users to define built-in method libraries

• Supports querying of ontologies

Page 12: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

12

SWRLTab: http://protege.cim3.net/cgi-bin/wiki.pl?SWRLTab

Page 13: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

13

What is the SWRL Editor?

• The SWRL Editor is an extension to Protégé-OWL that permits the interactive editing of SWRL rules.

• The editor can be used to create SWRL rules, edit existing SWRL rules, and read and write SWRL rules.

• It is accessible as a tab within Protégé-OWL.

Page 14: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

14

Page 15: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

15

Page 16: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

16

Page 17: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

17

SWRL Java API• The SWRL API provides a mechanism to

create and manipulate SWRL rules in an OWL knowledge base.

• This API is used by the SWRL Editor. However, it is accessible to all OWL Plugin developers.

• Third party software can use this API to work directly with SWRL rules and integrate rules into their applications

• Fully documented in SWRLTab Wiki.

Page 18: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

18

Executing SWRL Rules

• SWRL is a language specification

• Well-defined semantics

• Developers must implement engine

• Or map to existing rule engines

• Hence, a bridge…

Page 19: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

OWLKB+

SWRL

SWRL Rule Engine Bridge

Data

Knowledge

Rule Engine

SWRL Rule Engine Bridge

GUI

Page 20: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

20

SWRL Rule Engine Bridge• Given an OWL knowledge base it will extract

SWRL rules and relevant OWL knowledge.• Also provides an API to assert inferred

knowledge.• Knowledge (and rules) are described in non

Protégé-OWL API-specific way.• These can then be mapped to a rule-engine

specific rule and knowledge format.• This mapping is developer’s responsibility.

Page 21: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

21

We used the SWRL Bridge to Integrate Jess Rule Engine with

Protégé-OWL

• Jess is a Java-based rule engine.

• Jess system consists of a rule base, fact base, and an execution engine.

• Available free to academic users, for a small fee to non-academic users

• Has been used in Protégé-based tools, e.g., JessTab.

Page 22: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.
Page 23: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.
Page 24: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

24

Page 25: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

25

Page 26: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

26

Page 27: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.
Page 28: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

28

Page 29: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

29

Outstanding Issues• SWRL Bridge does not know about all OWL

constraints:– Contradictions with rules possible!– Consistency must be assured by the user

incrementally running a reasoner.– Hard problem to solve in general.

• Integrated reasoner and rule engine would be ideal.

• Possible solution with KAON2.

Page 30: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

30

SWRL Built-in Bridge

• SWRL provides mechanisms to add user-defined predicates, e.g., – hasDOB(?x, ?y) ^ temporal:before(?y, ‘1997’)…– hasDOB(?x, ?y) ^ temporal:equals(?y, ‘2000’)…

• These built-ins could be implemented by each rule engine.

• However, the SWRL Bridge provides a dynamic loading mechanism for Java-defined built-ins.

• Can be used by any rule engine implementation.

Page 31: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

31

Defining a Built-in in Protégé-OWL

• Describe library of built-ins in OWL using definition of swrl:Builtin provided by SWRL ontology.

• Provide Java implementation of built-ins and wrap in JAR file.

• Load built-in definition ontology in Protégé-OWL. Put JAR in plugins directory.

• Built-in bridge will make run-time links.

Page 32: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

32

Example: defining stringEqualIgnoreCase from Core SWRL Built-ins Library

• Core SWRL built-ins defined by:– http://www.w3.org/2003/11/swrlb

• Provides commonly needed built-ins, e.g., add, subtract, string manipulation, etc.

• Normally aliased as ‘swrlb’.

• Contains definition for stringEqualIgnoreCase

Page 33: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

33

package edu.stanford.smi.protegex.owl.swrl.bridge.builtins.swrlb; import edu.stanford.smi.protegex.owl.swrl.bridge.builtins.*; import edu.stanford.smi.protegex.owl.swrl.bridge.exceptions.*;

public class SWRLBuiltInMethodsImpl implements SWRLBuiltInMethods { public boolean stringEqualIgnoreCase(List arguments) throws BuiltInException { ... } .... } // SWRLBuiltInMethodsImpl

Example Implementation Class for Core SWRL Built-in Methods

Page 34: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

34

Example Implementation for Built-in swrlb:stringEqualIgnoreCase

private static String SWRLB_SEIC = "stringEqualIgnoreCase";

public boolean stringEqualIgnoreCase(List arguments) throws BuiltInException { SWRLBuiltInUtil.checkNumberOfArgumentsEqualTo(SWRLB_SEIC, 2, arguments.size());

String argument1 = SWRLBuiltInUtil.getArgumentAsAString(SWRLB_SEIC, 1, arguments); String argument2 = SWRLBuiltInUtil.getArgumentAsAString(SWRLB_SEIC, 2, arguments);

return argument1.equalsIgnoreCase(argument2); } // stringEqualIgnoreCase

Page 35: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

35

Invocation from Rule Engine

• Use of swrlb:stringEqualIgnoreCase in rule should cause automatic invocation.

• SWRL rule engine bridge has an invocation method.

• Takes built-in name and arguments and performs method resolution, loading, and invocation.

• Efficiency a consideration: some methods should probably be implemented natively by rule engine, e,g., add, subtract, etc.

Page 36: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

36

Patient(?p) ^ hasExtendedEvent(?p, ?eevent1) ^ hasExtendedEvent(?p, ?eevent2) ^ temporal:hasValue(?

eevent1, ?event1) ^ temporal:hasValidTime(?eevent1, ?event1VT) ^ temporal:hasTime(?event1VT, ?event1Time) ^ temporal:hasValue(?eevent2, ?event2) ^

temporal:hasValidTime(?eevent2, ?event2VT) ^ temporal:hasTime(?event2VT, ?event2Time) ^hasVisit(?event1, ?v1) ^ hasVisit(?event2, ?v2) ^

hasActivity(?event1, ?a1) ^ hasName(?a1, "Omalizumab") ^hasActivity(?event2, ?a2) ^ hasName(?a2, "Immunotherapy") ^

temporal:before(?event2Time, ?event1Time) ^temporal:durationMinutesLessThan(60, ?event2Time, ?event1Time)

-> NonConformingPatient(?p)

On days that both immunotherapy and omalzumab are administered,

omalzumab must be injected 60 minutes after immunotherapy.

Using SWRL to Express Protocol Constraints

Page 37: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

37

SWRL and Querying

• SWRL is a rule language, not a query language

• However, a rule antecedent can be viewed as a pattern matching specification, i.e., a query

• With built-ins, language compliant query extensions are possible.

Page 38: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

38

A SWRL ‘Query’

Person(?p) ^ hasAge(?p, ?age) ^ swrlb:greaterThan(?age, 17) -> swrlq:select(?p) ^ swrlq:orderBy(?age)

Return all adults in ontology:

Page 39: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

SWRLQueryTab

Page 40: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

SWRLQueryTab: Displaying Results

Page 41: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

41

SWRLQueryTab

• Query functionality added with built-ins

• Interactive query execution with tabular results display

• Low-level JDBC-like API for use in embedded applications

• Can use any existing rule engine back end

• Will be available as part of Protégé-OWL SWRLTab in a few weeks

Page 42: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

42

Use of SWRL as Query Language is Attractive

• Cleaner semantics than SPARQL

• OWL-based, not RDF-based

• Very extensible via built-ins, e.g., temporal queries using temporal built-ins

• Can work with reasoners

Page 43: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

43

Querying: Semantic Issues

• Syntactic SWRL conformance is easy

• However, SWRL is based on OWL-DL so assumes open world semantics

• Querying closes the world, e.g., how many adults in ontology?

• Should not make inferences based on query results – nonmonotonicity!

Page 44: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

44

Dealing with Relational Data

• Almost all data are relational

• Relational queries are at the database level not at the knowledge level

• We would like results of queries and analyses to be added to our store of knowledge

• We need to bridge the gap

• Triple stores a longer term solution

Page 45: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

45

Querying and Databases

Page 46: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

46

Querying and Databases

Page 47: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

47

Model Mismatch

• Relational n-ary tuples vs. RDF-triples

• Relational databases can store a lot of knowledge; typically they don’t

• Some mappings can be inferred

• The more normalized the database, the easier it is to infer mappings

• Manual user-driven mapping is usually required

Page 48: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

48

Solution Requirements

• A schema ontology to describe schema of arbitrary relational database

• A mapping ontology to describe mapping of data from tuples to triples

• Mapping software to dynamically map

• A query language

• A query engine

Page 49: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

Mapper

OWLKB Bridge

User Interface

Data

Knowledge

Engine

Page 50: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

50

Dynamic Relation-to-OWL Mapping

• Bridge generates optimized relational queries to retrieve data

• Current SPARQL-based systems– D2RQ– D2OMapper

• Approach used successfully in BioSTORM project for surveillance data

Page 51: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

51

Optimization

• Current ontology tools not scalable

• Databases are scalable: – offload as much work to RDBMS as possible

• Query engines must optimize

• Built-ins are a difficulty and an opportunity

Page 52: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

52

Built-in Optimization

Person(?p) ^ hasAge(?p, ?age) ^ swrlb:greaterThan(?age, 17) -> swrlq:select(?p) ^ swrlq:orderBy(?age)

Page 53: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

53

Two Approaches

• Built-in optimization by annotating built-in definitions and exploiting in query engine– Numerical built-in optimizations – Temporal built-in optimizations

• In-query optimization to avoid redundant data requests – Jess with Java Fact Storage Provider

Framework

Page 54: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

54

Rule/Query Distinction

• Significant optimizations possible for a queries

• Optimizations for entire rule bases not as dramatic – however, still possible, e.g., – Analyzing temporal ‘slices’– Analyzing spatial regions

• Dealing with reasoners

• Database updates?

Page 55: Efficiently Querying Relational Databases using OWL and SWRL Martin OConnor Stanford Medical Informatics, Stanford University.

55

Lessons learned so far

• SWRL provides a useful though not magical increase in expressivity

• Suited well to some tasks, not to others• Can work well as a query language• Built-ins provide a very nice way to

increase expressivity• Triple-stores are a longer term solution,

but dealing with relational data now is crucial