Principles of the Semantic Web DB or KRDB? Alon Halevy.

29
Principles of the Semantic Web DB or KRDB? Alon Halevy

Transcript of Principles of the Semantic Web DB or KRDB? Alon Halevy.

Page 1: Principles of the Semantic Web DB or KRDB? Alon Halevy.

Principles of the Semantic WebDB or KRDB?

Alon Halevy

Page 2: Principles of the Semantic Web DB or KRDB? Alon Halevy.

2

Agenda

A spectrum of representation and query formalisms:– From relational databases to description logics and

beyond.Questions:– What can we represent?– What can we query?– How much will it cost?– Can we actually use this stuff?

Page 3: Principles of the Semantic Web DB or KRDB? Alon Halevy.

3

Why Do We Care?

Need to represent data and knowledge on the semantic web.Need to query.Need to map between different representations of data/knowledge.People are confused, very religious about these issues.

Page 4: Principles of the Semantic Web DB or KRDB? Alon Halevy.

Perspectives from the Structure Chasm

Authoring

Creating a schemaWriting text

Querying

keywords Someone else’s schema

Data sharing

Easy Committees & standards

Page 5: Principles of the Semantic Web DB or KRDB? Alon Halevy.

5

Specifics

The relational model, a few query languages– Representing real data

Horn rules, the DB view and the KR view.XML: shattering the myth of no semantics.Description Logics: a logic of descriptions.One shameless plug for my past work.

Page 6: Principles of the Semantic Web DB or KRDB? Alon Halevy.

6

Recalling First-Order Logic

A KB is a set of well-formed formulas X (Person(X) Mortal(X))X (Student(X) Smart(X))Person(a) v Dog(a)

X (Student(x) (x=Aristotle))

Interpretation: a mapping from terms to the universe of discourse.

Model: any interpretation that satisfies the formulas.Key idea: infer implicit facts from the explicit ones.

Page 7: Principles of the Semantic Web DB or KRDB? Alon Halevy.

7

Example Inferences

The KB: X (Person(X) Mortal(X))X (Student(X) Smart(X))Person(a) v Dog(a)

X (Student(x) (x=Oren))– Mortal(a)? don’t know.– Smart(Oren)? Yes.– Mortal(a) v Mortal(a)? YesX (Person(x) not Mortal(x))? No

Page 8: Principles of the Semantic Web DB or KRDB? Alon Halevy.

8

The Relational Data Model

PName Price Category Manufacturer

Gizmo $19.99 Gadgets GizmoWorks

Powergizmo $29.99 Gadgets GizmoWorks

SingleTouch $149.99 Photography Canon

MultiTouch $203.99 Household Hitachi

Product

Attribute namesTable name

Tuples or rows

Page 9: Principles of the Semantic Web DB or KRDB? Alon Halevy.

9

No’s

No negationNo disjunctionAmbiguous support for incomplete information

The database represents a single model.

Hence, inference is just model checking.

Page 10: Principles of the Semantic Web DB or KRDB? Alon Halevy.

10

Integrity Constraints

Very specific forms of logical formulae. Enforced (maybe) by the database system:– Functional dependencies: A B– Foreign key constraints: every tuple in the Purchase

table must refer to a product in the Product table.– Multi-valued dependencies.– Tuple-generating dependencies, etc. etc.

Page 11: Principles of the Semantic Web DB or KRDB? Alon Halevy.

11

Inference: Type 1 (Querying)

Product (pname, price, category, manufacturer)Company (cname, stockPrice, country)

Find all countries that manufacture some product in the ‘Gadgets’ category.

SELECT countryFROM Product, CompanyWHERE manufacturer=cname AND category=‘Gadgets’

SELECT countryFROM Product, CompanyWHERE manufacturer=cname AND category=‘Gadgets’

Q(c) :- Product(w,y,’Gadgets’,x), Company(x,p,c)

Page 12: Principles of the Semantic Web DB or KRDB? Alon Halevy.

12

Query Language Features

Fundamental question: what queries can I express with my language?Start with selection, projection, and join.+ union and negation (= relational completeness)

To deal with real data:– Grouping and aggregation– Dealing with duplicates– Outer joins

Page 13: Principles of the Semantic Web DB or KRDB? Alon Halevy.

13

Inference Type 2: Query Containment

• Question: is the result of Q1 always a subset of Q2?

Q1(A,B) :- cites(A,B), cites(B,A), sameTopic(A,B)Q2(C,D) :- cites(C,C1), cites(D,D1)

• Inference on a very specific type of formula.• Only finite models are considered.

Page 14: Principles of the Semantic Web DB or KRDB? Alon Halevy.

14

Complexity Results Galore

For select-project-equi-join: NP-CompleteAdd comparisons (e.g, <): Pi^p_2 complete.Add negation: – Level 0: still Pi^p_2 complete.– Level 2: undecidable– Level 1: Sagiv and Ullman know but don’t want to tell

Allow at most 2 occurrences of every predicate name: polynomial.A lot of papers.

Page 15: Principles of the Semantic Web DB or KRDB? Alon Halevy.

15

Last Week Recap

The relational model: ground facts + UNA + CWA.Integrity constraints: expressing more knowledge.Inference Type 1: querying (= model checking).– Note: polynomial time is not good enough.

Inference Type 2: query containment– Type 2.5: answering queries using views.– Both are an inference problem of a particular type of

formula in first-order logic over finite models.

Page 16: Principles of the Semantic Web DB or KRDB? Alon Halevy.

16

Beyond ContainmentAnswering Queries Using Views

Given a query Q and a set of view definitions V1,…,Vn:Is it possible to answer Q using only the V’s?

V1(A,B) :- cites(A,B), cites(B,A)V2(C,D) :- sameTopic(C,D), cites(C,C1), cites(D,D1)Query:q(x,y) :- sameTopic(x,y), cites(x,y), cites(y,x)

Query rewriting: q’(X,Y) :- V1(X,Y), V2(X,Y)

Page 17: Principles of the Semantic Web DB or KRDB? Alon Halevy.

17

Didn’t We Say 590 Semantic Web?

Assume a virtual schema of the WWW, e.g.,– Course(number, university, title, prof, quarter)

Every data source on the web contains the answer to a view over the virtual schema:

UW database: SELECT number, title, prof FROM Course WHERE univ=‘UW’ AND quarter=‘2/02’Stanford database: SELECT number, title, prof, quarter FROM Course WHERE univ=‘Stanford’User query: find all professors who teach “database systems”

Page 18: Principles of the Semantic Web DB or KRDB? Alon Halevy.

18

Horn Rules / Datalog

Easy for KR people. Hard for database people.Add recursion to the query language:– Path (x,y) :- edge(x,y)– Path (x,y) :- Path(x,z), Path(z,y)

DB people consider least fixed point semantics.{ edge(a,b), Path(a,b), Path(b,c), Path(a,c)}:– Is a model for KR folks, not DB folks.

Recursion is not expressible in first-order logic.

Page 19: Principles of the Semantic Web DB or KRDB? Alon Halevy.

19

More on Datalog

Many clever algorithms for evaluating datalog queries:– They have fancy names: e.g., magic sets

Query containment: undecidable, unless you constrain the queries (ask Surajit)Some ideas made it into SQL and relational systems:– Magic sets (useful even without recursion)– Linear recursion in SQL-3

Page 20: Principles of the Semantic Web DB or KRDB? Alon Halevy.

20

XML

<db> <book> <title>Complete Guide to DB2</title> <author>Chamberlin</author> </book> <book> <title>Transaction Processing</title> <author>Bernstein</author> <author>Newcomer</author> </book> <publisher> <name>Morgan Kaufman</name> <state>CA</state> </publisher></db>

Page 21: Principles of the Semantic Web DB or KRDB? Alon Halevy.

21

XML: Issues

Data model: edge-labeled graph (/tree):– The tags can be viewed as binary relations

Features:– The schema is embedded in the data.– Can have a predefined schema (XML Schema)– Nesting can be arbitrary.– Can be irregular (e.g., different formats for elements)– Order of elements may be important.

Page 22: Principles of the Semantic Web DB or KRDB? Alon Halevy.

22

Querying XMLXPath, XQuery, XSLT.XQuery is based on XPath, XML-QL, SQL.Query languages features:– Path expressions – The Return clause: creates the output XML

document.– Standard query language bells and whistles.– Not in XQuery: tag variables – bind variables to

schema elements.Query containment: ask Dan and Gerome.

Page 23: Principles of the Semantic Web DB or KRDB? Alon Halevy.

23

Knowledge Representation

McCarthy suggested some form of first-order logic.Semantic networks – popular on the east coast.Evolved into:– Frame-based systems– Description logics (a.k.a. terminological logics).

Page 24: Principles of the Semantic Web DB or KRDB? Alon Halevy.

24

Description Logics

A subset of first-order logic with a German syntax. No variables.Allows only:– Unary relations (Concepts): Person, Happy– Binary relations (Roles, attributes): childOf

A DL Knowledge base:– Abox of ground facts: Person(sue), Happy(bob)– Tbox of definitions.

Page 25: Principles of the Semantic Web DB or KRDB? Alon Halevy.

25

Concept Descriptions

Built using a set of constructs:C, D A | Primitives Konzept | Top Konzept

| Bottom Konzept C D | DurchschnittC U D | Vereinigung

C | Komplement R.C | Rollenquantifikation/Werterestriktion R.C | Rollenquant./Existentielle Restriktion

Page 26: Principles of the Semantic Web DB or KRDB? Alon Halevy.

26

Concept Descriptions

Built using a set of constructs:C, D A | Primitive concepts | Top concept

| Bottom conceptC D | IntersectionC U D | Union

C | Complement R.C | Universal restriction R.C | Existential restriction (> n R) | number restriction

Page 27: Principles of the Semantic Web DB or KRDB? Alon Halevy.

27

TBox Assertions

Concept introduction:– Person Mammal

Concept definition:– Parent = Person (> 0 child)– HappyParent = Parent ( child.Smart)

Inclusion assertions:– Parent ( child. (= name Karina)) HappyParent– Person (> 5 child) HappyParent

Page 28: Principles of the Semantic Web DB or KRDB? Alon Halevy.

28

Reasoning in DLs.

Note: you can assert view facts – HappyParent(bob)– You don’t know who the children are, but you know

they’re smart.Classification: C(a)?Consistency: is C necessarily an empty concept?Subsumption: C1 C2?Theoretically, everything boils down to subsumption.Complexity depends on set of constructors allowed.

Page 29: Principles of the Semantic Web DB or KRDB? Alon Halevy.

29

DLs vs. Horn rules

Horn rules can handle any variable pattern– DL’s can handle only specific patterns.

DL’s can do subsumption with negation, number restrictions, and various other features.They can be combined but decidability is subtle (see CARIN, [Levy and Rousset, 1996]).