Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database &...

35
Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database &...

Page 1: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

Web Semantics: KB vs. DB

Zachary G. IvesUniversity of Pennsylvania

CIS 650 – Database & Information Systems

April 13, 2005

Page 2: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

2

Administrivia

Next readings and summaries: Bernstein on Model Management Dong and Halevy on Personal Info

Management

2 paragraph summary of the problems they focus on, key contributions

Page 3: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

3

Today’s Trivia Question

Page 4: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

4

Last Time…

The Semantic Web vision and goals Core ideas:

RDF as “semantic” format Also RDFS schema format

Ontologies as the standard way of defining concepts

Description logics are the way most ontologies are defined (OWL language)

Page 5: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

5

Description Logics(Borgida survey)

A class of languages based on FOL, like Datalog, Prolog

Key questions: subsumption of classes, recognition of members of classes

Prolog allows us to reason about instances: ParentOf(liz,andy). Male(andy). Child(_x) :- ParentOf(_z, _x) Son(_y) :- Male(_y), ParentOf(_w, _y)

DLs allow us to make further inferences – that andy is a Child, i.e., they realize: Child(x) (9 z) ParentOf(z,x) Son(y) (9 w) Male(y) Æ ParentOf(w,y)

Page 6: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

6

Syntax and Semantics

Build variable-free composite terms from atoms using term constructors (e.g., at-most, all)

COURSE and at-most(10, takers) and all (takers, GRADS)

(:and COURSE (:at-most 10 takers) (:all takers GRADS) COURSE \ · 10 takers \ 8 takers:GRADS

Can be expressed in FOPC: COURSE(a) Æ (9 x1 … x10) takers(a,x1) Æ … Æ takers(a, x10) Æ

(x1 ≠ x2 Æ x2 ≠ x3 Æ … Æ x9 ≠ x10) Æ takers µ GRADS

Page 7: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

7

Questions for DLs

Is a description D consistent and coherent? Not if the instance is empty for every possible relational

structure

Are D and D’ mutually disjoint? Yes if DI [ D’I = ; for every I

Are D and D’ equivalent? Yes if DI = D’I for every I

Does D subsume some other description D’? Yes if for every relational structure I, DI subsumes D’I

Inconsistency: and(C,D) NOTHING Equivalence: D subsumes D’, D’ subsumes D

Page 8: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

8

DL Example class STUDENT is-a PERSON with

studNumber: int, key; level: {1,2,3,4} and(PERSON, all(studNumber, INTEGER), at-

least(1,studNumber),at-most(1,studNumber), all(level, one-of(1,2,3,4)), at-least(1,level),at-most(1,level)

at-most(1, compose(studNumber, inverse(studNumber)) ENROLLMENT := and(

all(st,STUDENT) at-least(1,st) at-most(1,st)all(crs,COURSE) at-least(1,crs) at-most(1,crs)all(when,DATE) at-least(1,when) at-most(1,when))

STUDENT := and(all(inverse(st), ENROLLMENT)at-least(1, inverse(st)) at-most(6, inverse(st))

COURSE := and(all(inverse(crs), ENROLLMENT)at-least(1, inverse(crs)) at-most(300,inverse(crs)))

INSERT-IN(Cs431, COURSE). FILL-WITH(Cs431,taughtBy,Einstein). FILL-WITH(Cs431,takers,Anna)

Page 9: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

9

More on DLs

We can have both primitive classes (equivalent to extensional relations) and virtual ones But we can make assertions over virtual classes that

directly impact the primitive ones Contrast with updates to views in databases

Many different levels of expressiveness in different DLs

Comparison with Datalog: Both are subsets of FOL, with some limitations DLs allow bidirectional inference; Datalog is unidirectional DLs are equivalent to at most FOL with <= 3 variables;

Datalog has an unbounded number of existential variables

Page 10: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

10

Coming Back to the SW

Lots of work on OWL, the Web Ontology Language Based on different levels of DLs:

OWL Lite – classification hierarchy, simple constraints (cardinalities 0 or 1)

OWL DL – maximum expressiveness, computational completeness (always decidable and terminating)

OWL Full – no computational guarantees, allows classes as instances of other classes

Goal: each community builds an ontology But how to relate ontologies?

“equivalentClass”, “equivalentProperty”, “sameAs”

Is this enough???

Page 11: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

11

The Data Management Argument

The Semantic Web is all about integration and translation

But there’s no notion of translation in the SW, except for equivalences

“Semantic normalization”???

Does DB research have something to add? If so, what needs to change?

Page 12: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

12

Database Approaches to Semantic Integration

Data warehouse Design a single schema

Do physical DB design Map data into

warehouse schema Periodically update

warehouse

DataIntegration

System

Query

Mediatedschema

Wrappers

(demand-driven)

Data incommonformat

XML Sources

Rel. Sources

Virtual data integration (EII) Design mediated schema Map sources to mediated

schema Queries are rewritten and

answered on demand from sources

Page 13: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

13

A Single Centralized Schema is a Bottleneck!

Challenging to form a single schema for all domain data People don’t agree on how concepts should be represented Data warehouse: physical design is a strong consideration Mediated schema very different from original users’ schemas

Mappings may be challenging to create, and do not leverage work of previous source mappings

Each source gets mapped to mediated schema separately

Difficult to evolve this single schema as needs change May “break” existing queries Must build consensus for any schema changes

Page 14: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

14

Peer Data Management: Decentralized Mediation for Ad Hoc Extensibility

DB Projects

UPenn UW Stanford IIT Mumbai

Data integration: 1 mediated schema, m mappings to sources

Peer data management system (PDMS): n mediated “peer schemas,” as few as (n - 1)

mappings between them – evaluated transitively m mappings to sources

Page 15: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

15

Peer-to-Peer at both Logical and Architectural Levels

A “logical” peer-to-peer model:Every participant can contribute:

Extensional data Mappings between schemas Computation (query answering) and caching

Can we do a database (say, XML) version of the SW?

Page 16: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

16

RDF vs. XML

RDF explicitly names relationships:(book, title, “ABC”)(book, writtenBy, author)(author, name, “John Smith”)

XML does not always:1. <book>

<title>ABC</title> <writtenBy> <author><name>John Smith</name></author> </writtenBy></book>

2. <book> <title>ABC</title> <author>John Smith</author></book>

title name

book authorwrittenBy

Page 17: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

17

RDF vs. XML 2

RDF is subject-neutral (a graph) XML centers around a subject (a tree):

1. <book> <title>ABC</title> <author>John Smith</author></book>

2. <author> <name>John Smith</name> <book>ABC</book></book>

This may result in duplication of contained objects

Page 18: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

18

An XML Version of the Semantic Web

Data model: XML + Schema Vast volumes of data already in XML (or exported as XML) CAVEAT: not all relationships are labeled in XML

(“XML has no semantics.”)

Concepts: Views ≈ classes; schemas ≈ ontologies Views define membership via queries; can reason about

containment CAVEAT: less expressive than OWL classes

Schema mappings: target schema as query over sourceSophisticated reasoning about mappings is possible by extending existing data integration techniques Can use mappings in in “forward” and “reverse” directions Allows for “chaining” of mappings to answer queries

Page 19: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

19

Let’s Start with the Relational Model and then Extend

GAV: mediated relations as views over sources

Easy to rewrite queries: unfold them using view definitions

LAV: sources as views over mediated relations

More challenging to rewrite queries: answering queries using views (e.g., MiniCon [Pottinger & Levy 00])

More flexible in representing source properties

Med. Schema T1, …

…MST1(X’) :- S1(X),…MST2(Y’) :- S2(Y),…

Med. Schema T1, …

…S1(X’) MST1(X),…S2(Y’) MST1(Y),…

S1(X) S2(Y)

S1(X) S2(Y)

Page 20: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

20

Answering Queries in a PDMS:Transitively Evaluating Mappings

Start with schema being queried Look up mappings to neighbors; expand Continue iteratively until queries only over sources

Mappings in a PDMS may be a combination of LAV, GAV techniques: General form p1a(X, Y), p1b(Y,Z), … = p2a(Y, X), p2b(X,

Z), …(see paper for examination of what is actually tractable)

Requires unfolding and AQUV

We use a rule-goal “tree” to expand the mappings Extend some of the ideas of MiniCon to avoid

unnecessary expansions Challenges to avoid redundancy – see paper for

optimizations

Page 21: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

21

Example of Query Answering

Mappings between peers’ schemas:r0: SameProject(a1,a2,p) :- ProjMember(a1,p),

ProjMember(a2,p)r1: CoAuthor(a1,a2) Author(a1,w), Author(a2,w)

Mappings to data sources:r2: S1(a,p,s) ProjMember(a,p), Sched(f,s,end)r3: CoAuthor(f1,f2) :- S2(f1,f2)

Query: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)

Sched(f,s,e)

SameProject (a1,a2,p)

ProjMember (a1,p)

CoAuthor (a1,a2)

Author (a,w)

S1 S2

r0

r2

r3

r1

Page 22: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

22

Example Rule-Goal Tree Expansion

q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)

q

Page 23: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

23

Example Rule-Goal Tree Expansion

q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)

SameProject(a1,a2,p) Author(a1,w) Author(a2,w)

q

Page 24: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

24

Example Rule-Goal Tree Expansion

q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)

SameProject(a1,a2,p) Author(a1,w) Author(a2,w)

q

Mappings between peers’ schemas:r0: SameProject(a1,a2,p) :- ProjMember(a1,p), ProjMember(a2,p)r1: CoAuthor(a1,a2) Author(a1,w), Author(a2,w)

Page 25: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

25

Example Rule-Goal Tree Expansion

q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)

SameProject(a1,a2,p) Author(a1,w) Author(a2,w)

q

r0 r1 r1

Mappings between peers’ schemas:r0: SameProject(a1,a2,p) :- ProjMember(a1,p), ProjMember(a2,p)r1: CoAuthor(a1,a2) Author(a1,w), Author(a2,w)

Page 26: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

26

Example Rule-Goal Tree Expansion

q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)

SameProject(a1,a2,p) Author(a1,w) Author(a2,w)

ProjMember(a1,p)ProjMember(a2,p) CoAuthor(a1,a2) CoAuthor(a2,a1)

q

r0 r1 r1

Mappings to data sources:r2: S1(a,p,s) ProjMember(a,p), Sched(a,s,end)r3: CoAuthor(f1,f2) = S2(f1,f2)

Page 27: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

27

Example Rule-Goal Tree Expansion

q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)

SameProject(a1,a2,p) Author(a1,w) Author(a2,w)

ProjMember(a1,p)ProjMember(a2,p) CoAuthor(a1,a2) CoAuthor(a2,a1)

q

r0 r1 r1

Mappings to data sources:r2: S1(a,p,s) ProjMember(a,p), Sched(a,s,end)r3: CoAuthor(f1,f2) = S2(f1,f2)

r3 r3r2 r2

Page 28: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

28

Example Rule-Goal Tree Expansion

q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)

SameProject(a1,a2,p) Author(a1,w) Author(a2,w)

ProjMember(a1,p)ProjMember(a2,p) CoAuthor(a1,a2) CoAuthor(a2,a1)

S1(a1,p,_) S1(a2,p,_) S2(a1,a2) S2(a2,a1)

q

r0 r1 r1

r3 r3r2 r2

Page 29: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

29

Example Rule-Goal Tree Expansion

q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)

SameProject(a1,a2,p) Author(a1,w) Author(a2,w)

ProjMember(a1,p)ProjMember(a2,p) CoAuthor(a1,a2) CoAuthor(a2,a1)

S1(a1,p,_) S1(a2,p,_) S2(a1,a2) S2(a2,a1)

q

r0 r1 r1

r3 r3r2 r2

Q’(a1,a2) :- S1(a1,p,_), S1(a2,p,_), S2(a1,a2) S1(a1,p,_), S1(a2,p,_), S2(a2,a1)

Page 30: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

30

Stepping up to XML (WWW03)

Goals: Build on XQuery and XML (extended with RDF-style identity,

following lead of [Patel-Schneider & Simeon 02]) Remain computationally inexpensive Capture the common mapping types

Directional mapping language based on templates<output> {: $var IN document(“doc”)/path WHERE condition :}

<tag>$var</tag></output>

Translates between parts of data instances Restricted subset of XQuery that’s decidable to reason about Supports special annotations and object fusion

Can map XML-XML, XML-RDF, RDF-XML (at data level)

Page 31: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

31

Mapping Example between XML Schemas

Target:pubs

book* title

author*

name

Source:authors

author* full-

name publication*

title pub-type

pub-type name

publication authorwrittenBy

title

Page 32: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

32

Example Piazza Mapping

<pubs><book piazza:id={$t}>{: $a IN document(“…”)/authors/author, $an IN $a/full-name, $t IN $a/publication/title, $typ IN $a/publication/pub-type WHERE $typ = “book” PROPERTY $t >= ‘A’ AND $t < ‘B’ :}

<title>{$t}</title>

<author><name>{$an}</name></author></book>

</pubs>

Page 33: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

33

Challenges

Query reformulation for XML is significantly harder Hierarchy, 1:n schema constraints, ability to

map from values to tags, … Can only do ~ the XML equivalent of

conjunctive queries

See the WWW03 paper (plus later work by Yu and Popa, Deutsch et al., many others) for details

Page 34: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

34

What about Values?

Thus far, we’ve focused on schema mappings

Almost as important in the real world: mappings of values to values Proteins to binding sites SSNs to customer IDs etc.

The Hyperion system (KAM 03) focuses on computing transitive relationships between mappings In many cases, we only have partial transitive mappings Key idea: divide all of the mappings into partitions, each

of which can compute transitive closures separately

Page 35: Web Semantics: KB vs. DB Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 13, 2005.

35

Assessment: The Semantic Web

The KB world focuses on expressively capturing concepts

The DB world focuses on integrating and restructuring data (but views are less expressive in certain ways)

Do either of these seem likely to change the world?

What barriers need to be removed?