09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure...

23
09/12/2003 09/12/2003 Peer-to-Peer Information Systems – Peer-to-Peer Information Systems – WS 03/04 WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G. Ives, Alon Y. Halevy, Zachary G. Ives, Peter Peter Mork, Igor Tatarinov. Mork, Igor Tatarinov. Speaker: Sergey Chernov Speaker: Sergey Chernov Tutor: Jens Graupmann Tutor: Jens Graupmann

Transcript of 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure...

Page 1: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/200309/12/2003 Peer-to-Peer Information Systems – WS 03/Peer-to-Peer Information Systems – WS 03/0404

11

Piazza: Data Management Infrastructure for Semantic Web

Applications

Alon Y. Halevy, Zachary G. Ives,Alon Y. Halevy, Zachary G. Ives, Peter Mork, Peter Mork, Igor Tatarinov.Igor Tatarinov.

Speaker: Sergey ChernovSpeaker: Sergey Chernov

Tutor: Jens GraupmannTutor: Jens Graupmann

Page 2: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 2

OutlineOutline

1. INTRODUCTION. SEMANTIC WEB.2. PIAZZA: SYSTEM OVERVIEW3. IMPLEMENTATION DETAILS

3.1 MAPPING LANGUAGE3.2 QUERY ANSWERING ALGORITHM

4. CONCLUSIONS.

Page 3: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 3

IntroductionIntroduction

►Goal:Goal: Data Integration and Knowledge Data Integration and Knowledge

ManagementManagement

►Problem:Problem: Web data lacks machine-understandable Web data lacks machine-understandable

semanticssemantics

►Solution:Solution: Semantic Web?Semantic Web?

Page 4: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 4

The Semantic WebThe Semantic Web**

► Web sites include structural annotations You can pose meaningful queries on them. Ontologies provide the semantic glue. Internal implementation of web sites left open.

► Agents perform tasks: Query one or more web sites Perform updates (e.g., set schedules) Coordinate actions Trust each other (or not).

► I.e., agents operating on a gigantic heterogeneous distributed database.

(*View by A. Halevy)(*View by A. Halevy)

Page 5: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 5

General requirementsGeneral requirements► Robust infrastructure for querying

Peer data management systems.

► Facilitate mapping between different structures. Need tools for: Locating relevant structures Easily joining the semantic web.

► Get data into structured form Should we worry about the legacy web?

Page 6: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 6

Using views for specifyingmappings

► Local-As-View (LAV). Data sources can be described as views over the mediated schema.

► Global-As-View (GAV). Mediated schema can

be described as a set of views over the data sources.

Mediated Schema

Site B

Site A

Site C

Mediated Schema

Site B

Site A

Site C

Page 7: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 7

Mapping

►Mapping AB Mapping AB specifies specifies representation representation of structured of structured data from data from scheme of scheme of node A into node A into scheme of scheme of node B node B

Mediated Schema

Site B

Site A

Site C

Mapping “AB”

Mapping “BA”

Mapping “BC”

Mapping “CB”

Mapping

“C-MS”

Mapping

“MS-C”

Mapping

“A-MS”Mapping

“MS-A”

Page 8: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 8

Piazza: Peer Data-Management Piazza: Peer Data-Management SystemSystem

► Goal:Goal: Large scale autonomous sharing of structured Large scale autonomous sharing of structured

datadata

► Peer data management system (PDMS)Peer data management system (PDMS)

Autonomous Peers export data in their own Autonomous Peers export data in their own schemasschemas

Pair-wise mappings between peersPair-wise mappings between peers

Generalization of a Data Integration systemGeneralization of a Data Integration system

NOT a P2P file sharing systemNOT a P2P file sharing system

Page 9: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 9

Relationship of PDMS to…Relationship of PDMS to…

► P2P overlay networks (the “Structured World”)P2P overlay networks (the “Structured World”)

► Data integration systems (no central logical Data integration systems (no central logical mediated schema)mediated schema)

► Federated databases (scale, ad-hoc nature)Federated databases (scale, ad-hoc nature)

► Distributed databases (no central Distributed databases (no central administration)administration)

Page 10: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 10

Representing DataRepresenting Data

► A spectrum of possibilities: Relational tables, some integrity constraints XML: can encode relational, hierarchical

►Xquery – emerging standard query language (SQL for XML)

RDF: “XML on drugs”.►Sees only the logic; ignores other aspects.

DAML+OIL►Full-blown Knowledge representation language.

► They all have semantics; just different expressive powers.

► We keep the data simple. Mappings between data at different peers are more complex.

Page 11: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 11

Peer Data ManagementPeer Data Management

► Mappings are query expressions Mappings are query expressions DbResearcher(x) DbResearcher(x) Researcher(x),Area(x,DB)Researcher(x),Area(x,DB) DbResearcher(x), Office(x,DBLab) = DbResearcher(x), Office(x,DBLab) =

DbLabMember(x)DbLabMember(x)

DB Projects

MIT UW UCB Stanford

Area(areaID, name, descr)Project(projID, name, sponsor)ProjArea(projID, areaID)Pubs(pubID, projName, title, venue, year)Author(pubID, author)Member(projName, member)

Project(projID, name, descr)Student(studID, name, status)Faculty(facID, name, rank, office)Advisor(facID, studID)ProjMember(projID, memberID)Paper(papID, title, forum, year)Author(authorID, paperID)

Area(areaID, name, descr)Project(projID, areaID, name)Pub(pubID, title, venue, year)PubAuthor(pubID, authorID)PubProj(pubID, projID)Member(memID, projID, name, pos)Alumn(name, year, thesis)

Members(memID, name)Projects(projID, name, startDate)ProjFaculty(projID, facID)ProjStudents(projID, studID)…

Direction(dirID, name)Project(pID, dirID, name)…

Page 12: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 12

Piazza mapping language (1)Piazza mapping language (1)

Target:

pubs book* title author* name publisher* name

Source:

authors author* full-name publication* title pub-type

<pubs> <book> {: $a IN document(“source.xml”)\

/authors/author $t IN $a/publication/title, $typ IN $a/publication/pub-type WHERE $typ = “book” : }

<title> { $t }</title> <author>

<name> {: $a/full-name :} </name>

</author> </book></pubs>

► XML/XML Example

Page 13: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 13

Piazza mapping language (2)Piazza mapping language (2)

Target:

pubs book* title author* name publisher* name

Source:

authors author* full-name publication* title pub-type

► piazza:id attribute

<pubs> <book piazza:id={$t}> {: $a IN document(“source.xml”)\

/authors/author $t IN $a/publication/title, $typ IN $a/publication/pub-type WHERE $typ = “book” : }

<title piazza:id={$t}> { $t }</title> <author piazza:id={$t}>

<name> {: $a/full-name :} </name>

</author> </book></pubs>

Page 14: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 14

Piazza mapping language (3)Piazza mapping language (3)

Target:

pubs book* title author* name publisher* name

Source:

authors author* full-name publication* title pub-type

► Partial mapping<pubs> <book piazza:id={$t}> {: $a IN document(“source.xml”)\

/authors/author $t IN $a/publication/title, $typ IN $a/publication/pub-type WHERE $typ = “book” : }

PROPERTY $t >=’A’ AND $t < ‘B’ : } [: <publisher> <name>

{: PROPERTY $this IN {“PrintersInc”, “PubsInc”} :}

</name> </publisher> :] </book></pubs>

Page 15: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 15

Query Answering AlgorithmQuery Answering Algorithm

► ProblemProblem Evaluate query Q at PEvaluate query Q at P11 given a network of mappings given a network of mappings

► Reformulate the query over all relevant peersReformulate the query over all relevant peers Chaining of mappings using a combination of query Chaining of mappings using a combination of query

composition and query rewritingcomposition and query rewriting

► QQP1P1(x) :- (x) :- DbResearcher(x)DbResearcher(x) Query CompositionQuery Composition

► M:M: DbResearcher(x)DbResearcher(x) Researcher(x),Area(x,DB) Researcher(x),Area(x,DB) QQP2P2 (x) (x) Researcher(x),Area(x,DB)Researcher(x),Area(x,DB)

Query RewritingQuery Rewriting► M: M: DbResearcher(x), Office(x,DBLab) =DbResearcher(x), Office(x,DBLab) =

DbLabMember(x)DbLabMember(x) QQP3P3 (x) (x) DbLabMember(x)DbLabMember(x)

Page 16: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 16

Query Reformulation (1)Query Reformulation (1)

Mapping:<S2> <people> {: $people=/S1/people :} <faculty> {: $name=$people/faculty/name/text():} { $name} </faculty> <student>{: $student=$people/student/text():} <name> { $student } </name> <advisor> {: $faculty=$people/faculty,

$name=$faculty/name/text(), $advisee=$faculty/advisee/text()

where $advisee=$student :} { $name } <advisor> </student> </people> </S2>

<result> { for $faculty in /S1/people/faculty, $name in $faculty/name/text(), $advisee in $faculty/advisee/text() where $name = “Ullman” return <student> {$advisee} </student> }</result>

Query:

Page 17: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 17

Query Reformulation (2)Query Reformulation (2)

<result> { for $faculty in /S1/people/faculty, $name in $faculty/name/text(), $advisee in $faculty/advisee/text() where $name = “Ullman” return <student> {$advisee} </student> }</result>

Query:

<result>

name advisee$name = “Ullman”

<student> {$advisee}

S1

people

faculty

<S2>

S1<people> people

faculty name<faculty> {$name}

student<student>

<name> {$student}

faculty

name advisee$advisee=$student<advisor> {$name}

Query tree pattern:

Mapping tree pattern:

Page 18: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 18

Query Reformulation (3)Query Reformulation (3)

Query:

<result>

name advisee$name = “Ullman”

<student> {$advisee}

S1

people

faculty

<S2>

S1<people> people

faculty name<faculty> {$name}

student<student>

<name> {$student}

faculty

name advisee$advisee=$student<advisor> {$name}

Query tree pattern:

Mapping tree pattern:

<result> { for $faculty in /S2/people/student, $advisor in $student/advisor/text(), $name in $student/name/text() where $advisor = “Ullman” return <student> { $name } </student>}</result>

Page 19: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 19

Reformulation timesReformulation times

► Table 1: The test queries and their Table 1: The test queries and their respective running times.respective running times.

Query Description Reformulation time # of reformulations

Q1 XML-related projects. 0.5 sec 12

Q2Co-authors who reviewed

each other's work.0.9 sec 25

Q3PC members with a paper

at the same conference.0.2 sec 3

Q4PC chairs of recent

conferences + their projects.

0.5 sec 24

Q5Conflicts-of-interest of PC

members.0.7 sec 36

Page 20: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 20

Current and the FutureCurrent and the Future

► Current statusCurrent status Demo scenario using XML Demo scenario using XML Looking at real domains (Bio dbs, NASA dbs) Looking at real domains (Bio dbs, NASA dbs)

► Future WorkFuture Work More efficient reformulation algorithmMore efficient reformulation algorithm Semantic network analysis – eliminate Semantic network analysis – eliminate

redundant mappings and inconsistent redundant mappings and inconsistent mappingsmappings

Query caching to speed up query evaluationQuery caching to speed up query evaluation

Page 21: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 21

ConclusionsConclusions

► Mapping language for mapping between sets of XML source nodes with different document structures

► Architecture that uses the transitive closure of mappings to answer queries

► Algorithm for query answering over this transitive closure of mappings, which is able to follow mappings in both forward and reverse directions

Page 22: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 22

Thank You!Thank You!

Page 23: 09/12/2003 Peer-to-Peer Information Systems – WS 03/04 1 Piazza: Data Management Infrastructure for Semantic Web Applications Alon Y. Halevy, Zachary G.

09/12/2003 Peer-to-Peer Information Systems – WS 03/04 23

Further literatureFurther literature

1.1. Alon Y. Halevy, Zachary G. Ives, Dan Suciu, Igor Tatarinov: Alon Y. Halevy, Zachary G. Ives, Dan Suciu, Igor Tatarinov: Schema Schema Mediation for Large-Scale Semantic Data SharingMediation for Large-Scale Semantic Data Sharing

2.2. Igor Tatarinov, Zachary Ives, Jayant Madhavan, Alon Halevy, Dan Suciu, Igor Tatarinov, Zachary Ives, Jayant Madhavan, Alon Halevy, Dan Suciu, Nilesh Dalvi, Xin (Luna) Dong, Yana Kadiyska, Gerome Miklau, Peter Mork: Nilesh Dalvi, Xin (Luna) Dong, Yana Kadiyska, Gerome Miklau, Peter Mork: The Piazza Peer Data Management ProjectThe Piazza Peer Data Management Project

3.3. Alon Y. Halevy, Zachary G. Ives, Dan Suciu, Igor Tatarinov: Alon Y. Halevy, Zachary G. Ives, Dan Suciu, Igor Tatarinov: Schema Schema Mediation in Peer Data Management SystemsMediation in Peer Data Management Systems

4.4. Alon Halevy, Oren Etzioni, AnHai Doan, Zachary Ives, Jayant Madhavan, Alon Halevy, Oren Etzioni, AnHai Doan, Zachary Ives, Jayant Madhavan, Luke McDowell, Igor Tatarinov: Luke McDowell, Igor Tatarinov: Crossing the Structure ChasmCrossing the Structure Chasm

5.5. Madhan Arumugam, Amit Sheth, and I. Budak Arpinar: Madhan Arumugam, Amit Sheth, and I. Budak Arpinar: Towards Peer-to-Towards Peer-to-Peer Semantic Web: A Distributed Environment for Sharing Semantic Peer Semantic Web: A Distributed Environment for Sharing Semantic Knowledge on the WebKnowledge on the Web

6.6. Hendler J., Berners-Lee T., Miller E.: Hendler J., Berners-Lee T., Miller E.: Integrating Applications on the Integrating Applications on the Semantic WebSemantic Web