1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao...
-
Upload
gyles-brendan-hart -
Category
Documents
-
view
217 -
download
0
Transcript of 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao...
![Page 1: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/1.jpg)
1
Schema Mediation and Query Processing in Peer DataManagement SystemsPresenter: Jie Zhao
Supervisor: Rachel Pottinger
Sept. 29, 2006
![Page 2: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/2.jpg)
2
Preliminaries Datalog
Q(x) :- Airport(x, Vancouver) Mapping for heterogeneous schemas
Correspondences between two schemas A media for exchanging data, transferring queries,
etc PDMS (Peer Data Management System)
Each peer has a database Peer can leave or join the network voluntarily Mappings between some peers are provided
Code City
SEA Seattle
YVR Vancouver
Airport:
head body
![Page 3: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/3.jpg)
3
A general query answering case in PDMS
Local Database UBC
Local Database UW
Local Schema UBC Local Schema UW Local Schema UT
Local Database UT
Mapping UBC_UW
Mapping UW_UT
![Page 4: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/4.jpg)
4
A general query answering case in PDMS
QUW QUT
Query Reformulation
QUBC
Reformulated Results
Local Database UBC
Local Database UW
Local Schema UBC Local Schema UWUser
Local Schema UT
Local Database UT
Mapping UBC_UW
Mapping UW_UT
Query Q over UBCQuery Q’ over UW Query Q” over UT
Reformulated Results
![Page 5: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/5.jpg)
5
Previous methods can only access in the local schema
Assume relation: conf-paper(title, venue, year, pages)
Local Database UW
Local Database UBC
Local Schema UW Local Schema UBC
Mapping UW_UBC
Assume relation: conf-paper(title, venue, year, URL)
User
Query that a UW user can ask:
q(x) :- conf-paper(t, v, y, x).
He can never ask information about URL !!!
![Page 6: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/6.jpg)
6
What we’d like to improve… Want to access more information, e.g. url Get rid of the restrictive query format, e.g. loc
al schema only Improve the comprehensibility of the PDMS Reconsider the difficulties and complexity rais
ed by mapping composition Make good use of indirect mapping informatio
n
We have a method for mediated schema creation in PDMS that solves all of these
![Page 7: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/7.jpg)
7
Challenges
How to create the mediated schema without a centralized authority?
How to result in the same mediated schema wherever mediation starts?
How can an automatically created mediated schema be comprehensible to users?
How can human intervention be minimized? Where to store the mediated schema, and
how to update it?
![Page 8: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/8.jpg)
8
Related Work Bernstein et al.: a vision to incorporate the database
research into the P2P scenario Piazza project: provides a complete prototype for qu
ery answering in PDMS Fagin et al.: use SO logic as mapping language HePToX: XQuery reformulation Hyperion: uses both data-level and schema-level m
appings to specify the correspondences between acquainted peers
PeerDB: use keywords as the basis for relation matching
![Page 9: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/9.jpg)
9
Outline
Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A Study of Mapping composition Experimental Study
![Page 10: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/10.jpg)
10
Introducing concept into conjunctive mappings A conjunctive mapping is in the following form:
conf-paper(title,venue,yr) :-
UW.conf-paper(title,venue,yr,pages)
conf-paper(title,venue,yr) :-
UBC.conf-paper(title,venue,yr,URL) IDB name: “conf-paper” Component: each DataLog query above is a compo
nent Subgoal: each relation in the body,
e.g. “UW.conf-paper(title,venue,yr,pages)”
![Page 11: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/11.jpg)
11
Introducing concept into conjunctive mappings (Cont.) Intuitively, a concept describes the common
object across different schemas Informally, two mappings CM1 and CM2 have the
same concept if: CM1 and CM2 have the same IDB names Q1 and Q2 that are constructed by overlapped
subgoals of CM1 and CM2 are equivalent Subgoals should be compatible
![Page 12: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/12.jpg)
12
Introducing concept into conjunctive mappings (Cont.)
Mappings that express the same concept: Mapping 1, from UW to UBC:
Paper(title,venue):-UW.paper(title,venue,yr,pages)Paper(title,venue):-UBC.paper(title,venue,author,URL)
Mapping 2, from UBC to UT:Paper(title,author):-UBC.paper(title,venue,author,URL)Paper(title,author):-UT.paper(title,author,area)
Mappings that do not express the same concept: Mapping 1, from A to B
Manager(x, y) :- A.Mgr(x, y)Manager(x, y) :- B.Mgr1(x, y)
Mapping 2, from B to CManager(x) :- B.Mgr1(x, x)Manager(x) :- C.SelfMgr(x)
Mapping Compatible Check before merge
![Page 13: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/13.jpg)
13
Outline
Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study
![Page 14: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/14.jpg)
14
Pottinger’s Schema Mediation Algorithm for DIS
Base of our approach
Local Database UW
Local Database UBC
Local Schema UW Local Schema UBCMapping UW_UBC
Mediated Schema MMapping M_UBC Mapping M_UW
![Page 15: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/15.jpg)
15
Peer Schema Mediation – How the system works
X
C
B
Start
Peer X:X, MapX_B, MapX_C
Peer B:B, MapX_B, MapB_C, MapB_D
Peer C:MapX_C, MapB_C
t1:X creates Mt1 based on:X, MapX_B, MapX_C
t4:X gets responses from B, CX computes Mt4 containing X, B, C and MapX_B, MapX_C
t5:X broadcasts Mt4 andcorresponding MappingTable
t2: X sends Mt1 to B
t2: X sends Mt1 to C
t3: C checks and updates its local relation information in Mt1 based on C
t3: B checks and updates its local relation information in Mt1 based on B
C confirms or updates Mt1 to X
B confirms or updates Mt1 to X
Mapping with other peers
Mapping with other peers
![Page 16: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/16.jpg)
16
Schema Mediation Strategy
As explained in previous slide Merging two schemas is based on
MappingTables
![Page 17: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/17.jpg)
17
MappingTable creation
Purpose: Relate a relation in M for concept with subgoals from mappings Transform unstructured mapping information to structured forms Easy to reconstruct original mapping from the MappingTables Indirect mapping information can easily be represented in Map
pingTable; hard to do by using mappings Example:
![Page 18: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/18.jpg)
18
Merge Two MappingTables
The MappingTable merging process follows the general principles: Related attributes should be positioned in the
same column Un-related attributes are in different columns Overlapping local relations in the two
MappingTables are how we determine the indirect mapping information
![Page 19: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/19.jpg)
19
Merge Two MappingTables (Cont.)
M3: result of merging M1 and M2
![Page 20: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/20.jpg)
20
Compute GLAV Mappings for Each Local Peer
![Page 21: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/21.jpg)
21
![Page 22: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/22.jpg)
22
Query Reformulation
Reformulate Queries in both directions Q over E Q’ over M Q’ over M Q over E
![Page 23: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/23.jpg)
23
Information that each peer maintains in the system set-up phase Each peer stores:
E’s local database schema A list of mappings between E and its acquaintanc
es A current version of mediated schema M MappingTable set corresponds to M GLAV mappings from M to E
![Page 24: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/24.jpg)
24
Outline
Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study
![Page 25: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/25.jpg)
25
Adding a Peer to the Network Some peer builds application over M after system setup phase New peer joins, M will change, how to handle those already-built
applications? Keep transforming info to make old applications still usable
(a) Right after the system setup phase
(b) Sometime later, D joins…
![Page 26: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/26.jpg)
26
Dropping a Peer from the Network Strategy One: A peer’s leaving the network triggers a schema
mediation process from the very beginning BAD: too much system work assigned for schema mediation only
Strategy Two: Re-do the schema mediation once every assigned period Two ways to know X is leaving:
1. X notifies any other node before departure2. Other peer PINs or communicates with X
BAD: Previously-created mediated schema will be useless Strategy Three:
X leaves without notifying others X’s acquaintance Y will recognize X’s leaving Y compute the new mediated schema BAD:
Y needs to be able to recognize which relation in the MappingTable comes from X
Peers can easily lose connection with others
![Page 27: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/27.jpg)
27
Dropping a Peer from the Network (Cont.) Strategy Four: X wants to leave:
X calculates a new mediated schema X assigns its acquaintance another acquaintance from its
acquaintance list “Removal” operator: given M and X that is to be removed,
compute the remaining part Removing part: can be relations, attributes in relations• Good because
• All previously constructed applications can still be available
• All peers are still connected• No redundant work will be resulted: won’t start from the
beginning
![Page 28: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/28.jpg)
28
Information that each peer maintains in the system-steady state Each peer stores the following information:
Local schema Mappings to its acquaintances Current mediated schema, MappingTables, and m
appings to its own schema Previous versions of mediated schema that local
peer has applications built on it, and mappings to the new mediated schema
![Page 29: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/29.jpg)
29
Outline
Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study
![Page 30: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/30.jpg)
30
A study of Mapping Composition MePSys only considers input mappings to be:
Mappings with the same Concept Ignoring such complicated factors as self-join and
self-restrictive components Our approach is transferring the problem of
mapping composition into another: using the mediated schema to relate different schemas
![Page 31: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/31.jpg)
31
Some facts
[Madhavan and Halevy] The number of composed mappings does not depend on the number of the input mappings
[Madhavan and Halevy] The composition of finite mappings may result in infinite set of composed mappings
[Fagin et al.] The composed mapping of two mappings in first-order logic might not be expressed by first-order logic
![Page 32: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/32.jpg)
32
Analysis for the Study
We compared Piazza, SO logic algorithm and MePSys Whether Piazza method is expressive or not depends entirely on
whether existential attributes in the second schema are mapped to the third schema
The Second-Order logic Mapping Composition algorithm can handle cases with composed non-identical self-join components However, results are hard to understand
MePSys do not handle patterns with self-restrictive Mappings in such patterns do not support concepts
MePSys has yet to realize the mediation of schemas if mappings contain composed non-identical self-join components
Aside from these two special groups of patterns, using the mediated schema to relate different sources is decidable.
![Page 33: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/33.jpg)
33
Outline
Semantics in Conjunctive Mappings Peer Schema Mediation Updating the mediated schema A study of Mapping composition Experimental Study
![Page 34: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/34.jpg)
34
System Settings FreePastry
A P2P network layer, using efficient routing strategy Each node maintains a routing table Keeps track of its immediate neighbors. Provides the functionality of notifying applications of message arri
val, node failures, etc. Emulab
Network emulation testbed Access to different machines to emulate nodes in real network 900M memory with 2992.787 MHz processor
Input schemas and mappings Input schema follows TCP-H standard Avg num of acquaintances per peer Avg num of relations per peer schema Avg num of attributes in a relation
![Page 35: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/35.jpg)
35
Experiment 1: Schema Mediation in MePSys
![Page 36: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/36.jpg)
36
Experiment 2: Query Reformulation
For queries with similar size (less than 1k), time can be decidable
![Page 37: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/37.jpg)
37
Experiment 2: Query Reformulation (Cont.)
In the maximum case, 10 times query reformulation only takes 2% of the total time
![Page 38: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/38.jpg)
38
Experiment 3: Updating the MediatedSchema
Computing a new mediated schema always takes less than 2% of the total time
Updating almost takes no time
![Page 39: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/39.jpg)
39
Our contributions MePSys, in which a mediated schema is created dyna
mically and any information in the network can be queried without additional global services
Provide an efficient algorithm PSM to create a mediated schema in PDMS and further create mappings to local sources
Introduce the idea of automatically detecting specific Concepts in mappings
Study on how mapping composition impacts query reformulation with existing approaches
Solve the problem of updating the mediated schema Experiment on the efficiency and scalability of MePSys
![Page 40: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/40.jpg)
40
Future Work
Explore the semantic issues when a broader range of mappings are considered, i.e., mappings with self-join, mappings with different IDB names, etc
More optimization issues to be considered in the future system
Design better approach to update the mediated schema for local schema evolution
![Page 41: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/41.jpg)
41
Acknowledgement
![Page 42: 1 Schema Mediation and Query Processing in Peer Data Management Systems Presenter: Jie Zhao Supervisor: Rachel Pottinger Sept. 29, 2006.](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649eb75503460f94bc099f/html5/thumbnails/42.jpg)
42
Thank you!
Questions?