CS 245Notes 31 CS 245: Database System Principles Notes 03: Disk Organization Hector Garcia-Molina.
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.
-
date post
21-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.
![Page 1: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/1.jpg)
CSE 636Data Integration
Limited Source Capabilities
Slides by Hector Garcia-Molina
![Page 2: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/2.jpg)
2
Heterogeneous Databases
data
DBMS1
data
DBMS2
data
legacy
data
web site
Distributed Database System
![Page 3: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/3.jpg)
3
Limited Capabilities
![Page 4: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/4.jpg)
4
author:
title:
subject:
format:
price:
must specify at leastone of these
this attributenot returned
cannot query onthis attribute
menu ofchoices
Example: Amazon.com
![Page 5: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/5.jpg)
5
Example: BarnesAndNoble.com
must specify at leastone of these
can query if one ofother attributes
specified
Menu of choices
author:
title:
subject:
format:
price:
![Page 6: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/6.jpg)
6
Why Limited Capabilities?
• Search forms• Security• Indexes• Legacy
![Page 7: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/7.jpg)
7
Capability vs. Content
• Capability description– Can only search for subject = “art,” “history,”
“science”
• Content description– Source only contains subject = “art,” “history,”
“science”
![Page 8: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/8.jpg)
8
• Describing source capabilities• Extending source capabilities• How mediators cope with limited capabilities• Mediator capabilities• Other topics
Outline
Mediator
SourceSource
Wrapper Wrapper
![Page 9: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/9.jpg)
9
Describing Query Capabilities
R(X, Y, ... Z)
Adornments:• f: may or may not specify• u: cannot be specified• b: must be specified• c[S]: specified from list S• o[S]: optional, chose from S
![Page 10: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/10.jpg)
10
Describing Query Capabilities
R(X, Y, ... Z)
Adornments:• f: may or may not specify• u: cannot be specified• b: must be specified• c[S]: specified from list S• o[S]: optional, chose from S
With output restriction• f’• u’• b’• c’[S]• o’[S]
![Page 11: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/11.jpg)
11
Example
• Relation R(X, Y, Z)• Description Templates:
bu’f, uf’c[z1, z2]
• Answerable queries:R(x1, Y, Z), R(X, Y, z1)
• Unanswerable queries:R(X, y1, Z), R(X, Y, z3)
![Page 12: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/12.jpg)
12
Other Description Mechanisms
• Tsimmis– Query templates
• Information Manifold– capability records (# bound attrs, conditions ok,...)
• Disco• Garlic
– black box
• Context-free grammars
![Page 13: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/13.jpg)
13
Extending Source Capabilities
amazon
Wrapper
Query: author=“Freud” AND price > 10
Source: R(author, price, ...)Template: b, u, ...
![Page 14: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/14.jpg)
14
Extending Source Capabilities
Source: R(author, price, ...)Template: b, u, ...
Query: author=“Freud” AND price > 10
Source Query: author=“Freud”
Wrapper Filter: price > 10
amazon
Wrapper
![Page 15: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/15.jpg)
15
Another Example
Barnes&Noble
Wrapper
Query: (author = “Freud” OR author = “Jung”) AND price < 10
R(author, price, …)No disjunctive conditions;Price can only be specified with author
![Page 16: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/16.jpg)
16
Another Example
Query: (author = “Freud” OR author = “Jung”) AND price < 10
R(author, price, …)No disjunctive conditions;Price can only be specified with author
Q1: author = “Freud” AND price < 10Q2: author = “Jung” AND price < 10
Union Operation
Barnes&Noble
Wrapper
![Page 17: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/17.jpg)
17
Extending Source Capabilities
• General scheme:– try many query rewritings– check if query fragments supported by source– check if wrapper can combine answer fragments– do all this very efficiently!!
– H. Garcia-Molina, W. Labio, R. Yerneni: Capability-Sensitive Query Processing on Internet Sources,ICDE 1999
• Tsimmis, Info Manifold: no disjunctive queries• DISCO: no query splitting• Garlic: only CNF queries
![Page 18: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/18.jpg)
18
Mediator Processing
R(X, Y, Z) f, f, b
T(Z, W, U) f, u, b
M(X, Y, Z, W, U) = Join(R, T)
Query: M(5, Y, Z, W, 3)
Mediator
SourceSource
Wrapper Wrapper
![Page 19: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/19.jpg)
19
Plan 1
R(X, Y, Z) f, f, b
T(Z, W, U) f, u, b
M(X, Y, Z, W, U) = Join(R, T)
Query: M(5, Y, Z, W, 3)
Mediator
SourceSource
Wrapper Wrapper
(1) R(5, Y, Z)(2) T(Z, W, 3)
(3) Join answers
![Page 20: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/20.jpg)
20
Plan 2
R(X, Y, Z) f, f, b
T(Z, W, U) f, u, b
M(X, Y, Z, W, U) = Join(R, T)
Query: M(5, Y, Z, W, 3)
Mediator
SourceSource
Wrapper Wrapper
(3) Join answers
(1) P = T(Z, W, 3)
(2) for each (z,w,u) P: R(5, Y, u)
![Page 21: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/21.jpg)
21
Mediator Plan Generation
• Need feasible and efficient plan• Search space is huge• Tsimmis, Info Manifold, Garlic:
– exponential algorithms
• Polynomial algorithms:– often find optimal or near-optimal plan– bounded performance
– R. Yerneni, C. Li, J. D. Ullman, H. Garcia-Molina: Optimizing Large Join Queries in Mediation Systems, ICDT 1999
![Page 22: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/22.jpg)
22
Conclusion
• Not all sources are created equal!• Need to
– describe what sources can do– efficiently process queries with limited sources– describe what mediators can do– exploit content information– deal with unavailable sources
![Page 23: CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.](https://reader030.fdocuments.in/reader030/viewer/2022032704/56649d6a5503460f94a495cc/html5/thumbnails/23.jpg)
23
References
• Computing Capabilities of Mediators– Ramana Yerneni, Chen Li, Hector Garcia-Molina, Jeffrey
D. Ullman– SIGMOD Conference 1999
• Describing and Using Query Capabilities of Heterogeneous Sources– Vasilis Vassalos, Yannis Papakonstantinou– VLDB 1997