On-the-fly Data Integration
-
Upload
fordlovers -
Category
Documents
-
view
269 -
download
13
Transcript of On-the-fly Data Integration
![Page 1: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/1.jpg)
09.05.2008
Mapping Data to Queries
Martin Hentschel
Systems Group, ETH Zurich
![Page 2: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/2.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
“…, but the real advantage of XML is precisely
that it allows you to go from Point A to
destinations unknown.”
-- Larry O’Brien,
Microsoft
2
![Page 3: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/3.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected] 3
Goals
Integrate data from various data feeds Light-weight
Easy to use
Fast
![Page 4: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/4.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected] 4
Goals
Integrate data from various data feeds Light-weight
Mapping rules Easy to use
Based on common language (XQuery)
FastImplements research ideas (YFilter)
![Page 5: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/5.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Targets
Health care Electronic health records (Health Level 7)
Finance Exchange of financial data (xBRL)
Web services News feeds Weather
Every domain which uses several data sources
5
![Page 6: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/6.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Example
Find the most powerful car
6
<db> <car> <name>Ford</name> <hp>130</hp> </car></db>
<db> <car> <name>Ford</name> <hp>130</hp> </car></db>
<daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>
<daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>
![Page 7: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/7.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Example
Find the most powerful car
7
<db> <car> <name>Ford</name> <hp>130</hp> </car></db>
<db> <car> <name>Ford</name> <hp>130</hp> </car></db>
<daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>
<daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>
daten is-a db;auto is-a car;ps is-a hp;
daten is-a db;auto is-a car;ps is-a hp;
![Page 8: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/8.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Example
Find the most powerful car
Apply standard XQuery
8
<db> <car> <name>Ford</name> <hp>130</hp> </car></db>
<db> <car> <name>Ford</name> <hp>130</hp> </car></db><daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>
<daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>
daten is-a db;auto is-a car;ps is-a hp;
daten is-a db;auto is-a car;ps is-a hp;
let $max := max(//hp)for $car in //carwhere $car/hp = $maxreturn $car
let $max := max(//hp)for $car in //carwhere $car/hp = $maxreturn $car
![Page 9: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/9.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Example
Find the most powerful car
Apply standard XQuery
9
<db> <car> <name>Ford</name> <hp>130</hp> </car></db>
<db> <car> <name>Ford</name> <hp>130</hp> </car></db><daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>
<daten> <auto> <name>VW Golf</name> <ps>150</ps> </auto></daten>
daten is-a db;auto is-a car;ps is-a hp;
daten is-a db;auto is-a car;ps is-a hp;
let $max := max(//hp)for $car in //carwhere $car/hp = $maxreturn $car
let $max := max(//hp)for $car in //carwhere $car/hp = $maxreturn $car
<auto> <name>VW Golf</name> <ps>150</ps></auto>
<auto> <name>VW Golf</name> <ps>150</ps></auto>
Result
![Page 10: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/10.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Usage Scenarios
Continuous query processing
10
DSMSDSMS
Queries
Queries
RulesRulesStreamingInputEvents
StreamingOutputEvents
![Page 11: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/11.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Usage Scenarios
Publish/subscribe systems
11
RulesRules
Publishers Subscribers
EnhancedBroker
EnhancedBroker
Data
SubscriptionsData
Data
![Page 12: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/12.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Usage Scenarios
Data integration
12
RulesRules
Source 1
Company‘sData Store
Data
Data
DataSource 2
Source x
Homogeneous
DataData
HandlerData
Handler
![Page 13: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/13.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
The Is-A Rule
Map XML elements
Expresses a substitutability relationship Like in object oriented design Use the car wherever vehicles are expected
It follows //vehicle also returns car elements Returned as car Not transformed into vehicle Consistent with OO-approach
13
car is-a vehicle; car is-a vehicle;
![Page 14: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/14.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
The Is-A Rule
Map path expressions XPath path expressions Left hand side may include predicates
14
german/car is-a auto;auto is-a german/car;
german/car is-a auto;auto is-a german/car;
car[@ps < 100] is-aslow/
vehicle;
car[@ps < 100] is-aslow/
vehicle;
![Page 15: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/15.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
The Is-A Rule
Specify contexts Element names could be used differently in
different contexts
Scope applicability of rules Further refinement
15
car in cars[@country=‘Germany’]
is-a auto;
car in cars[@country=‘Germany’]
is-a auto;
![Page 16: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/16.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
The Is-A Rule
Element construction Map elements Transform data, e.g. for
Integration of very diverse data
16
auto as $a is-a<car>
<kw>{$a/ps * 0.74}</kw>
</car>;
auto as $a is-a<car>
<kw>{$a/ps * 0.74}</kw>
</car>;
<car> <name>Ford</name> <kw>100</kw></car>
<car> <name>Ford</name> <kw>100</kw></car>
<auto> <name>VW Golf</name> <ps>150</ps></auto>
<auto> <name>VW Golf</name> <ps>150</ps></auto>
![Page 17: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/17.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Implementation
Several possibilities MDQ approach
- Native approach, novel MDQ data model- Allows lazy execution
Query rewrite- E.g. //(car | auto | vehicle | ...)- Does not scale
Data translation- Translate input data- Big overhead
17
![Page 18: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/18.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
MDQ Data Model
Classical XML tree model
18
<daten> <auto> <name>Golf</name> <ps>150</ps> </auto></daten>
<daten> <auto> <name>Golf</name> <ps>150</ps> </auto></daten>
auto
psname
„Golf“ „150“
daten
![Page 19: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/19.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
MDQ Data Model
MDQ data model
Move names from
nodes to edges
19
<daten> <auto> <name>Golf</name> <ps>150</ps> </auto></daten>
<daten> <auto> <name>Golf</name> <ps>150</ps> </auto></daten>
auto
psname
„Golf“ „150“
daten
![Page 20: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/20.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
MDQ Data Model
Application of mapping rules
20
<daten> <auto> <name>Golf</name> <ps>150</ps> </auto></daten>
<daten> <auto> <name>Golf</name> <ps>150</ps> </auto></daten>
auto
psname
„Golf“ „150“
daten
daten is-a db;auto is-a car;ps is-a hp;
daten is-a db;auto is-a car;ps is-a hp;
db
car
hp
![Page 21: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/21.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Lazy Evaluation, YFilter
Built from left hand side of rules
Non-deterministic finite state machine
Main idea: Evaluate XQuery program Iterate through data model Report to YFilter Apply rules only when reaching an accepting
state
21
R1: daten is-a db;R2: auto is-a car;R2: ps is-a hp;
R1: daten is-a db;R2: auto is-a car;R2: ps is-a hp;
* daten
auto
ps
R1
R2
R3
![Page 22: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/22.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Experiment: Throughput
Complex query (multiple scans, joins)
QR: too many unions, DT: overhead of translation
22
![Page 23: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/23.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Experiment: Throughput
Simple query
Less unions for QR, DT: still overhead of translation
23
![Page 24: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/24.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Experiment: Throughput
1 input message, bundle of queries evaluated at once
QR: even more unions, DT: less overhead, only transforms input message once
24
![Page 25: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/25.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
Again: Advantages
Performance Novel data model, lazy execution
Light-weight Mappings rules are small units
Extensibility Add more rules as new sources are adopted
Flexibility Complex mappings through element
constructors25
![Page 26: On-the-fly Data Integration](https://reader035.fdocuments.in/reader035/viewer/2022062419/557cd2c7d8b42a4b6b8b45d3/html5/thumbnails/26.jpg)
09.05.2008 Martin Hentschel/Systems Group, ETH Zurich/[email protected]
The End
Visit our website, LIVE DEMO! http://fifthelement.inf.ethz.ch:8080/rules
Write us, please! [email protected]
26