RAINDROP: XML Stream Processing Engine Murali Mani, WPI @UPenn, DB seminar June 08, 2006 Partially...
-
date post
19-Dec-2015 -
Category
Documents
-
view
220 -
download
0
Transcript of RAINDROP: XML Stream Processing Engine Murali Mani, WPI @UPenn, DB seminar June 08, 2006 Partially...
RAINDROP: XML Stream Processing Engine
Murali Mani, WPI
@UPenn, DB seminarJune 08, 2006
Partially Supported by NSF grant IIS 0414567
June 08, 2006 DSRG, WPI 2
Acknowledgements NSF for the financial support Joint work with several others
Prof. Elke A. Rundensteiner Graduate students – Hong Su, Ming Li,
Mingzhu Wei, Shoushen Wang, Jinhui Jian
Undergraduate students – Drew Ditto, Bogomil Tselkov
…
June 08, 2006 DSRG, WPI 3
Applications Need for efficient stream data
processing Monitor patient data in real time Sensor networks – fire detection;
battle field deployment; traffic congestion
Others – news delivery, monitor network traffic, …
June 08, 2006 DSRG, WPI 4
<open_auctions>
<auction>
<privacy>No</privacy>
<description>
Calendar of <emph>French Impressionism</emph>by<emph>Monet </emph>
</description>
<initial> $20 </initial>
</auction> … Token-by-Token access manner
timeline
Pattern retrieval + Filtering + Restructuring
for $a in open_auctions/auction[privacy]let $e := $a/description/emphwhere $e = “French Impressionism”return <InterestedAuction> $a, $e </InterestedAuction>
<auction><privacy> No<open_auctions>
XML Stream Processing
June 08, 2006 DSRG, WPI 5
Option 1: Automata-Based Pattern Retrieval
for $a in open_auctions/auction[privacy]let $e := $a/description/emphwhere $e = “French Impressionism”return <Auction>$a, $e </Auction>
auctions
1
privacy3
5
description4
emph
2auction
0
When patterns are retrieved depends on the data
Additional Data Structures for•Buffering•Filtering•Restructuring•…
June 08, 2006 DSRG, WPI 6
Option 2: “DOM” Based Pattern Retrieval
Navigate $a, /description/emph->$e
Navigate $a, /privacy-> $p
Tagger
Select $e = “French Impressionism”
Logic Plan
Navigate-Index $a, /description/emph
-> $e
Select $e = “French Impressionism”
Tagger
Navigate-Scan $a, /privacy -> $p
Physical Plan
Choose low-level implementation alternatives
for $a in open_auctions/auction[privacy]let $e := $a/description/emphwhere $e = “French Impressionism”return <Auction> $a, $e </Auction>
Navigate $a, /privacy->$p
Rewrite by “pushing down selection”
Navigate $a,/description/emph->$e
Select $e=“French Impressionism”
Tagger
Rewritten Logic Plan
When patterns are retrieved depends on other patterns
June 08, 2006 DSRG, WPI 7
Data Size = 48M
05000
1000015000200002500030000
0% 25% 50% 75% 100%
Pattern Selectivity
Exe
cution
Tim
e (m
s) Minimal Pushdown
Maximal Pushdown [Tukwila]
Which paradigm is better?
Minimal pushdown plans win over
maximal pushdown when selectivity < 50%
June 08, 2006 DSRG, WPI 8
Problem
How to provide the framework to choose between these paradigms?
Model both paradigms uniformly as algebraic operators.
Use a cost model to choose optimal plan given data statistics.
June 08, 2006 DSRG, WPI 9
Automaton as TokenNav
StructuralJoin$a
Extract $a
TokenNav $s, /auctions/auction->$a
for $a in open_auctions/auction[privacy]let $e := $a/description/emphwhere $e = “French Impressionism”return <Auction> $a, $e </Auction>
auctions
1
privacy3
5
description4
emph
2auction
0
Select non-empty($b)
Select$e=“French …”
Extract $b
Extract $e
TokenNav $a, /privacy->$b
TokenNav $a,/desc/emph->$e
June 08, 2006 DSRG, WPI 10
DOM Navigation as NodeNav
Extract $a
TokenNav $s, /auctions/auction->$a
for $a in open_auctions/auction[privacy]let $e := $a/description/emphwhere $e = “French Impressionism”return <Auction> $a, $e </Auction>
auctions
1 2auction
0
Select non-empty($b)
Select$e=“French …”
NodeNav $a, /privacy->$b
NodeNav $a,/desc/emph->$e
June 08, 2006 DSRG, WPI 11
Exploring the Search Space A pattern can be retrieved inside the
automaton or outside the automaton However there are dependencies
for $a in …/a, $b in $a/…, $c in $b/…
NodeNav for $b => NodeNav for $cTokenNav for $b => TokenNav/NodeNav for $c
June 08, 2006 DSRG, WPI 12
Run-time Optimization Statistics unknown before data
arrives Statistics could change over time
We need techniques for efficient statistics monitoring, search space exploration and plan migration (safe points for migration)
June 08, 2006 DSRG, WPI 13
Run-time Optimization
Create an initial plan
Run initial plan and collect statistics at same time
Generate new plan using statistics collected
Pause receiving stream
Migrate to new plan
Resume receiving stream
Stream Query plan executor
statisticsstatistics
Initial Query plan
Query Optimizer
New Query plan
Plan Migrator
New Query plan
June 08, 2006 DSRG, WPI 15
Key Ideas Minimum Memory requirements
Discard data early Output data early
June 08, 2006 DSRG, WPI 16
In-Time Structural Join
for $a in open_auctions/auction[privacy]let $e := $a/description/emphwhere $e = “French Impressionism”return <Auction> $a, $e </Auction>
auctions
1
privacy3
5
description4
emph
2auction
0
StructuralJoin$a
Extract $a
TokenNav $s, /auctions/auction->$a
Select non-empty($b)
Select$e=“French …”
Extract $b
Extract $e
TokenNav $a, /privacy->$b
TokenNav $a,/desc/emph->$e
June 08, 2006 DSRG, WPI 17
Better than In-Time Structural Join
StructuralJoin$r
Extract $a
TokenNav $s, /root->$r
for $r in /rootreturn <root> <a>$r/a</a> <b>$r/b</b> </root>
0
2
3
1root
a
b
Extract $b
TokenNav $r, /a->$a TokenNav
$r, /b->$b
“a” tokens need not be stored
June 08, 2006 DSRG, WPI 18
Evaluating Predicates
StructuralJoin$r
Extract $a
TokenNav $s, /root->$r
for $r in /rootwhere $r/a = “value”return <root> <b>$r/b</b> </root>
0
2
3
1root
a
b
Extract $b
TokenNav $r, /a->$a
TokenNav $r, /b->$b
Once $a=“value” is satisfied,“b” tokens need not be stored
Select $a=“value”
June 08, 2006 DSRG, WPI 19
Using schema knowledge
StructuralJoin$a
Extract $a
TokenNav $s, /root->$r
for $r in /rootreturn <root> <a>$r/a</a> <b>$r/b</b> </root>
0
2
3
1root
a
b
Extract $b
TokenNav $r, /a->$a TokenNav
$r, /b->$b
“a”, “b” tokens need not be stored
root -> (a*, b*)
June 08, 2006 DSRG, WPI 20
Using Schema Knowledge for Predicates
StructuralJoin$r
Extract $a
TokenNav $s, /root->$r
for $r in /rootwhere $r/a = “value”return <root> <b>$r/b</b> </root>
0
2
3
1root
a
b
Extract $b
TokenNav $r, /a->$a
TokenNav $r, /b->$b
Once “c” is seen and $a=“value” is not yet satisfied, “b” tokens can be discarded
Select $a=“value”
root -> (b*, a*, c)
June 08, 2006 DSRG, WPI 21
Conclusions Raindrop integrates automaton
and “DOM” navigation into one algebraic framework.
Cost-based optimization possible. Execution minimizes memory
requirements.
June 08, 2006 DSRG, WPI 22
Ongoing Work Load shedding in XML stream
processing. Utilizing Dynamic schema changes
for optimization.
June 08, 2006 DSRG, WPI 23
Fragment of XQuery supported
FLWR expressions (no conditionals/user defined functions)
Path expressions use only forward axes (child, descendant, descendant or self, attribute)
Predicates supported are of the form: pathExpr relOp constant
June 08, 2006 DSRG, WPI 24
Issues with correlated queries
for $r in /rootreturn <root> for $a in $r/a return <a>$r/b</a> </root>