RAINDROP: XML Stream Processing Engine Murali Mani, WPI @UPenn, DB seminar June 08, 2006 Partially...

25
RAINDROP: XML Stream Processing Engine Murali Mani, WPI @UPenn, DB seminar June 08, 2006 Partially Supported by NSF grant IIS 0414567
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    0

Transcript of RAINDROP: XML Stream Processing Engine Murali Mani, WPI @UPenn, DB seminar June 08, 2006 Partially...

RAINDROP: XML Stream Processing Engine

Murali Mani, WPI

@UPenn, DB seminarJune 08, 2006

Partially Supported by NSF grant IIS 0414567

June 08, 2006 DSRG, WPI 2

Acknowledgements NSF for the financial support Joint work with several others

Prof. Elke A. Rundensteiner Graduate students – Hong Su, Ming Li,

Mingzhu Wei, Shoushen Wang, Jinhui Jian

Undergraduate students – Drew Ditto, Bogomil Tselkov

June 08, 2006 DSRG, WPI 3

Applications Need for efficient stream data

processing Monitor patient data in real time Sensor networks – fire detection;

battle field deployment; traffic congestion

Others – news delivery, monitor network traffic, …

June 08, 2006 DSRG, WPI 4

<open_auctions>

<auction>

<privacy>No</privacy>

<description>

Calendar of <emph>French Impressionism</emph>by<emph>Monet </emph>

</description>

<initial> $20 </initial>

</auction> … Token-by-Token access manner

timeline

Pattern retrieval + Filtering + Restructuring

for $a in open_auctions/auction[privacy]let $e := $a/description/emphwhere $e = “French Impressionism”return <InterestedAuction> $a, $e </InterestedAuction>

<auction><privacy> No<open_auctions>

XML Stream Processing

June 08, 2006 DSRG, WPI 5

Option 1: Automata-Based Pattern Retrieval

for $a in open_auctions/auction[privacy]let $e := $a/description/emphwhere $e = “French Impressionism”return <Auction>$a, $e </Auction>

auctions

1

privacy3

5

description4

emph

2auction

0

When patterns are retrieved depends on the data

Additional Data Structures for•Buffering•Filtering•Restructuring•…

June 08, 2006 DSRG, WPI 6

Option 2: “DOM” Based Pattern Retrieval

Navigate $a, /description/emph->$e

Navigate $a, /privacy-> $p

Tagger

Select $e = “French Impressionism”

Logic Plan

Navigate-Index $a, /description/emph

-> $e

Select $e = “French Impressionism”

Tagger

Navigate-Scan $a, /privacy -> $p

Physical Plan

Choose low-level implementation alternatives

for $a in open_auctions/auction[privacy]let $e := $a/description/emphwhere $e = “French Impressionism”return <Auction> $a, $e </Auction>

Navigate $a, /privacy->$p

Rewrite by “pushing down selection”

Navigate $a,/description/emph->$e

Select $e=“French Impressionism”

Tagger

Rewritten Logic Plan

When patterns are retrieved depends on other patterns

June 08, 2006 DSRG, WPI 7

Data Size = 48M

05000

1000015000200002500030000

0% 25% 50% 75% 100%

Pattern Selectivity

Exe

cution

Tim

e (m

s) Minimal Pushdown

Maximal Pushdown [Tukwila]

Which paradigm is better?

Minimal pushdown plans win over

maximal pushdown when selectivity < 50%

June 08, 2006 DSRG, WPI 8

Problem

How to provide the framework to choose between these paradigms?

Model both paradigms uniformly as algebraic operators.

Use a cost model to choose optimal plan given data statistics.

June 08, 2006 DSRG, WPI 9

Automaton as TokenNav

StructuralJoin$a

Extract $a

TokenNav $s, /auctions/auction->$a

for $a in open_auctions/auction[privacy]let $e := $a/description/emphwhere $e = “French Impressionism”return <Auction> $a, $e </Auction>

auctions

1

privacy3

5

description4

emph

2auction

0

Select non-empty($b)

Select$e=“French …”

Extract $b

Extract $e

TokenNav $a, /privacy->$b

TokenNav $a,/desc/emph->$e

June 08, 2006 DSRG, WPI 10

DOM Navigation as NodeNav

Extract $a

TokenNav $s, /auctions/auction->$a

for $a in open_auctions/auction[privacy]let $e := $a/description/emphwhere $e = “French Impressionism”return <Auction> $a, $e </Auction>

auctions

1 2auction

0

Select non-empty($b)

Select$e=“French …”

NodeNav $a, /privacy->$b

NodeNav $a,/desc/emph->$e

June 08, 2006 DSRG, WPI 11

Exploring the Search Space A pattern can be retrieved inside the

automaton or outside the automaton However there are dependencies

for $a in …/a, $b in $a/…, $c in $b/…

NodeNav for $b => NodeNav for $cTokenNav for $b => TokenNav/NodeNav for $c

June 08, 2006 DSRG, WPI 12

Run-time Optimization Statistics unknown before data

arrives Statistics could change over time

We need techniques for efficient statistics monitoring, search space exploration and plan migration (safe points for migration)

June 08, 2006 DSRG, WPI 13

Run-time Optimization

Create an initial plan

Run initial plan and collect statistics at same time

Generate new plan using statistics collected

Pause receiving stream

Migrate to new plan

Resume receiving stream

Stream Query plan executor

statisticsstatistics

Initial Query plan

Query Optimizer

New Query plan

Plan Migrator

New Query plan

June 08, 2006 DSRG, WPI 14

Executing a Raindrop Plan

June 08, 2006 DSRG, WPI 15

Key Ideas Minimum Memory requirements

Discard data early Output data early

June 08, 2006 DSRG, WPI 16

In-Time Structural Join

for $a in open_auctions/auction[privacy]let $e := $a/description/emphwhere $e = “French Impressionism”return <Auction> $a, $e </Auction>

auctions

1

privacy3

5

description4

emph

2auction

0

StructuralJoin$a

Extract $a

TokenNav $s, /auctions/auction->$a

Select non-empty($b)

Select$e=“French …”

Extract $b

Extract $e

TokenNav $a, /privacy->$b

TokenNav $a,/desc/emph->$e

June 08, 2006 DSRG, WPI 17

Better than In-Time Structural Join

StructuralJoin$r

Extract $a

TokenNav $s, /root->$r

for $r in /rootreturn <root> <a>$r/a</a> <b>$r/b</b> </root>

0

2

3

1root

a

b

Extract $b

TokenNav $r, /a->$a TokenNav

$r, /b->$b

“a” tokens need not be stored

June 08, 2006 DSRG, WPI 18

Evaluating Predicates

StructuralJoin$r

Extract $a

TokenNav $s, /root->$r

for $r in /rootwhere $r/a = “value”return <root> <b>$r/b</b> </root>

0

2

3

1root

a

b

Extract $b

TokenNav $r, /a->$a

TokenNav $r, /b->$b

Once $a=“value” is satisfied,“b” tokens need not be stored

Select $a=“value”

June 08, 2006 DSRG, WPI 19

Using schema knowledge

StructuralJoin$a

Extract $a

TokenNav $s, /root->$r

for $r in /rootreturn <root> <a>$r/a</a> <b>$r/b</b> </root>

0

2

3

1root

a

b

Extract $b

TokenNav $r, /a->$a TokenNav

$r, /b->$b

“a”, “b” tokens need not be stored

root -> (a*, b*)

June 08, 2006 DSRG, WPI 20

Using Schema Knowledge for Predicates

StructuralJoin$r

Extract $a

TokenNav $s, /root->$r

for $r in /rootwhere $r/a = “value”return <root> <b>$r/b</b> </root>

0

2

3

1root

a

b

Extract $b

TokenNav $r, /a->$a

TokenNav $r, /b->$b

Once “c” is seen and $a=“value” is not yet satisfied, “b” tokens can be discarded

Select $a=“value”

root -> (b*, a*, c)

June 08, 2006 DSRG, WPI 21

Conclusions Raindrop integrates automaton

and “DOM” navigation into one algebraic framework.

Cost-based optimization possible. Execution minimizes memory

requirements.

June 08, 2006 DSRG, WPI 22

Ongoing Work Load shedding in XML stream

processing. Utilizing Dynamic schema changes

for optimization.

June 08, 2006 DSRG, WPI 23

Fragment of XQuery supported

FLWR expressions (no conditionals/user defined functions)

Path expressions use only forward axes (child, descendant, descendant or self, attribute)

Predicates supported are of the form: pathExpr relOp constant

June 08, 2006 DSRG, WPI 24

Issues with correlated queries

for $r in /rootreturn <root> for $a in $r/a return <a>$r/b</a> </root>

June 08, 2006 DSRG, WPI 25

Visit our XQuery engine over XML stream

project (RAINDROP) website

http://davis.wpi.edu/dsrg/raindrop/