Dynamic Plan Migration for Continuous Queries over Data Streams Yali Zhu, Elke Rundensteiner and...

40
Dynamic Plan Migration for Continuous Queries over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group, WPI. Massachusetts, USA SIGMOD’2004 *Research partly supported by the RDC grant 2003-04 on ”On-line Stream Monitoring Systems: Untethered Healthcare, Intrusion Detection, and Beyond.”
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Dynamic Plan Migration for Continuous Queries over Data Streams Yali Zhu, Elke Rundensteiner and...

Dynamic Plan Migration for Continuous Queries over

Data StreamsYali Zhu, Elke Rundensteiner and George Heineman

Database System Research Group, WPI.Massachusetts, USA

SIGMOD’2004

*Research partly supported by the RDC grant 2003-04 on ”On-line Stream Monitoring Systems: Untethered Healthcare, Intrusion Detection, and Beyond.”

SIGMOD 2004 2

Stream Query Optimization

Differences with Traditional Query Optimization?

SIGMOD 2004 3

Stream Query Optimization More dynamic fluctuations in statistics

compile time optimization not possible

Global optimization not practical; as huge query networks adaptive optimization.

Need to take CPU processing and main memory into account other cost models

SIGMOD 2004 4

Motivation of ‘Query Migration’

Continuous queries over streamsStatistics unknown before startStatistics changing during execution

Stream rates, arrival pattern, distribution, etc

Need for dynamic adaptationPlan re-optimization

Change the shape of query plan tree

SIGMOD 2004 5

Run-time Plan Re-Optimization

Step 1 - Decide when to optimizeStatistics Monitoring

Step 2 – Generate new query planQuery Optimization

Step 3 – Replace current plan by new planPlan Migration

SIGMOD 2004 6

Naïve Plan Migration Strategy

Migration Steps Pause execution of old plan Drain out all tuples inside old plan Replace old plan by new plan Resume execution of new plan

AB

BC

A B C

AB

BC

A B C

Problem: Works for stateless operators only

SIGMOD 2004 7

Stateful Operator in CQ Why stateful

Need non-blocking operators in CQ Operator needs to output partial results State data structure keeps received tuples

AB

A B

b1b2b3b4b5

ax

State A State B

ax

ax b2ax b3

Observation: The purge of tuples in states relies typicallyon processing of new tuples.

Example: Symmetric NL join w/ window constraints

SIGMOD 2004 8

Naïve Migration Strategy Revisited

Steps(1) Pause execution of old plan(2) Drain out all tuples inside old plan(3) Replace old plan by new plan(4) Resume execution of new plan

AB

BC

A B C(2)

All tuples drained

(4)Processing

Resumed

(3) Old Replaced

By new

Deadlock Waiting Problem:

SIGMOD 2004 9

Concept of Migration Boxes

Two exchangable migration boxes One contains old plan or sub-plan One contains new plan or sub-plan Two plans are semantically equivalent Same input queues and output queues

Migration abstracted as replacing old box by new box.

BC

AB

QA QB QC QD

QABCD

AB

CD

BC

QA QB QC QD

QABCD

SAB SC

SA SB

SB SC

SBC SD

SBCDSACD

SABC SD

SIGMOD 2004 10

Problem Definition Dynamic Plan Migration

Input (two migration boxes) One contains old plan One contains new plan Have same input and output queues

Result Old box is replaced by new box

Valid Migration No missing tuples No duplicates

BC

AB

QA QB QC QD

QABCD

AB

CD

BC

QA QB QC QD

QABCD

SAB SC

SA SB

SB SC

SBC SD

SBCDSACD

SABC SD

Key points:- Involved plans contain stateful operators- Need to migrate yet still retain useful states and discard useless states.

SIGMOD 2004 11

State of the Art

“Efficient mid-query re-optimization of sub-optimal query execution plans” [Kabra, DeWitt 1998] Only migrates unprocessed portion

Query plan competing model [Ioannidis, Ng, et. al. 1992] [Graefe, Cole. 1994] Generate several candidate query plans before start Execute all, choose one after a while

SIGMOD 2004 12

Outline

Problem Motivation and Definition Dynamic Migration Strategies

Moving State StrategyParallel Track Strategy

Experimental Results

SIGMOD 2004 13

Moving State Strategy

Basic idea Share common states between two boxes

Key Steps Identify common states

State matching

Share common states State moving

Recompute unmatched states State recomputing

SIGMOD 2004 14

Moving State Strategy

State Matching state in old box has unique ID During rewriting, new ID given to

newly generated state in new box When rewriting done, match

states based on IDs.

State Moving Between matched states On same machine, creates new

pointers for matched states in new box

What’s left? Unmatched states in new box

CDSABC SD

BCSAB SC

ABSA SB

ABSA SBCD

CDSBC

SD

BCSB SC

QA QB QC QD QA QB QC QD

QABCD QABCD

Old Box New Box

SIGMOD 2004 16

Unmatched States

State Recomputing Recursively recompute unmatched

SBC and SBCD from bottom up

Why always possible? Old and new boxes have same input

queues The states associated with input

queues always match

Why necessary?

ABSA SBCD

CDSBC SD

BCSB SC

QA QB QC QD

QABCD

SIGMOD 2004 17

Terms on Tuples

New/Old tuples Old: tuples already in old box when migration starts New: tuples not exist in old box when migration starts

Sub-tuples Tuple ABCD is result of Tuple A, B, C and D are sub-tuples of tuple ABCD Tuple ABCD has 24=16 possible combinations of old/new sub-tuples

A B C D

CD

BC

AB

QA QB QC QD

SABC

SC

SA SB

SD

SAB

QABCD

SIGMOD 2004 18

Why Recompute Unmatched States

To get the complete results of ABCD, we need all 16 old/new combinations

AB

CD

BC

QB QC QDQA

SA

SD

SB SC

SBCD

SBC

If SBC not recomputed, will miss results with both B and C as OLD:

Old Tuple

New Tuple

B C DA

B C DA

B C DA

SIGMOD 2004 19

Cost Estimation of MS Migration

Cost of MS consists of Cost of state matching

ID comparison (neglectable) Cost of state moving

Create pointers (neglectable) Cost of state recomputing

Majority of cost

Affecting parameters Operator selectivities # of tuples in states

Estimated as (input rate x window size)

See paper for detailed cost models

Cost model conclusion:

Cost of MS has polynomial relationship to window size

SIGMOD 2004 20

Cost Estimation of MS MigrationTMS = Tmatch + Tmove + Trecompute

TMS ≈ Trecompute(SBC) + Trecompute(SBCD) = λBλCW2(Tj + TsσBC) + 2λBλCλDW3(TjσBC + TsσBCσBCD)

Tm Time spent for each string comparisonTc Time spent to create a new cursorTj Time spent to join a pair of tuplesTs Time spent to insert one tuple into a stateλA Average tuple input rate from QAλB Average tuple input rate from QBσAB Reduction factor of join operator ABW Global time window constraint

AB

CD

BC

QB QC QDQA

......

SD

SB SC

SBCD

SBC

...

SIGMOD 2004 21

MS Migration Pros and Cons

ProsFast when # of tuples in states is small

Low input rates, low selectivity or small window

ConsOutput silence during entire migration stage

Can query output even during migration? Motivation for Parallel Track Strategy

SIGMOD 2004 22

Parallel Track Strategy

Basic idea Execute both old and new plans in parallel Gradually “push” old tuples out of old box by purging

Key Steps Connect new box Execute both boxes in parallel Remove old box once “expired”

Contains only new tuples No old tuples or sub-tuples

SIGMOD 2004 23

Parallel Track Strategy

Key steps Connect boxes Execute in

parallel Until old box

“expired” (no old tuple or sub-tuple)

Disconnect old box

Start execute new box only

CD

SABC SD

BC

SAB SC

AB

SA SB

AB

SASBCD

CD

SBC SD

BCSB SC

QA QB QCQD

QA QB QC QD

QABCD QABCD

SIGMOD 2004 24

Potential Duplicates

Tuple ABCD 24=16 possible old/new sub-tuple combination Same case not generated by both boxes

Otherwise we have duplicates

In new box all states start empty only generates ABCD as (new,new,new,new)

In old box may generate all 16 cases duplicate the case of (new,new,new,new)

SIGMOD 2004 25

Duplicate Elimination

CD

BC

AB

QA QB QC QD

SABC

SC

SA SB

SD

SAB

QABCDAt root op in old box:

If both to-be-joined tuples have all-new

sub-tuples, don’t join.

Other op in old box:

Proceed as normal

SIGMOD 2004 26

Estimation of PT Migration Duration

TPT = W if h=0

2W if h>0

1st W

2nd W

TM-start

TM-end

T

New New

OldOld

New New

Old Old

h=2Estimation Formula:

h: height of the query tree

CD

BC

AB

QA QB QC QD

SABC

SC

SA SB

SD

SAB

Old Box

h=0AB

QA QB

SA SB

W

SIGMOD 2004 27

PT Migration Duration Given enough system computing resources

new tuples processed right away PT migration duration ≈ 2W

If not enough system resources New tuples accumulated in queues PT migration duration > 2W

SIGMOD 2004 28

Cost Estimation of PT Migration

Cost of PT

= cost of process 2W tuples in old box+

cost of process 2W tuples in new box

Parameters: Input rates, window size, selectivity

Similar to MS strategy

SIGMOD 2004 29

Cost Estimation of PT Migration Costs of processing 2W’s new tuples in both boxes

For old boxTAB = Cost of Purge + Cost of Insert + Cost of Join

For new box Differentiate first and second W

TBC = Cost for the first W + Cost for the second W

)](22[

)]()([00

cbscbj

W

bc

W

cbjcb

TWTW

tdttdtTTsW

)](22[2 basbaj TWTW

SIGMOD 2004 30

PT Migrations Pros and Cons

ProsKeep on producing results even during

migration no results during MS migration

ConsMigration duration is at least 2W

MS may be faster depending on # tuples in states

SIGMOD 2004 31

Outline

Problem Definition and Motivation Dynamic Migration Strategies

Moving State StrategyParallel Track Strategy

Experimental Results

SIGMOD 2004 32

Experimental Setup Embed in the CAPE system

CAPE = Continuous Adaptive Processing Engine A streaming query engine developed at DSRG, WPI

VLDB’04 demo

Layers of Adaptations Punctuation exploring Adaptive scheduling Query migration Dynamic distribution

CAPE Runtime Engine

Runtime Engine

OperatorConfigurator

QoS Inspector

OperatorScheduler

PlanMigrator

ExecutionEngineStorage

ManagerStream

Receiver

DistributionManager

Query PlanGenerator

Stream / QueryRegistration

GUI

StreamProvider

QueriesResults

CAPE Runtime Engine

Runtime Engine

OperatorConfigurator

QoS Inspector

OperatorScheduler

PlanMigrator

ExecutionEngineStorage

ManagerStream

Receiver

DistributionManager

Query PlanGenerator

Stream / QueryRegistration

GUI

StreamProvider

QueriesResults

SIGMOD 2004 33

Experimental Setup (II) Experiments on migration duration

Vary window size Vary input rates

Experiments on migration effects Changes of output rates

Arrival Streams Generated by stream generator in CAPE Poisson arrival pattern (exponential for inter-arrival time)

Machine WIN 2000 Pentium III processor 500MHz CPU, 384M REM

SIGMOD 2004 34

Experimental Setup (II) Experiments on migration duration

Vary window size Vary input rates

Experiments on migration effects Changes of intermediate results Changes of output rates Data Set

Enough system resources (low config) Not enough system resources (high config)

Machine WIN 2000 Pentium III processor 500MHz CPU, 384M REM

Migration Duration Migration Effects

set1 set2 set3 set4 (L) set5 (H)

W (ms) vary 1000 vary 1000 2000

IA(ms) 100 50 100 100 50

IB(ms) 100 vary 12 100 50

IC(ms) 100 50 12 100 50

ID(ms) 100 50 12 100 50

AB0.1 0.1 0.1 0.1 0.2

BC0.05 0.05 0.1 0.02 0.05

CD0.02 0.02 0.1 0.02 0.05

SIGMOD 2004 35

Migration Duration vs. Window Size

0

2000

4000

6000

8000

10000

12000

14000

0 2000 4000 6000 8000Global Window Size W (ms)

Mig

rati

on

Du

rati

on

(m

s)

Measured T_PT Estimated T_PT

0200400600800

100012001400160018002000

0 2000 4000 6000 8000Global Window Size W (ms)

Mig

rati

on

Du

rati

on

(m

s)

Measured T_MS Poly. (Measured T_MS)

0

2000

4000

6000

8000

10000

12000

14000

0 1000 2000 3000 4000 5000Window Size (ms)

Mig

rati

on

Du

rati

on

T_MS T_PT

SIGMOD 2004 36

Migration Duration vs. Input Rates

0

500

1000

1500

2000

2500

0 10 20 30 40 50Arrival Rate from Input B (tuples/sec)

Mig

rati

on

Du

rati

on

(m

s)

T_MS T_PT

T_MS almost constant T_PT increases with λB

SIGMOD 2004 37

Migration Effects

0

500

1000

1500

2000

2500

3000

0 10000 20000 30000 40000 50000 60000Time (ms)

# o

f in

term

edia

te t

up

les

MS PT New Old

050

100150200250300350400450500

0 10000 20000 30000 40000 50000 60000Time (ms)

Ou

tpu

t R

ate

(tu

ple

s/se

c)

MS PT New OLd

Migration starts at 10000ms Four lines

New – run the new (better) query plan alone Old – run the old (worse) query plan alone MS – start with old plan, migrate to new plan by MS strategy PT – start with old plan, migrate to new plan by PT strategy

SIGMOD 2004 38

Experimental Results – High Config

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

0 20000 40000 60000 80000 100000 120000

Time (ms)

# o

f in

term

edia

te t

up

les

MS PT New Old

0

500

1000

1500

2000

2500

0 50000 100000 150000 200000Time (ms)

Ou

tpu

t R

ate

(tu

ple

s/s

ec)

MS PT New Old

Migration starts at 10000msNew – run the new (better) query plan aloneOld – run the old (worse) query plan aloneMS – start with old plan, migrate to new plan by MS strategyPT – start with old plan, migrate to new plan by PT strategy

SIGMOD 2004 39

Conclusions

Identify problem of migration for stateful operators First solutions for continuous query migration

Moving state strategy Parallel track strategy

Embed both strategies into stream system Cost model and experimental evaluation

Cost model confirmed by experiments Identify performance trade-off of two strategies

SIGMOD 2004 41

Future Work

General migration frameworkAll stateful operator types

Cost analysisEffects on optimization choices

SIGMOD 2004 42

CAPE website @:

http://davis.wpi.edu/~dsrg/CAPE/