Dynamic Plan Migration for Continuous Queries over
Data StreamsYali Zhu, Elke Rundensteiner and George Heineman
Database System Research Group, WPI.Massachusetts, USA
SIGMOD’2004
*Research partly supported by the RDC grant 2003-04 on ”On-line Stream Monitoring Systems: Untethered Healthcare, Intrusion Detection, and Beyond.”
SIGMOD 2004 3
Stream Query Optimization More dynamic fluctuations in statistics
compile time optimization not possible
Global optimization not practical; as huge query networks adaptive optimization.
Need to take CPU processing and main memory into account other cost models
SIGMOD 2004 4
Motivation of ‘Query Migration’
Continuous queries over streamsStatistics unknown before startStatistics changing during execution
Stream rates, arrival pattern, distribution, etc
Need for dynamic adaptationPlan re-optimization
Change the shape of query plan tree
SIGMOD 2004 5
Run-time Plan Re-Optimization
Step 1 - Decide when to optimizeStatistics Monitoring
Step 2 – Generate new query planQuery Optimization
Step 3 – Replace current plan by new planPlan Migration
SIGMOD 2004 6
Naïve Plan Migration Strategy
Migration Steps Pause execution of old plan Drain out all tuples inside old plan Replace old plan by new plan Resume execution of new plan
AB
BC
A B C
AB
BC
A B C
Problem: Works for stateless operators only
SIGMOD 2004 7
Stateful Operator in CQ Why stateful
Need non-blocking operators in CQ Operator needs to output partial results State data structure keeps received tuples
AB
A B
b1b2b3b4b5
ax
State A State B
ax
ax b2ax b3
Observation: The purge of tuples in states relies typicallyon processing of new tuples.
Example: Symmetric NL join w/ window constraints
SIGMOD 2004 8
Naïve Migration Strategy Revisited
Steps(1) Pause execution of old plan(2) Drain out all tuples inside old plan(3) Replace old plan by new plan(4) Resume execution of new plan
AB
BC
A B C(2)
All tuples drained
(4)Processing
Resumed
(3) Old Replaced
By new
Deadlock Waiting Problem:
SIGMOD 2004 9
Concept of Migration Boxes
Two exchangable migration boxes One contains old plan or sub-plan One contains new plan or sub-plan Two plans are semantically equivalent Same input queues and output queues
Migration abstracted as replacing old box by new box.
BC
AB
QA QB QC QD
QABCD
AB
CD
BC
QA QB QC QD
QABCD
SAB SC
SA SB
SB SC
SBC SD
SBCDSACD
SABC SD
SIGMOD 2004 10
Problem Definition Dynamic Plan Migration
Input (two migration boxes) One contains old plan One contains new plan Have same input and output queues
Result Old box is replaced by new box
Valid Migration No missing tuples No duplicates
BC
AB
QA QB QC QD
QABCD
AB
CD
BC
QA QB QC QD
QABCD
SAB SC
SA SB
SB SC
SBC SD
SBCDSACD
SABC SD
Key points:- Involved plans contain stateful operators- Need to migrate yet still retain useful states and discard useless states.
SIGMOD 2004 11
State of the Art
“Efficient mid-query re-optimization of sub-optimal query execution plans” [Kabra, DeWitt 1998] Only migrates unprocessed portion
Query plan competing model [Ioannidis, Ng, et. al. 1992] [Graefe, Cole. 1994] Generate several candidate query plans before start Execute all, choose one after a while
SIGMOD 2004 12
Outline
Problem Motivation and Definition Dynamic Migration Strategies
Moving State StrategyParallel Track Strategy
Experimental Results
SIGMOD 2004 13
Moving State Strategy
Basic idea Share common states between two boxes
Key Steps Identify common states
State matching
Share common states State moving
Recompute unmatched states State recomputing
SIGMOD 2004 14
Moving State Strategy
State Matching state in old box has unique ID During rewriting, new ID given to
newly generated state in new box When rewriting done, match
states based on IDs.
State Moving Between matched states On same machine, creates new
pointers for matched states in new box
What’s left? Unmatched states in new box
CDSABC SD
BCSAB SC
ABSA SB
ABSA SBCD
CDSBC
SD
BCSB SC
QA QB QC QD QA QB QC QD
QABCD QABCD
Old Box New Box
SIGMOD 2004 16
Unmatched States
State Recomputing Recursively recompute unmatched
SBC and SBCD from bottom up
Why always possible? Old and new boxes have same input
queues The states associated with input
queues always match
Why necessary?
ABSA SBCD
CDSBC SD
BCSB SC
QA QB QC QD
QABCD
SIGMOD 2004 17
Terms on Tuples
New/Old tuples Old: tuples already in old box when migration starts New: tuples not exist in old box when migration starts
Sub-tuples Tuple ABCD is result of Tuple A, B, C and D are sub-tuples of tuple ABCD Tuple ABCD has 24=16 possible combinations of old/new sub-tuples
A B C D
CD
BC
AB
QA QB QC QD
SABC
SC
SA SB
SD
SAB
QABCD
SIGMOD 2004 18
Why Recompute Unmatched States
To get the complete results of ABCD, we need all 16 old/new combinations
AB
CD
BC
QB QC QDQA
SA
SD
SB SC
SBCD
SBC
If SBC not recomputed, will miss results with both B and C as OLD:
Old Tuple
New Tuple
B C DA
B C DA
B C DA
SIGMOD 2004 19
Cost Estimation of MS Migration
Cost of MS consists of Cost of state matching
ID comparison (neglectable) Cost of state moving
Create pointers (neglectable) Cost of state recomputing
Majority of cost
Affecting parameters Operator selectivities # of tuples in states
Estimated as (input rate x window size)
See paper for detailed cost models
Cost model conclusion:
Cost of MS has polynomial relationship to window size
SIGMOD 2004 20
Cost Estimation of MS MigrationTMS = Tmatch + Tmove + Trecompute
TMS ≈ Trecompute(SBC) + Trecompute(SBCD) = λBλCW2(Tj + TsσBC) + 2λBλCλDW3(TjσBC + TsσBCσBCD)
Tm Time spent for each string comparisonTc Time spent to create a new cursorTj Time spent to join a pair of tuplesTs Time spent to insert one tuple into a stateλA Average tuple input rate from QAλB Average tuple input rate from QBσAB Reduction factor of join operator ABW Global time window constraint
AB
CD
BC
QB QC QDQA
......
SD
SB SC
SBCD
SBC
...
SIGMOD 2004 21
MS Migration Pros and Cons
ProsFast when # of tuples in states is small
Low input rates, low selectivity or small window
ConsOutput silence during entire migration stage
Can query output even during migration? Motivation for Parallel Track Strategy
SIGMOD 2004 22
Parallel Track Strategy
Basic idea Execute both old and new plans in parallel Gradually “push” old tuples out of old box by purging
Key Steps Connect new box Execute both boxes in parallel Remove old box once “expired”
Contains only new tuples No old tuples or sub-tuples
SIGMOD 2004 23
Parallel Track Strategy
Key steps Connect boxes Execute in
parallel Until old box
“expired” (no old tuple or sub-tuple)
Disconnect old box
Start execute new box only
CD
SABC SD
BC
SAB SC
AB
SA SB
AB
SASBCD
CD
SBC SD
BCSB SC
QA QB QCQD
QA QB QC QD
QABCD QABCD
SIGMOD 2004 24
Potential Duplicates
Tuple ABCD 24=16 possible old/new sub-tuple combination Same case not generated by both boxes
Otherwise we have duplicates
In new box all states start empty only generates ABCD as (new,new,new,new)
In old box may generate all 16 cases duplicate the case of (new,new,new,new)
SIGMOD 2004 25
Duplicate Elimination
CD
BC
AB
QA QB QC QD
SABC
SC
SA SB
SD
SAB
QABCDAt root op in old box:
If both to-be-joined tuples have all-new
sub-tuples, don’t join.
Other op in old box:
Proceed as normal
SIGMOD 2004 26
Estimation of PT Migration Duration
TPT = W if h=0
2W if h>0
1st W
2nd W
TM-start
TM-end
T
New New
OldOld
New New
Old Old
h=2Estimation Formula:
h: height of the query tree
CD
BC
AB
QA QB QC QD
SABC
SC
SA SB
SD
SAB
Old Box
h=0AB
QA QB
SA SB
W
SIGMOD 2004 27
PT Migration Duration Given enough system computing resources
new tuples processed right away PT migration duration ≈ 2W
If not enough system resources New tuples accumulated in queues PT migration duration > 2W
SIGMOD 2004 28
Cost Estimation of PT Migration
Cost of PT
= cost of process 2W tuples in old box+
cost of process 2W tuples in new box
Parameters: Input rates, window size, selectivity
Similar to MS strategy
SIGMOD 2004 29
Cost Estimation of PT Migration Costs of processing 2W’s new tuples in both boxes
For old boxTAB = Cost of Purge + Cost of Insert + Cost of Join
For new box Differentiate first and second W
TBC = Cost for the first W + Cost for the second W
)](22[
)]()([00
cbscbj
W
bc
W
cbjcb
TWTW
tdttdtTTsW
)](22[2 basbaj TWTW
SIGMOD 2004 30
PT Migrations Pros and Cons
ProsKeep on producing results even during
migration no results during MS migration
ConsMigration duration is at least 2W
MS may be faster depending on # tuples in states
SIGMOD 2004 31
Outline
Problem Definition and Motivation Dynamic Migration Strategies
Moving State StrategyParallel Track Strategy
Experimental Results
SIGMOD 2004 32
Experimental Setup Embed in the CAPE system
CAPE = Continuous Adaptive Processing Engine A streaming query engine developed at DSRG, WPI
VLDB’04 demo
Layers of Adaptations Punctuation exploring Adaptive scheduling Query migration Dynamic distribution
CAPE Runtime Engine
Runtime Engine
OperatorConfigurator
QoS Inspector
OperatorScheduler
PlanMigrator
ExecutionEngineStorage
ManagerStream
Receiver
DistributionManager
Query PlanGenerator
Stream / QueryRegistration
GUI
StreamProvider
QueriesResults
CAPE Runtime Engine
Runtime Engine
OperatorConfigurator
QoS Inspector
OperatorScheduler
PlanMigrator
ExecutionEngineStorage
ManagerStream
Receiver
DistributionManager
Query PlanGenerator
Stream / QueryRegistration
GUI
StreamProvider
QueriesResults
SIGMOD 2004 33
Experimental Setup (II) Experiments on migration duration
Vary window size Vary input rates
Experiments on migration effects Changes of output rates
Arrival Streams Generated by stream generator in CAPE Poisson arrival pattern (exponential for inter-arrival time)
Machine WIN 2000 Pentium III processor 500MHz CPU, 384M REM
SIGMOD 2004 34
Experimental Setup (II) Experiments on migration duration
Vary window size Vary input rates
Experiments on migration effects Changes of intermediate results Changes of output rates Data Set
Enough system resources (low config) Not enough system resources (high config)
Machine WIN 2000 Pentium III processor 500MHz CPU, 384M REM
Migration Duration Migration Effects
set1 set2 set3 set4 (L) set5 (H)
W (ms) vary 1000 vary 1000 2000
IA(ms) 100 50 100 100 50
IB(ms) 100 vary 12 100 50
IC(ms) 100 50 12 100 50
ID(ms) 100 50 12 100 50
AB0.1 0.1 0.1 0.1 0.2
BC0.05 0.05 0.1 0.02 0.05
CD0.02 0.02 0.1 0.02 0.05
SIGMOD 2004 35
Migration Duration vs. Window Size
0
2000
4000
6000
8000
10000
12000
14000
0 2000 4000 6000 8000Global Window Size W (ms)
Mig
rati
on
Du
rati
on
(m
s)
Measured T_PT Estimated T_PT
0200400600800
100012001400160018002000
0 2000 4000 6000 8000Global Window Size W (ms)
Mig
rati
on
Du
rati
on
(m
s)
Measured T_MS Poly. (Measured T_MS)
0
2000
4000
6000
8000
10000
12000
14000
0 1000 2000 3000 4000 5000Window Size (ms)
Mig
rati
on
Du
rati
on
T_MS T_PT
SIGMOD 2004 36
Migration Duration vs. Input Rates
0
500
1000
1500
2000
2500
0 10 20 30 40 50Arrival Rate from Input B (tuples/sec)
Mig
rati
on
Du
rati
on
(m
s)
T_MS T_PT
T_MS almost constant T_PT increases with λB
SIGMOD 2004 37
Migration Effects
0
500
1000
1500
2000
2500
3000
0 10000 20000 30000 40000 50000 60000Time (ms)
# o
f in
term
edia
te t
up
les
MS PT New Old
050
100150200250300350400450500
0 10000 20000 30000 40000 50000 60000Time (ms)
Ou
tpu
t R
ate
(tu
ple
s/se
c)
MS PT New OLd
Migration starts at 10000ms Four lines
New – run the new (better) query plan alone Old – run the old (worse) query plan alone MS – start with old plan, migrate to new plan by MS strategy PT – start with old plan, migrate to new plan by PT strategy
SIGMOD 2004 38
Experimental Results – High Config
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
0 20000 40000 60000 80000 100000 120000
Time (ms)
# o
f in
term
edia
te t
up
les
MS PT New Old
0
500
1000
1500
2000
2500
0 50000 100000 150000 200000Time (ms)
Ou
tpu
t R
ate
(tu
ple
s/s
ec)
MS PT New Old
Migration starts at 10000msNew – run the new (better) query plan aloneOld – run the old (worse) query plan aloneMS – start with old plan, migrate to new plan by MS strategyPT – start with old plan, migrate to new plan by PT strategy
SIGMOD 2004 39
Conclusions
Identify problem of migration for stateful operators First solutions for continuous query migration
Moving state strategy Parallel track strategy
Embed both strategies into stream system Cost model and experimental evaluation
Cost model confirmed by experiments Identify performance trade-off of two strategies
SIGMOD 2004 41
Future Work
General migration frameworkAll stateful operator types
Cost analysisEffects on optimization choices
Top Related