DMSN 2011Cagri Balkesen & Nesime Tatbul
Scalable Data Partitioning Techniques for Parallel Sliding Window Processing over Data Streams
Talk Outline
• Intro & Motivation• Stream Partitioning Techniques
– Basic window partitioning– Batch partitioning– Pane-based partitioning
• Ring-based Query Evaluation• Experimental Evaluation• Conclusions & Future Work
Architectural Overview
• Classical Split-Merge pattern from Parallel DBs• Adjustable parallelism level, d• QoS on max latency & order
Query
Query nodes
Splitstage
Split node
Query Mergestage
Merge node
inputstream
outputstream
QoS: latency < 5 seconds disorder < 3 tuples
Query
Related Work: How to Partition?
• Content-sensitive– FluX: Fault-tolerant, load balancing Exchange [1,2]– Use group-by values from the query to partition– Need explicit load-balancing due to skewed data
• Content-insensitive– GDSM: Window-based parallelization (fixed-size tumbling wins) [3]– Win-Distribute: Partition at window boundaries– Win-Split: Partition each win into equi-length subwins
• The Problem:– How to handle sliding windows?– How to handle queries without group-by or a few groups?
[1] Flux: An Adaptive Partitioning Operator for Continuous Query Systems, ICDE‘03[2] Highly-Available, Fault-Tolerant, Parallel Dataflows, SIGMOD ‘04[3] Customizable Parallel Execution of Scientific Stream Queries, VLDB ‘05
Stream Partitioning Techniques
• Independently processable chunking– Window aware splitting of the stream
• Each window has an id & tuples are marked– (first-winid, last-winid, is-win-closer)
• Tuples are replicated for each of their windows
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 . . .
W1
W3
W4
w = 6 units, s = 2 units, Replication = 6/2 = 3
Node1
Node2
Node3
SplitW2
Approach 1: Basic Sliding Window Partitioning
The Problem with Basic sliding window partitioning:• Tuples belong to many windows depending on slide• Excessive replication of tuples for each window• Increase in output data volume of split
Approach 1: Basic Sliding Window Partitioning
t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 . . .
W1
W3
W4
Node1
Node2
Node3
SplitW2
w = 6 units, s = 2 units, Replication = 6/2 = 3
Approach 2: Batch-based Partitioning
• Batch several windows together to reduce replication• “Batch-window”: wb
= w+(B-1)*s ; sb = B*s– All the tuples in a batch go to the same partition– Only tuples overlapping btw. batches are replicated
• Replication reduced to wb/sb partitions instead of w/st1 t2 t3 t4 t5 t6 t7 t8 t9 t10 . . .
w1w2
w3
w4w5
w6
w7w8
B1
B2w = 3, s = 1B = 3 wb = 5, sb = 3Replication : 3 5/3
Definitions:w : window-sizes : slide-sizeB : batch-size
The Panes Technique
• Divide overlapping windows into disjoint panes• Reduce cost by sub-aggregation and sharing• Each window has w/gcd(w,s) panes of size gcd(w,s)• Query is decomposed: pane-level (PLQ) & window-level (WLQ)
queries
w1
w2
w3
w4
w5
. . .
win
dow
s
p1 p2 p3 p4 p5 p6 p7 p8 . . .
panes
[1] No Pane, No Gain: Efficient Evaluation of Sliding Window Aggregates over Data Streams, SIGMOD Record ‘[email protected] 10
Approach 3: Pane-based Partitioning• Mark each tuple with pane-id + win-id
– Treat panes as tumbling window with wp = sp = gcd(w,s)
• Route tuples to a node based on pane-id• Nodes compute PLQ with pane tuples• Combine all PLQ results of a window to form WLQ
– Need for an organized topology of nodes– We propose organization of nodes in a ring
Node1
Node2
Node3
Split
w = 6 units, s = 2 units
Window1
Pane1 Pane3
65
Pane2
4321
Window2Pane5
109
Pane4
87
Pane3
65
Window3Pane6 Pane7
14131211
Pane5
109
Node1
Node3
Node2
Merge
…P9P8
P3P2P1
…P11P10
P5P4
. .
.
…P13
P12
P7
P6
R3
R11
R9
R5
R13
R7
W1
W2
W3
Input Source
Split
Ring-based Query Evaluation
• High amount of pipelined result sharing among nodes
• Organized communication topology
W = 6, S = 4 tuplesP = GCD(6,4) = 2 tuples
Assignment of Windows and Panes to Nodes
• All pane results only arrive from predecessors• Pane results sent to successor is only local panes
– Each node is assigned n consecutive windows– Min n st.
Definitions:ww : win-size in # of panessw : slide-size in # of panes
Flexible Result Merging
[1] Exploiting k-Constraints to Reduce Memory Overhead in Continuous Queries over Data Streams. ACM TODS ‘04
Fully-ordered
FIFO
k-ordered: k-ordering constraint [1], certain disorder allowed
Defn: For any tuple s, s’ arrives at least k+1 tuples after s st. s’.A ≥ s.A
* k = 0
Experimental Evaluation
• Implementation of techniques in Borealis• Workload adapted from Linear Road Benchmark
– Slightly modified segment statistics queries– Basic aggregation functions with different window/slide
ratios
Scalability of Split Operator
• Pane-partitioning: cost & tput constant regardless of overlap ratio• Window & batch –partitioning: cost ↑ and tput↓ as overlap ↑• Excessive replication in window-partitioning is reduced by batching
window-size/slide ratio (window overlap)
Max
imum
inpu
t rat
e (t
uple
s/se
cond
)
Scalability of Partitioning Techniques
• Pane-based scales close to linear until split is saturated– per tuple cost is constant
• Window & batch based: exteremely high replication– Split is not saturated, but scales very slowly
* w/s = overlap ratio = 100
Summary & Conclusions
• Pane-partitioning is the choice of partitioning– Avoids tuple replication– Incurs less overhead in split and aggregate– Scales close to linear
1) Window-based 2) Batch-based 3) Pane-based
Ongoing & Future Work
• Generalization of the framework• Support for adaptivity during runtime• Extending complexity of query plans• Extending performance analysis & experiments
Thank You!
Top Related