Runtime Semantic Query Optimization for Event Stream Processing
Operator Placement for In-Network Stream Query Processing.
-
date post
20-Dec-2015 -
Category
Documents
-
view
227 -
download
1
Transcript of Operator Placement for In-Network Stream Query Processing.
Introduction In-network query processing
Consider a video surveillance applicationEnvironment
Target Suspicious activity
dark, movementNeed
filter for calculating intensity (F1) filter for detecting sufficient motion (F2)
Introduction Previous work
push down all filterssince CPU cost << communication cost
What if the queries involve expensive predicates ?
Objective place each filter at the “best" node
based on selectivity and cost minimize the overall cost
Introduction Operator placement problem
TradeoffLower computational costs
Put on the nodes higher upLower transmission cost
Put on the nodes lower down Candidate
m-level hierarchyn filters
mn possible solutions
In this paper… Key idea
Model network links as filters Content
define the problemprovide a greedy alg. that failedpresent a polynomial-time optimal alg.extend to multiway stream join…
Preliminaries Consider a linear chain of nodes
NotationS = data acquired by node N1
F = { F1, F2, …, Fn }
Query
Cost Model Three quantities
Selectivity of filter F : s(F)fraction of the tuples in stream S that are expected to satisfy F
Cost of filter F : c(F, i)per-tuple cost of execution on node Ni
c(F, i+1) = i c(F, i) i ≤ 1 (if i > 1 )
Cost of network transmission : liper-tuple cost of transmitting from Ni to Ni+1
rs(F)r
Cost Model Notation
P(F) = i if filter F is executed on Ni
Fi = { F | P(F) = i } F’ = F’1, F’2, …, F’n’ c(F’, i) = the cost per tuple of executing F’ at node Ni
r(Fi) = Fi in rank order
Ref. J. Hellerstein and M.Stonebraker. Predicate migration: Optimizing queries with expensive predicates. 1993
Cost on a single node
Overall cost
Example 2.2
c(P) = c(F1, 1) +
s(F1) c(F2, 1) +
s(F1) s(F2) [ l1 + l2 + c(F3, 3) ] +
s(F1) s(F2) s(F3) [ l3 + c(F4, 4) ]
= 200 +
(½) 400 +
(½) (½) [ 700 + 500 + (1/5) (1/2) 1300 ] +
(½) (½) (½) [ 300 + (1/5) (1/2) (1/4) 2500 ]
= 200 + 200 + 332.5 + 45.3125 = 777.8125
s(F) = 1/2
Greedy algorithm Notation
c(P, i) = part of the total cost c(P) incurred at Ni
including transmission from Ni to Ni+1
network link Ni to Ni+1 : s( ) = 0, c( ,1) = li
Example 3.3
At N1, r(F1) = 400, r(F2) = 800, r(F3) = 2600, r(F4) = 5000, Fl1 = 700
> r(F1)
At N2, r(F2) = 160, r(F3) = 520, r(F4) = 1000, Fl2 = 500 > r(F2)
At N3, r(F3) = 260, r(F4) = 500, Fl3 = 300 > r(F3)
At N4, r(F4)
c(P) = 200 + 350 + 40 + 125 + 32.5 + 37.5 + 7.8125 = 792.8125
Example 3.7
Model links as filters
= 4571.42857142857 ,
r(F1) = 400, r(F2) = 800, r(F3) = 2600, r(F4) = 5000, r(Fl1 ) = 875, r(Fl
2,4 ) = 4571.4
r(F1) < r(F2) < r(Fl1 ) < r(F3) < r(Fl
2,4 ) < r(F4)
c(P) = 200 + 200 + 175 + 65 + 100 + 7.8125 = 747.8125
Correlated filters Definition
Conditional selecivity s(F|Q) = the fraction of tuples that satisfy F given that they satisfy all the filters in Q
Reference Optimal ordering of correlated filters at a single node
NP-hard guaranteed to find a cost at most 4 times the opt. cost
Approximation ratio of 4 the best possible unless P = NP
Correlated filters Definition
,
Short-circuiting
Optimal solution
Tree hierarchy =
Each of the queries operates on different data.There is no sharing computation or transmission among them.
Joins Problem
k different data streams acquired by N1
Solution Reference
Sliding-window joinMJoin operator
at a single node join tree is left as future work
Query
W1 and W2 represent the lengths of the windows (time-pased or tuple-
based) on streams S1 and S2.
Joins
Joint operator Illustration
Selectivitys() = the fraction of the cross product that
occurs in the join result
Cost
r1
r2
s()r1 r2
Joins
Notation Fi = filters that can be applied either on Si befor
e the join or after | Fi | = ni
F12 = filters that can be applied only on after e the join
Extensions Constrained nodes
Per-filter cost scaling c(F, i+1) / c(F, i) may be different for different F. Modeling network links as filters no longer applies. It becomes NP-hard.
Conclusion Environment
Operator placement problem Tradeoff
Lower computational costs Put on the nodes higher up
Lower transmission cost Put on the nodes lower down
Provide Greedy alg. & Optimal alg. Extensions
F1 in P is chosen according to the theorem.
∵ Lemma 3.1 and s(Fl1)=0 ∴
F’1 in P’ s.t. c( P’, 1 ) < c( P, 1 )
∵ Theorem 2.1 ∴ c( P, 1 ) ≦ c( P’, 1 ) → contradiction
Theorem 3.2
F1 in P is chosen according to the theorem.
∵ Lemma 3.4 ∴
P’ s.t. c( P’, 1 ) < c( P, 1 )
∵ Theorem 2.1 ∴ c( P, 1 ) ≦ c( P’, 1 ) → contradiction
Theorem 3.5