Operator Placement for In-Network Stream Query Processing.

30
Operator Placement for In-Network Stream Query Processing
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    227
  • download

    1

Transcript of Operator Placement for In-Network Stream Query Processing.

Operator Placement for In-Network Stream Query

Processing

Outline

Introduction Preliminaries Filter placement Extensions Conclusions

Introduction In-network query processing

Consider a video surveillance applicationEnvironment

Target Suspicious activity

dark, movementNeed

filter for calculating intensity (F1) filter for detecting sufficient motion (F2)

Introduction Previous work

push down all filterssince CPU cost << communication cost

What if the queries involve expensive predicates ?

Objective place each filter at the “best" node

based on selectivity and cost minimize the overall cost

Introduction Operator placement problem

TradeoffLower computational costs

Put on the nodes higher upLower transmission cost

Put on the nodes lower down Candidate

m-level hierarchyn filters

mn possible solutions

In this paper… Key idea

Model network links as filters Content

define the problemprovide a greedy alg. that failedpresent a polynomial-time optimal alg.extend to multiway stream join…

Preliminaries Consider a linear chain of nodes

NotationS = data acquired by node N1

F = { F1, F2, …, Fn }

Query

Cost Model Three quantities

Selectivity of filter F : s(F)fraction of the tuples in stream S that are expected to satisfy F

Cost of filter F : c(F, i)per-tuple cost of execution on node Ni

c(F, i+1) = i c(F, i) i ≤ 1 (if i > 1 )

Cost of network transmission : liper-tuple cost of transmitting from Ni to Ni+1

rs(F)r

Cost Model Notation

P(F) = i if filter F is executed on Ni

Fi = { F | P(F) = i } F’ = F’1, F’2, …, F’n’ c(F’, i) = the cost per tuple of executing F’ at node Ni

r(Fi) = Fi in rank order

Ref. J. Hellerstein and M.Stonebraker. Predicate migration: Optimizing queries with expensive predicates. 1993

Cost on a single node

Overall cost

Example 2.2

c(P) = c(F1, 1) +

s(F1) c(F2, 1) +

s(F1) s(F2) [ l1 + l2 + c(F3, 3) ] +

s(F1) s(F2) s(F3) [ l3 + c(F4, 4) ]

= 200 +

(½) 400 +

(½) (½) [ 700 + 500 + (1/5) (1/2) 1300 ] +

(½) (½) (½) [ 300 + (1/5) (1/2) (1/4) 2500 ]

= 200 + 200 + 332.5 + 45.3125 = 777.8125

s(F) = 1/2

Filter Placement

1. Greedy algorithm

2. Optimal algorithm

Greedy algorithm Notation

c(P, i) = part of the total cost c(P) incurred at Ni

including transmission from Ni to Ni+1

network link Ni to Ni+1 : s( ) = 0, c( ,1) = li

Example 3.3

At N1, r(F1) = 400, r(F2) = 800, r(F3) = 2600, r(F4) = 5000, Fl1 = 700

> r(F1)

At N2, r(F2) = 160, r(F3) = 520, r(F4) = 1000, Fl2 = 500 > r(F2)

At N3, r(F3) = 260, r(F4) = 500, Fl3 = 300 > r(F3)

At N4, r(F4)

c(P) = 200 + 350 + 40 + 125 + 32.5 + 37.5 + 7.8125 = 792.8125

Optimal algorithm Notation

network link Ni to Ni+1 :

,

Optimal algorithm Short-circuiting

Rank

Cost scaleup

Optimal algorithm

Example 3.7

Model links as filters

= 4571.42857142857 ,

r(F1) = 400, r(F2) = 800, r(F3) = 2600, r(F4) = 5000, r(Fl1 ) = 875, r(Fl

2,4 ) = 4571.4

r(F1) < r(F2) < r(Fl1 ) < r(F3) < r(Fl

2,4 ) < r(F4)

c(P) = 200 + 200 + 175 + 65 + 100 + 7.8125 = 747.8125

Extensions

Correlated filters

Tree hierarchiesJoinsOther extensions

Correlated filters Definition

Conditional selecivity s(F|Q) = the fraction of tuples that satisfy F given that they satisfy all the filters in Q

Reference Optimal ordering of correlated filters at a single node

NP-hard guaranteed to find a cost at most 4 times the opt. cost

Approximation ratio of 4 the best possible unless P = NP

Correlated filters Definition

,

Short-circuiting

Optimal solution

Tree hierarchy =

Each of the queries operates on different data.There is no sharing computation or transmission among them.

Joins Problem

k different data streams acquired by N1

Solution Reference

Sliding-window joinMJoin operator

at a single node join tree is left as future work

Query

W1 and W2 represent the lengths of the windows (time-pased or tuple-

based) on streams S1 and S2.

Joins

Joint operator Illustration

Selectivitys() = the fraction of the cross product that

occurs in the join result

Cost

r1

r2

s()r1 r2

Joins

Notation Fi = filters that can be applied either on Si befor

e the join or after | Fi | = ni

F12 = filters that can be applied only on after e the join

Joins

Time complexity : O(n2n1m(n+m)log(n+m))

Extensions Constrained nodes

Per-filter cost scaling c(F, i+1) / c(F, i) may be different for different F. Modeling network links as filters no longer applies. It becomes NP-hard.

Conclusion Environment

Operator placement problem Tradeoff

Lower computational costs Put on the nodes higher up

Lower transmission cost Put on the nodes lower down

Provide Greedy alg. & Optimal alg. Extensions

Lemma 3.1

by (2)

F1 in P is chosen according to the theorem.

∵ Lemma 3.1 and s(Fl1)=0 ∴

F’1 in P’ s.t. c( P’, 1 ) < c( P, 1 )

∵ Theorem 2.1 ∴ c( P, 1 ) ≦ c( P’, 1 ) → contradiction

Theorem 3.2

Lemma 3.4

)]1,([])([2

11

2

1

1

2

1

1

i

j

lij

i

jji

i

jji Fcll 1

F1 in P is chosen according to the theorem.

∵ Lemma 3.4 ∴

P’ s.t. c( P’, 1 ) < c( P, 1 )

∵ Theorem 2.1 ∴ c( P, 1 ) ≦ c( P’, 1 ) → contradiction

Theorem 3.5

Suppose and the best

Moving the filters on node Ni to Ni-1

Moving the filters on node Ni to Ni+1

∵ P is best plan ∴ c( P) < c( P’) , c( P) < c( P”)

implies → contradiction

Lemma 3.6