Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour,...

Post on 15-Jan-2016

223 views 0 download

Transcript of Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour,...

Subscription Subsumption Evaluation forContent-Based Publish/Subscribe Systems

Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra,and Nalini Venkatasubramanian

2

Outline

Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption

checking Experimental evaluation Conclusions

3

Event-based pub/sub systems

Publish subscribe systems

Publish/ Subscribe Service

Event

4

Types of pub/sub systems

Topic-based vs. Content-based Centralized vs. Distributed

5

Information dissemination in pub/sub systems Publication/Subscription routing in

distributed pub/subSubscriber 1

Subscriber 2

Publisher

6

Reducing dissemination traffic

Goal: Preventing dissemination of redundant subscriptions

Subscriber 1

Subscriber 2

Publisher

Subscriber 3

Preventing redundant subscription dissemination• Reduces subscription forwarding traffic• Reduces subscription table size in broker• Speeds up publication matching

7

Detection of redundant subscriptions: Covering and Subsumption Subscription covering is a pair-wise relationship between

subscriptions Subscription s2 covers subscription s1 iff all publications

matching s1 also match s2

Subscription subsumption is a generalization of covering Subscription s is subsumed by subscription set T =

{s1, s2, .., sn} iff all publications matching s also match at least one of subscriptions in T

s1

s2

s1

s2

s3s3 is subsumed by s1υ s2

but not covered by either of them

8

Problem formulation

Content space: d-dimensional space where each dimension represents a numeric attribute

Subscriptions are d-dimensional rectangles Publications are d-dimensional points

Given a set of d-dimensional rectangles T = {s1, s2, .., sn}, is a new rectangle s contained in the disjunction (union) of rectangles in T ?

9

Outline

Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption

checking Experimental evaluation Conclusions

10

Related work Pair-wise covering

For a new subscription s, check if any previous subscription covers it

If not, then forward this query to all other brokers in the network

Probabilistic subsumption checking For a new subscription s, randomly

select d points in s If all of these points were covered

by previous subscriptions, assume s is subsumed

Complexity O(k.m.d), k = # of subscriptions, m = # dimensions & d = # of test points

False negatives may be generated, i.e., subscriptions that are not subsumed may be falsely assumed as subsumed May result in incorrect content routing

s1

s2

s3

s1

s2

11

Outline

Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption

checking Experimental evaluation Conclusions

12

Exact Subscription Subsumption Checking – Key Observation

Checking if a new subscription is covered by the union of previous subscriptions ≡ checking if new subscription intersects with the uncovered region.

We partition the content space into positive and negative spaces Positive space, , is parts of the space that are covered

by at least one existing subscription Negative space, , is parts of the space that are not

covered by any of the existing subscriptions Both can be represented by a set of non-overlapping

rectangles

Subscription s is subsumed iff

13

Representation of Negative Space & Subsumption Evaluation

We represent the negative space as a set of non-overlapping d-dimensional rectangles

If a new subscription intersects with any of these rectangles, it is not subsumed

r1

r3

r2 r4 r5

r6

r7

r8

14

Data structures & Complexity

The algorithm always detects whether a new subscription is subsumed or not

For efficient subsumption checking, the set of negative rectangles are indexed using R-Tree or KD-Tree for fast retrieval

For n subscriptions in d-dimensional space, the algorithm generates O(nd) negative rectangles

For high dimensional content space the number of negative rectangles can grow fast To control the growth of the number of negative

rectangles we propose an approximate subsumption checking algorithm

15

Outline

Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption

checking Experimental evaluation Conclusions

16

Approximation algorithm

r1

r3

r2 r4 r5

r6

r7 r8

r9

r6r6

In the example we have k=3

On adding a new subscription, restrict the number of new negative rectangles added ≤ k

At most O(k.n) negative rectangles after n active subscriptions

Leads to no false negatives, may generate some false positives (correctness is not compromised)

17

Top-k rectangle selection criteria

Top-k selection We propose a model based on benefit/cost for

selecting these rectangles.

benefit of partitioning a negative rectangle with respect to a subscription is the volume of the intersecting region.

cost is the number of new negative rectangles created

We choose the top-k negative rectangles with highest benefit to cost ratio for splitting and add them to the representation of negative space.

18

Subscription Forwarding in Approximate Algorithm If new subscription does not intersect

with any negative rectangle it is covered Otherwise

Find all intersecting negative rectangles with the subscription and sort them based on benefit/cost

Select first k negative rectangles and subtract the subscribed region from these

Update the representation of the negative space by replacing the k original rectangles by the new ones

(Algorithms for unsubscribing can be found in the paper)

19

Outline

Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption

checking Experimental evaluation Conclusions

20

Experimental evaluation

Simulation setup 10K subscriptions 2, 3, 4 and 5 dimensional space Each dimension in range [0, 1000] Zipfian distribution

21

Experimental evaluation

Measuring advantage of subsumption checking Subscription Subsumption vs. Covering

More than 50% improvement in redundant subscription detection

Exact algorithm Approximate algorithm(k = 50)

22

Storage overhead comparison (Exact vs Approximate)

Negative rectangle creation rate

23

Experimental evaluation

Effect of k in the approximate algorithm

Larger k value results in more reduction in redundant subscriptions

24

Experimental evaluation

Other Selection Metric Value Function

Considering both Benefit and Cost results in better subsumption checking

25

Conclusions

Efficient query subsumption checking can greatly improve the performance of pub/sub systems by reducing subscription routing traffic between brokers.

Negative space maintenance as a set of disjoint rectangles leads to efficient subsumption checking by converting it to a intersection detection problem

We proposed exact and approximate subsumption checking algorithms & compare their relative performances.

26

Thank You!

Questions?

27

Related work Ouksel et al. present a Monte Carlo type probabilistic algorithm

for the subsumption checking For a new subscription s, randomly select d points in s If all of these points were covered by previous subscriptions, assume

s is subsumed

Has the complexity of O(k.m.d) where k is number of subscriptions, m is number of dimensions and d is the number of tests

False negative, subscriptions that are not subsumed may be assumed as subsumed May result in incorrect content routing

May mistakenly detect that s3 is subsumed

Our proposed approach prevents false negativess1

s2

s3

28

Exact Subscription Subsumption Checking

Subsumption checking algorithm Input:

Set of negative rectangles: R={r1,r2,…,rm} Subscription s

Find Rintersect: The set of intersecting negative rectangles with s

If Rintersect = ∅ , s is subsumed Otherwise,

For every ri є Rintersect

R=R-{ri}

Ri = ri-s, represent Ri as a set of non-overlapping rectangles

R= R U Ri

29

Approximate Subscription Subsumption Checking

On adding a new subscription, the number of new negative rectangles added ≤ k

At most O(k.n) negative rectangles after n active subscriptions

In the following example we have k=3

r1

r3

r2 r4 r5

r6

r7

r8 r9

r10 r11

r12

r9

30

Experimental evaluation

Simulation setup 10K subscriptions 2, 3, 4 and 5 dimensional space Each dimension in range [0, 1000] Zipfian distribution For approximate algorithm, default value

for k is 50

31

Problem definition and formulation Content space: d-dimensional space where each

dimension representing a numeric attribute Subscriptions are d-dimensional rectangles Publications are d-dimensional pointsExample: Covering & Subsumption in 2-dimensional space

s1

s2

s1

s2

s3

s2 is covered by s1

s3 is subsumed by s1υ s2

but not covered by either of them