Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour,...

31
Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    223
  • download

    0

Transcript of Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour,...

Page 1: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

Subscription Subsumption Evaluation forContent-Based Publish/Subscribe Systems

Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra,and Nalini Venkatasubramanian

Page 2: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

2

Outline

Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption

checking Experimental evaluation Conclusions

Page 3: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

3

Event-based pub/sub systems

Publish subscribe systems

Publish/ Subscribe Service

Event

Page 4: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

4

Types of pub/sub systems

Topic-based vs. Content-based Centralized vs. Distributed

Page 5: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

5

Information dissemination in pub/sub systems Publication/Subscription routing in

distributed pub/subSubscriber 1

Subscriber 2

Publisher

Page 6: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

6

Reducing dissemination traffic

Goal: Preventing dissemination of redundant subscriptions

Subscriber 1

Subscriber 2

Publisher

Subscriber 3

Preventing redundant subscription dissemination• Reduces subscription forwarding traffic• Reduces subscription table size in broker• Speeds up publication matching

Page 7: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

7

Detection of redundant subscriptions: Covering and Subsumption Subscription covering is a pair-wise relationship between

subscriptions Subscription s2 covers subscription s1 iff all publications

matching s1 also match s2

Subscription subsumption is a generalization of covering Subscription s is subsumed by subscription set T =

{s1, s2, .., sn} iff all publications matching s also match at least one of subscriptions in T

s1

s2

s1

s2

s3s3 is subsumed by s1υ s2

but not covered by either of them

Page 8: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

8

Problem formulation

Content space: d-dimensional space where each dimension represents a numeric attribute

Subscriptions are d-dimensional rectangles Publications are d-dimensional points

Given a set of d-dimensional rectangles T = {s1, s2, .., sn}, is a new rectangle s contained in the disjunction (union) of rectangles in T ?

Page 9: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

9

Outline

Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption

checking Experimental evaluation Conclusions

Page 10: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

10

Related work Pair-wise covering

For a new subscription s, check if any previous subscription covers it

If not, then forward this query to all other brokers in the network

Probabilistic subsumption checking For a new subscription s, randomly

select d points in s If all of these points were covered

by previous subscriptions, assume s is subsumed

Complexity O(k.m.d), k = # of subscriptions, m = # dimensions & d = # of test points

False negatives may be generated, i.e., subscriptions that are not subsumed may be falsely assumed as subsumed May result in incorrect content routing

s1

s2

s3

s1

s2

Page 11: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

11

Outline

Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption

checking Experimental evaluation Conclusions

Page 12: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

12

Exact Subscription Subsumption Checking – Key Observation

Checking if a new subscription is covered by the union of previous subscriptions ≡ checking if new subscription intersects with the uncovered region.

We partition the content space into positive and negative spaces Positive space, , is parts of the space that are covered

by at least one existing subscription Negative space, , is parts of the space that are not

covered by any of the existing subscriptions Both can be represented by a set of non-overlapping

rectangles

Subscription s is subsumed iff

Page 13: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

13

Representation of Negative Space & Subsumption Evaluation

We represent the negative space as a set of non-overlapping d-dimensional rectangles

If a new subscription intersects with any of these rectangles, it is not subsumed

r1

r3

r2 r4 r5

r6

r7

r8

Page 14: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

14

Data structures & Complexity

The algorithm always detects whether a new subscription is subsumed or not

For efficient subsumption checking, the set of negative rectangles are indexed using R-Tree or KD-Tree for fast retrieval

For n subscriptions in d-dimensional space, the algorithm generates O(nd) negative rectangles

For high dimensional content space the number of negative rectangles can grow fast To control the growth of the number of negative

rectangles we propose an approximate subsumption checking algorithm

Page 15: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

15

Outline

Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption

checking Experimental evaluation Conclusions

Page 16: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

16

Approximation algorithm

r1

r3

r2 r4 r5

r6

r7 r8

r9

r6r6

In the example we have k=3

On adding a new subscription, restrict the number of new negative rectangles added ≤ k

At most O(k.n) negative rectangles after n active subscriptions

Leads to no false negatives, may generate some false positives (correctness is not compromised)

Page 17: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

17

Top-k rectangle selection criteria

Top-k selection We propose a model based on benefit/cost for

selecting these rectangles.

benefit of partitioning a negative rectangle with respect to a subscription is the volume of the intersecting region.

cost is the number of new negative rectangles created

We choose the top-k negative rectangles with highest benefit to cost ratio for splitting and add them to the representation of negative space.

Page 18: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

18

Subscription Forwarding in Approximate Algorithm If new subscription does not intersect

with any negative rectangle it is covered Otherwise

Find all intersecting negative rectangles with the subscription and sort them based on benefit/cost

Select first k negative rectangles and subtract the subscribed region from these

Update the representation of the negative space by replacing the k original rectangles by the new ones

(Algorithms for unsubscribing can be found in the paper)

Page 19: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

19

Outline

Problem definition and formulation Related work Exact subscription subsumption checking Approximate subscription subsumption

checking Experimental evaluation Conclusions

Page 20: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

20

Experimental evaluation

Simulation setup 10K subscriptions 2, 3, 4 and 5 dimensional space Each dimension in range [0, 1000] Zipfian distribution

Page 21: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

21

Experimental evaluation

Measuring advantage of subsumption checking Subscription Subsumption vs. Covering

More than 50% improvement in redundant subscription detection

Exact algorithm Approximate algorithm(k = 50)

Page 22: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

22

Storage overhead comparison (Exact vs Approximate)

Negative rectangle creation rate

Page 23: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

23

Experimental evaluation

Effect of k in the approximate algorithm

Larger k value results in more reduction in redundant subscriptions

Page 24: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

24

Experimental evaluation

Other Selection Metric Value Function

Considering both Benefit and Cost results in better subsumption checking

Page 25: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

25

Conclusions

Efficient query subsumption checking can greatly improve the performance of pub/sub systems by reducing subscription routing traffic between brokers.

Negative space maintenance as a set of disjoint rectangles leads to efficient subsumption checking by converting it to a intersection detection problem

We proposed exact and approximate subsumption checking algorithms & compare their relative performances.

Page 26: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

26

Thank You!

Questions?

Page 27: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

27

Related work Ouksel et al. present a Monte Carlo type probabilistic algorithm

for the subsumption checking For a new subscription s, randomly select d points in s If all of these points were covered by previous subscriptions, assume

s is subsumed

Has the complexity of O(k.m.d) where k is number of subscriptions, m is number of dimensions and d is the number of tests

False negative, subscriptions that are not subsumed may be assumed as subsumed May result in incorrect content routing

May mistakenly detect that s3 is subsumed

Our proposed approach prevents false negativess1

s2

s3

Page 28: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

28

Exact Subscription Subsumption Checking

Subsumption checking algorithm Input:

Set of negative rectangles: R={r1,r2,…,rm} Subscription s

Find Rintersect: The set of intersecting negative rectangles with s

If Rintersect = ∅ , s is subsumed Otherwise,

For every ri є Rintersect

R=R-{ri}

Ri = ri-s, represent Ri as a set of non-overlapping rectangles

R= R U Ri

Page 29: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

29

Approximate Subscription Subsumption Checking

On adding a new subscription, the number of new negative rectangles added ≤ k

At most O(k.n) negative rectangles after n active subscriptions

In the following example we have k=3

r1

r3

r2 r4 r5

r6

r7

r8 r9

r10 r11

r12

r9

Page 30: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

30

Experimental evaluation

Simulation setup 10K subscriptions 2, 3, 4 and 5 dimensional space Each dimension in range [0, 1000] Zipfian distribution For approximate algorithm, default value

for k is 50

Page 31: Subscription Subsumption Evaluation for Content-Based Publish/Subscribe Systems Hojjat Jafarpour, Bijit Hore, Sharad Mehrotra, and Nalini Venkatasubramanian.

31

Problem definition and formulation Content space: d-dimensional space where each

dimension representing a numeric attribute Subscriptions are d-dimensional rectangles Publications are d-dimensional pointsExample: Covering & Subsumption in 2-dimensional space

s1

s2

s1

s2

s3

s2 is covered by s1

s3 is subsumed by s1υ s2

but not covered by either of them