Content based filtering, pub sub, bloom filters

29
Content based filtering, Pub/Sub & Bloom filter Presented By: Yara Ali

description

 

Transcript of Content based filtering, pub sub, bloom filters

Page 1: Content based filtering, pub   sub, bloom filters

Content based filtering, Pub/Sub

& Bloom filter

Presented By: Yara Ali

Page 2: Content based filtering, pub   sub, bloom filters

Agenda

• Introduction• Human Networks (HUMNETs)• Content-based Publish\Subscribe• Pub-sub service network• Bloom filter-based pub-SUB “B-SUB”• B-SUB Components• Bloom Filters (BF)• How Bloom filters work?• Temporal counting Bloom filter (TCBF)• Problems with TCBF• Decaying Factor

Page 3: Content based filtering, pub   sub, bloom filters

Introduction

• Distributed system:

• A system consisting

of several connected

computers that appear

to be one computing

entity.

Page 4: Content based filtering, pub   sub, bloom filters

Introduction .. Cont,

Communication Mechanism

Client / Server Architecture

Remote Procedure call

(RPC)

Message Oriented

Middleware(MOM)

Message Queues Tuple SpacePublish / Subscribe

Architecture

Page 5: Content based filtering, pub   sub, bloom filters

Introduction .. Cont,

• Publish / Subscribe Architectures

1- Lists at server:Middleware is at the servers

2- Message broker:Middleware is in a separate unit

3- BroadCast & filter at client:Middleware is at the clients

Page 6: Content based filtering, pub   sub, bloom filters

Human Networks (HUMNETs)

• It’s a dynamic networks composed of human-carried wireless devices.

• Applications in HUMNETs require content-based networking services. (style of communication that associates source and destination pairs based on actual content and interests, rather than letting source nodes specify the destination)

Page 7: Content based filtering, pub   sub, bloom filters

Content-based Publish\Subscribe (CBPS)

• Content-based matching is the problem of finding all the subscriptions that match a given notification.

• CBPS represents a compromise between the extremes of publisher-side filtering of messages ( with event directly transmitted to interested subscribers ) and subscriber-side filtering of messages ( with events broadcasted to all subscribers ).

• Event delivery is the task of delivering the notification to the set of interested subscribers selected with content-based matching.

Page 8: Content based filtering, pub   sub, bloom filters

Pub-sub service network

• Two Approaches :

1. Filter-based approach:Performs content-based filtering on intermediate routing servers to dynamically guide routing decisions.

2. Multicast-based approach:Delivers events through a few high-quality multicast groups that are pre-constructed to approximately match user interests.

Page 9: Content based filtering, pub   sub, bloom filters

Pub-sub service network…Cont,

Page 10: Content based filtering, pub   sub, bloom filters

Pub-sub service network…Cont,

• In the filter-based approach, Routing decisions are made via successive content-based filtering at all nodes from source to destination: every pub-sub server along the way matches the event with remote subscriptions from other servers and then forwards it only toward directions that lead to matching subscriptions

• In the multicast-based approach, A limited number of multicast groups are computed before event transmission begins. For each event the routing decision is made only once at the publisher, mapping the event into the single appropriate group. The event is then multicast to the group assuming IP multicast or application-level multicast support. Because only a limited number of multicast groups can be built, servers with different interests may be clustered into same group and events may be sent to uninterested servers as well.

Page 11: Content based filtering, pub   sub, bloom filters

Bloom filter-based pub-SUB “B-SUB”

• It’s a content-based publish-subscribe system.

• In B-SUB, messages are identified by using strings that summarize their contents. ( called keys )

Page 12: Content based filtering, pub   sub, bloom filters

Bloom filter-based pub-SUB “B-SUB” …Cont,

• Pub/sub paradigm is used in B-SUB

Page 13: Content based filtering, pub   sub, bloom filters

Bloom filter-based pub-SUB “B-SUB” …Cont,

• Advantages:

1- Frees users from addressing & routing tasks. (reduces the overall overhead in the system)

2- Message producers & consumers are separated.

3- Messages are forwarded only by brokers (Perform content matching for the users)

Page 14: Content based filtering, pub   sub, bloom filters

B-SUB Components

B-SUB

Broker AllocationPub – Sub forwarding

Interests propagation

Message forwarding

TCBF

Page 15: Content based filtering, pub   sub, bloom filters

B-SUB Components … Cont,1- Broker Allocation:

• Group of socially active nodes are selected to be brokers.

• Normal users don’t participate in interest propagation & message forwarding

• Brokers are responsible for collecting subscriptions and forwarding messages

• A Broker stores a TCBF for propagating other users’ interests. (which is called relay filters)

2- Pub – Sub forwarding

• It’s separated into 2 parts: interests propagation and message forwarding

Page 16: Content based filtering, pub   sub, bloom filters

Bloom Filters (BF)

• It’s a space-efficient data structure for representing sets which supports probabilistic membership querying.

•  is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set

• BF maps a key through multiple hash functions into a bit vector of a few bits being set. “ User’s interests are represented as keys – Also messages are identified by strings that summarize their contents are called Keys “

Page 17: Content based filtering, pub   sub, bloom filters

Bloom Filters (BF) … Cont,

• The locations of the set bits are determined by the hash functions.

• A query of a key to a BF checks if all the hashed bits of the key are set, which indicates if the key is contained in the BF

Page 18: Content based filtering, pub   sub, bloom filters

Bloom Filters (BF) … Cont,

• A BF for a set of keys is obtained by sequentially inserting keys into the filter.

1 1

1 1 1

1 1

{K0}

{K1}

{k0,K1}+10 10

1010

101020

Page 19: Content based filtering, pub   sub, bloom filters

Bloom Filters (BF) … Cont,

• To merge multiple BFs we do a bit-wise OR on them.

1 1

1 1 1

1 1

{K0}

{K1}

{k0,K1}M10 10

1010

101010

Page 20: Content based filtering, pub   sub, bloom filters

Bloom Filters (BF) … Cont,

• The basic BF doesn’t support deletions since we are unable to trace the associated keys of set bits.

• The counting bloom filter (CBF) is proposed to provide deletion.

• In a CBF each bit is associated with a counter, which represent the number of keys that are associated with it.

• To delete a key from a CBF we decrement the counters of the key’s hashed bits. A bit will be reset once its counter reaches 0.

Page 21: Content based filtering, pub   sub, bloom filters

Bloom Filters (BF) … Cont,

• The sizes of messages in B-SUB are small which are in order of hundreds of bytes. This assumption is true in social networking applications.

• Ex: twitter; a popular micro-blogging application, requires a max size of 140 bytes for each post. If a message is wrongly injected into the network, the wasted bandwidth is acceptable)

Page 22: Content based filtering, pub   sub, bloom filters

How Bloom filters work?“Message Forwarding”

• When a producer meets a consumer, the consumer reports its interests in a BF to the producer. The producer then queries all its messages against the filter, and forwards all the messages that match the filter, to the consumer.

• When a broker meets a producer, it forwards a BF to the producer. The producer queries that filter and determines the events that need to be transmitted.

Page 23: Content based filtering, pub   sub, bloom filters

How Bloom filters work?“Message Forwarding” … Cont,

• When a broker meets a consumer, the broker requests a BF containing the consumer’s interests, then forwards the matched messages to the consumer.

• Message are removed from brokers’ memory after being forwarded. This is to prevent excessive copies in the network.

• Messages’ lifetime is controlled by their time-to-live (TTL) values, which are identical to their maximum tolerable delay. The TTL is counted since the message has been created.

Page 24: Content based filtering, pub   sub, bloom filters

Temporal counting Bloom filter (TCBF)

• Extension to BF, proposed to perform content-based networking tasks.

• It doesn’t support direct deletion of elements. It only supports temporal deletion, that is, A filter constantly decrements the counter’s values of all its set bits, which is called Decaying

• B-SUB uses TCBF to encode user’s interest & embed information needed for brokers to make forwarding decisions.

• B-SUB makes forwarding decisions through querying the TCBFs ( B-SUB can propagate interests by transmitting at most two TCBFs of dozens of bytes)

• The operations performed are only hashing and table lookup.

Page 25: Content based filtering, pub   sub, bloom filters

Problems with TCBF

• False positive (Spam) occur because a key’s hashed bits are accidentally set by other keys that have been put into the TCBF.

• Because of false postivies, B-SUB may falsely inject useless messages into the network.

Page 26: Content based filtering, pub   sub, bloom filters

Decaying Factor (DF)

• It’s the key for adjusting B-SUB’s behaviors.

• If decaying is not used, the counters of the set bits don’t change after being set, then no interests will be removed.

• An obvious consequence is that a broker will end up with carrying the interests from the users that it meets rarely.

Page 27: Content based filtering, pub   sub, bloom filters

Decaying Factor (DF)…Cont,

• Suppose that each message has a delay limit of time T, we should set the DF in such a way that an interest will get removed after T since a consumer inserted the interest once.

• If the broker contains the interest, then that means that the broker has met a consumer that is interested in it within T.

• If a message is forwarded by the broker it’s likely that the message will be delivered within T.

Page 28: Content based filtering, pub   sub, bloom filters

References

• http://temple.academia.edu/YaxiongZhao/Papers/1043038/B-SUB_A_Practical_Bloom-Filter-Based_Publish-Subscribe_System_for_Human_Networks

• http://scholar.google.com/scholar?q=bloom+filters+in+publish+subscribe&hl=en&btnG=Search&as_sdt=1%2C5&as_sdtp=on

• http://en.wikipedia.org/wiki/Bloom_filter

• https://www.comp.nus.edu.sg/~david/Publications/debs2006.pdf

Page 29: Content based filtering, pub   sub, bloom filters

Thank You !