The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

28
The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

Transcript of The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

Page 1: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

The Bloom Paradox

Ori Rottenstreich

Joint work with Isaac Keslassy

Technion, Israel

Page 2: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• Requirement: A data structure in user with fast answer to• Solutions:

o O(n) – Searching in a listo O(log(n)) – Searching in a sorted listo O(1) – But with false positives / negatives

Slocal cache

Problem Definition

2

Mcentral memory with

all elements

vuzyxzx

x

usercost = 10

cost = 1x

y

cost = 10

y

user

y

Page 3: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• False Positive: but the data structure answers

• Results in a redundant access to the local cache.

Additional cost of 1.

• False Negative: but the data structure answers

• Results in an expensive access to the central memory instead of the local cache.

Additional cost of 10-1=9.

Two Possible Errors

3

x

y

Page 4: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

1

• Initialization: Array of zero bits.

• Insertion: Each of the elements is hashed times, the corresponding bits are set.

• Query: Hashing the element, checking that all bits are set.

• False positive rate (probability) of .

• No false negatives.

Bloom Filters (Bloom, 1970)

4

0000000000 00

1

y1 1

0000000000 00

1 1

z

x11

1 1

1 11 1 1

x11 1 w

1 11

Page 5: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• Cache/Memory Framework• Packet Classification• Intrusion Detection• Routing• Accounting• Beyond networking: Spell Checking, DNA Classification

• Can be found in o Google's web browser Chromeo Google's database system BigTableo Facebook's distributed storage system Cassandrao Mellanox's IB Switch System

Bloom Filters are Widely Used

5

Page 6: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

The Bloom Paradox

6

Sometimes, it is better to disregard the Bloom filter results, and in fact not to even query it,

thus making the Bloom filter useless.

Page 7: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

Outline

Introduction to Bloom Filters The Bloom Paradox

o The Bloom Paradox in Bloom Filterso Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter

Summary

7

Page 8: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• Parameters:

• Extreme case without locality: All elements with equal probability of

belonging to the cache.o Toy example

Bloom Paradox Example

8

Bloom filter

Page 9: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• Parameters:• Let be the set of elements that the Bloom filter indicates are in

o In particular, no false negatives in Bloom filter

• Intuition:

Slocal cache

Mcentral memory with

all elements

vuzyxzx

cost = 10cost = 1

cost = 10

Bloom Paradox Example

. .

userBBloom filterBloom filter

9

Page 10: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• Parameters:• Let be the set of elements that the Bloom filter indicates are in

o In particular, no false negatives in Bloom filter

• Surprise:

cost = 1

Slocal cache

Mcentral memory with

all elements

vuzyxzx

cost = 10

cost = 10

Bloom Paradox Example

. . 9

BBloom filter

Page 11: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• Parameters:• Let be the set of elements that the Bloom filter indicates are in

o In particular, no false negatives in Bloom filter

• Surprise:

The Bloom filter indicates the membership of

elements. Only of them are indeed in .

Bloom Paradox Example

. .

BBloom filter

Page 12: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• When the Bloom filter states that , it is wrong with probability

• Average cost if we listen to the Bloom filter:

• Average cost if we don’t:

The Bloom filter is useless!

Bloom Paradox Example

11

Don’t listen to the Bloom filter

= =

Page 13: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

Outline

Introduction to Bloom Filters The Bloom Paradox

o The Bloom Paradox in Bloom Filterso Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter

Summary

12

Page 14: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• The cost of a false positive : 1• The cost of a false negative :

• In the cache example:

Costs of the Two Possible Errors

13

Page 15: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter

• Intuition: The Bloom paradox occurs more often when:o is small

Conditions for the Bloom Paradox

14

localcache

Bloom filter

central memory

Page 16: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter

• Intuition: The Bloom paradox occurs more often when:o is smallo is large (i.e. is small)

Conditions for the Bloom Paradox

14central memory

localcache

Bloom filter

Page 17: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter

• Intuition: The Bloom paradox occurs more often when:o is small o is large (i.e. is small)o is small (because the Bloom filter implicitly assumes )

Conditions for the Bloom Paradox

14

Bloom filtercentral memory

localcache

Page 18: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter

• Intuition: The Bloom paradox occurs more often when:o is small o is large (i.e. is small)o is small (because the Bloom filter implicitly assumes )

• Theorem 1:The Bloom paradox occurs if and only if

• Boundaries of the Bloom Paradox: (for )

Conditions for the Bloom Paradox

14

If and the Bloom paradox occurs if

Page 19: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• Theorem 1:The Bloom paradox occurs if and only if

Bloom Filter Improvements

15

• Use the formula to improve the Bloom filter o Only insert / query Bloom filter if the formula expects it to be

useful

Bloom filtercentral memory

localcache

Page 20: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• Theorem 1:The Bloom paradox occurs if and only if

Bloom Filter Improvements

15

• Use the formula to improve the Bloom filter o Only insert / query Bloom filter if the formula expects it to be

useful

Bloom filtercentral memory

localcache

Page 21: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

Outline

Introduction to Bloom Filters The Bloom Paradox

o The Bloom Paradox in Bloom Filterso Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter

Summary

16

Page 22: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

1

• Bloom filters do not support deletions of elements. Simply resetting bits might cause false negatives.

• The solution: Counting Bloom filters - Storing array of counters instead of bits.o Insertion: Incrementing counters by one.o Deletion: Decrementing counters by one. o Query: Checking that counters are positive.

• The same false positive probability.• Require too much memory, e.g. 57 bits per element for .

Counting Bloom Filters (CBFs)

y+1 +1

0102001010 01

+1 +1x

+1+1

0000001010 00

x11 111

Page 23: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• Queryo Checking that counters are positive.

o Question: Which is more likely to be correct? y or z?

Counting Bloom Filter Query

18

0381052010 12

zy

y

Page 24: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• Theorem 2:Let denote the values of the counters pointed by the

set of hash functions. Then,

19

The Bloom Paradox in the Counting Bloom Filter

Only counters product matters!

Page 25: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• Parameters: n=3328, m = 28485, k=6 20

CBF Based Membership Probability

-Before checking CBF, a priori membership probability = ≈ 0.03-CBF indicates counters product=8 a posteriori membership probability ≈ 0.69

Page 26: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• Internet trace (equinix-chicago) with real hash functions.

Counting Bloom filter parameters: n=210, m / n = 30, k=5, 220

queries

21

Experimental Results

Page 27: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

• Discovery of the Bloom paradox

• Importance of the a priori membership probability

• Using the counters product to estimate the correctness of a positive indication of the CBF

Concluding Remarks

22

Page 28: The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

Thank You