A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

29
Computer Networks Seminar Spring 2007 A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives Miguel Jimeno Ken Christensen Department of Computer Science and Engineering University of South Florida Tampa, FL 33620 {mjimeno, christen}@cse.usf.edu

description

A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives. Miguel Jimeno Ken Christensen Department of Computer Science and Engineering University of South Florida Tampa, FL 33620 {mjimeno, christen}@cse.usf.edu. Outline. Introduction & Background - PowerPoint PPT Presentation

Transcript of A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

Page 1: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

Computer Networks SeminarSpring 2007

A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

Miguel JimenoKen Christensen

Department of Computer Science and EngineeringUniversity of South Florida

Tampa, FL 33620{mjimeno, christen}@cse.usf.edu

Page 2: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

2 Computer Networks SeminarSpring 2007

Introduction & Background Research Problem The SmartNIC The new Design: Best-of-N Method Analysis of Best-of-N Method Numerical Results & Experiments Evaluation Summary & Future Work

Outline

Page 3: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

3 Computer Networks SeminarSpring 2007

The internet consumes 2% of all the electricity consumed in the US.[1]

An average PC consumes 120 W when fully powered-on.[10]

PCs could add 10% to the typical US residential consumption.

P2P Applications make the PC remain “on the net” all the time, (they are idle 99% of the time)

Introduction

[1]K. Kawamoto, J. Koomey, B. Nordman, R. Brown, M. Piette, M. Ting, and A. Meier, “Electricity Used by Office Equipment and Network Equipment in the U.S.: Detailed Report and Appendices,” Technical Report LBNL-45917, Energy Analysis Department, Lawrence Berkeley National Laboratory, 2001.

Page 4: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

4 Computer Networks SeminarSpring 2007

Can a P2P application can be run in small, low-power microcontroller?

The PC could then be power managed. The microcontroller can’t store large list of file

names.

Bloom Filters: Bloom filters are a well known probabilistic data

structure for representing a list of file name strings.

Introduction

Page 5: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

5 Computer Networks SeminarSpring 2007

Figure 1. Bloom filter of size m bits, and k = 4 hash functions.

Image Taken from [9]

False negatives are not possible, but there is a probability of generating false positives.

where m = size of the Bloom filter in bits, k = number of hash functions used to calculate

a Bloom filter, and s = number of bits set.

k

m

s]positivefalsePr[

Bloom Filters: A group of hash functions are used to map

elements into an array of bits.

Introduction

Page 6: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

6 Computer Networks SeminarSpring 2007

Background

Bloom filters were first proposed by Bloom [2]

Kirsch et. al. proposed a way to calculate bloom filter with less hashing [7]

Lumetta et. al. used the Power of Two Choices to calculate the bloom filter [8]

[2] B. Bloom, “Space/Time Tradeoffs in Hash Coding with Allowable Errors,” Communications of the ACM, Vol. 13, No. 7, pp. 422-426, 1970.

Page 7: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

7 Computer Networks SeminarSpring 2007

Introduction & Background Research Problem The SmartNIC The new Design: Best-of-N Method Analysis of Best-of-N Method Numerical Results & Experiments Evaluation Summary & Future Work

Outline

Page 8: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

8 Computer Networks SeminarSpring 2007

We investigated new methods for reducing the probability of false positives for a Bloom filter for fixed m and n.

The target is the implementation of this structure in a power management proxy.

Research Problem

Page 9: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

9 Computer Networks SeminarSpring 2007

Introduction & Background Research Problem The SmartNIC The new Design: Best-of-N Method Analysis of Best-of-N Method Numerical Results & Experiments Evaluation Summary & Future Work

Outline

Page 10: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

10 Computer Networks SeminarSpring 2007

NICs support up to MAC layer, but can’t respond to higher-layer packets.

A PC needs to be fully powered-on in order to respond to packets.

Applications like P2P file sharing require the PC to be fully powered-on all the time.

To manage power in PCs running P2P applications:- We are studying the idea of using small controller to

proxy for a sleeping PC.

The SmartNIC

Page 11: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

11 Computer Networks SeminarSpring 2007

This proxy will be able to maintain P2P TCP connections and respond to query messages.

We are exploring locating the controller on the NIC, so it’s a “SmartNIC”.

NIC (internal to PC)

Traffic flow from/to PC

Traffic flow from/to NIC

PC in sleepPC fully on

Figure 2. The SmartNIC with proxy capability

InternetInternet

(a) (b)

Proxying is enabledNIC (internal to PC)

Traffic flow from/to PC

Traffic flow from/to NIC

PC in sleepPC fully on

Figure 2. The SmartNIC with proxy capability

InternetInternet

(a) (b)

Proxying is enabled

The SmartNIC

Page 12: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

12 Computer Networks SeminarSpring 2007

Introduction & Background Research Problem The SmartNIC The new Design: Best-of-N Method Analysis of Best-of-N Method Numerical Results & Experiments Evaluation Summary & Future Work

Outline

Page 13: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

13 Computer Networks SeminarSpring 2007

Best-of-N method: N instances of a Bloom filter are generated and the instance with the least number of bits set to 1 is selected.

The “winner” hash group is used to test the bloom filter.

Bloom Filter

Generate a Bloom filter

instance

Final Instance

Select instance if number of bits set is the

smallest

New instance created

next seeded group of hash functions

Set of Strings

Strings to be inserted

N times

Figure 3: Best-of-N Method for Bloom filter

Bloom Filter

Bloom Filter

Generate a Bloom filter

instance

Generate a Bloom filter

instance

Final Instance

Final Instance

Select instance if number of bits set is the

smallest

Select instance if number of bits set is the

smallest

New instance created

next seeded group of hash functions

Set of Strings

Strings to be inserted

Set of StringsSet of Strings

Strings to be inserted

N times

Figure 3: Best-of-N Method for Bloom filter

The New Design: Best-of-N method

1) What improvement in Pr[false positive] can be achieved?

2) What is the computational cost to generate the filter?

Page 14: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

14 Computer Networks SeminarSpring 2007

In order to compute N instances quickly, we developed a new pseudo-hashing method called “RNG hashing”.

This method, based on a Random Number Generator, generates multiple hashes from one initial “seed” hash.

The New Design: Best-of-N method

Page 15: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

15 Computer Networks SeminarSpring 2007

Introduction & Background Research Problem The SmartNIC The new Design: Best-of-N Method Analysis of Best-of-N Method Numerical Results & Experiments Evaluation Summary & Future Work

Outline

Page 16: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

16 Computer Networks SeminarSpring 2007

We define S to be the random variable for the number of bits set in a Bloom filter.

Using order statistics we can determine the distribution of the minimum value of the independent samples S1, S2, …, SN (selected as Best-of-N).

For order statistics, if f(s) and F(s) are known, then

)S,...,S,Smin(SS N)(min 211

Analysis of Best-of-N Method

Page 17: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

17 Computer Networks SeminarSpring 2007

For a continuous distribution, )s(f))s(F(N)s(f Nmin

11

The mean can be computed as ds)s(sf]S[E minmin

Based on heuristic and empirical evidence, the distribution of S appears to be close to normal. Now we have that

where μ=E[S] and σ= σ[S]. We know that

kn

mm]S[E

111

2

2

2

1

12

1

22

1

sN

Nmin es

erfcN)s(f

Analysis of Best-of-N Method

Page 18: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

18 Computer Networks SeminarSpring 2007

We derive knknknkn

m

mm

m

mm

m

mm

m

mm]S[

1221 222

The probability of false positive for our method is then:

k

min

m

]S[E]positivefalsePr[

where E[Smin] is computed by substituting above.

Analysis of Best-of-N Method

Page 19: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

19 Computer Networks SeminarSpring 2007

Introduction & Background Research Problem The SmartNIC The new Design: Best-of-N Method Analysis of Best-of-N Method Numerical Results & Experiments Evaluation Summary & Future Work

Outline

Page 20: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

20 Computer Networks SeminarSpring 2007

For a given m and n where k is chosen optimally, we study the probability of false positive as a function of N.

1.00

1.05

1.10

1.15

1.20

1.25

1.30

0 10 20 30 40 50 60 70 80 90 100

Number of instances (N )

Impr

ovem

ent f

acto

r

m/n = 32

m/n = 16

m/n = 8

m/n = 64

Figure 4. Improvement factor for various m /n

Numerical Results

30%

Page 21: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

21 Computer Networks SeminarSpring 2007

4.0E-04

4.2E-04

4.4E-04

4.6E-04

4.8E-04

5.0E-04

0 10 20 30 40 50 60 70 80 90 100

Number of instances (N )

Pr[f

alse

pos

itive

]

Figure 5. Probability of false positive for m /n = 16

1.4E-07

1.6E-07

1.8E-07

2.0E-07

2.2E-07

2.4E-07

0 10 20 30 40 50 60 70 80 90 100

Number of instances (N )

Pr[f

alse

pos

itive

]

Figure 6. Probability of false positive for m /n = 32

For Figure 5, n = 1000 and m = 16,000. For Figure 6, same n, but m = 32,000

Numerical Results

Page 22: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

22 Computer Networks SeminarSpring 2007

Introduction & Background Research Problem The SmartNIC The new Design: Best-of-N Method Analysis of Best-of-N Method Numerical Results & Experiments Evaluation Summary & Future Work

Outline

Page 23: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

23 Computer Networks SeminarSpring 2007

Environment- Dell OptiPlex GX620 PC (Pentium4, 3.4 Ghz, 2 MBytes

cache) with 1 GByte RAM.- WindowsXP, gcc compiler (version 3.4.2 mingw-special

from Dev C++.- A list of 25,000 strings of unique music file names was

obtained using Bearshare 5.2. Response Variables

- Probability of false positive for the Bloom filter.- Execution time to generate a Bloom filter.

Experiments Evaluation

Page 24: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

24 Computer Networks SeminarSpring 2007

Control variables- Hashing method used.

• CRC32, Md5, RNG Method, Kirsch Method- Bloom filter parameters m, n, and k.- Best-of-N parameter N.- Number of strings used in the string test set.

Experiments Description- False Positive Exp 1: Vary N, measure Prob. of False

Positive.- False Positive Exp 2: Vary N, measure False Pos.- Run-time experiment: Collect CPU time for each N.

Experiments Evaluation

Page 25: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

25 Computer Networks SeminarSpring 2007

4.0E-04

4.1E-04

4.2E-04

4.3E-04

4.4E-04

4.5E-04

4.6E-04

0 10 20 30 40 50 60 70 80 90 100Number of instances (N )

Pr[

fals

e po

siti

ve]

AnalysisMD5CRC32RNG methodKirsch

Figure 7. Results from false positive experiment #1

0.0E+00

1.0E-01

2.0E-01

3.0E-01

4.0E-01

5.0E-01

6.0E-01

0 10 20 30 40 50 60 70 80 90 100

Number of instances (N )

CP

U T

ime

(sec

)

MD5CRC32RNG methodKirsch

Figure 8. Results from run time experiment

The experimental results for probability of false positive perfectly agree with the analysis.

CPU time results of RNG method were as good as Kirsch method, and better than CRC32.

Experiments Evaluation

Kirsch and RNG

Page 26: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

26 Computer Networks SeminarSpring 2007

Introduction & Background Research Problem The SmartNIC The new Design: Best-of-N Method Analysis of Best-of-N Method Numerical Results & Experiments Evaluation Summary & Future Work

Outline

Page 27: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

27 Computer Networks SeminarSpring 2007

Two Improvements to Bloom filters- A new Best-of-N method that reduces the probability of

false positive by generating N instances of a Bloom filter and selecting the best one.

- A new RNG hashing method that generates pseudo hashes given a single seed hash.

Bloom filters could be implemented in a power management proxy for P2P applications.

Savings of up to 85 Mill. could be obtained if 25% of PCs running P2P applications use SmartNICs.

Summary & Future Work

Page 28: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

28 Computer Networks SeminarSpring 2007

3. A. Broder and M. Mitzenmacher, “Network Applications of Bloom Filters: A Survey,” Internet Mathematics, Vol. 1, No. 4, pp. 485-509, 2005.

4. Energy Information Administration, “U.S Household Electricity Report,” July 2005. Available: http://www.eia.doe.gov/emeu/reps/enduse/er01_us.html.

5. L. Fan, P. Cao, and J. Almeida, “Bloom Filters - The Math,” 2000. Available: http://www.cs.wisc.edu/~cao/ papers/summary-cache/node8.html.

6. A. Kirsch and M. Mitzenmacher, “Less Hashing, Same Performance: Building a Better Bloom Filter,” Technical Report TR-02-5, Computer Science Group, Harvard University, 2005.

7. S. Lumetta and M. Mitzenmacher, “Using the Power of Two Choices to Improve Bloom Filters,” unpublished, 2006. Available: http://www.eecs.harvard.edu/~michaelm/ postscripts/bftwo.ps.

8. A. Pagh, R. Pagh, and S. Rao, “An Optimal Bloom Filter Replacement,” Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 823-829, 2005.

9. http://www.cs.wisc.edu/~cao/papers/summary-cache/node8.html10. US Department of Energy, Energy Efficiency and Renewable Energy, “Estimating

Appliance and Home Electronic Energy Use,” 2005. Available: http://www.eere.energy.gov/consumer/your_home/appliances/index.cfm/mytopic=10040.

References

Page 29: A Power Management Proxy with a New Best-of-N Bloom Filter Design to Reduce False Positives

29 Computer Networks SeminarSpring 2007

Thanks!

I’ll be happy to answer any questions.