Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds...

86
Bounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics Stanford University [email protected] April 18th, 2013 Joint work with Jeffrey Cohen. D. Kane (Stanford) Cuckoo Independence April 2013 1 / 30

Transcript of Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds...

Page 1: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Bounds on the Independence Required for CuckooHashing

Daniel Kane

Department of MathematicsStanford University

[email protected]

April 18th, 2013

Joint work with Jeffrey Cohen.

D. Kane (Stanford) Cuckoo Independence April 2013 1 / 30

Page 2: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Dictionary Data Structures

Problem: Given a universe U = 1, 2, . . . ,N, and a subset S ⊆ U, with|S | = n, build a data structure that given x ∈ U can determine if x ∈ S .

Dynamic dictionary data structure also supports adding or removingelements from S .

D. Kane (Stanford) Cuckoo Independence April 2013 2 / 30

Page 3: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Dictionary Data Structures

Problem: Given a universe U = 1, 2, . . . ,N, and a subset S ⊆ U, with|S | = n, build a data structure that given x ∈ U can determine if x ∈ S .

Dynamic dictionary data structure also supports adding or removingelements from S .

D. Kane (Stanford) Cuckoo Independence April 2013 2 / 30

Page 4: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Hashing

Basic solution:

Create an array A of size m > n.

Find a hash function (i.e. a deterministically computable functionthat behaves like a random function) h : U → [m].

For x ∈ S store x in A[h(x)].

Query for x by checking if x = A[h(x)].

D. Kane (Stanford) Cuckoo Independence April 2013 3 / 30

Page 5: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Hashing

Basic solution:

Create an array A of size m > n.

Find a hash function (i.e. a deterministically computable functionthat behaves like a random function) h : U → [m].

For x ∈ S store x in A[h(x)].

Query for x by checking if x = A[h(x)].

D. Kane (Stanford) Cuckoo Independence April 2013 3 / 30

Page 6: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Hashing

Basic solution:

Create an array A of size m > n.

Find a hash function (i.e. a deterministically computable functionthat behaves like a random function) h : U → [m].

For x ∈ S store x in A[h(x)].

Query for x by checking if x = A[h(x)].

D. Kane (Stanford) Cuckoo Independence April 2013 3 / 30

Page 7: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Hashing

Basic solution:

Create an array A of size m > n.

Find a hash function (i.e. a deterministically computable functionthat behaves like a random function) h : U → [m].

For x ∈ S store x in A[h(x)].

Query for x by checking if x = A[h(x)].

D. Kane (Stanford) Cuckoo Independence April 2013 3 / 30

Page 8: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Collisions

Problem: What if h(x) = h(y)? Cannot store two values at A[h(x)]. Thisis called a collision between x and y under h.

Solution:

Randomness of h ensures not too many collisions

When you do have collisions do one of:I Bucket for extra values.I Secondary hashing.I Have multiple possible locations.

D. Kane (Stanford) Cuckoo Independence April 2013 4 / 30

Page 9: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Collisions

Problem: What if h(x) = h(y)? Cannot store two values at A[h(x)]. Thisis called a collision between x and y under h.

Solution:

Randomness of h ensures not too many collisions

When you do have collisions do one of:I Bucket for extra values.I Secondary hashing.I Have multiple possible locations.

D. Kane (Stanford) Cuckoo Independence April 2013 4 / 30

Page 10: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Collisions

Problem: What if h(x) = h(y)? Cannot store two values at A[h(x)]. Thisis called a collision between x and y under h.

Solution:

Randomness of h ensures not too many collisions

When you do have collisions do one of:I Bucket for extra values.

I Secondary hashing.I Have multiple possible locations.

D. Kane (Stanford) Cuckoo Independence April 2013 4 / 30

Page 11: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Collisions

Problem: What if h(x) = h(y)? Cannot store two values at A[h(x)]. Thisis called a collision between x and y under h.

Solution:

Randomness of h ensures not too many collisions

When you do have collisions do one of:I Bucket for extra values.I Secondary hashing.

I Have multiple possible locations.

D. Kane (Stanford) Cuckoo Independence April 2013 4 / 30

Page 12: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Collisions

Problem: What if h(x) = h(y)? Cannot store two values at A[h(x)]. Thisis called a collision between x and y under h.

Solution:

Randomness of h ensures not too many collisions

When you do have collisions do one of:I Bucket for extra values.I Secondary hashing.I Have multiple possible locations.

D. Kane (Stanford) Cuckoo Independence April 2013 4 / 30

Page 13: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Cuckoo Hashing

Cuckoo Hashing is a dynamic dictionary data structure developed by Paghand Rodler in 2001.

Idea:

Two hash functions h1, h2.

Two arrays A1,A2 of length m = 4n.

Store x in either A1[h1(x)] or A2[h2(x)].

D. Kane (Stanford) Cuckoo Independence April 2013 5 / 30

Page 14: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Cuckoo Hashing

Cuckoo Hashing is a dynamic dictionary data structure developed by Paghand Rodler in 2001.

Idea:

Two hash functions h1, h2.

Two arrays A1,A2 of length m = 4n.

Store x in either A1[h1(x)] or A2[h2(x)].

D. Kane (Stanford) Cuckoo Independence April 2013 5 / 30

Page 15: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Cuckoo Hashing

Cuckoo Hashing is a dynamic dictionary data structure developed by Paghand Rodler in 2001.

Idea:

Two hash functions h1, h2.

Two arrays A1,A2 of length m = 4n.

Store x in either A1[h1(x)] or A2[h2(x)].

D. Kane (Stanford) Cuckoo Independence April 2013 5 / 30

Page 16: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Cuckoo Hashing

Cuckoo Hashing is a dynamic dictionary data structure developed by Paghand Rodler in 2001.

Idea:

Two hash functions h1, h2.

Two arrays A1,A2 of length m = 4n.

Store x in either A1[h1(x)] or A2[h2(x)].

D. Kane (Stanford) Cuckoo Independence April 2013 5 / 30

Page 17: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Queries and Removal

To determine if x ∈ S :

See if x is either A1[h1(x)] or A2[h2(x)].

To remove x from S :

If x = A1[h1(x)], set A1[h1(x)] to 0.

If x = A2[h2(x)], set A2[h2(x)] to 0.

D. Kane (Stanford) Cuckoo Independence April 2013 6 / 30

Page 18: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Queries and Removal

To determine if x ∈ S :

See if x is either A1[h1(x)] or A2[h2(x)].

To remove x from S :

If x = A1[h1(x)], set A1[h1(x)] to 0.

If x = A2[h2(x)], set A2[h2(x)] to 0.

D. Kane (Stanford) Cuckoo Independence April 2013 6 / 30

Page 19: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Insertions

To add x to S :

Let x = x1

If A1[h1(x1)] is 0, set it to x1.

X1

D. Kane (Stanford) Cuckoo Independence April 2013 7 / 30

Page 20: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Insertions

If A1[h1(x1)] 6= 0, let x2 = A1[h1(x1)], set A1[h1(x1)] to x1.

If A2[h2(x2)] is 0, set it to x2.

Otherwise, let x3 = A2[h2(x2)], set A2[h2(x2)] to x2.

X1

X2

X3

D. Kane (Stanford) Cuckoo Independence April 2013 8 / 30

Page 21: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Insertions

Generally trying to place xi .

If Ai [hi (xi )] = 0, set it to xi .

Otherwise, xi+1 = Ai [hi (xi )], set Ai [hi (xi )] to xi try to place xi+1.

X1

X2

X3

X4

X5

0

D. Kane (Stanford) Cuckoo Independence April 2013 9 / 30

Page 22: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Insertions

Sometimes problems will crop up if you hit a cycle.

If chain take more than log(n) replacements, pick new hash functionsand try again (known as rehashing).

D. Kane (Stanford) Cuckoo Independence April 2013 10 / 30

Page 23: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Analysis

To have a chain of k − 1 replacements need:

x1, . . . , xk

h1(x1) = h1(x2), h2(x2) = h2(x3), . . .

Expected number of such chains is nk

mk−1 < n41−k .

E[∑

k#chains of length k] = O(n).

Probably none of length more than log(n).

Also have trouble if there’s a loop:

x1, . . . , xk

h1(x1) = h1(x2), h2(x2) = h2(x3), . . . , hi (xk) = hi (x1)

Expected number of such chains is nk

mk < 4−k .

Expected number of loops is less than 1/3.

D. Kane (Stanford) Cuckoo Independence April 2013 11 / 30

Page 24: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Analysis

To have a chain of k − 1 replacements need:

x1, . . . , xk

h1(x1) = h1(x2), h2(x2) = h2(x3), . . .

Expected number of such chains is nk

mk−1 < n41−k .

E[∑

k#chains of length k] = O(n).

Probably none of length more than log(n).

Also have trouble if there’s a loop:

x1, . . . , xk

h1(x1) = h1(x2), h2(x2) = h2(x3), . . . , hi (xk) = hi (x1)

Expected number of such chains is nk

mk < 4−k .

Expected number of loops is less than 1/3.

D. Kane (Stanford) Cuckoo Independence April 2013 11 / 30

Page 25: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Analysis

To have a chain of k − 1 replacements need:

x1, . . . , xk

h1(x1) = h1(x2), h2(x2) = h2(x3), . . .

Expected number of such chains is nk

mk−1 < n41−k .

E[∑

k#chains of length k] = O(n).

Probably none of length more than log(n).

Also have trouble if there’s a loop:

x1, . . . , xk

h1(x1) = h1(x2), h2(x2) = h2(x3), . . . , hi (xk) = hi (x1)

Expected number of such chains is nk

mk < 4−k .

Expected number of loops is less than 1/3.

D. Kane (Stanford) Cuckoo Independence April 2013 11 / 30

Page 26: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Analysis

To have a chain of k − 1 replacements need:

x1, . . . , xk

h1(x1) = h1(x2), h2(x2) = h2(x3), . . .

Expected number of such chains is nk

mk−1 < n41−k .

E[∑

k#chains of length k] = O(n).

Probably none of length more than log(n).

Also have trouble if there’s a loop:

x1, . . . , xk

h1(x1) = h1(x2), h2(x2) = h2(x3), . . . , hi (xk) = hi (x1)

Expected number of such chains is nk

mk < 4−k .

Expected number of loops is less than 1/3.

D. Kane (Stanford) Cuckoo Independence April 2013 11 / 30

Page 27: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Analysis

To have a chain of k − 1 replacements need:

x1, . . . , xk

h1(x1) = h1(x2), h2(x2) = h2(x3), . . .

Expected number of such chains is nk

mk−1 < n41−k .

E[∑

k#chains of length k] = O(n).

Probably none of length more than log(n).

Also have trouble if there’s a loop:

x1, . . . , xk

h1(x1) = h1(x2), h2(x2) = h2(x3), . . . , hi (xk) = hi (x1)

Expected number of such chains is nk

mk < 4−k .

Expected number of loops is less than 1/3.

D. Kane (Stanford) Cuckoo Independence April 2013 11 / 30

Page 28: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Analysis

Expected cost of n insertions is O(n).

Rehash expected O(1) times on n inputs. Cost O(n) per rehash.

Tighter analysis shows probability of rehash is O(1/n).

Theorem

Cuckoo Hashing uses O(n log(N)) space, performs queries and removals inworst case O(1) time, and insertions in expected, amortized O(1) time.

D. Kane (Stanford) Cuckoo Independence April 2013 12 / 30

Page 29: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Analysis

Expected cost of n insertions is O(n).

Rehash expected O(1) times on n inputs. Cost O(n) per rehash.

Tighter analysis shows probability of rehash is O(1/n).

Theorem

Cuckoo Hashing uses O(n log(N)) space, performs queries and removals inworst case O(1) time, and insertions in expected, amortized O(1) time.

D. Kane (Stanford) Cuckoo Independence April 2013 12 / 30

Page 30: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Analysis

Expected cost of n insertions is O(n).

Rehash expected O(1) times on n inputs. Cost O(n) per rehash.

Tighter analysis shows probability of rehash is O(1/n).

Theorem

Cuckoo Hashing uses O(n log(N)) space, performs queries and removals inworst case O(1) time, and insertions in expected, amortized O(1) time.

D. Kane (Stanford) Cuckoo Independence April 2013 12 / 30

Page 31: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

k-Independence

One problem with this construction is that you cannot use truly randomhash functions.

One common substitute is the following:

Definition

A k-Independent family of hash functions, is a probability distribution overfunctions h : U → [m] so that for every x1, . . . , xk ∈ U andy1, . . . , yk ∈ [m],

Prh(h(xi ) = yi , 1 ≤ i ≤ k) =1

mk.

Simple constructions to sample from k-independent families.

Analysis of Cuckoo Hashing only requires that h1, h2 be chosen fromlog(n)-independent families.

D. Kane (Stanford) Cuckoo Independence April 2013 13 / 30

Page 32: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

k-Independence

One problem with this construction is that you cannot use truly randomhash functions. One common substitute is the following:

Definition

A k-Independent family of hash functions, is a probability distribution overfunctions h : U → [m] so that for every x1, . . . , xk ∈ U andy1, . . . , yk ∈ [m],

Prh(h(xi ) = yi , 1 ≤ i ≤ k) =1

mk.

Simple constructions to sample from k-independent families.

Analysis of Cuckoo Hashing only requires that h1, h2 be chosen fromlog(n)-independent families.

D. Kane (Stanford) Cuckoo Independence April 2013 13 / 30

Page 33: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

k-Independence

One problem with this construction is that you cannot use truly randomhash functions. One common substitute is the following:

Definition

A k-Independent family of hash functions, is a probability distribution overfunctions h : U → [m] so that for every x1, . . . , xk ∈ U andy1, . . . , yk ∈ [m],

Prh(h(xi ) = yi , 1 ≤ i ≤ k) =1

mk.

Simple constructions to sample from k-independent families.

Analysis of Cuckoo Hashing only requires that h1, h2 be chosen fromlog(n)-independent families.

D. Kane (Stanford) Cuckoo Independence April 2013 13 / 30

Page 34: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Required Independence

Unfortunately, using k-independent families for large k is impractical(common constructions require O(k) time per evaluation).

How much independence is actually required in order for Cuckoo Hashingto satisfy it’s standard guarantees?

k = log(n) is enough.

Is k = O(1) enough?

What about k = 2?

D. Kane (Stanford) Cuckoo Independence April 2013 14 / 30

Page 35: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Required Independence

Unfortunately, using k-independent families for large k is impractical(common constructions require O(k) time per evaluation).How much independence is actually required in order for Cuckoo Hashingto satisfy it’s standard guarantees?

k = log(n) is enough.

Is k = O(1) enough?

What about k = 2?

D. Kane (Stanford) Cuckoo Independence April 2013 14 / 30

Page 36: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Required Independence

Unfortunately, using k-independent families for large k is impractical(common constructions require O(k) time per evaluation).How much independence is actually required in order for Cuckoo Hashingto satisfy it’s standard guarantees?

k = log(n) is enough.

Is k = O(1) enough?

What about k = 2?

D. Kane (Stanford) Cuckoo Independence April 2013 14 / 30

Page 37: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Required Independence

Unfortunately, using k-independent families for large k is impractical(common constructions require O(k) time per evaluation).How much independence is actually required in order for Cuckoo Hashingto satisfy it’s standard guarantees?

k = log(n) is enough.

Is k = O(1) enough?

What about k = 2?

D. Kane (Stanford) Cuckoo Independence April 2013 14 / 30

Page 38: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

5-Independence is not Enough

Theorem (Cohen, K.)

For S ⊂ U with |S | = n, and m n26/25, there exists a 5-independentfamily of hash functions h : U → [m], so that for h1 and h2 pickedindependently from this family, Cuckoo Hashing is forced to rehash on Swith high probability.

D. Kane (Stanford) Cuckoo Independence April 2013 15 / 30

Page 39: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Collision Graphs

Assume S = U.

After picking h1, h2 define the collision graph, G .

V (G ) = S .

(x , y) ∈ E (G ) if and only if h1(x) = h1(y) or h2(x) = h2(y).

Forced to rehash if G has pair of connected cycles

D. Kane (Stanford) Cuckoo Independence April 2013 16 / 30

Page 40: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Collision Graphs

Assume S = U.After picking h1, h2 define the collision graph, G .

V (G ) = S .

(x , y) ∈ E (G ) if and only if h1(x) = h1(y) or h2(x) = h2(y).

Forced to rehash if G has pair of connected cycles

D. Kane (Stanford) Cuckoo Independence April 2013 16 / 30

Page 41: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Collision Graphs

Assume S = U.After picking h1, h2 define the collision graph, G .

V (G ) = S .

(x , y) ∈ E (G ) if and only if h1(x) = h1(y) or h2(x) = h2(y).

Forced to rehash if G has pair of connected cycles

D. Kane (Stanford) Cuckoo Independence April 2013 16 / 30

Page 42: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Value-Oblivious Families

We only care about the collisions caused by a function, not the values.

Definition

We say that a family of hash functions h : U → [m] is value-oblivious if forevery π ∈ Sm, the family selects h with the same probability that it selectsπ h.

Nice criterion for k-independence:

Proposition

Given a value-oblivious family of hash functions h : U → [m], the family isk independent if and only if for all x1, . . . , xk ∈ U and any equivalencerelation ∼ on xi with c equivalence classes

Prh(h(xi ) = h(xj), for all xi ∼ xj) =1

mk−c.

D. Kane (Stanford) Cuckoo Independence April 2013 17 / 30

Page 43: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Value-Oblivious Families

We only care about the collisions caused by a function, not the values.

Definition

We say that a family of hash functions h : U → [m] is value-oblivious if forevery π ∈ Sm, the family selects h with the same probability that it selectsπ h.

Nice criterion for k-independence:

Proposition

Given a value-oblivious family of hash functions h : U → [m], the family isk independent if and only if for all x1, . . . , xk ∈ U and any equivalencerelation ∼ on xi with c equivalence classes

Prh(h(xi ) = h(xj), for all xi ∼ xj) =1

mk−c.

D. Kane (Stanford) Cuckoo Independence April 2013 17 / 30

Page 44: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Value-Oblivious Families

We only care about the collisions caused by a function, not the values.

Definition

We say that a family of hash functions h : U → [m] is value-oblivious if forevery π ∈ Sm, the family selects h with the same probability that it selectsπ h.

Nice criterion for k-independence:

Proposition

Given a value-oblivious family of hash functions h : U → [m], the family isk independent if and only if for all x1, . . . , xk ∈ U and any equivalencerelation ∼ on xi with c equivalence classes

Prh(h(xi ) = h(xj), for all xi ∼ xj) =1

mk−c.

D. Kane (Stanford) Cuckoo Independence April 2013 17 / 30

Page 45: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Value-Oblivious Families

We only care about the collisions caused by a function, not the values.

Definition

We say that a family of hash functions h : U → [m] is value-oblivious if forevery π ∈ Sm, the family selects h with the same probability that it selectsπ h.

Nice criterion for k-independence:

Proposition

Given a value-oblivious family of hash functions h : U → [m], the family isk independent if and only if for all x1, . . . , xk ∈ U and any equivalencerelation ∼ on xi with c equivalence classes

Prh(h(xi ) = h(xj), for all xi ∼ xj) =1

mk−c.

D. Kane (Stanford) Cuckoo Independence April 2013 17 / 30

Page 46: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

2-Independent Construction

Let S = V = F`3

Pick h by:I Pick v ∈ VI For each line L = w + tv parallel to v have h collide the elements of

L with probability pI No other collisionsI For each class of colliding elements, assign a random distinct value of

[m].

Check 2-independence:

Probability that h(x) = h(y).

Probability 2/n that x − y is parallel to v .

Probability p that they collide if so.

Need 2p/n = 1/m, or p = n/(2m).

D. Kane (Stanford) Cuckoo Independence April 2013 18 / 30

Page 47: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

2-Independent Construction

Let S = V = F`3

Pick h by:I Pick v ∈ V

I For each line L = w + tv parallel to v have h collide the elements ofL with probability p

I No other collisionsI For each class of colliding elements, assign a random distinct value of

[m].

Check 2-independence:

Probability that h(x) = h(y).

Probability 2/n that x − y is parallel to v .

Probability p that they collide if so.

Need 2p/n = 1/m, or p = n/(2m).

D. Kane (Stanford) Cuckoo Independence April 2013 18 / 30

Page 48: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

2-Independent Construction

Let S = V = F`3

Pick h by:I Pick v ∈ VI For each line L = w + tv parallel to v have h collide the elements of

L with probability p

I No other collisionsI For each class of colliding elements, assign a random distinct value of

[m].

Check 2-independence:

Probability that h(x) = h(y).

Probability 2/n that x − y is parallel to v .

Probability p that they collide if so.

Need 2p/n = 1/m, or p = n/(2m).

D. Kane (Stanford) Cuckoo Independence April 2013 18 / 30

Page 49: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

2-Independent Construction

Let S = V = F`3

Pick h by:I Pick v ∈ VI For each line L = w + tv parallel to v have h collide the elements of

L with probability pI No other collisions

I For each class of colliding elements, assign a random distinct value of[m].

Check 2-independence:

Probability that h(x) = h(y).

Probability 2/n that x − y is parallel to v .

Probability p that they collide if so.

Need 2p/n = 1/m, or p = n/(2m).

D. Kane (Stanford) Cuckoo Independence April 2013 18 / 30

Page 50: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

2-Independent Construction

Let S = V = F`3

Pick h by:I Pick v ∈ VI For each line L = w + tv parallel to v have h collide the elements of

L with probability pI No other collisionsI For each class of colliding elements, assign a random distinct value of

[m].

Check 2-independence:

Probability that h(x) = h(y).

Probability 2/n that x − y is parallel to v .

Probability p that they collide if so.

Need 2p/n = 1/m, or p = n/(2m).

D. Kane (Stanford) Cuckoo Independence April 2013 18 / 30

Page 51: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

2-Independent Construction

Let S = V = F`3

Pick h by:I Pick v ∈ VI For each line L = w + tv parallel to v have h collide the elements of

L with probability pI No other collisionsI For each class of colliding elements, assign a random distinct value of

[m].

Check 2-independence:

Probability that h(x) = h(y).

Probability 2/n that x − y is parallel to v .

Probability p that they collide if so.

Need 2p/n = 1/m, or p = n/(2m).

D. Kane (Stanford) Cuckoo Independence April 2013 18 / 30

Page 52: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

2-Independent Construction

Let S = V = F`3

Pick h by:I Pick v ∈ VI For each line L = w + tv parallel to v have h collide the elements of

L with probability pI No other collisionsI For each class of colliding elements, assign a random distinct value of

[m].

Check 2-independence:

Probability that h(x) = h(y).

Probability 2/n that x − y is parallel to v .

Probability p that they collide if so.

Need 2p/n = 1/m, or p = n/(2m).

D. Kane (Stanford) Cuckoo Independence April 2013 18 / 30

Page 53: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Failure Rate

Pick v1, v2.

n/9 planes parallel to v1, v2.

If all 6 lines have collisions (probability p6), must rehash.

Likely to happen if m n7/6.

v 1

v 2

D. Kane (Stanford) Cuckoo Independence April 2013 19 / 30

Page 54: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

What Happened?

hi collides v with v ± vi

Addition by v1 commutes with addition by v2

Get cycles v collides with v + v1 collides with v + v1 + v2 collideswith v + v2 collides with v

Similar construction, colliding planes in characteristic 2 gives3-independent family

More is needed for a 4-independent family as 4-independence impliesnot too many 4-cycles

D. Kane (Stanford) Cuckoo Independence April 2013 20 / 30

Page 55: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

What Happened?

hi collides v with v ± vi

Addition by v1 commutes with addition by v2

Get cycles v collides with v + v1 collides with v + v1 + v2 collideswith v + v2 collides with v

Similar construction, colliding planes in characteristic 2 gives3-independent family

More is needed for a 4-independent family as 4-independence impliesnot too many 4-cycles

D. Kane (Stanford) Cuckoo Independence April 2013 20 / 30

Page 56: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

What Happened?

hi collides v with v ± vi

Addition by v1 commutes with addition by v2

Get cycles v collides with v + v1 collides with v + v1 + v2 collideswith v + v2 collides with v

Similar construction, colliding planes in characteristic 2 gives3-independent family

More is needed for a 4-independent family as 4-independence impliesnot too many 4-cycles

D. Kane (Stanford) Cuckoo Independence April 2013 20 / 30

Page 57: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

What Happened?

hi collides v with v ± vi

Addition by v1 commutes with addition by v2

Get cycles v collides with v + v1 collides with v + v1 + v2 collideswith v + v2 collides with v

Similar construction, colliding planes in characteristic 2 gives3-independent family

More is needed for a 4-independent family as 4-independence impliesnot too many 4-cycles

D. Kane (Stanford) Cuckoo Independence April 2013 20 / 30

Page 58: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

5-Independent Family

In order to produce a 5-Independent family on which Cuckoo Hashing fails,we begin by making some reductions:

Find a family that merely produces many cycles with high probability,rather than a double cycle

I Split S into blocksI Use family on each blockI Get many cyclesI Expect there to be a random collision between cycles from different

blocks

Choose h1 and h2 from different familiesI Split S into blocksI On each block randomly select h from one of the two familiesI On a quarter of the blocks h1 is selected from the first family and h2

from the second

D. Kane (Stanford) Cuckoo Independence April 2013 21 / 30

Page 59: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

5-Independent Family

In order to produce a 5-Independent family on which Cuckoo Hashing fails,we begin by making some reductions:

Find a family that merely produces many cycles with high probability,rather than a double cycle

I Split S into blocksI Use family on each blockI Get many cyclesI Expect there to be a random collision between cycles from different

blocks

Choose h1 and h2 from different familiesI Split S into blocksI On each block randomly select h from one of the two familiesI On a quarter of the blocks h1 is selected from the first family and h2

from the second

D. Kane (Stanford) Cuckoo Independence April 2013 21 / 30

Page 60: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

5-Independent Family

In order to produce a 5-Independent family on which Cuckoo Hashing fails,we begin by making some reductions:

Find a family that merely produces many cycles with high probability,rather than a double cycle

I Split S into blocksI Use family on each blockI Get many cyclesI Expect there to be a random collision between cycles from different

blocks

Choose h1 and h2 from different familiesI Split S into blocksI On each block randomly select h from one of the two familiesI On a quarter of the blocks h1 is selected from the first family and h2

from the second

D. Kane (Stanford) Cuckoo Independence April 2013 21 / 30

Page 61: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Simplifying Assumptions

We will construct a value-oblivious family satisfying a slightly weakerversion of 5-independence.

Namely, that for any x1, . . . , x5, and anyequivalence relation on them with c equivalence classes, we ensure that

Prh(h(xi ) = h(xj), for all xi ∼ xj) ≤1

m5−c.

(rather than ensuring equality. This is roughly equivalent to the weakercondition known as 5-Universality)Essentially, we allow our hash function to have too few collisions of varioustypes. This can be corrected for the family we produce via somewhatmessy modification.

D. Kane (Stanford) Cuckoo Independence April 2013 22 / 30

Page 62: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Simplifying Assumptions

We will construct a value-oblivious family satisfying a slightly weakerversion of 5-independence. Namely, that for any x1, . . . , x5, and anyequivalence relation on them with c equivalence classes, we ensure that

Prh(h(xi ) = h(xj), for all xi ∼ xj) ≤1

m5−c.

(rather than ensuring equality. This is roughly equivalent to the weakercondition known as 5-Universality)Essentially, we allow our hash function to have too few collisions of varioustypes. This can be corrected for the family we produce via somewhatmessy modification.

D. Kane (Stanford) Cuckoo Independence April 2013 22 / 30

Page 63: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Basic Construction

Split S into 2` columns Ci each isomorphic to [q]

Have hi collide x in Ci+1 only with πi (x) in Ci for some πi ∈ Sq

Ensure that π1 π2 · · · π2` = 1

C o l u m n 1

C o l u m n 2

C o l u m n 3

C o l u m n 4

C o l u m n 5

C o l u m n 6

k 1

k 2

k 3

k 4

k 5

k 6

D. Kane (Stanford) Cuckoo Independence April 2013 23 / 30

Page 64: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Conditions on the π’s

1 π1 π2 · · · π2` = 1

2 5-UniversalityI No pair of pairs collide with too high a probabilityI For i 6= j , or x 6= y we need πi (x) nearly independent of πj(y).I Idea: pick πi from a 2-transitive family

3 π1, π3, . . . chosen independently of π2, π4, . . .

Note: without the third condition, h1, h2 are not independent of eachother and merely satisfy some joint-k-independence, easy, pick πiuniformly in Sq subject to

∏πi = 1.

D. Kane (Stanford) Cuckoo Independence April 2013 24 / 30

Page 65: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Conditions on the π’s

1 π1 π2 · · · π2` = 12 5-Universality

I No pair of pairs collide with too high a probabilityI For i 6= j , or x 6= y we need πi (x) nearly independent of πj(y).I Idea: pick πi from a 2-transitive family

3 π1, π3, . . . chosen independently of π2, π4, . . .

Note: without the third condition, h1, h2 are not independent of eachother and merely satisfy some joint-k-independence, easy, pick πiuniformly in Sq subject to

∏πi = 1.

D. Kane (Stanford) Cuckoo Independence April 2013 24 / 30

Page 66: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Conditions on the π’s

1 π1 π2 · · · π2` = 12 5-Universality

I No pair of pairs collide with too high a probabilityI For i 6= j , or x 6= y we need πi (x) nearly independent of πj(y).I Idea: pick πi from a 2-transitive family

3 π1, π3, . . . chosen independently of π2, π4, . . .

Note: without the third condition, h1, h2 are not independent of eachother and merely satisfy some joint-k-independence, easy, pick πiuniformly in Sq subject to

∏πi = 1.

D. Kane (Stanford) Cuckoo Independence April 2013 24 / 30

Page 67: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Conditions on the π’s

1 π1 π2 · · · π2` = 12 5-Universality

I No pair of pairs collide with too high a probabilityI For i 6= j , or x 6= y we need πi (x) nearly independent of πj(y).I Idea: pick πi from a 2-transitive family

3 π1, π3, . . . chosen independently of π2, π4, . . .

Note: without the third condition, h1, h2 are not independent of eachother and merely satisfy some joint-k-independence, easy, pick πiuniformly in Sq subject to

∏πi = 1.

D. Kane (Stanford) Cuckoo Independence April 2013 24 / 30

Page 68: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Basic Construction

So we need a relationship among elements of a 2-transitive permutationgroup in which half of the terms can be chosen independently of the otherhalf.

Idea: use the group of affine linear transformations on Fq. Have therelation:

[[a, b], [c , d ]] = 1.

Have h1 pick a and c and h2 pick b and d . Get relation

(c−1a)ba−1b−1cdc−1(d−1b)ab−1a−1dcd−1 = 1.

Unfortunately, the fact that some of these terms are the same, means thatπi (x), πj(y) are not independent (for example in cases where πi = πj).There are a few ways to fix this.

D. Kane (Stanford) Cuckoo Independence April 2013 25 / 30

Page 69: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Basic Construction

So we need a relationship among elements of a 2-transitive permutationgroup in which half of the terms can be chosen independently of the otherhalf.Idea: use the group of affine linear transformations on Fq. Have therelation:

[[a, b], [c , d ]] = 1.

Have h1 pick a and c and h2 pick b and d . Get relation

(c−1a)ba−1b−1cdc−1(d−1b)ab−1a−1dcd−1 = 1.

Unfortunately, the fact that some of these terms are the same, means thatπi (x), πj(y) are not independent (for example in cases where πi = πj).There are a few ways to fix this.

D. Kane (Stanford) Cuckoo Independence April 2013 25 / 30

Page 70: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Basic Construction

So we need a relationship among elements of a 2-transitive permutationgroup in which half of the terms can be chosen independently of the otherhalf.Idea: use the group of affine linear transformations on Fq. Have therelation:

[[a, b], [c , d ]] = 1.

Have h1 pick a and c and h2 pick b and d . Get relation

(c−1a)ba−1b−1cdc−1(d−1b)ab−1a−1dcd−1 = 1.

Unfortunately, the fact that some of these terms are the same, means thatπi (x), πj(y) are not independent (for example in cases where πi = πj).There are a few ways to fix this.

D. Kane (Stanford) Cuckoo Independence April 2013 25 / 30

Page 71: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Fix

Still using the affine group:

(ax + α) (bx + β) (a−1x + γ) (b−1x + δ) = x + α + aβ + abγ + bδ.

Compose a few copies of this to get the identity.

D. Kane (Stanford) Cuckoo Independence April 2013 26 / 30

Page 72: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Full Construction

Have h1 pick a ∈ F∗q, α1 + α2 + α3 = 0, γ1 + γ2 + γ3 = 0.

Have h2 pick b ∈ F∗q, β1 + β2 + β3 = 0, δ1 + δ2 + δ3 = 0.

(ax + α1) (bx + β1) (a−1x + γ1) (b−1x + δ1)(ax + α2) (bx + β2) (a−1x + γ2) (b−1x + δ2)(ax + α3) (bx + β3) (a−1x + γ3) (b−1x + δ3) = 1.

And this construction has all the properties we need.

D. Kane (Stanford) Cuckoo Independence April 2013 27 / 30

Page 73: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Finalizing the Construction

Split S into 12 columns of size q.

Pick a, αi , γi (if choosing h1) or b, βi , δi (if choosing h2). Produce πi .

For element x in Ci+1 and πi (x) in Ci , have them collide withprobability p.

I Given two pairs of elements need probability that both collide is atmost 1/m2.

I Probability is p2Pr(πi (x), πj(y) have correct values) ≈ p2/q2.I Choose p ≈ q/m.

Each element of the first column causes a possible collision cycle,which materializes with probability p12. Expect about x = q13/m12

cycles.

Split S into S1,S2, run this construction independently on each half.Expect the number of overlapping cycles is aboutx2/m ≈ q26/m25 ≈ n26/m25.

With some fiddling, we produce a 5-Independent family on whichCuckoo Hashing fails with high probability.

D. Kane (Stanford) Cuckoo Independence April 2013 28 / 30

Page 74: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Finalizing the Construction

Split S into 12 columns of size q.

Pick a, αi , γi (if choosing h1) or b, βi , δi (if choosing h2). Produce πi .

For element x in Ci+1 and πi (x) in Ci , have them collide withprobability p.

I Given two pairs of elements need probability that both collide is atmost 1/m2.

I Probability is p2Pr(πi (x), πj(y) have correct values) ≈ p2/q2.I Choose p ≈ q/m.

Each element of the first column causes a possible collision cycle,which materializes with probability p12. Expect about x = q13/m12

cycles.

Split S into S1,S2, run this construction independently on each half.Expect the number of overlapping cycles is aboutx2/m ≈ q26/m25 ≈ n26/m25.

With some fiddling, we produce a 5-Independent family on whichCuckoo Hashing fails with high probability.

D. Kane (Stanford) Cuckoo Independence April 2013 28 / 30

Page 75: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Finalizing the Construction

Split S into 12 columns of size q.

Pick a, αi , γi (if choosing h1) or b, βi , δi (if choosing h2). Produce πi .

For element x in Ci+1 and πi (x) in Ci , have them collide withprobability p.

I Given two pairs of elements need probability that both collide is atmost 1/m2.

I Probability is p2Pr(πi (x), πj(y) have correct values) ≈ p2/q2.I Choose p ≈ q/m.

Each element of the first column causes a possible collision cycle,which materializes with probability p12. Expect about x = q13/m12

cycles.

Split S into S1,S2, run this construction independently on each half.Expect the number of overlapping cycles is aboutx2/m ≈ q26/m25 ≈ n26/m25.

With some fiddling, we produce a 5-Independent family on whichCuckoo Hashing fails with high probability.

D. Kane (Stanford) Cuckoo Independence April 2013 28 / 30

Page 76: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Finalizing the Construction

Split S into 12 columns of size q.

Pick a, αi , γi (if choosing h1) or b, βi , δi (if choosing h2). Produce πi .

For element x in Ci+1 and πi (x) in Ci , have them collide withprobability p.

I Given two pairs of elements need probability that both collide is atmost 1/m2.

I Probability is p2Pr(πi (x), πj(y) have correct values) ≈ p2/q2.I Choose p ≈ q/m.

Each element of the first column causes a possible collision cycle,which materializes with probability p12. Expect about x = q13/m12

cycles.

Split S into S1,S2, run this construction independently on each half.Expect the number of overlapping cycles is aboutx2/m ≈ q26/m25 ≈ n26/m25.

With some fiddling, we produce a 5-Independent family on whichCuckoo Hashing fails with high probability.

D. Kane (Stanford) Cuckoo Independence April 2013 28 / 30

Page 77: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Finalizing the Construction

Split S into 12 columns of size q.

Pick a, αi , γi (if choosing h1) or b, βi , δi (if choosing h2). Produce πi .

For element x in Ci+1 and πi (x) in Ci , have them collide withprobability p.

I Given two pairs of elements need probability that both collide is atmost 1/m2.

I Probability is p2Pr(πi (x), πj(y) have correct values) ≈ p2/q2.I Choose p ≈ q/m.

Each element of the first column causes a possible collision cycle,which materializes with probability p12. Expect about x = q13/m12

cycles.

Split S into S1,S2, run this construction independently on each half.Expect the number of overlapping cycles is aboutx2/m ≈ q26/m25 ≈ n26/m25.

With some fiddling, we produce a 5-Independent family on whichCuckoo Hashing fails with high probability.

D. Kane (Stanford) Cuckoo Independence April 2013 28 / 30

Page 78: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Finalizing the Construction

Split S into 12 columns of size q.

Pick a, αi , γi (if choosing h1) or b, βi , δi (if choosing h2). Produce πi .

For element x in Ci+1 and πi (x) in Ci , have them collide withprobability p.

I Given two pairs of elements need probability that both collide is atmost 1/m2.

I Probability is p2Pr(πi (x), πj(y) have correct values) ≈ p2/q2.I Choose p ≈ q/m.

Each element of the first column causes a possible collision cycle,which materializes with probability p12. Expect about x = q13/m12

cycles.

Split S into S1,S2, run this construction independently on each half.Expect the number of overlapping cycles is aboutx2/m ≈ q26/m25 ≈ n26/m25.

With some fiddling, we produce a 5-Independent family on whichCuckoo Hashing fails with high probability.

D. Kane (Stanford) Cuckoo Independence April 2013 28 / 30

Page 79: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Finalizing the Construction

Split S into 12 columns of size q.

Pick a, αi , γi (if choosing h1) or b, βi , δi (if choosing h2). Produce πi .

For element x in Ci+1 and πi (x) in Ci , have them collide withprobability p.

I Given two pairs of elements need probability that both collide is atmost 1/m2.

I Probability is p2Pr(πi (x), πj(y) have correct values) ≈ p2/q2.I Choose p ≈ q/m.

Each element of the first column causes a possible collision cycle,which materializes with probability p12. Expect about x = q13/m12

cycles.

Split S into S1,S2, run this construction independently on each half.Expect the number of overlapping cycles is aboutx2/m ≈ q26/m25 ≈ n26/m25.

With some fiddling, we produce a 5-Independent family on whichCuckoo Hashing fails with high probability.

D. Kane (Stanford) Cuckoo Independence April 2013 28 / 30

Page 80: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Required k

We know that to guarantee that Cuckoo Hashing works we need at least6-Independence, and at most O(log(n))-independence. Closing this gapfurther seems difficult because:

Similar lower bound constructions are difficult due to the lack ofsolvable 3-transitive permutation groups.

Better upper bounds are hard due to the difficulty of distinguishing:I Joint Independence: (h1, h2) chosen from a k-independent family of

hash functions from U → [m]× [m] (easy lower bound ofk = Ω(log(n))).

I Sequential Independence: h1 chosen from a k-independent family, thenh2 chosen from a k-independent family that may depend on h1.

I Full independence: h1, h2 chosen independently from k-independentfamilies.

D. Kane (Stanford) Cuckoo Independence April 2013 29 / 30

Page 81: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Required k

We know that to guarantee that Cuckoo Hashing works we need at least6-Independence, and at most O(log(n))-independence. Closing this gapfurther seems difficult because:

Similar lower bound constructions are difficult due to the lack ofsolvable 3-transitive permutation groups.

Better upper bounds are hard due to the difficulty of distinguishing:I Joint Independence: (h1, h2) chosen from a k-independent family of

hash functions from U → [m]× [m] (easy lower bound ofk = Ω(log(n))).

I Sequential Independence: h1 chosen from a k-independent family, thenh2 chosen from a k-independent family that may depend on h1.

I Full independence: h1, h2 chosen independently from k-independentfamilies.

D. Kane (Stanford) Cuckoo Independence April 2013 29 / 30

Page 82: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Required k

We know that to guarantee that Cuckoo Hashing works we need at least6-Independence, and at most O(log(n))-independence. Closing this gapfurther seems difficult because:

Similar lower bound constructions are difficult due to the lack ofsolvable 3-transitive permutation groups.

Better upper bounds are hard due to the difficulty of distinguishing:I Joint Independence: (h1, h2) chosen from a k-independent family of

hash functions from U → [m]× [m] (easy lower bound ofk = Ω(log(n))).

I Sequential Independence: h1 chosen from a k-independent family, thenh2 chosen from a k-independent family that may depend on h1.

I Full independence: h1, h2 chosen independently from k-independentfamilies.

D. Kane (Stanford) Cuckoo Independence April 2013 29 / 30

Page 83: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Related Work

Dietzfelbinger and Schellbach showed that for bad input sets thatcertain commonly used 2-Independent families will cause CuckooHashing to fail.

Cohen and I also show that (modifying the algorithm slightly), itsuffices to pick one function 2-Independent and the otherlog(n)-Independent.

Patrasu and Thorup, show that Cuckoo Hashing can be made to workwith efficient hash functions (store in O(nc) space, compute inO(logn(N)) time).

D. Kane (Stanford) Cuckoo Independence April 2013 30 / 30

Page 84: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Related Work

Dietzfelbinger and Schellbach showed that for bad input sets thatcertain commonly used 2-Independent families will cause CuckooHashing to fail.

Cohen and I also show that (modifying the algorithm slightly), itsuffices to pick one function 2-Independent and the otherlog(n)-Independent.

Patrasu and Thorup, show that Cuckoo Hashing can be made to workwith efficient hash functions (store in O(nc) space, compute inO(logn(N)) time).

D. Kane (Stanford) Cuckoo Independence April 2013 30 / 30

Page 85: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Related Work

Dietzfelbinger and Schellbach showed that for bad input sets thatcertain commonly used 2-Independent families will cause CuckooHashing to fail.

Cohen and I also show that (modifying the algorithm slightly), itsuffices to pick one function 2-Independent and the otherlog(n)-Independent.

Patrasu and Thorup, show that Cuckoo Hashing can be made to workwith efficient hash functions (store in O(nc) space, compute inO(logn(N)) time).

D. Kane (Stanford) Cuckoo Independence April 2013 30 / 30

Page 86: Bounds on the Independence Required for Cuckoo Hashingcseweb.ucsd.edu/~dakane/CuckooTalk.pdfBounds on the Independence Required for Cuckoo Hashing Daniel Kane Department of Mathematics

Jeffery S. Cohen, and Daniel M. Kane Bounds on the IndependenceRequired for Cuckoo Hashing, in revision.

Martin Dietzfelbinger, Ulf Schellbach, “On risks of using cuckoohashing with simple universal hash classes”, ACM-SIAM Symposiumon Discrete Algorithms (SODA ’09), SIAM, 2009.

Patrasu, M., Thorup, M. The power of simple tabulation hashing,Proceedings of the 43rd ACM Symposium on Theory of Computing(STOC). pp. 110 (2011).

D. Kane (Stanford) Cuckoo Independence April 2013 30 / 30