Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT [email protected] Joint work...

24
Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT [email protected] Joint work with Piotr Indyk

Transcript of Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT [email protected] Joint work...

Page 1: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Tight Lower Bounds for the Distinct Elements

Problem

David WoodruffMIT

[email protected]

Joint work with Piotr Indyk

Page 2: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

The Problem

• Stream of elements a1, …, an each in {1, …, m}

• Want F0 = # of distinct elements

• Elements in adversarial order• Algorithms given one pass over stream• Goal: Minimum-space algorithm

0113734 …

Page 3: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

A Trivial Algorithm

…0113734

• Keep m-bit characteristic vector v of stream

• j in stream $ vj = 1 • F0 = wt(10011011) = 5• Space = m

00000000 10011011

Can we do better?

Page 4: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Negative Results

• Any algorithm computing F0 exactly must use (m) space [AMS96]

• Any deterministic alg. that outputs x with |F0 – x| < F0 must use (m) space [AMS96]

• What about randomized approximation algorithms?

Page 5: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Rand. Approx. Algorithms for F0

• O(log log m/2 + log m log 1/) alg. outputs x with Pr[| F0 – x| < F0 ] > ¾ [BJKST02]

• Lots of hashing tricks

Is this optimal?

• Previous lower bounds• (log m) [AMS96]• (1/) [Bar-Yossef]

• Open Problem of [BJKST02]: GAP: 1/ << 1/2

Page 6: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Idea Behind Lower Bounds

x 2 {0,1}m

y 2 {0,1}m

Stream s(x) Stream s(y)

(1 § ) F0 algorithm A

(1 § ) F0 algorithm A

Internal state of A

• Compute (1 § ) F0(s(x) ± s(y)) w.p. > ¾• Idea: If can decide f(x,y) w.p. > ¾, space used by A at least f’s rand. 1-way comm. complexity

S

Alice Bob

Page 7: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Randomized 1-way comm. complexity

• Boolean function f: X £ Y ! {0,1}

• Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y)

• Only 1 message sent: must be from Alice to Bob

• Comm. cost of protocol = expected length of longest message sent over all inputs.

-error randomized 1-way comm. complexity of f, R(f), is comm. cost of optimal protocol computing f w.p. ¸ 1-

• How do we lower bound R(f)?

Page 8: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

The VC Dimension [KNR] • F = {f : X ! {0,1}} family of Boolean functions• f 2 F is length-|X | “bit string”

• For S µ X, shatter coefficient SC(fS) of S is |{f |S}f 2 F| = # distinct bit strings when F restricted to S

• SC(F, p) = maxS 2 X, |S| = p SC(fS)

• If SC(fS) = 2|S|, S shattered by F

• VC Dimension of F, VCD(F), = size of largest S shattered by F

Page 9: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Shatter Coefficient Theorem

• Notation: For f: X £ Y ! {0,1}, define: fX = { fx(y) : Y ! {0,1} | x 2 X },

where fx(y) = f(x,y)

• Theorem [BJKS]: For every f: X £ Y ! {0,1}, every p ¸ VCD( fX ),

R1/4(f) = (log(SC(fX, p)))

Page 10: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

The (1/) Lower Bound [Bar-Yossef]

• Alice has x 2R {0,1}m, wt(x) = m/2 • Bob has y 2R {0,1}m, wt(y) = m and:

• Either wt(x Æ y) = 0 OR wt(x Æ y) = m f(x,y) = 0 f(x,y) = 1

• R1/4(f) = (VCD(fX)) = (1/) [Bar-Yossef]• s(x), s(y) any streams w/char. vectors x, y

• f(x,y) = 1 ! F0(s(x) ± s(y)) = m/2• f(x,y) = 0 ! F0(s(x) ± s(y)) = m/2 + m• (1+’)m/2 < (1 - ’)(m/2 + m) for ’ = ()

• Hence, can decide f ! F0 alg. uses (1/) space

Page 11: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Our Results

• Remainder of talk: (1/2) lower bound for = (m-1/(9+k)) for any k > 0.

! O(log log m/2 + log m log 1/) upper bound almost optimal

IDEA: Reduce from protocol for computing dot product

Page 12: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

The Promise Problem

• X = {x 2 [0,1]t, ||x|| = 1 and 9 y 2 Y s.t. (x,y) 2 }• We lower bound R1/4(f) via SC(fX, t)

• t = (1/2), Y = basis of unit vectors of Rt

x 2 [0,1]t

||x|| = 1y 2 Y

Alice Bob

Promise Problem : hx,yi = 0 hx,yi = 2/t1/2

f(x,y) = 0 OR f(x,y) = 1

Page 13: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Bounding SC(fX, t)

• Theorem: SC(fX, t/4) = 2(t)

• Proof:

1. 8 T ½ {Y} s.t. |T| = t/4, put xT = (2/t1/2) ¢ e 2 T e

2. Define X1 ½ X as X1 = {xT | T ½ {Y}, |T| = t/4}

3. Claim: 8 s 2 {0,1}t w/ wt(x) = t/4, s 2 truth tab. of fX1

4. Proof:

1. Let s 2 {0,1}t with 1s in positions i1, …, it/42. Put T = {ei1, …, eit/4}. 8 e 2 T, he, xTi = 2/t1/2 = 2

3. 8 e 2 Y - T, h e, xT i = 0

5. There are 2(t) such s.

Page 14: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Bounding R1/4(f)

• Corollary:

• ReductionReduction: we need protocol computing f with communication = space used by any (1 § ) F0 approx. alg.

Page 15: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Reduction• Recall:

• hx,yi = 0 if f(x,y) = 0 • hx,yi = 2/t1/2 if f(x,y) = 1

• Goal:Goal: Reduce “separation” of hx,yi to separation of F0(s(x) ± s(y)) for streams s(x),s(y) Alice/Bob can derive from x,y

• Use relation: ||y-x||2 = ||y||2 + ||x||2 – 2hx, yi• f(x,y) = 0 ! ||y-x|| = 21/2

• f(x,y) = 1 ! ||y-x|| < 21/2 (1- 1/t1/2) = 21/2 (1 - ())

Page 16: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Overview of Reductionx 2 [0,1]t

||x|| = 1y 2 E

1. Low-distortion embedding : l2t ! l1poly(t)

2. Rational Approximation

(x) (y)

3. Scale rationals to integers s

4. Convert integer coords to unary to get {0,1} vectors x’,y’x’ y’

F0(s(x’) ± s(y’)) can decide f(x,y) w.p. ¸ 3/4

F0 Alg

F0(s(x’) ± s(y’))

F0 AlgState

s(x’) s(y’)

Page 17: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Embedding l2t into l1poly(t)

• A (1+)-distortion embedding : l2t ! l1d is mapping s.t. 8 p,q 2 l2t,

• Theorem [FLM77]: 8 9 a (1+ )-distortion embedding : l2t ! l1d with:

Page 18: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Embedding l2t into l1d

x 2 [0,1]t

||x|| = 1y 2 E

Low-distortion embedding

: l2t ! l1d

(x) (y)

• Using Theorem [FLM77], Alice/Bob get (x), (y) 2 Rd with d = O(t ¢ (log 1/) / 2):

• specified later

Page 19: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Rational Approximation

• z = z(t): N ! N; assume z ¸ d

• Approximate each coord. of output of embedding by integer multiple of 1/z

Page 20: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Scaling

• Alice (resp. Bob) multiplies each coord. of (resp. ) by z

• Obtains s( ) (resp. s( )

• Claim: coords. are integers in range [-2z, 2z]

• Proof: 1. | | · |(¢)| + d/z · 22. |s( )| = z| |

Page 21: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Converting to Unary

• For i=1 to d• j à s( )i

• Replace s( )i with 12z+j02z-j

• Bob does same for s( )

• x’, y’ denote new length 4dz bitstrings

• wt(x’) = |s( )|, wt(y’) = |s( )| (x’,y’) = |s( ) – s( )|

Page 22: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Reducing (x’,y’) to F0

• Alice (Bob) chooses stream ax’ (ay’) with char. vector x’ (y’).

• Lemma: If 1 < wt(x’), wt(y’) < 2, then:

1 + (x’,y’)/2 < F0(ax’ ± ay’) < 2 + (x’,y’)/2

Follows from fact: F0(ax’ ± ay’) = wt(x’ Ç y’)

Page 23: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Reducing (x’,y’) to F0

• Use lemma to show:

• Set = (), z = (1/5 log 1/) so that two cases distinguished by (1 § ()) F0 alg

Page 24: Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk.

Conclusions

• ax’, ay’ must be in universe of size ¸ 4zd = (log (1/)/9)

• Reduction only valid if 4zd · m (1/2) bound for = (m-1/(9+k)) 8 k > 0.

• Recently lower bound improved to:• (1/2) for ¸ m-1/2, which is optimal• Find set of vectors directly in Hamming space via involved prob. method argument