Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional...

60
1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan Joint with David Woodruff, IBM Almaden

Transcript of Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional...

Page 1: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

1-1

Tight Bounds for Distributed FunctionalMonitoring

Jan. 2012

Qin Zhang

MADALGO, Aarhus University

NII Shonan meeting, Japan

Joint with

David Woodruff, IBM Almaden

Page 2: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

2-1

The distributed streaming model

· · ·S1 S2 S3 Sk

time

Ccoordinator

sitesA(t) : set of ele-ments received up totime t from all sites.a

aAssume ≤ 1 itemcomes at each time unit.

(a.k.a. distributed functional/continuous monitoring)

Page 3: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

2-2

The distributed streaming model

· · ·S1 S2 S3 Sk

time

Ccoordinator

sitesA(t) : set of ele-ments received up totime t from all sites.a

aAssume ≤ 1 itemcomes at each time unit.

The coordinator needs tomaintain f(A(t)) for all t.

(a.k.a. distributed functional/continuous monitoring)

Page 4: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

2-3

The distributed streaming model

· · ·S1 S2 S3 Sk

time

Ccoordinator

sitesA(t) : set of ele-ments received up totime t from all sites.a

aAssume ≤ 1 itemcomes at each time unit.

The coordinator needs tomaintain f(A(t)) for all t.

Goal: minimize communication cost

(a.k.a. distributed functional/continuous monitoring)

Page 5: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

3-1

Problems

· · ·S1 S2 S3 Sk

time

Ccoordinator

sites

The Distributed Streaming Model

Static case (a one-shot/staticcomputation at the end)

• Top-k

• Heavy-hitter

• . . .

Dynamic case

• Samplings

• Frequent moments

(F0, F1, F2, . . .)

• Heavy-hitter

• Quantile

• Entropy

• Non-linear functions

• . . .

Page 6: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

4-1

What you would like to see:

• Efficient algorithms/protocols

• Practical heuristics

This talk

Page 7: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

4-2

What you would like to see:

• Efficient algorithms/protocols

• Practical heuristics

What you (probably) do not want to see:

• “Useless” impossibility results

• Complicated proofs

This talk

Page 8: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

4-3

What you would like to see:

• Efficient algorithms/protocols

• Practical heuristics

What you (probably) do not want to see:

• “Useless” impossibility results

• Complicated proofs

Unfortunately, in the next 30 minutes ...

This talk

Page 9: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

5-1

The multiparty communication model

x1 = 010011 x2 = 111011

x3 = 111111xk = 100011

We want to compute f(x1, x2, . . . , xk)

f can be bit-wise XOR, OR, AND, MAJ . . .

– A model for lower bounds

Page 10: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

5-2

The multiparty communication model

x1 = 010011 x2 = 111011

x3 = 111111xk = 100011

We want to compute f(x1, x2, . . . , xk)

f can be bit-wise XOR, OR, AND, MAJ . . .

Message passing: If x1talks to x2, others can-not hear.

Blackboard: One speaks,everyone else hears.

– A model for lower bounds

Page 11: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

5-3

The multiparty communication model

x1 = 010011 x2 = 111011

x3 = 111111xk = 100011

We want to compute f(x1, x2, . . . , xk)

f can be bit-wise XOR, OR, AND, MAJ . . .

Message passing: If x1talks to x2, others can-not hear.

Blackboard: One speaks,everyone else hears.

· · ·S1 S2 S3 Sk

Ccoordinator

sites

=

– A model for lower bounds

Page 12: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

6-1

Previously

Some works in the blackboard model. Almost nothing inthe message-passing model.

Page 13: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

6-2

Previously

1. Ω(nk) for the bitwise-XOR/OR/AND/MAJ.

2. Ω(nk) for connectivity.

Some works in the blackboard model. Almost nothing inthe message-passing model.

This SODA, with Jeff Phillips and Elad Verbin we proposeda general and elegant technique called “symmetrization”which works in both variants. In particular, we obtained(in the message-passing model)

Page 14: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

6-3

Previously

1. Ω(nk) for the bitwise-XOR/OR/AND/MAJ.

2. Ω(nk) for connectivity.

Some works in the blackboard model. Almost nothing inthe message-passing model.

This SODA, with Jeff Phillips and Elad Verbin we proposeda general and elegant technique called “symmetrization”which works in both variants. In particular, we obtained(in the message-passing model)

Artificial? Well ...

Page 15: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

6-4

Previously

1. Ω(nk) for the bitwise-XOR/OR/AND/MAJ.

2. Ω(nk) for connectivity.

Some works in the blackboard model. Almost nothing inthe message-passing model.

This SODA, with Jeff Phillips and Elad Verbin we proposeda general and elegant technique called “symmetrization”which works in both variants. In particular, we obtained(in the message-passing model)

Artificial? Well ...

In any case, let’s look at real important problems.

Page 16: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

7-1

Now, important problems

• Samplings

• Frequent moments

(F0, F1, F2, . . .)

• Heavy-hitter

• Quantile

• Entropy

• . . .

Page 17: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

7-2

Now, important problems

• Samplings

• Frequent moments

(F0, F1, F2, . . .)

• Heavy-hitter

• Quantile

• Entropy

• . . .

Solved

Page 18: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

7-3

Now, important problems

• Samplings

• Frequent moments

(F0, F1, F2, . . .)

• Heavy-hitter

• Quantile

• Entropy

• . . .

Solved

Our work

Page 19: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

8-1

Results

Page 20: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

8-2

Results

• (Almost) tight bounds for all these questions

• Static lower bounds (almost) match dynamic upper bounds.

(up to polylog factors)

Page 21: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

8-3

Results

• (Almost) tight bounds for all these questions

• Static lower bounds (almost) match dynamic upper bounds.

(up to polylog factors)

Today

Page 22: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

9-1

F0 upper bound(Cormode, Muthu and Yi 2008)

Page 23: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

10-1

The (1 + ε)-approximation F0 problem

We have k sites S1, S2, . . . , Sk. Si holds a set Xi.

Our goal: compute F0(∪i∈kXi) up to (1 + ε)-approximation.

1

59

457

28

57

6

10

How many distinct items?

A fundamental problem indata analysis.

Page 24: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

10-2

The (1 + ε)-approximation F0 problem

We have k sites S1, S2, . . . , Sk. Si holds a set Xi.

Our goal: compute F0(∪i∈kXi) up to (1 + ε)-approximation.

1

59

457

28

57

6

10

How many distinct items?

A fundamental problem indata analysis.

Current best UB: O(k/ε2) (Cormode, Muthu, Yi 2008)

Holds in the dynamic case.

Page 25: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

11-1

General idea for the one-shot computation

Each site generates a “sketch” via small-spacestreaming algorithms.

The coordinator combines (via communication)the sketches from the k sites to obtain a globalsketch, from which we can extract the answer.

Page 26: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

12-1

The FM sketch

Take a pair-wise independent random hash functionh : 1, . . . , n → 1, . . . , 2d, where 2d > n

Page 27: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

12-2

The FM sketch

Take a pair-wise independent random hash functionh : 1, . . . , n → 1, . . . , 2d, where 2d > n

For each incoming element x, compute h(x)

e.g., h(5) = 10101100010000

Count how many trailing zeros

Remember the max # trailing zeroes in any h(x)

Page 28: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

12-3

The FM sketch

Take a pair-wise independent random hash functionh : 1, . . . , n → 1, . . . , 2d, where 2d > n

For each incoming element x, compute h(x)

e.g., h(5) = 10101100010000

Count how many trailing zeros

Remember the max # trailing zeroes in any h(x)

Let Y be the max # trailing zeroes

Can show E[2Y ] = #distinct elements

Page 29: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

13-1

One-shot case, the FM sketch (cont.)

So 2Y is an unbiased estimator for # distinct elements

Page 30: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

13-2

One-shot case, the FM sketch (cont.)

So 2Y is an unbiased estimator for # distinct elements

However, has a large variance

Some techniques [Bar-Yossef et. al. 2002] can produce agood estimator that has probability 1−δ to be within relativeerror ε.

Space increased to O(1/ε2)

Page 31: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

13-3

One-shot case, the FM sketch (cont.)

So 2Y is an unbiased estimator for # distinct elements

However, has a large variance

Some techniques [Bar-Yossef et. al. 2002] can produce agood estimator that has probability 1−δ to be within relativeerror ε.

Space increased to O(1/ε2)

FM sketch has linearity

Y1 from A, Y2 from B, then 2maxY1,Y2 estimates # distinctitems in A ∪B.

Page 32: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

13-4

One-shot case, the FM sketch (cont.)

So 2Y is an unbiased estimator for # distinct elements

However, has a large variance

Some techniques [Bar-Yossef et. al. 2002] can produce agood estimator that has probability 1−δ to be within relativeerror ε.

Space increased to O(1/ε2)

FM sketch has linearity

Y1 from A, Y2 from B, then 2maxY1,Y2 estimates # distinctitems in A ∪B.

Thus, we can use it to design a one-shot algorithm withcommunication O(k/ε2)

Page 33: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

14-1

F0 lower bound

Page 34: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

15-1

The F0 problem

We have k sites S1, S2, . . . , Sk. Si holds a set Xi.

Our goal: compute F0(∪i∈kXi) up to (1 + ε)-approximation.

159

457

28 5

76

10

How many distinct items?

A fundamental problem indata analysis.

Page 35: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

15-2

The F0 problem

We have k sites S1, S2, . . . , Sk. Si holds a set Xi.

Our goal: compute F0(∪i∈kXi) up to (1 + ε)-approximation.

159

457

28 5

76

10

How many distinct items?

A fundamental problem indata analysis.

(Cormode, Muthu, Yi, 2008)

Holds in the dynamic case.

Our LB: Ω(k/ε2).Holds in the static and message-passing case.

Current best UB: O(k/ε2)

Previous LB: Ω(k) (Cormode, Muthu, Yi, 2008)

Ω(1/ε2) (reduction from Gap-Hamming)

Page 36: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

15-3

The F0 problem

We have k sites S1, S2, . . . , Sk. Si holds a set Xi.

Our goal: compute F0(∪i∈kXi) up to (1 + ε)-approximation.

159

457

28 5

76

10

How many distinct items?

A fundamental problem indata analysis.

(Cormode, Muthu, Yi, 2008)

Holds in the dynamic case.

Our LB: Ω(k/ε2).Holds in the static and message-passing case.

Tight!

Current best UB: O(k/ε2)

Previous LB: Ω(k) (Cormode, Muthu, Yi, 2008)

Ω(1/ε2) (reduction from Gap-Hamming)

Page 37: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

16-1

The proof framework

Step 1: We first introduce a simpler problem calledk-GAP-MAJ

Step 2: We compose k-GAP-MAJ with the Set Dis-jointness problem using information cost to prove alower bound for F0

Page 38: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

17-1

k-GAP-MAJ

We have k sites S1, S2, . . . , Sk. Si holds a bit Zi which is 1 w.p.β and 0 w.p. 1−β where ω(1/k) ≤ β ≤ 1/2 is a prefixed value.

Our goal: compute the following function.

GM(Z1, Z2, . . . , Zk) =

0, if

∑i∈[k] Zi ≤ βk −

√βk,

1, if∑

i∈[k] Zi ≥ βk +√βk,

∗, otherwise,

where “∗” means that the answer can be arbitrary.

Page 39: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

17-2

k-GAP-MAJ

We have k sites S1, S2, . . . , Sk. Si holds a bit Zi which is 1 w.p.β and 0 w.p. 1−β where ω(1/k) ≤ β ≤ 1/2 is a prefixed value.

Our goal: compute the following function.

GM(Z1, Z2, . . . , Zk) =

0, if

∑i∈[k] Zi ≤ βk −

√βk,

1, if∑

i∈[k] Zi ≥ βk +√βk,

∗, otherwise,

where “∗” means that the answer can be arbitrary.

Lemma 1: If a protocol P computes k-GAP-MAJ correctly w.p.0.9999, then w.p. Ω(1), the protocol has to learn at least Ω(k)of Zi each with Ω(1) bit (that is, H(Zi | Π) ≤ Hb(0.01β)).

Page 40: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

17-3

k-GAP-MAJ

We have k sites S1, S2, . . . , Sk. Si holds a bit Zi which is 1 w.p.β and 0 w.p. 1−β where ω(1/k) ≤ β ≤ 1/2 is a prefixed value.

Our goal: compute the following function.

GM(Z1, Z2, . . . , Zk) =

0, if

∑i∈[k] Zi ≤ βk −

√βk,

1, if∑

i∈[k] Zi ≥ βk +√βk,

∗, otherwise,

where “∗” means that the answer can be arbitrary.

Alternatively: I(Z1, Z2, . . . , Zk; Π) = Ω(k)

Lemma 1: If a protocol P computes k-GAP-MAJ correctly w.p.0.9999, then w.p. Ω(1), the protocol has to learn at least Ω(k)of Zi each with Ω(1) bit (that is, H(Zi | Π) ≤ Hb(0.01β)).

Page 41: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

18-1

Set disjointness (2-DISJ)

Alice Bob

x ∈ 0, 1n y ∈ 0, 1n

x ∩ y = ∅?

Page 42: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

18-2

Set disjointness (2-DISJ)

Alice Bob

x ∈ 0, 1n y ∈ 0, 1n

x ∩ y = ∅?

A classical hard instance:

Distribution µ: X and Y are both random subsets of size ` =(n+1)/4 from [n] such that |X∩Y | = 1 w.p. β and |X∩Y | = 0w.p. 1− β.

Razborov [1990] shows an Ω(n) for this hard distribution anderror β/100.

Page 43: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

19-1

Next step: Compose k-GAP-MAJ with 2-DISJ

· · ·S1 S2 S3 Sk

Ccoordinator

sites

n = Θ(1/ε2)` = (n+ 1)/4β = 1/kε2

Page 44: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

19-2

Next step: Compose k-GAP-MAJ with 2-DISJ

· · ·S1 S2 S3 Sk

Ccoordinator

sites

n = Θ(1/ε2)` = (n+ 1)/4β = 1/kε2

Step 1: Pick Y = y ⊂ [n] ofsize ` uniformly at random

Y

Page 45: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

19-3

Next step: Compose k-GAP-MAJ with 2-DISJ

· · ·S1 S2 S3 Sk

Ccoordinator

sites

n = Θ(1/ε2)` = (n+ 1)/4β = 1/kε2

Step 1: Pick Y = y ⊂ [n] ofsize ` uniformly at random

Step 2: Pick X1, . . . , Xk ⊂ [n] indepedently and randomly from µ|Y =y

Y

X1 X2 X3 Xk

Page 46: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

19-4

Next step: Compose k-GAP-MAJ with 2-DISJ

· · ·S1 S2 S3 Sk

Ccoordinator

sites

n = Θ(1/ε2)` = (n+ 1)/4β = 1/kε2

Step 1: Pick Y = y ⊂ [n] ofsize ` uniformly at random

Step 2: Pick X1, . . . , Xk ⊂ [n] indepedently and randomly from µ|Y =y

Y

X1 X2 X3 Xk

F0(X1, X2, . . . , Xk)?

Page 47: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

20-1

The proof

F0(X1, X2, . . . , Xk) ⇐⇒ k-GAP-MAJ(Z1, Z2, . . . , Zk)

(Zi = |Xi ∩ Y |)⇐⇒ learn Ω(k) Zi’s well

(by Lemma 1)

⇐⇒ need Ω(k/ε2) bits

(learning each Zi = |Xi ∩ Y | well needsΩ(n) = Ω(1/ε2) bits, by 2-DISJ)

· · ·S1 S2 S3 Sk

Ccoordinator

sites

Y

X1 X2 X3 Xk

Zi = |Xi ∩ Y |

1 w.p. β0 w.p. 1− β

Page 48: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

20-2

The proof

F0(X1, X2, . . . , Xk) ⇐⇒ k-GAP-MAJ(Z1, Z2, . . . , Zk)

(Zi = |Xi ∩ Y |)⇐⇒ learn Ω(k) Zi’s well

(by Lemma 1)

⇐⇒ need Ω(k/ε2) bits

(learning each Zi = |Xi ∩ Y | well needsΩ(n) = Ω(1/ε2) bits, by 2-DISJ)

· · ·S1 S2 S3 Sk

Ccoordinator

sites

Y

X1 X2 X3 Xk

Zi = |Xi ∩ Y |

Q.E.D.

1 w.p. β0 w.p. 1− β

Page 49: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

21-1

Proof:

1. Suppose Π does not satisfy this.

2. Since the Zi are independent given Π,∑k

i=1 Zi | Π is a sumof independent Bernoulli random variables.

3. Since most H(Zi | Π) are large, by anti-concentration, bothof the following events occur with constant probability:

•∑k

i=1 Zi | Π > βk +√βk,

•∑k

i=1 Zi | Π < βk −√βk.

4. So P can’t succeed with large probability.

Proof sketch of Lemma 1

Lemma 1: If a protocol P computes k-GAP-MAJ correctly w.p.0.9999, then w.p. Ω(1), for Ω(k) Zi’s, we haveH(Zi | Π) ≤ Hb(0.01β).

Page 50: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

22-1

F2 lower bound

Page 51: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

23-1

We have k sites S1, S2, . . . , Sk. Si holds a set Xi.

Our goal: compute F2(∪i∈kXi) up to (1 + ε)-approximation.

What’s the size of self-join?

Another fundamental problemin data analysis.

The F2 problem

2

7

9

2

4

2

2

7

9

2

4

2

Join

Page 52: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

23-2

We have k sites S1, S2, . . . , Sk. Si holds a set Xi.

Our goal: compute F2(∪i∈kXi) up to (1 + ε)-approximation.

What’s the size of self-join?

Another fundamental problemin data analysis.

The F2 problem

2

7

9

2

4

2

2

7

9

2

4

2

Join

Previous UB: O(k2/ε+ k1.5/ε3)(Cormode, Muthu, Yi 2008)

Our UB: O(k/poly(ε)), one way protocolHolds in the dynamic case.

Previous LB: Ω(k)Our LB: Ω(k/ε2).Holds in the static and blackboard case.

(Cormode, Muthu, Yi, 2008)

Page 53: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

23-3

We have k sites S1, S2, . . . , Sk. Si holds a set Xi.

Our goal: compute F2(∪i∈kXi) up to (1 + ε)-approximation.

What’s the size of self-join?

Another fundamental problemin data analysis.

The F2 problem

2

7

9

2

4

2

2

7

9

2

4

2

Join

Previous UB: O(k2/ε+ k1.5/ε3)(Cormode, Muthu, Yi 2008)

Our UB: O(k/poly(ε)), one way protocolHolds in the dynamic case.

Previous LB: Ω(k)Our LB: Ω(k/ε2).Holds in the static and blackboard case.

(Cormode, Muthu, Yi, 2008)Almost Tight!

Page 54: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

24-1

A quick glance: (1 + ε)-approximation F2

2-party gap-hamming: Alice has X = X1, X2, . . . , X1/ε2, Bob

has Y = Y1, Y2, . . . , Y1/ε2. They want to compute:

GHD(X,Y ) =

0, if

∑i∈[1/ε2] Xi ⊕ Yi ≤ 1/2ε2 − 1/ε,

1, if∑

i∈[1/ε2] Xi ⊕ Yi ≥ 1/2ε2 + 1/ε,

∗, otherwise,

where “∗” means that the answer can be arbitrary.

Page 55: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

24-2

A quick glance: (1 + ε)-approximation F2

2-party gap-hamming: Alice has X = X1, X2, . . . , X1/ε2, Bob

has Y = Y1, Y2, . . . , Y1/ε2. They want to compute:

GHD(X,Y ) =

0, if

∑i∈[1/ε2] Xi ⊕ Yi ≤ 1/2ε2 − 1/ε,

1, if∑

i∈[1/ε2] Xi ⊕ Yi ≥ 1/2ε2 + 1/ε,

∗, otherwise,

where “∗” means that the answer can be arbitrary.

k-DISJ: We have k sites S1, S2, . . . , Sk. Si holds a set Zi. Wepromise that either Zi (i = 1, . . . , k) are all disjoint, or they intersecton one element and the rest are all disjoint (sun-flower).

The goal is to find out which is the case.

Page 56: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

24-3

A quick glance: (1 + ε)-approximation F2

2-party gap-hamming: Alice has X = X1, X2, . . . , X1/ε2, Bob

has Y = Y1, Y2, . . . , Y1/ε2. They want to compute:

GHD(X,Y ) =

0, if

∑i∈[1/ε2] Xi ⊕ Yi ≤ 1/2ε2 − 1/ε,

1, if∑

i∈[1/ε2] Xi ⊕ Yi ≥ 1/2ε2 + 1/ε,

∗, otherwise,

where “∗” means that the answer can be arbitrary.

k-DISJ: We have k sites S1, S2, . . . , Sk. Si holds a set Zi. Wepromise that either Zi (i = 1, . . . , k) are all disjoint, or they intersecton one element and the rest are all disjoint (sun-flower).

The goal is to find out which is the case.

2 copies

k-XOR

Page 57: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

24-4

A quick glance: (1 + ε)-approximation F2

2-party gap-hamming: Alice has X = X1, X2, . . . , X1/ε2, Bob

has Y = Y1, Y2, . . . , Y1/ε2. They want to compute:

GHD(X,Y ) =

0, if

∑i∈[1/ε2] Xi ⊕ Yi ≤ 1/2ε2 − 1/ε,

1, if∑

i∈[1/ε2] Xi ⊕ Yi ≥ 1/2ε2 + 1/ε,

∗, otherwise,

where “∗” means that the answer can be arbitrary.

k-BTA

k-DISJ: We have k sites S1, S2, . . . , Sk. Si holds a set Zi. Wepromise that either Zi (i = 1, . . . , k) are all disjoint, or they intersecton one element and the rest are all disjoint (sun-flower).

The goal is to find out which is the case.

2 copies

k-XOR

compose via in-formation cost

Page 58: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

24-5

A quick glance: (1 + ε)-approximation F2

2-party gap-hamming: Alice has X = X1, X2, . . . , X1/ε2, Bob

has Y = Y1, Y2, . . . , Y1/ε2. They want to compute:

GHD(X,Y ) =

0, if

∑i∈[1/ε2] Xi ⊕ Yi ≤ 1/2ε2 − 1/ε,

1, if∑

i∈[1/ε2] Xi ⊕ Yi ≥ 1/2ε2 + 1/ε,

∗, otherwise,

where “∗” means that the answer can be arbitrary.

k-BTA

k-DISJ: We have k sites S1, S2, . . . , Sk. Si holds a set Zi. Wepromise that either Zi (i = 1, . . . , k) are all disjoint, or they intersecton one element and the rest are all disjoint (sun-flower).

The goal is to find out which is the case.

CC(k-BTA) = Ω(k/ε2)

2 copies

k-XOR

compose via in-formation cost

Page 59: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

24-6

A quick glance: (1 + ε)-approximation F2

2-party gap-hamming: Alice has X = X1, X2, . . . , X1/ε2, Bob

has Y = Y1, Y2, . . . , Y1/ε2. They want to compute:

GHD(X,Y ) =

0, if

∑i∈[1/ε2] Xi ⊕ Yi ≤ 1/2ε2 − 1/ε,

1, if∑

i∈[1/ε2] Xi ⊕ Yi ≥ 1/2ε2 + 1/ε,

∗, otherwise,

where “∗” means that the answer can be arbitrary.

k-BTA

k-DISJ: We have k sites S1, S2, . . . , Sk. Si holds a set Zi. Wepromise that either Zi (i = 1, . . . , k) are all disjoint, or they intersecton one element and the rest are all disjoint (sun-flower).

The goal is to find out which is the case.

CC(k-BTA) = Ω(k/ε2)

Finally, we reduce F2 to k-BTA.

2 copies

k-XOR

compose via in-formation cost

Page 60: Monitoring Tight Bounds for Distributed Functional...1-1 Tight Bounds for Distributed Functional Monitoring Jan. 2012 Qin Zhang MADALGO, Aarhus University NII Shonan meeting, Japan

25-1

The end

T HANK YOU

Q and A