Post on 22-Dec-2015
1
Algorithms for Large Data Sets
Ziv Bar-YossefLecture 13
June 22, 2005
http://www.ee.technion.ac.il/courses/049011
3
Data Streams: Reminder
Data
(n is very large)
Computer Program
Stream through the data;Use limited memory
Approximation of
How do we provememory
lower boundsfor data stream
algorithms?
4
CC(g) = min: computes g cost()CC(g) = min: computes g cost()
Communication Complexity[Yao 79]
Alice Bobm1
m2
m3
m4
Referee
cost() = i |mi|cost() = i |mi|
a,b) “transcript”
5
CC1(g) = min: computes g cost()CC1(g) = min: computes g cost()
1-Way Communication Complexity
Alice Bob
Referee
cost() = |m1| + |m2|cost() = |m1| + |m2|
a,b) “transcript”
m1
m2
6
Data Stream Space Lower Bounds
f: An B: a function DS(f): data stream space complexity of f
Minimum amount of memory used by any data stream algorithm computing f
g: An/2 × An/2 B: g(x,y) = f(xy) CC1(g): 1-way communication complexity of g
Proposition: DS(f) ≥ (CC1(g)) Corollary: In order to obtain a lower bound on DS(f),
suffices to prove a lower bound on CC1(g).
7
Reduction: 1-Way CC to DS
Proof of reduction: S = DS(f) M: an S-space data stream algorithm for f
Lemma: There is a 1-way CC protocol for g whose cost is S + log(|B|).
Proof: works as follows: Alice runs M on x Alice sends state of M to Bob (S bits) Bob continues execution of M on y Bob sends output of M to Alice
8
Distinct Elements: Reminder
Input: a vector x 2 {1,2,…,m}n
DE(x) = number of distinct elements of x Example: if x = (1,2,3,1,2,3), then DE(x) = 3\
Space complexity of DE: Randomized approximate: O(log m) space Deterministic approximate: (m) space Randomized exact: (m) space
9
The Equality Function
EQ: X × X {0,1} EQ(x,y) = 1 iff x = y
Theorem: CC1(EQ) ≥ (log |X|) Proof:
: any 1-way protocol for EQ A(x): Alice’s message on input x B(m,y): Bob’s message when receiving message m
from Alice and input y (x,y) = (A(x), B(A(x),y)): transcript on (x,y)
10
Equality Lower Bound
Proof (cont.): Suppose |A(x)| < log|X| for all x X
Then, # of distinct messages of Alice < 2log|X| = |X|
By pigeonhole principle, there exist two inputs x,x’ X s.t. A(x) = A(x’)
Therefore, (x,x) = (x’,x)But EQ(x,x) ≠ EQ(x’,x). Contradiction.
11
Combinatorial Designs
1. For each i, |Ti| = |U|/4
2. For each i,j, |Ti Tj| ≤ |U|/8.
T1
T2
T3U
A family of subsets T1,…,Tk of a universe U s.t.
Fact: There exist designs of size k = 2(|U|).
(Constant rate, constant relative minimum distance binary error-correcting codes).
12
Reduction from EQ to DE
U = {1,2,…,m} X = { T1,….,Tk }: design of size k = 2(m)
EQ: X × X {0,1} Note:
If x = y, then DE(xy) = m/2 If x ≠ y, then DE(xy) ≥ 3m/4
Therefore: deterministic data stream algorithm that approximates
DE with space S 1-way protocol for EQ with cost S + O(log m)
Conclusion: DS(DE) ≥ (m)
13
Randomized Communication Complexity Alice & Bob are allowed to use random coin tosses For any inputs x,y, the referee needs to find g(x,y) w.p. 1 -
RCC(g) = minimum cost of a randomized protocol that
computes g with error ¼. RCC1(g) = minimum cost of a randomized 1-way protocol
that computes g with error ¼. What is RCC1(EQ)?
RDS(f) = minimum amount of memory used by any randomized data stream algorithm computing f with error ¼.
Lemma: RDS(f) ≥ (RCC1(g))
14
Set Disjointness
U = { 1,2,…,m } X = 2U = {0,1}m
Disj: X × X {0,1} Disj(x,y) = 1 iff x y ≠ Equivalently:
Theorem [Kalyanasundaram-Schnitger 88]
RCC(Disj) ≥ (m)
15
Reduction from Disj to DE
U = {1,2,…,m} X = 2U
Disj: X × X {0,1} Note: DE(xy) = |x y| Hence,
If x y = , then DE(xy) = |x| + |y| If x y ≠ , then DE(xy) < |x| + |y|
Therefore: randomized data stream algorithm that computes DE exactly with
space S 1-way randomized protocol for Disj with cost S + O(log m)
Conclusion: RDS(DE) ≥ (m)
16
Information Theory Primer
X: random variable on U H(X) = entropy of X = amount of
“uncertainty” in X (in bits) Ex: if X is uniform, then H(X) = log(|U|) Ex: if X is constant, then H(X) = 0
H(X | Y) = conditional entropy of X given Y = amount of uncertainty left in X after knowing Y Ex: H(X | X) = 0 Ex: If X,Y are independent, H(X | Y) =
H(X) I(X ; Y) = H(X) – H(X | Y) = H(Y) –
H(Y | X) = mutual information between X and Y
H(X) H(Y)
H(X|Y)
H(Y|X)
I(X;Y)
H(X,Y)
17
Information Theory Primer (cont.)
Sub-additivity of entropy:
H(X,Y) ≤ H(X) + H(Y). Equality iff X,Y independent.
Conditional mutual information:
I(X ; Y | Z) = H(X | Z) – H(X | Y,Z)
18
Information Complexity
g: A × B C: a function : a communication complexity protocol that
computes g : distribution on A × B (X,Y): random variable with distribution Information cost of : icost() = I(X,Y; (X,Y)) Information complexity of g:
IC(g) = min: computes g icost()
Lemma: For any , RCC(g) ≥ IC(g)
19
Direct Sum for Information Complexity We want to:
Find a distribution on {0,1}m {0,1}m
Show that IC(Disj) ≥ (m) Recall that Disj(x,y) = ORi = 1 to m (xi AND yi) We will prove a “direct sum” theorem for IC(Disj):
Disj is a “sum” of m independent copies of AND Hence, information complexity of Disj is m times information
complexity of AND
We will define a distribution on {0,1} {0,1} and then define = m
“Theorem” [Direct Sum]: IC(Disj) ≥ m IC(AND) It would then suffice to prove an (1) lower bound on
IC(AND)
20
Conditional Information Complexity
Cannot prove direct sum directly for information complexity Recall:
: distribution on {0,1}m {0,1}m
(X,Y): random variable with distribution (X,Y) is product, if X,Y are independent Z: some auxiliary random variable on domain S (X,Y) is product conditioned on Z, if for any z S, X,Y are
independent conditioned on the event { Z = z }. Conditional information complexity of g given Z:
CIC(g | Z) = min: computes g I(X,Y ; (X,Y) | Z)
Lemma: For any and Z, IC(g) ≥ CIC(g | Z)
21
Input Distributions for Set Disjointness : distribution on pairs {0,1} {0,1} (U,V): random variable with distribution (U,V) are generated as follows:
Choose a uniform bit D If D = 0, choose U uniformly from {0,1} and set V = 0 If D = 1, choose V uniformly from {0,1} and set U = 0
We define = m
Note: is not product Conditioned on Z = Dm, is product For every (x,y) in the support of , Disj(x,y) = 0.
22
Direct Sum for IC [Bar-Yossef, Jayram, Kumar, Sivakumar 02]
Theorem: CIC(Disj | Dm) ≥ CIC(AND | D) Proof outline:
Decomposition step:
I(X,Y; (X,Y) | Dm) ≥ i I((Xi,Yi) ; (X,Y) | Dm)
Reduction step:
I((Xi,Yi) ; (X,Y) | Dm) ≥ CIC(AND | D)
23
Decomposition Step
I(X,Y; (X,Y) | Dm) = = H(X,Y | Dm) – H(X,Y | (X,Y), Dm)
≥ i H(Xi,Yi | Dm) - i H(Xi,Yi | (X,Y), Dm)
(by independence of (X1,Y1),…,(Xm,Ym) and by sub-additivity of entropy).
= i I((Xi,Yi) ; (X,Y) | Dm)
24
Reduction Step Want to show:
I((Xi,Yi) ; (X,Y) | Dm) ≥ CIC(AND | D)
I((Xi,Yi) ; (X,Y) | Dm) = d-i Pr(D-i = d-i) I((Xi,Yi) ;
(X,Y) | Di,D-i = d-i)
A protocol for computing AND(xi,yi): For all j i, Alice and Bob select Xj and Yj independently
using dj
Alice and Bob run the protocol on X = (X1,…,Xi-1 ,xi,Xi+1,…,Xm) and Y = (Y1,…,Yi-1,yi,Yi+1,…,Ym)
Note that Disj(X,Y) = AND(xi,yi)
icost of this protocol = I((Xi,Yi) ; (X,Y) | Di,D-i = d-i)