The Communication and Streaming Complexity of Computing the Longest Common and Increasing...

Post on 27-Mar-2015

215 views 0 download

Tags:

Transcript of The Communication and Streaming Complexity of Computing the Longest Common and Increasing...

The Communication and Streaming Complexity of

Computing the Longest Common and Increasing Subsequences

Xiaoming Sun Tsinghua University

David Woodruff MIT

The Problem

• Stream of elements a1, …, an 2

• Algorithm given one pass over stream

• Problem: Compute the longest increasing subsequence (LIS) – in this case answer is (3,7)

0113734

Previous Work

• Let k be the length of the LIS of the stream

• There exists an algorithm which computes the LIS with O(k2 log ||) space [LNVZ05]

• Trivial (k) lower bound

• Our first result: Improve both bounds to a tight (k2 log ||/k)

Our Lower Bound

Alice Bob

Reduction from indexing function:

x 2 {0,1}n i 2 [n] = {1, 2, …, n}

Randomized 1-way communication is (n)

What is xi?

Alice Bob

x 2 {0,1}n i 2 [n] = {1, 2, …, n}

What is xi?

Construct a stream A Construct a stream B

1. From LIS(A, B), Bob can get xi

2. |LIS(A, B)| = k, where k is input parameter

Alice

Alice uses x to create k-1 increasing sequences A1, …, Ak-1

For each j, Aj has length j. Each bit of x is encoded in some sequence Aj

Every element in Ak-1 is larger than every element in Ak-2, every element in Ak-2 larger than every element in Ak-3, etc.

Set A = Ak-1 ,…, A2 , A1

x 2 {0,1}n A:

A 1

A 2

A k-1

…Value

Position in stream

Bob

i 2 [n]

Bob uses i to recover Aj, the sequence encoding xi

Bob creates an increasing subsequence B of length k-j,

Every element in B is greater than Ar if r < j, and every element in B is less than Ar if r > j

A j-1

A j+1Value

Position in stream

A j

B:

B

Alice Bob

x 2 {0,1}n i 2 [n]

What is xi?

A = Ak-1, …, A2, A1B

A j-1

A j+1Value

Position in stream

A jB

LIS(A, B) = Aj, B, and |LIS(A, B)| = k

But xi encoded in Aj, so Bob recovers xi

• Thus, any streaming algorithm must use (n) space.

• But what is n? We need to construct k increasing sequences that are different for different x in {0,1}n

• Assume || large. Divide into k-1 blocks of size ||/(k-1)

• Let Aj be a random increasing sequence of length j in block j.

• The space to represent Aj is (k log ||/k) for j > k/2

• Set n = (k2 log ||/k).

Our Upper Bound• When processing the stream, keep lists A[1],

A[2], …, A[k].

• A[j] is an LIS of length j in the stream with minimal last element.

• Let L[1], L[2], …, L[k] be last elements of A[1], A[2], …, A[k]

• To process item x, find i for which L[i] < x < L[i+1], and replace A[i+1] with A[i], x

• So we have k arrays A[1], …, A[k], each of length at most k.

• Naively, this takes O(k2 log ||) space.

• But the Ai are increasing, so can compress the list by storing differences.

• Total space is O(k2 log ||/k).

This talk

• First result: a tight space bound for the LIS problem

• Second result: tight bounds for longest common subsequence (LCS)

LCS Bounds

• Problem: Alice has a permutation of [N], Bob has a permutation of [N]. Decide if |LCS(, )| ¸ k.

• Previous space bound: (k) [LNVZ05]

• Our space bound: (N) for 3 · k · N/2

(holds for randomized O(1)-pass algorithms)

LCS Bounds

• Why can we only prove (N) for 3 · k · N/2?

• If k = 2, reduces to equality test.

• If k large, there are at most O(N2(N-k)) permutations with |LCS(, )| > k, so just use an equality test with error O(1/N2(N-k))

Our Lower Bound

• Padding lemma: if for k = 3 the randomized communication complexity is (N), then it’s (N) for all k · N/2

• Proof: just pad each of the inputs by some common subsequence of length k-3

Alice Bob

Remains to show high complexity for k =3. We reduce from disjointness

x 2 {0,1}n y 2 {0,1}n

Randomized multi-way communication is (n)

Is there ani such thatxi = yi = 1?

Alice Bob

x 2 {0,1}N/3 y 2 {0,1}N/3

Construct Construct

Want |LCS(, )| ¸ 3 iff x and y are disjoint

Is there ani such thatxi = yi = 1?

Alice

x 2 {0,1}N/3

Divide 1, …, N into N/3 groupsG1 = (1, 2, 3), G2 = (4, 5, 6), …, GN/3 = (N-2, N-1, N).

Use x to choose 1, …, N/3

ii acts onacts on G Gii

If xIf xii = 0, = 0, ii (m+1, m+2, m+3) = (m+1, m+2, m+3). (m+1, m+2, m+3) = (m+1, m+2, m+3).

If xIf xii = 1, = 1, ii (m+1, m+2, m+3) = (m+1, m+3, m+2). (m+1, m+2, m+3) = (m+1, m+3, m+2).

= 1, 2, …, N/3

Bob

y 2 {0,1}N/3 = N/3 , …, 1

Divide 1, …, N into N/3 groupsG1 = (1, 2, 3), G2 = (4, 5, 6), …, GN/3 = (N-2, N-1, N).

Use y to choose 1, …, N/3

i acts on Gi

If yi = 0, i (m+1, m+2, m+3) = (m+3, m+2, m+1).If yi = 1, I (m+1, m+2, m+3) = (m+1, m+3, m+2).

1(G1)

2(G2)

3(G3)

N/3(GN/3)

N/3(GN/3)

3(G3)

2(G2)

1(G1)

Claim: |LCS(, )| · 3.

Proof: Use the fact that LCS(, ) intersects at most one Gi

Claim: |LCS(, )| = 3 iff there is some i with xi = yi = 1

Proof: Use the way we defined i and i

Thus, can decide disjointness, so (N) communication.

Other results

• Tight space bounds for computing the LIS length.

• Generalization to approximate LIS and LCS. Still many gaps here.

• Example: approximate LIS length, we have (1/) and O(k log ||). Recent work [GJKK07] has shown O(sqrt(N/) log ||), but still large gap.

Conclusion

• First result: a tight bound for the LIS

• Second result: an (N) space bound for the LCS k-decision problem for 3 · k · N/2

• Other results for approximation problems

• Another open question: extend our lower bound for LIS to randomized multi-round