On the streaming complexity of fundamental …approximation in R2.To the best of our knowledge, this...

On the streaming complexity of fundamental geometric problems

Arijit Bishnu ∗ Arijit Ghosh ∗ Gopinath Mishra ∗ Sandeep Sen †

March 20, 2018

Abstract

In this paper, we focus on lower bounds and algorithms for some basic geometric problems inthe one-pass (insertion only) streaming model. The problems considered are grouped into threecategories —

(i) Klee’s measure

(ii) Convex body approximation, geometric query, and

(iii) Discrepancy

Klee’s measure is the problem of finding the area of the union of hyperrectangles. Underconvex body approximation, we consider the problems of convex hull, convex body approximation,linear programming (LP) in fixed dimensions. The results for convex body approximation impliesa property testing type result to find if a query point lies inside a convex polyhedron. Underdiscrepancy, we consider both the geometric and combinatorial discrepancy. For all the problemsconsidered, we present (randomized) lower bounds on space. Most of our lower bounds arein terms of approximating the solution with respect to an error parameter ε. We provideapproximation algorithms that closely match the lower bound on space for most of the problems.

1 Introduction

A data stream P = p1, . . . , pn is a sequence of data that can be read in increasing order of itsindices i (i = 1, . . . , n) in one or more passes. In this paper, we consider the one-pass, insertiononly streaming model. For us, P will be typically a set of points in Rd. Only a sketch S, thatis either a subset of P or some information derived from it, can be stored; |S| |P|. As amachine model, streaming has just the bare essentials. Thus, impossibility results, in terms oflower bounds on the sketch size, becomes important. The seminal work of Alon et al. [AMS99]introduced the idea of lower bounds on space for approximating frequency moments. The focus onmassive data applications has generated a lot of interest in streaming algorithms and related lowerbounds [GM12, Mut05, Rou16, Woo04]. In this paper, we try to push the frontiers of streamingin computational geometry by addressing fundamental problems like Klee’s measure, convex bodyapproximation, discrepancy, etc both in terms of lower bounds and matching algorithms . We alsoconsider promise problems and property testing kind of results for some problems.

1.1 Our computational model and notations

We will deal with points in Rd that can be represented as rationals with bounded bit precision.Thus, any point in our stream P comes from a universe of size [N ]d, where [x] denotes 1, . . . , x.∗Indian Statistical Institute, Kolkata, India†Indian Institute of Technology Delhi, India

1

arX

iv:1

803.

0687

5v1

[cs

.CG

] 1

9 M

ar 2

018

Computations take place in a word RAM whose word size can hold a point, any input parameter anda dlog ne-bit counter. The precision of the intermediate data generated is within a constant factor ofthe word size. The size of the stream |P| = n is not known beforehand but standard techniquesallow us to assume that wlog.

Let [p, q] denote an interval between p and q in R and |[p, q]|, its length. For a problem P , letO and O′ be the optimal and an algorithm generated solution, respectively. By ε-additive andε-multiplicative solutions to P , we mean |O′ −O| ≤ ε and |O′ −O| ≤ εO, respectively. A convexbody K is said to be ε-approximated by a convex body K′ if dH(K,K′) ≤ ε, where dH(·, ·) denotesthe Hausdroff distance. In our context, the diameter of K is bounded by 1. By (ε, ρ)-additivesolution, we mean ε-additive solution that succeeds with probability at least 1− ρ. Similarly, wedefine (ε, ρ)-multiplicative solution. Also, by (ε, ρ)-approximate solution, we mean ε-approximatesolution that succeeds with probability ρ. Typically ρ = 1

3 , unless stated otherwise.In almost all the problems considered in this paper, the optimal solution lies in [0, 1] and we

provide lower bound results for ε-additive solution. Note that in such cases, the lower bound resultsfor ε-additive solutions are relatively stronger than their ε-multiplicative counterparts. We alsoprovide one-pass algorithms for ε-additive solutions which can be converted into multi-pass algorithmsfor ε-multiplicative solutions. This can be achieved easily because of the following theorem. Notethat upper bound results for ε-multiplicative solutions are relatively stronger than their ε-additivecounterparts.

Theorem 1. Let P be a problem whose optimal solution is O ∈ (0, 1]. Let A be a one-pass alorithmthat gives ε-additive (ε ∈ (0, 1)) solution to P using space S(ε). We can design an O

(log(

1O))

-passalgorithm A′ that gives ε-multiplicative solution to P and uses space S (ε · O).

Proof. We design a p-pass algorithm A′ by invoking A repeatedly. We fix p later. In the k-th pass,k ∈ [p], we do the following.

We find an εk-additive solution to P using A, where εk = ε2k

. Let Ok be the corresponding

output. Note that |Ok −O| ≤ εk. Assuming Ok ≥ 12k

+ εk, we can deduce the following.

Ok ≥ 1

2k+ εk

O ≥ 1

2k(∵ |Ok −O| ≤ εk)

ε · O ≥ εk (∵ εk =ε

2k)

|Ok −O| ≤ ε · O (∵ |Ok −O| ≤ εk)

If Ok ≥ 12k

+ εk holds in the k-th pass, then we return Ok as the ε-multiplicative solution to P.

Recall that A′ is a p-pass algorithm. So, Op ≥ 12p + εp and Ok < 1

2k+ εk for each 1 ≤ k < p.

Now we show that for p = dlog(

1+2εO)e, Op ≥ 1

2p + εp holds.

p ≥ log

(1 + 2ε

O

)O ≥ 1 + 2ε

2p

O ≥ 1

2p+ 2εp

Op ≥1

2p+ εp

2

Note that S decreases with increase of ε and hence, the space used byA′ is max(S(ε1), . . . , S(εp)) =S(εp). Observe that εp ≥ ε · O. Thus, the space used by A′ is bounded by ≤ S(ε · O).

1.2 Our contributions and previous results

All lower bounds discussed are randomized lower bounds. In this paper, the term hardness impliesthat without a sketch size |S| = Ω(n), we can not solve that problem. The problem statements andresults follow.

Klee’s measure

• (Klee’s measure) Given a set of streaming axis-parallel hyperrectangles R = R1, . . . , Rnin Rd, the Klee’s measure problem is to find the volume V :=

⋃ni=1Ri. We show that

any (ε, ρ)-additive solution for Klee’s measure requires a space of Ω(

1ε + log n

)bits even for

d = 1. We give an (ε, ρ)-additive solution for Klee’s measure by using O(

1ε2

log(

1ρ

))space

for any constant d. For δ-fat1 hyperrectangles, we provide a deterministic algorithm that usesO(

1εd−1δd

)space and gives ε-additive solution to the Klee’s measure for constant dimension d.

The problem of Klee’s measure was first posed by V. Klee [Kle77] in 1977. Since then, therehave been a series of works done on Klee’s measure [Ben77, Cha10, Cha13, OY91, vLW81] in theRAM model. The best known algorithmic result in the RAM model, a time complexity of O(nd/2),is by Chan [Cha13]. We highlight that Klee’s measure has connections to estimating F0, the numberof distinct elements in a stream. A brief exposition on this is given in Section 2.1. A discrete versionof Klee’s measure was studied for streaming in [TW12] followed by [SBV+15]. In both [TW12]and [SBV+15], we have a set of points in Zd as a discrete universe U and stream R of rectangles.The objective is to report the number of points of U present inside

⋃ni=1Ri. Tirthapura and

Woodruff [TW12] gave a (ε, ρ)-multiplicative solution to the Klee’s measure and uses a space ofO((1ε log(nU))O(1)

). In [SBV+15], Sharma et al. improved the above result for a stream of two

dimensional fat2 rectangles using a space of O(

logUε2

)bits. They also gave an algorithm for a stream

of arbitrary rectangles that gives output which is at most√

logU times the optimal solution anduses a space of O(log2 U log logU) bits. Note that the space complexity in [SBV+15] is independentof n. Also, algorithms by Sharma et al. [SBV+15] have better update time per rectangle than thatof Tirthapura and Woodruff [TW12].

We give the first lower bound for approximating Klee’s measure in R. Our algorithms forKlee’s measure works for the original setting of Klee’s measure in Rd as opposed to the discreteversions in [SBV+15, TW12]. Our randomized algorithm assumes nothing on the input rectanglesbut the deterministic streaming algorithm assumes a fatness condition.

Convex body approximation and geometric query

• (convex-body) Let CH(P) denote the convex hull of P. We strengthen the lower boundresults by showing the promise version of convex hull to be hard, i.e., it is hard to distinguishbetween inputs having |CH(P)| = O(1) and |CH(P)| = Ω(n) in R2.

Problems of convex body approximation requires storing the approximated convex body. We

show a space lower bound of Ω(

1√ε

)bits for an (ε, ρ)-approximate solution to convex body

1A hyperrectangle R ⊂ [0, 1]d is δ-fat if the length of each side is at least δ ∈ (0, 1].2Here fat rectangle means 1

C≤ width

height≤ C, for some constant C > 1.

3

approximation in R2. To the best of our knowledge, this is the first data structure lowerbound for convex body approximation. We design an ε-approximate solution for convex body

approximation using a space of O(logd n/εd−12 ) for a fixed dimension d, when it is given to

us as a stream of hyperplanes This implies an ε-additive one-pass deterministic streamingalgorithm for low dimensional LP.

• (geom-query-conv-poly) The decision problem is to detect if a convex polyhedron C, givenas an input stream of at most n hyperplanes in Rd contains a query point q given at the end.We show that this problem is hard and then obtain a property testing type result using theconvex body approximation result.

In [CC07], Chan et al. gave multi-pass algorithms for computing exact convex hull in R2 andR3 along with nearly matching lower bounds for some special class of deterministic algorithmsin case of R2, which was later generalized by Guha et al. [GM08]. Zhang [Zha05] showed arandomized lower bound of Ω(n) bits of space for the k-promise convex hull problem where oneknows beforehand that the convex hull has k points. We strengthen the lower bound results ofthe promise version of convex hull given in [Zha05] by showing that it is hard to distinguishbetween inputs having |CH(P)| = O(1) and |CH(P)| = Ω(n) in R2.

Discrepancy

• (geometric-discrepancy) Given n points P as a stream, where each pi ∈ [0, 1], the objectiveis to report Dg(P), the 1-dimensional geometric discrepancy [KN74] of P, defined as

Dg(P) := sup[p,q]⊆[0,1]

∣∣∣|[p, q]| − npqn

∣∣∣ ,where npq is the number of points in [p, q]. We show that any (ε, ρ)-additive solution to Dg(P)requires space bound of Ω

(1ε

)bits. We present a matching ε-additive deterministic algorithm.

• (color-discrepancy) Given n points P as a stream, where each pi ∈ [0, 1], and a color labelred or blue on each point, the objective is to report 1-dimensional color discrepancy [Mat99]of P denoted and defined as

Dc(P) = supI⊆[0,1]

|R(I)−B(I)|,

where R(I) and B(I) denote the number of red and blue points of P respectively, that belongto the interval I. We show that any (ε, ρ)-multiplicative solution to Dc(P) admits a spacelower bound of Ω (n) bits, where 0 < ε < 1

5 . If P arrives in a sorted order, Dc(P) can becomputed in constant space.

The only work prior to ours considering discrepancy in the streaming model has been the work byAgarwal et al. [AMP+06]. They defined discrepancy in the context of spatial scan statistics and gavelower bounds and algorithmic results with respect to that. We stick to the conventional definition ofboth geometric and combinatorial discrepancy and our lower bound results are stronger than that ofAgarwal et al. [AMP+06]. We also remark that (ε, ρ)-additive solution to geometric-discrepancyand (εn, ρ)-additive solution to color-discrepancy can be found using the algorithm for allquantile estimation [KLL16]; the space required for both the cases is O

(1ε log log 1

ε

). Note that the

bound achieved by our algorithm for geometric-discrepancy is O(

1ε

)and we are seeking an

(ε, ρ)-multiplicative solution for color-discrepancy.

4

1.3 A brief review

Historically, the works of Morris [Mor78], Munro and Paterson [MP78] and Flajolet et al. [FM85]were precursors to the work of Alon et al. [AMS99] where the idea of lower bounds on space forapproximating frequency moments was considered. Lower bounds in streaming is an active area ofinterest since then [HRR98, Mut05, Rou16].

In computational geometry, researchers have looked at one-pass streaming algorithms for funda-mental problems like convex hull [HS08, RR15], minimum empty circle [AHV04, AS15, CP14, ZC06],diameter of a point set [AS15, AC14, Cha06, FKZ04, Ind03], other extent measures like width,annulus, bounding box, cylindrical shell [AHV04, AN16, Cha06], clustering [HM04], determinis-tic ε-net and ε-approximations [BCEG07] and their randomized versions [FIS08], robust statis-tics on geometric data [BCEG07], geometric queries [BCEG07, BCNS15, STZ06], interval geome-try [CP15, EHR12], minimum weight matching, k-medians and Euclidean minimum spanning treeweight [FIS08, Ind04a]. As mentioned in [Ind04b], most of the algorithms for these problems followeither the merge and reduce technique [AHV04, AS15, AC14, Cha06] or low distortion randomizedembeddings [FIS08, Ind03, Ind04a]. In two recent works, polynomial methods have been used to findthe approximate width of a streaming point set [AN16] and ε-kernels [Cha16] for dynamic streaming(streaming with both insertion and deletion). On the line of merge and reduce technique, the seminalwork of Agarwal et al. [AHV04] on ε-kernels, that is a coreset for extent measure kind of problems,led to a series of works based on coresets in streaming [AS15, AC14, Cha06, Cha09, CP14, Zar11].To avoid repetition, we refer the reader to [Cha16] for a nice summary on this line of work.

Streaming algorithm for convex hull in R2 was considered in [HS08] where by storing 2r + 1points one can obtain a distance error of O(D/r2) (D is the diameter of the point set) between theoriginal and the reported convex hull. Another streaming algorithm with an error bound on area ofthe convex hull of the given points was proposed in [RR15]. Multi-pass streaming algorithm, as thename suggests, can do more than one pass on the data stream. Convex hull, linear programming(LP) [CC07, GM08] and skyline [SLNX09] problems have been studied under the multi-pass model.

Compared to algorithms for geometric problems in streaming, there has not been much studyon lower bounds in streaming. There are mostly two types of lower bound results – the usualspace lower bound of streaming [FKZ04, SLNX09] and the trade off between approximation ratioand space [AS15, BCEG07, ZC06]. Feigenbaum et al. [FKZ04] show that any exact algorithm forcomputing the diameter of a set of points requires Ω(n) bits of space. Zadeh and Chan [ZC06]proved a lower bound of (1 +

√2)/2 on the approximation factor of any deterministic algorithm

for the minimum enclosing ball (MEB) that at any time stores only one enclosing ball. Agarwaland Sharathkumar [AS15] deduce lower bounds using the communication complexity model [KN97,Rou16] by defining α-approximate variants of MEB, diameter, coreset and width. α > 1 is amultiplicative parameter on the radius of MEB, the MEB of the coreset, the diameter and the widthof the slab containing the points. Defining this α-approximate variants, allow them to deduce lowerbounds on space in terms of bits in the communication complexity model and relate approximationbounds to the space by obtaining suitable values of α. All of their lower bound results are inthe following framework – any streaming algorithm that maintains an α-MEB, an α-diameter, anα-coreset or an α-width for a set of points in Rd, for α < C (where C is a constant depending on dand is different for each of the problems), with probability at least 2/3 requires Ω(minn, exp(d1/3))bits of storage. Apart from the above, Bagchi et al. [BCEG07] showed that it is not possible toapproximate the range counting problem in polylogarithmic space.

All our lower bounds will be stated in number of bits that is consistent with the streaming model,where as the upper bounds will be stated in number of words. The lower bounds will be based oncommunication complexity arguments by using the results on Index and Disj problems. In any

5

instance of the Index problem, Alice has x ∈ 0, 1n and Bob has an integer (index) i. The goal isto compute the i-th bit of x i.e., xi. We say Index(x, i) = 1 if and only if xi = 1. In an instanceof the Disj problem, both Alice and Bob have bit vectors x,y ∈ 0, 1n. The goal is to determinewhether there exists i ∈ [n] such that xi = yi = 1. We say Disj(x,y) = 0 if and only if there existsi ∈ [n] such that xi = yi = 1. Index is hard in one-way communication complexity and Disj is hardin two-way communication complexity. The following, stated as a Theorem, will be useful for us.

Theorem 2. [KN97, Rou16]

(a) Every randomized one-way protocol that solves Index problem with probability at least 1− ρuses Ω(n) bits of communication.

(b) Every randomized two-way protocol that solves Disj problem with probability at least 1− ρ usesΩ(n) bits of communication.

2 KLEE’S MEASURE

This section begins with a discussion that shows the importance of klee’s measure in the sensethat it is related to F0 estimation. Next, we present lower bounds along with randomized anddeterministic algorithms.

2.1 Connection of Klee’s measure to F0 estimation

Let us consider a stream R, where each Ri ⊆ [N ]d, i.e. corners of each rectangle have integer

coordinates. Recall that klee’s measure ofn⋃i=1

Ri is denoted as V . Our objective is to report

the estimate V of the volume ofn⋃i=1

Ri such that∣∣∣V − V ∣∣∣ ≤ εV . The following result will be of

importance.Let F0 be the number of distinct elements present in a stream such that each element in the

stream is from universe U . Then, there exists a one pass randomized streaming algorithm ALGthat finds F ′0 such that |F0 − F ′0| ≤ εF0 and uses O

(1ε2

+ log |U |)

bit of space, where ε is an inputparameter [KNW10]. This is the optimal algorithm w.r.t. space for F0 estimation [AMS99, Woo04].

Here corners of each Ri have integer cordinates. This implies that each rectangle is a disjointunion of unit hypercubes that lies inside it. So, Klee’s measure V is the number of distinct unithypercubes in

⋃ni=1Ri. On receiving a hyperrectangle Ri in the stream, we give all the unit cubes

inside Ri as inputs to ALG. At the end of the stream, we report the ouput produced by ALG as

V . Observe that∣∣∣V − V ∣∣∣ ≤ εV . Note that here the size of the universe for F0 estimation is [N ]d.

Hence, we have the following observation.

Observation 3. Let a stream of hyperrectangles be such that corners of each rectangle have integercoordinates in [N ]d. Then, there exists a randomized one-pass streaming algorithm that outputs V

such that∣∣∣V − V ∣∣∣ ≤ εV with high probability and uses O

(1ε2

+ d logN)

bits of space.

One can reduce F0 estimation to the Klee’s measure problem where corners of each rectangle haveinteger coordinates as follows. So, the space used by the corresponding algorithm of Observation 3is optimal.

6

0 0 015

15

15

25

25

25

35

35

35

45

45

451 1 1

0 0 015

15

15

25

25

25

35

35

35

45

45

451 1 1

Figure 1: Reduction idea for Theorem 4. Here n = 5. In both (a) and (b), Alice’s input x is shownin the left figure and Bobs input y is shown in the middle one. In (a), x = 01101 and y = 10001;Disj(x,y) = 0; lA = 3

5 and lB = 35 ; the total Klee’s measure is shown in the right figure and the

value is 45 < lA + lB. In (b), x = 10001 and y = 01010; Disj(x,y) = 1; lA = 2

5 and lB = 25 ; The

total Klee’s measure is shown in the right figure and the value is 45 = lA + lB.

2.2 Lower bound

Theorem 4. For every ε ∈ (0, 0.1), there exists an positive integer n such that any randomizedone-pass streaming algorithm that outputs (ε, ρ)-additive solution to Klee’s measure for all streamsof at most 2n intervals in R, uses Ω

(1ε + log n

)bits of space.

Proof. We prove the Theorem by proving the followings.

(a) For every ε ∈ (0, 0.1), there exists an positive integer n such that any randomized one-passstreaming algorithm that outputs (ε, ρ)-additive solution to Klee’s measure for all streamsof at most 2n intervals in R, uses Ω

(1ε

)bits of space.

(b) For every ε ∈ (0, 0.1), there exists an positive integer n such that any randomized one-passstreaming algorithm that outputs (ε, ρ)-additive solution to Klee’s measure for all streamsof at most 2n intervals in R, uses Ω (log n) bits of space.

(a) Let n = b 12εc − 1 and A be a streaming algorithm that returns (ε, ρ)-additive solution to

Klee’s measure and uses space s = o(

1ε

). The following one-way communication protocol can

solve the Disj using space o(n). Alice will process her input x as follows. See Figure ??. For eachi, if xi = 1 then we give the interval [ i−1

n , in ] as input to A, otherwise, we do nothing. Let thecorresponding set of intervals generated by Alice be SA and lA be the Klee’s measure of SA. Alicewill send the current memory state of A and lA to Bob. Similarly, Bob will process his input y. Letthe corresponding set of intervals generated by Bob be SB and lB be its Klee’s measure. Observethat Disj(x,y) = 1 if the length of SA ∪ SB is lA + lB and in this case A returns at least lA + lB − ε.Disj(x,y) = 0 if the length of SA ∪ SB is at most lA + lB − 1

n < lA + lB − 2ε and A returns lessthen lA + lB − ε. So, we report Disj(x,y) = 1 if and only if A gives output at least lA + lB − ε.

(b) Let V be a family of 0, 1n vectors such that ∀x ∈ V ,n∑i=1

xi = n2 ; ∀x,y ∈ V and x 6= y , there

are at most n4 indices where both x and y have 1. Such a family V with |V| = 2Ω(n) exists [AMS99].

We define the Equality function as follows. Both Alice and Bob get vectors x and y respectivelyfrom V and the objective is to decide whether x = y. Formally, Equality(x,y) = 1 if and only ifx = y. It is well known that the two-way private coin randomized communication complexity ofEquality is Ω(log n) [AMS99, Rou16, KN97].

Let n > 21/ε andA be a streaming algorithm that returns (ε, ρ)-additive solution to Klee’s measureusing space o(log n) bits. The following one-way communication protocol can solve the Equalityusing space o(log n). Both Alice and Bob process their input exactly as in part (a). Let SA, SB,lA, lB have the same notation as in (a). Observe that if Equality(x,y) = 1, then the length ofSA ∪ SB is 0.5 and in this case A returns at most 0.5 + 0.1 = 0.6. If Equality(x,y) = 0, then the

7

length of SA ∪ SB is at least 0.75 and in this case A returns at least 0.75− 0.1 = 0.65. ThereforeEquality(x,y) = 1 if and only if A gives output at most 0.6.

Remark 1. • Multipass lower bound: We can also do both of the above reductions from Disjand Equality to a p-pass streaming algorithm, using space s, for Klee’s measure in R.Observe that the induced protocol for Disj requires at most 2ps bits of communication. This

implies s = Ω(

1p

(1ε + log n

)).

2.3 Algorithms for Klee’s measure in [0, 1]d

Randomized algorithm

Let us consider a stream R, where each Ri ⊂ [0, 1]d. Our objective is to output V such that∣∣∣V − V ∣∣∣ ≤ ε with probability at least 1− ρ, where 0 < ε, ρ < 1. We generate and store M random

points p1, . . . , pM ∈ [0, 1]d. For each point pj , j ∈ [M ], we maintain a binary indicator randomvariable Xj , j ∈ [M ]. Xj = 1 if and only if pj lies on or inside any Ri ∈ R. At the end of the

stream, we report V = XM , where X =

∑Mi=1Xi.

As we have chosen M points uniformly at random, the probability that a random point is

present in some rectangle in the stream, is same as the volume ofn⋃i=1

Ri. So, Pr(Xj = 1) = V and

E(X) = MV . Now apply standard Chernoff bound.

Pr(∣∣∣V − V ∣∣∣ ≥ ε) = Pr(|X −MV | ≥Mε)

≤ 2e− (ε/V )2

2+ε/VMV

= 2e−ε2

2V+εM

≤ 2e−ε2

2+εM ≤ ρ.

This implies M ≥ 2+εε2

log(

1ρ

). Hence, we have the following theorem.

Theorem 5. There exists a randomized one-pass streaming algorithm that outputs (ε, ρ)-additive

solution to Klee’s measure in [0, 1]d and uses O(

1ε2

log(

1ρ

))space.

Deterministic algorithm

A hyperrectangle R ⊂ [0, 1]d is δ-fat if the length of each side is at least δ ∈ (0, 1). Our objectiveis to report a V such that V − ε ≤ V ≤ V for a given ε ∈ (0, 1). The main result is stated in thefollowing theorem.

Theorem 6. There exists a deterministic one-pass streaming algorithm that takes a stream of δ-fat

hyperrectangles in [0, 1]d as input, and outputs V such that V − ε ≤ V ≤ V , using O(

2d2+d

εd−1δd

)space,

where ε ∈ (0, 1) is an input parameter.

Proof. The crux of the proof is to subdivide [0, 1]d into 1/δd hyperboxes, each of size δ × . . .× δ asshown in Figure 2; denote each such hyperbox as Hδd. Each δ-fat hyperrectangle of the stream willintersect at least one corner of some Hδd; there are 2d such corners for each Hδd. For any corner, asubset of hyperrectangles of R will be called anchored if each member of the subset intersects thatcorner point. A set of hyperrectangles is anchored for a hyperbox if each hyperrectangle in the set

8

Algorithm 1: Klee(d, ε)

Input: A stream of hyperrectangles R = R1, . . . , Rn in [0, 1]d.Output: Klee’s measure of the stream

1 begin2 Subdivide [0, 1]d into 1/δd hyperboxes, each of size δ × . . .× δ as shown in Figure 2; denote

each such hyperbox as Hδd.3 We magnify each dimension of [0, 1]d by 1/δ so that each Hδd becomes [0, 1]d.

4 Let B be the set of all magnified Hδd’s.5 Call in parallel, 1/δd copies of ALG(d, ε) — one for each Hdδ ∈ B. (ALG(d, ε) is given as

Algorithm 2.)6 for (each hyperrectangle R in the stream) do7 Find all Hdδ ∈ B that intersects R and give R ∩Hdδ as input to the corresponding

ALG(d, ε) algorithm for Hδd.8 end

9 Find the sum of outputs produced by all 1/δd copies of ALG(d, ε) and multiply by δd andreturn.

10 end

ε′δ

[0, 1]d−1

[0, 1]d Hδd

pε′

q

Figure 2: The figure to the left shows the placement of some input rectangles in [0, 1]d, the figure inthe middle shows anchored rectangle at origin in [0, 1]d and the figure to the right shows projectionof all intersection of all rectangles with a particular strip to [0, 1]d−1.

9

intersects at least one corner of the hyperbox. The next Claim, proved later, finds the estimate ofklee’s measure for hyperrectangles that are anchored for a hyperbox.

Claim 7. Let Ra be a stream of hyperrectangles anchored for [0, 1]d. There exists a deterministic

one-pass streaming algorithm that outputs Va such that Va − ε ≤ Va ≤ Va and uses O(

2d2+d

εd−1

)space,

where Va is the Klee’s measure of Ra and ε ∈ (0, 1) is an input parameter.

To put the proof of Theorem 6 in Claim 7’s context of [0, 1]d, we magnify each dimension of[0, 1]d by 1/δ so that each Hδd becomes [0, 1]d. The error term will also change accordingly. So, ourproblem boils down to finding the Klee’s measure in [0, 1/δ]d within an additive error of ε/δd. Let Bbe the set of all magnified Hδd. Recall that each hyperrectangle R ∈ R is δ-fat. Therefore, for eachHδd ∈ B, if R ∩Hδd 6= φ, then R ∩Hdδ is anchored at (at least) one of the corners of Hδd.

We start, in parallel, 1/δd d-dimensional anchored Klee’s measure algorithm – one for eachHdδ ∈ B. On receiving a rectangle R, we find all Hdδ ∈ B that intersects R and give R ∩Hdδ as inputto the corresponding algorithm for Hδd. Refer the pseudocode given in Algorithm 1. Total additiveerror of ε/δd can be achieved if we can ensure additive error of at most ε for each Hδd ∈ B. By

Claim 7, this can be achieved by using O(

2d2+d

εd−1

)space for each Hδd. Hence, the total amount space

required is O(

2d2+d

εd−1δd

).

Proof of Claim 7. We will prove it by using induction on d, where d ≥ 1. Let ALG(ε, d) be thealgorithm that solves the problem within an additive error of E(ε, d) using S(ε, d) space. We have

to prove that E(ε, d) ≤ ε and S(ε, d) ≤ 2d2+d

εd−1 , where d ≥ 1. For the base case of d = 1, each intervalof the stream in [0, 1] is anchored either at 0 or 1. Our algorithm ALG(ε, 1) maintains the rightmost(leftmost) extreme point left (right) of all intervals anchored at 0 (1). At the end of the stream, wereport min(left + right, 1). Observe that we output exact Klee’s measure without any error usingconstant space. Hence, E(ε, 1) = 0 ≤ ε and S(ε, 1) = O(1).

Assuming the statement is true for all dimensions less than or equal to d− 1, we show that it isalso true for dimension d. We divide the entire space [0, 1]d along the d-th dimension into 1/ε′ strips,each of the form [0, 1]d−1 × [(i− 1)ε′, iε′], where ε′ = ε/2d+1 and 1 ≤ i ≤ 1/ε′. We start 1/ε′ copiesof the algorithm ALG(ε/2, d− 1), one for each strip.

On receiving an anchored hyperrectangle R ⊂ [0, 1]d, we assign it to one of the 2d corners withwhich it intersects. Without loss of generality, assume that it is anchored at the origin. Let R cutthrough p strips (0 ≤ p ≤ 1/ε′) and extend for a distance of q (0 ≤ q < ε′) in the last strip alongthe d-th dimension. Refer Figure 2. Thus R can be decomposed as R = R′ × [0, pε′ + q], whereR′ (of dimension d− 1) ⊂ [0, 1]d−1. Divide R into p parts of the form R′ × [(j − 1)ε′, jε′], wherej ∈ [p] and assign R′ × [(j − 1)ε′, jε′] to the j-th strip. Part of hyperrectangles we are assigningto a strip is of length ε′ along dimension d. So, each anchored hyperrectangle in [0, 1]d assignedto a strip can be thought of as a d − 1 dimensional hyperrectangle with a length of ε′ along thed-th dimension. Observe that the projection of hyperrectangles of the form R′ × [(j − 1)ε′, jε′], thatbelong to a particular strip j, to the d− 1 dimensional space is nothing but R′. See Figure 2 foran example. For each j ∈ [p], R′ is given as an input to the coresponding ALG(ε/2, d− 1) of thejth strip. At the end of the stream, we compute the sum of outputs produced by all 1

ε′ recursivecalls ALG(ε/2, d− 1) and then multiply by ε′ to get the final estimate of Klee’s measure. Refer thepseudocode given in Algorithm 2.

Our algorithm incurs two types of errors –

(i) Error incurred from the recursive calls to the lower dimensions.

10

Algorithm 2: ALG(d, ε): d dimensional anchored Klee’s measure, d > 1

Input: A stream of hyperrectangles in [0, 1]d anchored either at 0 or 1.Output: Klee’s measure of the stream

1 begin2 Divide the entire space [0, 1]d along the d-th dimension into 1/ε′ strips, where ε′ = ε/2d+1,

such that each strip is of the form [0, 1]d−1 × [(i− 1)ε′, iε′], 1 ≤ i ≤ 1/ε′.3 We start 1/ε′ copies of the algorithm ALG(d− 1, ε/2), one for each strip.4 for (each anchored hyperrectangle R in the stream) do5 Assign R to one of the 2d corners with which it intersects. W.l.o.g., assume that it is

anchored at the origin.6 Let R = R′ × [0, pε′ + q], where R′ ⊂ [0, 1]d−1, 0 ≤ p ≤ 1/ε′ and 0 ≤ q < ε′. Divide R

into p parts of the form R′ × [(j − 1)ε′, jε′], where j ∈ [p] and assignR′ × [(j − 1)ε′, jε′] to the j-th strip. Ignore remaining portion of R, i.e,R′ × [pε′, pε′ + q] (the shaded part in Figure 2).

7 For each j ∈ [p], give input R′ to the corresponding ALG(d− 1, ε/2) of the j-th strip.

8 end9 Find the sum of outputs produced by all 1/ε′ recursive calls ALG(d− 1, ε/2) and then

multiply by ε′ and return.10 end

(ii) The error incurred at this level of the recursion corresponding to the top part of R, i.e.,R′ × [pε′, pε′ + q] (as shown by the shaded parts in Figure 2) that we just ignore.

To analyze the error of type (ii), notice that these top parts of the hyperrectangles anchored at acorner has a special structure as these hyperrectangles form a partial order under inclusion. So, theerrors are additive and can be at most ε′. Hence, the total error with respect to all 2d corners is atmost 2dε′. Now to estimate the error of type (i), we observe that the error in calculation of Klee’smeasure due to one strip (because of the multiplication by ε′) is ε′E(ε/2, d− 1). So, the total errorwith respect to all the 1/ε′ strips is 1

ε′ (ε′E(ε, d− 1)) = E(ε, d− 1) which by induction hypothesis is

at most ε/2. Now considering the fact that ε′ = ε/2d+1, the total error is given by

E(ε, d) = 2dε′ + E(ε/2, d− 1) ≤ ε

.With the space requirement for one strip being S(ε/2, d− 1) by induction hypothesis, the total

space requirement with respect to all 1/ε′ strips is 1ε′S(ε/2, d− 1) along with the book keeping of 1

ε′

instances of ALG(ε/2, d− 1). Therefore,

S(ε, d) ≤ 1

ε′S(ε/2, d− 1) + log

(1

ε′

)=

2d+1

εS(ε/2, d− 1) + log

(2d+1

ε′

)≤ 2d

2+d

εd−1.

3 CONVEX-BODY and GEOM-QUERY-CONV-POLY

In this section, we first strengthen the lower bound result on convex hull by showing that evenpromised version in hand. Next, we derive lower bounds for convex body approximation and point

11

existence queries in a convex polygon followed by algorithms for them. To the best of our knowledge,the lower bound on convex body approximation is first of this kind. As a consequence of our resulton convex body approximation, we can solve LP in the streaming model and design a propertytesting type result for geometric queries.

3.1 Lower bounds

Lower bound for promise version of convex hull

Theorem 8. Any randomized one-pass streaming algorithm for convex hull that distinguishes betweeninputs having |CH(P)| = O(1) and |CH(P)| = Ω(n) in R2 with probability 1− ρ, uses Ω(n) bits ofspace.

Proof. If there exists a randomized algorithm A that distinguishes between |CH(P)| = O(1) and|CH(P)| = Ω(n) with probability ρ and uses o(n) bits, we can show the existence of a randomizedprotocol that solves Index and uses o(n) bits. Consider a regular convex polygon of 2n vertices,V = vi : 1 ≤ i ≤ 2n, as shown in Figure 3. We construct the input point set P to A by takingeach v2i−1, 1 ≤ i ≤ n. We also add v2i if and only if the i-th bit of Alice’s input xi = 1. We send thecurrent sketch to Bob. Let j be Bob’s query index. Now another n+ 2 points are given as input toA such that n of them are in convex position inside the triangle ∆v2j−1v2jv2j+1 and the other twoare such that no vertices of V \ v2j−1, v2j , v2j+1 can be on CH(P) as shown in Figure 3. Observethat the last n+ 2 points are put in such a way that if v2j is on CH(P) then |CH(P)| = 5, otherwise,|CH(P)| = n+ 4 = Ω(n). Note that by construction v2j is on CH(P) if and only if xj = 1. Hence,xj = 1 if |CH(P)| = 5 and xj = 0 if |CH(P)| = Ω(n).

v2j

v2j−1 v2j+1

u v

Figure 3: u, v and the dotted points inside the triangle ∆v2j−1v2jv2j+1 are the last n+ 2 points.

Lower bound for convex body approximation and geometric query

Theorem 9. For every ε > 0, there exists a positive integer n such that any (one-pass streaming)

algorithm that (ε, ρ)-approximates a convex polygon K, requires Ω(

1√ε

)bits of space, where K is

given as a stream of at most 2n straight lines in R2.

12

12

1

πn

p1

p2

p2n

12

cos(πn )

2

KC

Figure 4: Reduction idea for Theorem 9.

Proof. Without loss of generality, assume that 0 < ε < 12 . Let n = d π

2√

2εe and A be a (streaming)

algorithm, as stated, using o(

1√ε

)bits. Now we can design a protocol that solves Index using

space o(n). See Figure 4 for the following discussion.Alice and Bob know a circle C of diameter 1and 2n points p1, . . . , p2n placed evenly on its circumference. We process Alice’s input x as follows.If xi = 1, then we give two line segments p2i−1p2i and p2ip2i+1 as inputs to A. If xi = 0, wegive the line segment p2i−1p2i+1 as an input to A. We send the current memory state of Alice toBob. Let K be the actual convex polygon and K′ be the convex polygon generated by A suchthat dH(K,K′) ≤ ε. Observe that xi = 1 implies p2i ∈ K, i.e., there exists a point q ∈ K′ such thatd(p2i, q) ≤ ε, where d(p, q) is the Euclidean distance between p and q. If xi = 0, then p2i /∈ K and

d(p2i,K) = minq∈K

d(p, q) =1−cos(πn)

2 = sin2(π2n

)≥ sin2

√2ε > ε, as 0 < ε < 1

2 .

Let j be the query index of Bob. We report xj = 1 if and only if C(p2j) ∩ K′ 6= φ, where C(p)denotes the circle of radius ε centred at p.

The proof for the lower bound of geom-query-conv-poly is similar to the proof of Theorem 9.

Theorem 10. Every randomized one-pass streaming algorithm that solves geom-query-conv-polywith probability 1− ρ, uses Ω(n) bits of space.

Proof. Let A be a streaming algorithm, as stated in the Theorem, using o(n) bits. Now we can designa protocol that solves Index using space o(n) bits. See Figure 5 for the following discussion. Aliceand Bob know a circle C of diameter 1 and 2n points p1, . . . , p2n placed evenly on its circumference.We process Alice’s input x as follows. If xi = 1, then we give two line segments p2i−1p2i and p2ip2i+1

as inputs to A. If xi = 0, we give the line segment p2i−1p2i+1 as an input to A. We send the currentmemory state of Alice to Bob.

Let K be the actual convex polygon. Observe that xi = 1 implies p2i ∈ K. If xi = 0, then p2i /∈ K.Let j be the query index of Bob. We report xj = 1 if and only if A reports p2j ∈ K.

13

1

p1

p2

p2n

KC

Figure 5: Reduction idea for Theorem 10.

3.2 Algorithms

Convex body approximation

We state Dudley’s result [Dud74] and some observations about Hausdroff distance that will be usefulfor the algorithm.

Lemma 11. [Dud74] A convex body K can be ε-approximated by a polytope P with 1/ε(d−1)/2 facets.ε-approximation in this context means dH(K, P ) ≤ ε.

Observation 12. Let K1,K2,K′1 and K′2 be convex bodies with K1 ⊆ K′1, K2 ⊆ K′2 and dH(K1,K′1) ≤ε1, dH(K2,K′2) ≤ ε2. Then dH(K1 ∩ K2,K′1 ∩ K′2) ≤ max(ε1, ε2).

Observation 13. Let K, K′ and K′′ be convex bodies such that dH(K,K′) ≤ ε1 and dH(K′,K′′) ≤ ε2.Then, dH(K,K′′) ≤ ε1 + ε2.

Note that the approximated convex body in Lemma 11 is always a superset of the original one.Given a convex body K in Rd and a parameter ε > 0, let A be a non-streaming algorithm thatoutputs a convex polytope K′ such that dH(K,K′) ≤ ε and A stores 1

ε(d−1)/2 facets. Agarwal etal. [AHV04] used Bentley-Saxe’s dynamization technique [BS80] to maintain the approximate extentmeasures of a point set. We adapt these ideas for hyperplanes. Let a convex body K be given as astream of hyperplanes in Rd. K is contained in a unit ball B. Note that K is the intersection of allhyperplanes in the stream. The objective is to store a convex body K′ using sub-linear number offacets that will ε-approximate K, i.e., dH(K,K′) ≤ ε.

We partition the processed stream of hyperplanes H =< H1, . . . ,Hn > into t parts — H1, . . . ,Ht,t ≤ dlog ne. For each i ∈ [t], |Hi| = 2ri for some non-negative integer ri < dlog ne. We say ri is therank of Hi. We maintain our data structure in a such a way that the rank of Hi is equal to therank of Hj if and only if i = j. Let Ci =

⋂H∈Hi

H, be the convex polytope of hyperplanes in Hi. As

we can not store Hi or Ci, we maintain some approximation C ′i of Ci such that dH(Ci, C′i) ≤ f(ri).

f(r) = ε/2r2 if r 6= 0 and f(0) = 0. We store C ′i and ri. Let C = C ′1, . . . , C ′t.Now consider the situation when we have to process Hn+1. We set H0 = Hn+1 and add

C ′0 = C0 = Hn+1 ∩ B to C as f(0) approximation of C0. If there exists C ′i, C′j ∈ C such that ri =

14

rj = r for some r < dlog(n+ 1)e, then using algorithm A, we find C ′ as approximation of C ′i ∩ C ′jsuch that dH(C ′, C ′i ∩ C ′j) ≤ f(r + 1). Note that the actual convex body corresponding to C ′ i.e.,Hi ∩Hj is of rank r + 1. C = (C \ Ci, Cj) ∪ C ′. We repeat the same process until all membersof C have distinct rank. Now our target is that at the end, we should have each C ′i as approximation

of Ci such that dH(Ci, C′i) ≤ ε. As a result, we get K′ =

t⋂i=1

C ′i as an approximation of K =t⋂i=1

Ci

such that dH(K,K′) ≤ ε by Observation 12. Note that the number of times a Ci is approximated isat most its rank. Hence by Observation 13, we have

dH(Ci, C′i) ≤

r∑l=1

f(l) =r∑l=1

ε

2l2≤ ε.

By Lemma 11, the amount of space used to store C ′i is O(

1f(r)(d−1)/2

)where r is the rank of Hi.

We consider only for r > 0, because there can be at most one Ci of rank 0 and the correspondingapproximation stores only 1 facet. Hence, the total amount of space we use is

1 +t∑l=1

∣∣C ′i∣∣ = O

dlogne∑l=1

1

(f(l))(d−1)/2

= O

dlogne∑l=1

1

(ε/2l)(d−1)/2

= O

(logd n

ε(d−1)/2

).

We summarize the above discussion in the following Theorem.

Theorem 14. A convex body K, given as stream of n hyperplanes, can be ε-approximated determin-

istically by a polytope P with O(

logd nε(d−1)/2

)facets.

Low dimensional LP

Chan et al. [CC07] gave a multi-pass algorithm to find exact solution to LP, whose one-passcounterpart admits a lower bound of Ω(n) bits of space [GM08].

In the streaming setting of LP, constraints arrive as a stream of hyperplanes. Note that inLP, the feasible region, i.e., intersection of all constraints (hyperplanes) is a convex body. Dueto Theorem 14, given a set of contraints as a stream, we can maintain ε-approximation of theconvex body, i.e., the feasible region of the LP using polylogarithmic 3 space. The ε-approximationalong with the objective function can be used to find an ε-additive solution to LP. Note that theε-approximtaion of the convex body is a superset of original convex body. So, the extreme point ofthe approximated convex body in the direction of the objective function vector may lie outside thefeasible region. One can shift the facets of the approximated convex body ε distance inwards toget a feasible solution also. An added advantage is that the objective function need not be knownbeforehand. In summary, we have the following Corollary to Theorem 14.

Corollary 15. There exists a one-pass deterministic streaming algorithm that outputs ε-additive

solution to LP using O(

logd nε(d−1)/2

)space for a fixed dimension d. The objective function may be

revealed at the end of the stream of contraints.

3O( logn

ε(d−1)/2 )

15

Property testing result for GEOM-QUERY-CONV-POLY

By the method discussed in Theorem 14, we approximate K by a convex body K′ such thatK ⊆ K′ and dH(K,K′) ≤ ε. By Theorem 14, the amount of space required to maintain K′ is

O(

logd nε(d−1)/2

)facets. Note that at the end of the stream we have K′ as our sketch and our objective

is to answer correctly for query point q if dH(K,K′) ≥ ε. For geom-query-conv-poly, we report(i) q ∈ K if q ∈ K′ and dH(q,K′) ≥ ε, (ii) q /∈ K if q /∈ K′ and (iii) arbitrary answer, otherwise. SeeFigure 6. Summarizing the above discussion, we have the following Corollary to Theorem 14.

KK′

K′′

dH(K,K′) ≤ ε

Figure 6: K is the original convex body. K′ is the ε-approximation to K. K′′ is shown as an ε shrinkof K. The region inside K′′ and outside K′ are the zones for q where we can correctly answer.

Corollary 16. There exists a deterministic one-pass streaming algorithm that given a convexpolyhedron K as a stream of hyperplanes and a query point q such that dH(q,K) ≥ ε, solves

geom-query-conv-poly by storing O(

logd nε(d−1)/2

)facets.

4 Discrepancy problems

The proofs in this Section will require the notion of star discrepancy [KN74]. In star discrepancy, allother problem specifications in the definition of discrepancy remain the same but each interval isconstrained to have its left end point at 0.

Thus, the problem of star-geometric-discrepancy is to report

D∗g(P) := sup[0,q]⊆[0,1]

∣∣∣q − n0q

n

∣∣∣ ,and the problem of star-color-discrepancy is to report

D∗c (P) := max

I=[0,x]⊆[0,1)|R(I)−B(I)| ,

which is same asD∗c (P) = max

Ip=[0,p]:p∈P|R(Ip)−B(Ip)|

because discrepancy values at points of the stream P only matter. The following relations are knownbetween the values of geometric discrepancy Dg(P) and color discrepancy Dc(P) and their starvariants.

Fact 17. [KN74] D∗g(P) ≤ Dg(P) ≤ 2D∗g(P) and D∗c (P) ≤ Dc(P) ≤ 2D∗c (P).

16

Fact 18. [KN74] Let x1 < x2 < x3 < . . . < xn ∈ [0, 1] be a sorted sequence of points in P . Then, thesupremum in the definition of D∗g can be replaced by a maximum operation as D∗g(P) = max

i∈[n]|xi− 2i−1

2n |.

4.1 Problem GEOMETRIC-DISCREPANCY

Theorem 19. For any ε > 0, there exists a positive integer n such that any one-pass randomizedstreaming algorithm that outputs an (ε, ρ)-additive solution to geometric-discrepancy for allstreams of length n, requires Ω

(1ε

)bits of space.

Proof. Let P be the stream. We need the following claim that will be proved later.

Claim 20. For any ε > 0, there exists a positive integer n such that any one-pass randomizedstreaming algorithm that outputs an approximate solution D to star-geometric-discrepancywith probability ρ, such that D∗g(P)− ε ≤ D ≤ 2D∗g(P) + ε, requires Ω

(1ε

)bits of space, where P is

the input stream of length 2n.

Let there exist an algorithm as stated in Theorem 19 that returns D′ for ε-additive error solutionto geometric-discrepancy and uses space o

(1ε

)bits. Observe that Dg(P)− ε ≤ D′ ≤ Dg(P)− ε.

Now using Fact 17, we can have the following. D′ ≤ Dg(P) + ε ≤ 2D∗g(P) + ε and D′ ≥ Dg(P)− ε ≥D∗g(P) − ε. So, we can report D′ as D, i.e., the solution to our star-geometric-discrepancy

problem satisfying D∗g(P)−ε ≤ D ≤ 2D∗g(P)+ε. Note that we are using o(

1ε

)bits, which contradicts

Claim 20.

(a)

112

312

512

712

912

1112

1B 0A 1A 0B 0A 0B

0 1

(b)

112

312

512

712

912

1112

1B 1B 0B1A0A 1A

0 1

Figure 7: Reduction idea for Claim 20. Here n = 3. In (a), Alice’s input x = 010 and Bob’s inputy = 100; Disj(x,y) = 1; D∗g(P) = 1

12 . In (b), x = 011 and y = 110; Disj(x,y) = 0; D∗g(P) = 16 + 1

24 .

Proof of Claim 20. Let n = d 132εe and A be a streaming algorithm that gives output, as stated,

using o(

1ε

)bits of space. We design a protocol for solving Disj using o(n) bits. The idea is to

generate points at varying intervals as per the inputs of Alice and Bob so that Disj can be solvedby looking at the separation of the discrepancy values. We process Alice’s bit vector x as follows.See Figure 7, 1A (1B) denotes an input of 1 for Alice (Bob) and 0A (0B) denotes an input of 1for Alice (Bob). For each i ∈ [n], we give zi as input to A such that zi = 4i−3

4n + 14n if xi = 0 and

zi = 4i−34n − 1

4n , otherwise. Let ZA = ∪ni=1zi. We send the current memory state to Bob. We processBob’s input y as follows and give another n inputs to A. We give input zn+i = 4i−1

4n if yi = 0 and

zn+i = 4j−34n − 1

8n if yi = 1. Let ZB = ∪ni=1zn+i and Z = ZA ∪ ZB. In total, we have given Z having2n inputs to A. Let Zs =< z′1, . . . , z

′2n >, be the sorted sequence, in increasing fashion, of the points

in Z. Recalling Fact 18, note that D∗g(P) = maxi∈[2n]

Di, where Di =∣∣2i−1

4n − z′i∣∣. Let J = j : yj = 1

be the set of indices where Bob has an input of 1. Let A(j), B(j), j ∈ J , be the indices of the inputin Zs corresponding to zj (Alice) and zn+j (Bob), respectively. Note that B(j) = A(j) + 1 if xj = 1

17

and B(j) = A(j)− 1, otherwise . One can see that Di ≤ 14n , ∀i ∈ [2n] and i 6= A(j), B(j); j ∈ J . If

xj = 1, B(j) = A(j) + 1, DA(j) = 14n and DB(j) = 1

2n + 18n . If xj = 0, B(j) = A(j)− 1, DA(j) = 1

4n

and DB(j) = 18n . Hence, if Disj(x,y) = 0, there exists an index i ∈ [n] such that xi = yi, then

D∗g(P) = 12n + 1

8n and in this case A reports at least 12n + 1

8n − ε, i.e., 19ε. If Disj(x,y) = 1, then

D∗g(P) ≤ 14n and in this case A reports at most 2D∗g(P) + ε = 1

2n + ε, i.e., 17ε. So, we reportDisj(x,y) = 1 if and only if A reports at most 17ε.

We have been able to design deterministic one (multi) pass streaming algorithm for ε-additive(multiplicative) solution to geometric-discrepancy using bucketing technique. The result isstated in the following Theorem.

Theorem 21. There exists a one-pass deterministic streaming algorithm for ε-additive solution togeometric-discrepancy using O(1

ε ) space, where ε ∈ (0, 1) is an input parameter.

Proof. We use bucketing technique to solve geometric-discrepancy in R. Given an ε ∈ (0, 1),we create d1

ε e buckets. B = Bi = [2(i − 1)ε,min(2iε, 1)) : i ∈ [d1ε e] be the set of buckets. We

also maintain d1ε e counters, i.e., ci, i ∈ [d1

ε e] such that ci maintains the number of points in Bi. Onreceiving a point in the stream, we only increase the count of the corresponding counter of thebucket.

At the end of the stream, we know the value of n and the values of ci’s, i ∈ d 12εe. Let Ci denote

the number of points ini∪

j=1Bj i.e., Ci =

i∑j=1

cj . Note that only ci’s (and Ci’s) are maintained, not

exact coordinate of points. We take yi = (2i− 1) ε2 as the representative coordinate for each point inBi. So, the amount of space we are using is O

(1ε

).

For buckets Bi and Bj , i ≤ j, consider any interval [a, b] ⊆ [0, 1] that spans from Bi to Bj , i.e.,a ∈ Bi and b ∈ Bj . So, if we know the exact value of the number of points present in [a, b], i.e.,nab, We can report

∣∣|[yi, yj ]| − nabn

∣∣ as an ε-additive solution to∣∣|[a, b]| − nab

n

∣∣ as |yi − a| , |yj − b| ≤ ε2 .

But we do not know exact nab. However, Ci−1 + 1 ≤ nab ≤ Cj , where C0 = 0, as [a, b] spans fromBi to Bj . So,

maxCi−1+1≤nab≤Cj

(∣∣∣|[yi, yj ]| − nabn

∣∣∣) = max

(∣∣∣∣|[yi, yj ]| − Ci−1 + 1

n

∣∣∣∣ , ∣∣∣∣|[yi, yj ]| − Cjn

∣∣∣∣)is an ε-additive approximate solution to max

a∈Bi,b∈Bj

∣∣|[a, b]| − nabn

∣∣.Observe that Dg(P) = max

1≤i≤j≤d 1εe

maxa∈Bi,b∈Bj

∣∣|[a, b]| − nabn

∣∣. Hence, we report

max1≤i≤j≤d 1

εemax

(∣∣∣∣|[yi, yj ]| − Ci−1 + 1

n

∣∣∣∣ , ∣∣∣∣|[yi, yj ]| − Cjn

∣∣∣∣)as our ε-additive solution to Dg(P). This concludes the proof.

4.2 Problem COLOR-DISCREPANCY

Theorem 22. Any one-pass streaming algorithm that returns (ε, ρ)-multiplicative solution to color-discrepancy, uses Ω(n) bits of space, where P is the input stream and 0 < ε < 1

5 .

Proof. We need the following claim where P be the stream.

18

0 113

23

0 113

23

(a)

(b)

Figure 8: Reduction idea for Claim 23. Here n = 3. In (a), Alice’s input x = 100 and Bob’s inputy = 010; Disj(x,y) = 1; D∗c (P) = 2. In (b), x = 011 and y = 110; Disj(x,y) = 0; D∗c (P) = 3.

Claim 23. Any one-pass streaming algorithm that outputs an approximate solution D to star-color-discrepancy with probability ρ such that (1− ε)D∗c (P) ≤ D ≤ 2(1 + ε)D∗c (P), uses Ω(n)bits of space, where P is the input stream of length n and 0 < ε < 1

5 .

Let there exist an algorithm as stated in Theorem 22 that outputs D′ for color-discrepancyand uses space of o(n) bits. Now using Fact 17, we have D′ ≤ (1 + ε)Dc(P) ≤ 2(1 + ε)D∗c (P) andD′ ≥ (1− ε)Dc(P) ≥ (1− ε)D∗c (P). So, we can report D′ as our D, i.e., the approximate solution toour star-color-discrepancy satisfying (1− ε)D∗c (P) ≤ D ≤ 2(1 + ε)D∗c (P). Note that we areusing o(n) bits, which contradicts Claim 23.

Remark 2. Multipass lower bound: Using a similar line of arguemnet of Remark 1, we can havethe followings.

• Any p-pass algorithm that computes ε-multiplicative solution to color-discrepancy (geometric-

discrepancy), requires Ω(np

)bits of space, where 0 < ε < 1

5 .

• Any p-pass algorithm that computes ε-additive solution to color-discrepancy (geometric-

discrepancy), requires Ω(

1εp

)) bits of space, where 0 < ε < 1.

Proof of Claim 23. We show a reduction from Disj. Let A be an algorithm that solves correctlystar-color-discrepancy, as stated, with probability 2/3 and uses o(n) bits of space. Now wecan design a protocol by suitably placing “red” and “blue” points and looking for separation ofdiscrepancy values to solve Disj. We process each bit of Alice’s input x as follows. See Figure 8. Ifxi = 1, we give inputs i−1

n + 17n and i

n labeled as “red” and “blue”, respectively to A. Otherwise, wegive points i−1

n + 27n and i

n labeled as “blue” and “red”, respectively as inputs. We send the currentmemory status of A to Bob. Bob processes his input y as follows. If yi = 1, Bob gives four inputs toA: i−1

n + 37n and i−1

n + 47n both labeled as “red”; i−1

n + 57n and i−1

n + 67n both labeled as “blue”. If

yi = 0, Bob does nothing. As discussed at the begining of this Section, the discrepancy values at theinput points only matter. By construction of the input instance of star-color-discrepancy, eachpoint in P is in one of the forms: i−1

n + j5n for some i ∈ [n], 0 ≤ j ≤ 6. Recall that Ip = [0, p]. Observe

that, |R(Ip)−B(Ip)| = 0 if p = in for some i; |R(Ip)−B(Ip)| = 1 if p = i−1

n + 17n or p = i−1

n + 27n for

some i. Let J = k : yk = 1. Only for i ∈ J , we have Ip such that p = i−1n + j

7n , where 3 ≤ j ≤ 6.

19

Observe that if xj = 1, then

|R(Ip)−B(Ip)| = 2 for p =j − 1

n+

3

7n,

|R(Ip)−B(Ip)| = 3 for p =j − 1

n+

4

7n,

|R(Ip)−B(Ip)| = 2 for p =j − 1

n+

5

7n,

and |R(Ip)−B(Ip)| = 1 for p =j − 1

n+

6

7n.

If xj = 0, then

|R(Ip)−B(Ip)| = 0 for p =j − 1

n+

3

7n,

|R(Ip)−B(Ip)| = 1 for p =j − 1

n+

4

7n,

|R(Ip)−B(Ip)| = 0 for p =j − 1

n+

5

7n,

and |R(Ip)−B(Ip)| = 1 for p =j − 1

n+

6

7n.

If Disj(x,y) = 0, there exists an index i such that xi = yi = 1, then D∗c (P) = 3 and in this caseA returns at least (1− ε)D∗c (P), i.e., more than 12

5 . If Disj(x,y) = 1, then D∗c (P) = 1 and in thiscase A returns at most 2(1 + ε)D∗c (P), i.e., less than 12

5 . Hence, we report Disj(x,y) = 1 if andonly if A gives output less than 12

5 .

4.3 Problem COLOR-DISCREPANCY for sorted sequence

By Theorem 22, approximating color-discrepancy needs Ω(n) bits. But if the stream Parrives in a sorted order, we can compute Dc(P) using O(1) space.

Theorem 24. color-discrepancy can be solved exactly by a one-pass deterministic streamingalgorithm using O(1) space when P is sorted.

Proof. Let Iopt = [p, q] be an interval where Dc(P) = |R(Iopt)−B(Iopt)| is optimized. Then p, q ∈ Pand Dc(P) 6= 0, i.e., R(Iopt) 6= B(Iopt). Also if R(Iopt) > B(Iopt) then p, q are red points and ifR(Iopt) < B(Iopt) then p, q are blue points. To establish the theorem, we need the following Claim.

Claim 25. Dc(P) = maxp∈P

(R ([0, p])−B ([0, p]))−minq∈P

(R ([0, q])−B ([0, q])) + 1.

The algorithm keeps track of the number of red (denoted as #R) and blue (denoted as #B)points seen so far. It also maintains aother two variables — max and min. On receiving an input,increment #R or #B accordingly and then update max with max(max,#R−#B) and min withmin(min,#R−#B). Observe that Dc(P) = max−min+ 1 by Claim 25.

20

Proof of Claim 25.

Dc(P) = maxI⊂[0,1]

|R(I)−B(I)|

= maxp,q∈P

|R ([p, q])−B ([p, q])|

= max(

maxp,q∈P,R([p,q])>B([p,q])

(R ([p, q])−B ([p, q])) ,

maxp,q∈P,B([p,q])>R([p,q])

(B ([p, q])−R ([p, q])))

= max(

maxp,q∈P,R([p,q])>B([p,q])

((R ([0, q])−R ([0, p]) + 1)− (B ([0, q])−B ([0, p]))) ,

maxp,q∈P,B([p,q])>R([p,q])

((B ([0, q])−B ([0, p]) + 1)− (R ([0, q])−R ([0, p]))))

= maxp,q∈P

|(R([0, q])−R([0, p]))− (B([0, q])−B([0, p]))|+ 1

= maxp,q∈P

|(R([0, q])−B([0, q]))− (R([0, p])−B([0, p]))|+ 1

= maxp∈P

(R([0, p])−B([0, p]))−minq∈P

(R([0, q])−B([0, q])) + 1

References

[AC14] S. Arya and T. M. Chan. Better Epsilon-Dependencies for Offline Approximate Nearest NeighborSearch, Euclidean Minimum Spanning Trees, and Epsilon-Kernels. In SoCG, page 416, 2014.

[AHV04] P. K. Agarwal, S. Har-Peled, and K. R. Varadarajan. Approximating extent measures of points.J. ACM, 51(4):606–635, 2004.

[AMP+06] D Agarwal, A McGregor, J. M. Phillips, S. Venkatasubramanian, and Z. Zhu. Spatial ScanStatistics: Approximations and Performance Study. In KDD, pages 24–33, 2006.

[AMS99] N. Alon, Y. Matias, and M. Szegedy. The Space Complexity of Approximating the FrequencyMoments. J. Comput. Syst. Sci., 58(1):137–147, 1999.

[AN16] A. Andoni and H. L. Nguyen. Width of Points in the Streaming Model. ACM Trans. Algorithms,12(1):5, 2016.

[AS15] P. K. Agarwal and R. Sharathkumar. Streaming Algorithms for Extent Problems in HighDimensions. Algorithmica, 72(1):83–98, 2015.

[BCEG07] A. Bagchi, A. Chaudhary, D. Eppstein, and M. T. Goodrich. Deterministic sampling and rangecounting in geometric data streams. ACM Trans. Algorithms, 3(2), 2007.

[BCNS15] A. Bishnu, A. Chakrabarti, S. C. Nandy, and S. Sen. On Density, Threshold and EmptinessQueries for Intervals in the Streaming Model. In FSTTCS, pages 336–349, 2015.

[Ben77] J. L. Bently. Algorithms for Klee’s rectangle problem. Unpublished manuscript, 1977.

[BS80] J. L. Bentley and J. B. Saxe. Decomposable Searching Problems I: Static-to-Dynamic Transfor-mation. J. Algorithms, 1(4):301–358, 1980.

[CC07] T. M. Chan and E. Y. Chen. Multi-Pass Geometric Algorithms. Discrete & ComputationalGeometry, 37(1):79–102, 2007.

21

[Cha06] T. M. Chan. Faster core-set constructions and data-stream algorithms in fixed dimensions. Comput.Geom., 35(1-2):20–35, 2006.

[Cha09] T. M. Chan. Dynamic Coresets. Discrete & Computational Geometry, 42(3):469–488, 2009.

[Cha10] T. M. Chan. A (slightly) faster algorithm for klee’s measure problem. Comput. Geom., 43(3):243–250, 2010.

[Cha13] T. M. Chan. Klee’s Measure Problem Made Easy. In FOCS, pages 410–419, 2013.

[Cha16] T. M. Chan. Dynamic Streaming Algorithms for Epsilon-Kernels. In SoCG, pages 27:1–27:11,2016.

[CP14] T. M. Chan and V. Pathak. Streaming and dynamic algorithms for minimum enclosing balls inhigh dimensions. Comput. Geom., 47(2):240–247, 2014.

[CP15] S. Cabello and P. Perez-Lantero. Interval Selection in the Streaming Model. In WADS, pages127–139, 2015.

[Dud74] R.M Dudley. Metric entropy of some classes of sets with differentiable boundaries. Journal ofApproximation Theory, 10(3):227 – 236, 1974.

[EHR12] Y. Emek, M. M. Halldorsson, and A. Rosen. Space-Constrained Interval Selection. In ICALP,pages 302–313, 2012.

[FIS08] G. Frahling, P. Indyk, and C. Sohler. Sampling in Dynamic Data Streams and Applications. Int.J. Comput. Geometry Appl., 18(1/2):3–28, 2008.

[FKZ04] J. Feigenbaum, S. Kannan, and J. Zhang. Computing Diameter in the Streaming and Sliding-Window Models. Algorithmica, 41(1):25–41, 2004.

[FM85] P. Flajolet and G. N. Martin. Probabilistic Counting Algorithms for Data Base Applications. J.Comput. Syst. Sci., 31(2):182–209, 1985.

[GM08] S. Guha and A. McGregor. Tight Lower Bounds for Multi-pass Stream Computation Via PassElimination. In ICALP, pages 760–772, 2008.

[GM12] S. Guha and A. McGregor. Graph Synopses, Sketches, and Streams: A Survey. PVLDB,5(12):2030–2031, 2012.

[HM04] S. Har-Peled and S. Mazumdar. On coresets for k-means and k-median clustering. In STOC,pages 291–300, 2004.

[HRR98] M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on Data Streams. In ExternalMemory Algorithms, Proceedings of a DIMACS Workshop, pages 107–118, 1998.

[HS08] J. Hershberger and S. Suri. Adaptive sampling for geometric problems over data streams. Comput.Geom., 39(3):191–208, 2008.

[Ind03] P. Indyk. Better Algorithms for High-dimensional Proximity Problems via Asymmetric Embeddings.In SODA, pages 539–545, 2003.

[Ind04a] P. Indyk. Algorithms for Dynamic Geometric Problems over Data Streams. In STOC, pages373–380, 2004.

[Ind04b] P. Indyk. Streaming Algorithms for Geometric Problems. In FSTTCS, pages 32–34, 2004.

[Kle77] V. Klee. Cann⋃

i=1

[ai, bi] computed in less than o(n log n) steps? Amer. Math. Monthly, 1977.

22

[KLL16] Z. S. Karnin, K. J. Lang, and E. Liberty. Optimal Quantile Approximation in Streams. In FOCS,pages 71–78, 2016.

[KN74] L. Kuipers and H. Niederreiter. Uniform Distribution of Sequences. John Wiley & Sons, 1974.

[KN97] E. Kushilevtiz and N. Nishan. Communication Complexity. In Cambridge University Press, 1997.

[KNW10] Daniel M. Kane, Jelani Nelson, and David P. Woodruff. An optimal algorithm for the distinctelements problem. In PODS, pages 41–52, 2010.

[Mat99] J. Matousek. Geometric Discrepancy: An Illustrated Guide. Springer, 1999.

[Mor78] R. Morris. Counting Large Numbers of Events in Small Registers. Commun. ACM, 21(10):840–842,1978.

[MP78] J. I. Munro and M. S. Paterson. Selection and Sorting with Limited Storage. In FOCS, pages253–258, 1978.

[Mut05] S. Muthukrishnan. Data Streams: Algorithms and Applications. FSTTCS, 1(2), 2005.

[OY91] M. H. Overmars and C.-K. Yap. New Upper Bounds in Klee’s Measure Problem. SIAM J.Comput., 20(6):1034–1045, 1991.

[Rou16] T. Roughgarden. Communication Complexity (for Algorithm Designers). FSTTCS, 11(3-4):217–404, 2016.

[RR15] R. A. Rufai and D. S. Richards. A Streaming Algorithm for the Convex Hull. In CCCG, 2015.

[SBV+15] G. Sharma, C. Busch, R. Vaidyanathan, Suresh Rai, and J. L. Trahan. Efficient transformationsfor Klee’s measure problem in the streaming model. Comput. Geom., 48(9):688–702, 2015.

[SLNX09] A. D. Sarma, A. Lall, D. Nanongkai, and J. Xu. Randomized Multi-pass Streaming SkylineAlgorithms. PVLDB, 2(1):85–96, 2009.

[STZ06] S. Suri, C. D. Toth, and Y. Zhou. Range Counting over Multidimensional Data Streams. Discrete& Computational Geometry, 36(4):633–655, 2006.

[TW12] S. Tirthapura and D. P. Woodruff. Rectangle efficient aggregation in spatial data stream. PODS,pages 283–294, 2012.

[vLW81] J. van Leeuwen and D. Wood. The measure problem for rectangular ranges in d-space. J.Algorithms,2, pages 282–300, 1981.

[Woo04] D. P. Woodruff. Optimal Space Lower Bounds for all Frequency Moments. In SODA, pages167–175, 2004.

[Zar11] H. Zarrabi-Zadeh. An Almost Space-Optimal Streaming Algorithm for Coresets in Fixed Dimen-sions. Algorithmica, 60(1):46–59, 2011.

[ZC06] H. Zarrabi-Zadeh and T. M. Chan. A Simple Streaming Algorithm for Minimum Enclosing Balls.In CCCG, 2006.

[Zha05] Jian Zhang. Massive Data Streams in Graph theory and Computational Geometry. PhD thesis,Yale University, USA, 2005.

23

On the streaming complexity of fundamental …approximation in R2.To the best of our knowledge, this...

Documents

Transcript of On the streaming complexity of fundamental …approximation in R2.To the best of our knowledge, this...