arXiv:1803.08777v1 [cs.DS] 23 Mar 2018
Transcript of arXiv:1803.08777v1 [cs.DS] 23 Mar 2018
arX
iv:1
803.
0877
7v1
[cs
.DS]
23
Mar
201
8
Data Streams with Bounded Deletions
Rajesh Jayaram
Carnegie Mellon University
David P. Woodruff
Carnegie Mellon University
Abstract
Two prevalent models in the data stream literature are the insertion-only and turnstilemodels. Unfortunately, many important streaming problems require a Ī(log(n)) multiplicativefactor more space for turnstile streams than for insertion-only streams. This complexity gapoften arises because the underlying frequency vector f is very close to 0, after accounting for allinsertions and deletions to items. Signal detection in such streams is difficult, given the largenumber of deletions.
In this work, we propose an intermediate model which, given a parameter Ī± ā„ 1, lowerbounds the norm āfāp by a 1/Ī±-fraction of the Lp mass of the stream had all updates been
positive. Here, for a vector f , āfāp = (ān
i=1|fi|p)1/p, and the value of p we choose depends on
the application. This gives a fluid medium between insertion only streams (with Ī± = 1), andturnstile streams (with Ī± = poly(n)), and allows for analysis in terms of Ī±.
We show that for streams with this Ī±-property, for many fundamental streaming problemswe can replace a O(log(n)) factor in the space usage for algorithms in the turnstile modelwith a O(log(Ī±)) factor. This is true for identifying heavy hitters, inner product estimation,L0 estimation, L1 estimation, L1 sampling, and support sampling. For each problem, we givematching or nearly matching lower bounds for Ī±-property streams. We note that in practice,many important turnstile data streams are in fact Ī±-property streams for small values of Ī±.For such applications, our results represent significant improvements in efficiency for all theaforementioned problems.
1
Contents
1 Introduction 31.1 Our Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Our Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Frequency Estimation via Sampling 72.1 Count-Sketch Sampling Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 L1 Heavy Hitters 16
4 L1 Sampling 174.1 The L1 Sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5 L1 estimation 205.1 Strict Turnstile L1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.2 General Turnstile L1 Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6 L0 Estimation 246.1 Review of Unbounded Deletion Case . . . . . . . . . . . . . . . . . . . . . . . . . . . 256.2 Dealing With Small L0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286.3 The Algorithm for Ī±-Property Streams . . . . . . . . . . . . . . . . . . . . . . . . . . 286.4 Our Constant Factor L0 Estimator for Ī±-Property Streams . . . . . . . . . . . . . . 31
7 Support Sampling 33
8 Lower Bounds 35
9 Conclusion 41
A Sketch of L2 heavy hitters algorithm 46
2
1 Introduction
Data streams have become increasingly important in modern applications, where the sheer size of adataset imposes stringent restrictions on the resources available to an algorithm. Examples of suchdatasets include internet traffic logs, sensor networks, financial transaction data, database logs, andscientific data streams (such as huge experiments in particle physics, genomics, and astronomy).Given their prevalence, there is a substantial body of literature devoted to designing extremelyefficient one-pass algorithms for important data stream problems. We refer the reader to [9, 50] forsurveys of these algorithms and their applications.
Formally, a data stream is given by an underlying vector f ā Rn, called the frequency vector,which is initialized to 0n. The frequency vector then receives a stream of m updates of the form(it,āt) ā [n] Ć āM, . . . ,M for some M > 0 and t ā [m]. The update (i,ā) causes the changefit ā fit+āt. For simplicity, we make the common assumption that log(mM) = O(log(n)), thoughour results generalize naturally to arbitrary n,m [12].
Two well-studied models in the data stream literature are the insertion-only model and theturnstile model. In the former model, it is required that āt > 0 for all t ā [m], whereas in thelatter āt can be any integer in āM, . . . ,M. It is known that there are significant differencesbetween these models. For instance, identifying an index i ā [n] for which |xi| > 1
10
ānj=1 |xj | can
be accomplished with only O(log(n)) bits of space in the insertion-only model [10], but requiresĪ©(log2(n)) bits in the turnstile model [38]. This log(n) gap between the complexity in the twomodels occurs in many other important streaming problems.
Due to this ācomplexity gapā, it is perhaps surprising that no intermediate streaming modelhad been systematically studied in the literature before. For motivation on the usefulness of sucha model, we note that nearly all of the lower bounds for turnstile streams involve inserting a largenumber of items before deleting nearly all of them [41, 38, 39, 37]. This behavior seems unlikely inpractice, as the resulting norm āfāp becomes arbitrarily small regardless of the size of the stream.In this paper, we introduce a new model which avoids the lower bounds for turnstile streams bylower bounding the norm āfāp. We remark that while similar notions of bounded deletion streamshave been considered for their practical applicability [26] (see also [21], where a bound on themaximum number of edges that could be deleted in a graph stream was useful), to the best of ourknowledge there is no comprehensive theoretical study of data stream algorithms in this setting.
Formally, we define the insertion vector I ā Rn to be the frequency vector of the substream ofpositive updates (āt ā„ 0), and the deletion vector D ā Rn to be the entry-wise absolute value ofthe frequency vector of the substream of negative updates. Then f = I ā D by definition. Ourmodel is as follows.
Definition 1. For Ī± ā„ 1 and p ā„ 0, a data stream f satisfies the Lp Ī±-property if āI+Dāp ā¤ Ī±āfāp.
For p = 1, the definition simply asserts that the final L1 norm of f must be no less than a1/Ī± fraction of the total weight of updates in the stream
āmt=1 |āt|. For strict turnstile streams,
this is equivalent to the number of deletions being less than a (1ā 1/Ī±) fraction of the number ofinsertions, hence a bounded deletion stream.
For p = 0, the Ī±-property simply states that āfā0, the number of non-zero coordinates at theend of the stream, is no smaller than a 1/Ī± fraction of the number of distinct elements seen in thestream (known as the F0 of the stream). Importantly, note that for both cases this constraint needonly hold at the time of query, and not necessarily at every point in the stream.
Observe for Ī± = 1, we recover the insertion-only model, whereas for Ī± = mM in the L1 caseor Ī± = n in the L0 case we recover the turnstile model (with the minor exception of streams withāfāp = 0). So Ī±-property streams are a natural parameterized intermediate model between the
3
Problem Turnstile L.B. Ī±-Property U.B. Citation Notes
Ē«-Heavy Hitters Ī©(Ē«ā1 log2(n)) O(Ē«ā1 log(n) log(Ī±)) [38]Strict-turnstilesucceeds w.h.p.
Ē«-Heavy Hitters Ī©(Ē«ā1 log2(n)) O(Ē«ā1 log(n) log(Ī±)) [38]General-turnstile
Ī“ = O(1)
Inner Product Ī©(Ē«ā1 log(n)) O(Ē«ā1 log(Ī±)) Theorem 21 General-turnstile
L1 Estimation Ī©(log(n)) O(log(Ī±)) Theorem 16 Strict-turnstile
L1 Estimation Ī©(Ē«ā2 log(n))O(Ē«ā2 log(Ī±))+ log(n))
[39] General-turnstile
L0 Estimation Ī©(Ē«ā2 log(n))O(Ē«ā2 log(Ī±))+ log(n))
[39] General-turnstile
L1 Sampling Ī©(log2(n)) O(log(n) log(Ī±)) [38] General-turnstile (ā)Support Sampling Ī©(log2(n)) O(log(n) log(Ī±)) [41] Strict-turnstile
Figure 1: The best known lower bounds (L.B.) for classic data stream problems in the turnstilemodel, along with the upper bounds (U.B.) for Ī±-property streams from this paper. The notesspecify whether an U.B./L.B. pair applies to the strict or general turnstile model. For simplicity,we have suppressed log log(n) and log(1/Ē«) terms, and all results are for Ī“ = O(1) failure probability,unless otherwise stated. (ā) L1 sampling note: strong Ī±-property, with Ē« = Ī(1) for both U.B. &L.B.
insertion-only and turnstile models. For clarity, we use the term unbounded deletion stream to referto a (general or strict) turnstile stream which does not satisfy the Ī± property for Ī± = o(n).
For many applications of turnstile data streaming algorithms, the streams in question are in factĪ±-property streams for small values of Ī±. For instance, in network traffic monitoring it is useful toestimate differences between network traffic patterns across distinct time intervals or routers [50].If f1i , f
2i represent the number of packets sent between the i-th [source, destination] IP address pair
in the first and second intervals (or routers), then the stream in question is f1 ā f2. In realisticsystems, the traffic behavior will not be identical across days or routers, and even differences assmall as 0.1% in overall traffic behavior (i.e. āf1 ā f2ā1 > .001āf1 + f2ā1) will result in Ī± < 1000(which is significantly smaller than the theoretical universe size of n ā 2256 potential IP addressespairs in IPv6).
A similar case for small Ī± can be made for differences between streams whenever these differencesare not arbitrarily small. This includes applications in streams of financial transactions, sensornetwork data, and telecommunication call records [34, 20], as well as for identifying newly trendingsearch terms, detecting DDoS attacks, and estimating the spread of malicious worms [51, 24, 48,44, 58].
A setting in which Ī± is likely even smaller is database analytics. For instance, an importanttool for database synchronization is Remote Differential Compression (RDC)[55, 3], which allowssimilar files to be compared between a client and server by transmitting only the differences betweenthem. For files given by large data streams, one can feed these differences back into sketches of thefile to complete the synchronization. Even if as much as a half of the file must be resynchronizedbetween client and sever (an inordinately large fraction for typical RDC applications), streamingalgorithms with Ī± = 2 would suffice to recover the data.
For L0, there are important applications of streams with bounded ratio Ī± = F0/L0. For example,L0 estimation is applicable to networks of cheap moving sensors, such as those monitoring wildlife
4
movement or water-flow patterns [34]. In such networks, some degree of clustering is expected (inregions of abundant food, points of water-flow accumulation), and these clusters will be consistentlyoccupied by sensors, resulting in a bounded ratio of inactive to active regions. Furthermore, innetworking one often needs to estimate the number of distinct IP addresses with active networkconnections at a given time [27, 50]. Here we also observe some degree of clustering on high-activityIPās, with persistently active IPās likely resulting in an L0 to F0 ratio much larger than 1/n (wheren is the universe size of IP addresses).
In many of the above applications, Ī± can even be regarded as a constant when compared withn. For such applications, the space improvements detailed in Figure 1 are considerable, and reducethe space complexity of the problems nearly or exactly to known upper bounds for insertion-onlystreams [10, 49, 40, 38].
Finally, we remark that in some cases it is not unreasonable to assume that the magnitude ofevery coordinate would be bounded by some fraction of the updates to it. For instance, in thecase of RDC it is seems likely that none of the files would be totally removed. We summarize thisstronger guarantee as the strong Ī±-property.
Definition 2. For Ī± ā„ 1, a data stream f satisfies the strong Ī±-property if Ii +Di ā¤ Ī±|fi| for alli ā [n].
Note that this property is strictly stronger that the Lp Ī±-property for any p ā„ 0. In particular,it forces fi 6= 0 if i is updated in the stream. In this paper, however, we focus primarily on themore general Ī±-property of Definition 1, and use Ī±-property to refer to Definition 1 unless otherwiseexplicitly stated. Nevertheless, we show that our lower bounds for Lp heavy hitters, L1 estimation,L1 sampling, and inner product estimation, all hold even for the more restricted strong Ī±-propertystreams.
1.1 Our Contributions
We show that for many well-studied streaming problems, we can replace a log(n) factor in algorithmsfor general turnstile streams with a log(Ī±) factor for Ī±-property streams. This is a significantimprovement for small values of Ī±. Our upper bound results, along with the lower bounds for theunbounded deletion case, are given in Figure 1. Several of our results come from the introductionof a new data structure, CSSampSim (Section 2), which produces point queries for the frequenciesfi with small additive error. Our improvements from CSSampSim and other L1 problems are theresult of new sampling techniques for Ī±-property streams
While sampling of streams has been studied in many papers, most have been in the contextof insertion only streams (see, e.g., [15, 18, 19, 17, 23, 31, 42, 45, 57]). Notable examples of theuse of sampling in the presence of deletions in a stream are [16, 30, 33, 28, 29]. We note thatthese works are concerned with unbiased estimators and do not provide the (1 Ā± Ē«)-approximaterelative error guarantees with small space that we obtain. They are also concerned with unboundeddeletion streams, whereas our algorithms exploit the Ī±-property of the underlying stream to obtainconsiderable savings.
In addition to upper bounds, we give matching or nearly matching lower bounds in the Ī±-property setting for all the problems we consider. In particular, for the L1-related problems (heavyhitters, L1 estimation, L1 sampling, and inner product estimation), we show that these lowerbounds hold even for the stricter case of strong Ī±-property streams.
We also demonstrate that for general turnstile streams, obtaining a constant approximation ofthe L1 still requires Ī©(log(n))-bits for Ī±-property streams. For streams with unbounded deletions,there is an Ī©(Ē«ā2 log(n)) lower bound for (1 Ā± Ē«)-approximation [39]. Although we cannot remove
5
this log n factor for Ī±-property streams, we are able to show an upper bound of O(Ē«ā2 logĪ±+log n)bits of space for strong Ī±-property streams, where the O notation hides log(1/Ē«)+ log log n factors.We thus separate the dependence of Ē«ā2 and log n in the space complexity, illustrating an additionalbenefit of Ī±-property streams, and show a matching lower bound for strong Ī±-property streams.
1.2 Our Techniques
Our results for L1 streaming problems in Sections 2 to 5 are built off of the observation that forĪ± property streams the number of insertions and deletions made to a given i ā [n] are both upperbounded by Ī±āfā1. We can then think of sampling updates to i as a biased coin flip with bias atleast 1/2 + (Ī±āfā1)ā1. By sampling poly(Ī±/Ē«) stream updates and scaling up, we can recover thedifference between the number of insertions and deletions up to an additive Ē«āfā1 error.
To exploit this fact, in Section 2 we introduce a data structure CSSampSim inspired by the classicCountsketch of [14], which simulates running each row of Countsketch on a small uniform sample ofstream updates. Our data structure does not correspond to a valid instantiation of Countsketch onany stream since we sample different stream updates for different rows of Countsketch. Nevertheless,we show via a Bernstein inequality that our data structure obtains the Countsketch guarantee plusan Ē«āfā1 additive error, with only a logarithmic dependence on Ē« in the space. This results inmore efficient algorithms for the L1 heavy hitters problem (Section 3), and is also used in our L1
sampling algorithm (Section 4). We are able to argue that the counters used in our algorithms canbe represented with much fewer than log n bits because we sample a very small number of streamupdates.
Additionally, we demonstrate that sampling poly(Ī±/Ē«) updates preserves the inner productbetween Ī±-property streams f, g up to an additive Ē«āfā1āgā1 error. Then by hashing the sampleduniverse down modulo a sufficiently large prime, we show that the inner product remains preserved,allowing us to estimate it in O(Ē«ā1 log(Ī±)) space (Section 2.2).
Our algorithm for L1 estimation (Section 5) utilizes our biased coin observation to show thatsampling will recover the L1 of a strict turnstile Ī±-property stream. To carry out the sampling ino(log(n)) space, give a alternate analysis of the well known Morris counting algorithm, giving betterspace but worse error bounds. This allows us to obtain a rough estimate of the position in thestream so that elements can be sampled with the correct probability. For L1 estimation in generalturnstile streams, we analyze a virtual stream which corresponds to scaling our input stream byCauchy random variables, argue it still has the Ī±-property, and apply our sampling analysis for L1
estimation on it.Our results for the L0 streaming problems in Sections 6 and 7 mainly exploit the Ī±-property
in sub-sampling algorithms. Namely, many data structure for L0 streaming problems subsamplethe universe [n] at log(n) levels, corresponding to log(n) possible thresholds which could be O(1)-approximations of the L0. If, however, an O(1) approximation were known in advance, we couldimmediately subsample to this level and remove the log(n) factor from the space bound. For Ī±property streams, we note that the non-decreasing values F t
0 of F0 after t updates must be boundedin the interval [Lt
0, O(Ī±)L0]. Thus, by employing an O(1) estimator Rt of F t0 , we show that it suffices
to subsample to only the O(log(Ī±/Ē«)) levels which are closest to log(Rt) at time t, from which ourspace improvements follows.
1.3 Preliminaries
If g is any function of the updates of the stream, for t ā [m] we write gt to denote the value of gafter the updates (i1,ā1), . . . , (it,āt). For Sections 2 to 5, it will suffice to assume āt ā ā1, 1
6
for all t ā [m]. For general updates, we can implicitly consider them to be several consecutiveupdates in ā1, 1, and our analysis will hold in this expanded stream. This implicit expandingof updates is only necessary for our algorithms which sample updates with some probability p. Ifan update |āt| > 1 arrives, we update our data structures with the value sign(āt) Ā· Bin(|āt|, p),where Bin(n, p) is the binomial random variable on n trials with success probability p, which hasthe same effect. In this unit-update setting, the L1 Ī± property reduces to m ā¤ Ī±āfmā1, so this isthe definition which we will use for the L1 Ī± property. We use the term unbounded deletion streamto refer to streams without the Ī±-property (or equivalently streams that have the Ī± = poly(n)-property and can also have all 0 frequencies). For Sections 2 to 5, we will consider only the L1
Ī±-property, and thus drop the L1 in these sections for simplicity, and for Sections 6 and 7 we willconsider only the L0 Ī±-property, and similarly drop the L0 there.
We call a vector y ā Rn k-sparse if it has at most k non-zero entries. For a given vector f ā Rn,let Errkp(f) = miny kāsparse āf ā yāp. In other words, Errkp(f) is the p-norm of f with the k heaviestentries removed. We call the argument minimizer of āf ā yāp the best k-sparse approximationto f . Finally, we use the term with high probability (w.h.p.) to describe events that occur withprobability 1 ā nāc, where c is a constant. Events occur with low probability (w.l.p.) if they area complement to a w.h.p. event. We will often ignore the precise constant c when it only factorsinto our memory usage as a constant.
2 Frequency Estimation via Sampling
In this section, we will develop many of the tools needed for answering approximate queries aboutĪ± property streams. Primarily, we develop a data structure CSSampSim, inspired by the classicCountsketch of [14], that computes frequency estimates of items in the stream by sampling. Thisdata structure will immediately result in an improved heavy hitters algorithm in Section 3, and isat the heart of our L1 sampler in Section 4. In this section, we will write Ī±-property to refer to theL1 Ī±-property throughout.
Firstly, for Ī±-property streams, the following observation is crucial to many of our results.Given a fixed item i ā [n], by sampling at least poly(Ī±/Ē«) stream updates we can preserve fi (afterscaling) up to an additive Ē«āfā1 error.
Lemma 1 (Sampling Lemma). Let f be the frequency vector of a general turnstile stream withthe Ī±-property, and let fā be the frequency vector of a uniformly sampled substream scaled up by1p , where each update is sampled uniformly with probability p > Ī±2Ē«ā3 log(Ī“ā1)/m. Then withprobability at least 1ā Ī“ for i ā [n], we have
|fāi ā fi| < Ē«āfā1
Moreover, we haveān
i=1 fāi =
āni=1 fi Ā± Ē«āfā1.
Proof. Assume we sample each update to the stream independently with probability p. Let f+i , fāi
be the number of insertions and deletions of element i respectively, so fi = f+i ā fāi . Let X+j
indicate that the j-th insertion to item i is sampled. First, if Ē«āfā1 < f+i then by Chernoff bounds:
Pr[|1p
f+
iā
j=1
X+j ā f+i | ā„ Ē«āfā1] ā¤ 2 exp
(āpf+i (Ē«āfā1)23(f+i )2
)
ā¤ exp(
ā pĒ«3m/Ī±2)
7
CSSampSim (CSSS)
Input: sensitivity parameters k ā„ 1, Ē« ā (0, 1).
1. Set S = Ī(Ī±2
Ē«2 T2 log(n)), where T ā„ 4/Ē«2 + log(n).
2. Instantiate d Ć 6k count-sketch table A, for d = O(log(n)). For each table entry aij ā A,store two values a+ij and aāij, both initialized to 0.
3. Select 4-wise independent hash functions hi : [n]ā [6k], gi : [n]ā 1,ā1, for i ā [d].4. Set pā 0, and start log(n) bit counter to store the position in the stream.5. On Update (it,āt):
(a) if t = 2r log(S) + 1 for any r ā„ 1, then for every entry aij ā A set a+ij āBin(a+ij , 1/2),
aāij āBin(aāij , 1/2), and pā p+ 1
(b) On Update (it,āt): Sample (it,āt) with probability 2āp. If sampled, then for i ā [d]
i. if ātgi(it) > 0, set a+i,hi(it)
ā a+ihi(it)
+ātgi(it).
ii. else set aāi,hi(it)ā aāihi(it)
+ |ātgi(it)|6. On Query for fj: return yāj = median2pgi(j) Ā· (a+i,hi(j)
ā aāi,hi(j)) | i ā [d].
Figure 2: Our data structure to simulate running Countsketch on a uniform sample of the stream.
where the last inequality holds because f+i ā¤ m ā¤ Ī±āfā1. Taking p ā„ Ī±2 log(1/Ī“)/(Ē«3m) gives the
desired probability Ī“. Now if Ē«āfā1 ā„ f+i , then Pr[1pāf+
ij=1X
+j ā„ f+i + Ē«āfā1] ā¤ exp
(
ā pf+i Ē«āfā1/f+i
)
ā¤ exp(
ā pĒ«m/Ī±)
ā¤ Ī“ for the same value of p. Applying the same argument to fāi , we obtainfāi = f+i ā fāi Ā± 2Ē«āfā1 = fi Ā± 2Ē«āfā1 as needed after rescaling Ē«. For the final statement, we canconsider all updates to the stream as being made to a single element i, and then simply apply thesame argument given above.
2.1 Count-Sketch Sampling Simulator
We now describe the Countsketch algorithm of [14], which is a simple yet classic algorithm inthe data stream literature. For a parameter k and d = O(log(n)), it creates a d Ć 6k matrix Ainitialized to 0, and for every row i ā [d] it selects hash functions hi : [n] ā [6k], and gi : [n] ā1,ā1 from 4-wise independent uniform hash families. On every update (it,āt), it hashes thisupdate into every row i ā [d] by updating ai,hi(it) ā ai,hi(it) + gi(it)āt. It then outputs yā whereyāj = mediangi(j)ai,hi(j) | i ā [d]. The guarantee of one row of the Countsketch is as follows.
Lemma 2. Let f ā Rn be the frequency vector of any general turnstile stream hashed into a dĆ 6kCountsketch table. Then with probability at least 2/3, for a given j ā [n] and row i ā [d] we haveā£
ā£gi(j)ai,hi(j) ā fjā£
ā£ < kā1/2Errk2(f). It follows if d = O(log(n)), then with high probability, for all
j ā [n] we haveā£
ā£yāj ā fjā£
ā£ < kā1/2Errk2(f), where yā is the estimate of Countsketch. The space
required is O(k log2(n)) bits.
We now introduce a data structure which simulates running Countsketch on a uniform sampleof the stream of size poly(Ī± log(n)/Ē«). The full data structure is given in Figure 2. Note that for afixed row, each update is chosen with probability at least 2āp ā„ S/(2m) = Ī©(Ī±2T 2 log(n)/(Ē«2m)).We will use this fact to apply Lemma 1 with Ē«ā² = (Ē«/T ) and Ī“ = 1/poly(n). The parameter T willbe poly(log(n)/Ē«), and we introduce it as a new symbol purely for clarity.
Now the updates to each row in CSSS are sampled independently from the other rows, thus CSSSmay not represent running Countsketch on a single valid sample. However, each row independently
8
contains the result of running a row of Countsketch on a valid sample of the stream. Since theCountsketch guarantee holds with probability 2/3 for each row, and we simply take the median ofO(log(n)) rows to obtain high probability, it follows that the output of CSSS will still satisfy anadditive error bound w.h.p. if each row also satisfies that error bound with probability 2/3.
By Lemma 1 with sensitivity parameter (Ē«/T ), we know that we preserve the weight of all itemsfi with |fi| ā„ 1/Tāfā1 up to a (1 Ā± Ē«) factor w.h.p. after sampling and rescaling. For all smallerelements, however, we obtain error additive in Ē«āfā1/T . This gives rise to the natural division ofthe coordinates of f . Let big ā [n] be the set of i with |fi| ā„ 1/Tāfā1, and let small ā [n] be thecomplement. Let E ā [n] be the top k heaviest coordinates in f , and let s ā Rn be a fixed samplevector of f after rescaling by pā1. Since Errk2(s) = miny k-sparse ās ā yā2, it follows that Errk2(s) ā¤āsbig\Eā2 + āssmallā2. Furthermore, by Lemma 1 āsbig\Eā2 ā¤ (1 + Ē«)āfbig\Eā2 ā¤ (1 + Ē«)Errk2(f). Soit remains to upper bound āssmallā22, which we do in the following technical lemma.
The intuition for the Lemma is that āfsmallā2 is maximized when all the L1 weight is con-centrated in T elements, thus āfsmallā2 ā¤ (T (āfā1/T )2)1/2 = āfā1/T 1/2. By the Ī± property, weknow that the number of insertions made to the elements of small is bounded by Ī±āfā1. Thus,computing the variance of āssmallā22 and applying Bernsteinās inequality, we obtain a similar upperbound for āssmallā2.
Lemma 3. If s is the rescaled frequency vector resulting from uniformly sampling with probabilityp ā„ S/(2m), where S = Ī©(Ī±2T 2 log(n)) for T = Ī©(log(n)), of a general turnstile stream f withthe Ī± property, then we have āssmallā2 ā¤ 2Tā1/2āfā1 with high probability.
Proof. Fix f ā Rn, and assume that āfā1 > S (otherwise we could just sample all the updatesand our counters in Countsketch would never exceed log(Ī±S)). For any i ā [n], let fi = f+i ā fāi ,so that f = f+ ā fā, and let si be our rescaled sample of fi. By definition, for each i ā smallwe have fi <
1T āfā1. Thus the quantity āfsmallā22 is maximized by having T coordinates equal
to 1T āfā1. Thus āfsmallā22 ā¤ T (āfā1T )2 ā¤ āfā2
1
T . Now note that if we condition on the fact thatsi ā¤ 2/Tāfā for all i ā small, which occurs with probability greater than 1 ā nā5 by Lemma1, then since E[
ā
i |si|4] = O(n4), all of the following expectations change by a factor of at most(1 Ā± 1/n) by conditioning on this fact. Thus we can safely ignore this conditioning and proceedby analyzing E[āssmallā22] and E[āssmallā44] without this condition, but use the conditioning whenapplying Bernsteinās inequality later.
We have that |si| = |1pāf+
i +fāi
j=1 Xij |, where Xij is an indicator random variable that is Ā±1 ifthe j-th update to fi is sampled, and 0 otherwise, where the sign depends on whether or not theupdate was an insertion or deletion and p ā„ S/(2m) is the probability of sampling an update. Then
E[|si|] = E[|1pāf+
i +fāi
j=1 Xij |] = |fi|. Furthermore, we have E[s2i ] =1p2E[(
āf+
i +fāi
j=1 Xij)2]
=1
p2(
f+
i +fāi
ā
j=1
E[X2ij ] +
ā
j1 6=j2
E[Xij1Xij2 ])
=1
p(f+i + fāi ) + ((f+i )(f+i ā 1) + (fāi )(fāi ā 1)ā 2f+i f
āi )
Substituting fāi = f+i āfi in part of the above equation gives E[s2i ] =1p(f
+i +fāi )+f2i +fiā2f+i ā¤
1p(f
+i + fāi ) + 2f2i . So E[
ā
iāsmall s2i ] ā¤ 1
p(āf+smallā1 + āfāsmallā1) + 2āfsmallā22, which is at most
Ī±p āfā1+2
āfā21T by the Ī±-property of f and the upper bound on āfsmallā22. Now E[s4i ] =
1p4E[(
āf+
i +fāi
j=1
9
Xij)4], which we can write as
E[
ā
j
X4ij + 4
ā
j1 6=j2
X3ij1Xij2 + 12
ā
j1,j2,j3distinct
X2ij1Xij2Xij3 +
ā
j1,j2,j3,j4distinct
Xij1Xij2Xij3Xij4
] 1
p4(1)
We analyze Equation 1 term by term. First note that E[ā
j X4ij ] = p(f+i +fāi ), and E[
ā
j1 6=j2X3
ij1Xij2 ] =
p2(
(f+i )(f+i ā1)+(fāi )(fāi ā1)ā2(f+i fāi ))
. Substituting fāi = f+i āfi, we obtain E[ā
j1 6=j2X3
ij1Xij2 ] ā¤
2p2f2i . Now for the third term we have E[ā
j1 6=j2 6=j3X2
ij1Xij2Xij3 ] = p3
(
(f+i )(f+i ā 1)(f+i ā 2) +
(fāi )(fāi ā 1)(fāi ā 2)+ f+i (fāi )(fāi ā 1)+ fāi (f+i )(f+i ā 1)ā 2(f+i fāi )(f+i ā 1)ā 2(f+i f
āi )(fāi ā 1)
)
,which after the same substitution is upper bounded by 10p3 maxf+i , |fi|f2i ā¤ 10p3Ī±āfā1f2i , wherethe last inequality follows from the Ī±-property of f . Finally, the last term is E[
ā
j1 6=j2 6=j3 6=j4
Xij1Xij2Xij3Xij4 ] = p4(
f+i (f+i ā 1)(f+i ā 2)(f+i ā 3) + fāi (fāi ā 1)(fāi ā 2)(fāi ā 3) + 6(f+i (f+i ā1))(fāi (fāi ā1))ā4
(
f+i (f+i ā1)(f+i ā2)fāi +fāi (fāi ā1)(fāi ā2)f+i))
. Making the same substitutionallows us to bound this above by p4(24f4i ā 12fif
+i + 12(f+i )2) ā¤ p4(36(fi)4 + 24(f+i )2).
Now f satisfies the Ī± property. Thus āf+smallā1 + |fāsmallā1 ā¤ Ī±āfā1, so summing the boundsfrom the last paragraph over all i ā small we obtain
E[1
p4āssmallā44] ā¤
1
p3Ī±āfā1 +
8
p2āfsmallā22
+120Ī±
pāfsmallā22 |fā1 + 36āfsmallā44 + 24Ī±2āfā21
Now 1/p ā¤ 2mS ā¤ 2Ī±
S āfā1 by the Ī±-property. Applying this with the fact that āfsmallā22 is maximized
atāfā2
1
T , and similarly āfsmallā22 is maximized at T (āfā1/T )4 = āfā41/T 3 we have that E[ 1p4āssmallā44]
is at most8Ī±4
S3āfā41 +
36Ī±2
S2Tāfā41 +
240Ī±2
STāfā41 +
36
T 3āfā41 + 24Ī±2āfā21
Since we have āfā1 > S > Ī±2T 2, the above expectation is upper bounded by 300T 3 āfā41. We now
apply Bernsteinās inequality. Using the fact that we conditioned on earlier, we have the upper bound1psi ā¤ 2āfā1/T , so plugging this into Bernsteinās inequality yields: Pr[
ā£
ā£āssmallā22 ā E[āssmallā22]ā£
ā£ >āfā21T ] ā¤ exp
(
ā āfā41/(2T 2)
300āfā41/T 3+2āfā3
1/(3T 2)
)
ā¤ exp(
ā T/604)
Finally, T = Ī©(log(n)), so the above
probability is poly( 1n) for T = c log(n) and a sufficiently large constant c. Since the expectation
E[āssmallā22] is at most Ī±p āfā1 + 2āfā21/T ā¤ 3āfā21/T , it follows that āssmallā2 ā¤ 2āfā1ā
Twith high
probability, which is the desired result.
Applying the result of Lemma 3, along with the bound on Errk2(s) from the previous paragraphs,we obtain the following corollary.
Corollary 1. With high probability, if s is as in Lemma 3, then Errk2(s) ā¤ (1 + Ē«)Errk2(f) +2Tā1/2āfā1
Now we analyze the error from CSSS. Observe that each row in CSSS contains the result ofhashing a uniform sample into 6k buckets. Let si ā Rn be the frequency vector, after scaling by1/p, of the sample hashed into the i-th row of CSSS, and let yi ā Rn be the estimate of si takenfrom the i-th row of CSSS. Let Ļ(i) : n ā [O(log(n))] be the row from which Countsketch returns
its estimate for fi, meaning yāi = yĻ(i)i .
10
Theorem 1. For Ē« > 0, k > 1, with high probability, when run on a general turnstile stream f ā Rn
with the Ī± property, CSSS with 6k columns, O(log(n)) rows, returns yā such that, for every i ā [n]we have
|yāi ā fi| ā¤ 2(1
k1/2Errk2(f) + Ē«āfā1)
It follows that if y is the best k-sparse approximation to yā, then Errk2(f) ā¤ āfāyā2 ā¤ 5(k1/2Ē«āfā1+Errk2(f)) with high probability. The space required is O(k log(n) log(Ī± log(n)/Ē«)).
Proof. Set S = Ī(Ī±2
Ē«2T 2 log(n)) as in Figure 2, and set T = 4/Ē«2 +O(log(n)). Fix any i ā [n]. Now
CSSS samples updates uniformly with probability p > S/(2m), so applying Lemma 1 to our samplesji of fi for each row j and union bounding, with high probability we have sji = fiĀ± Ē«
T āfā1 = sqi forall rows j, q. Then by the Countsketch guarantee of Lemma 2, for each row q that yqi = fiĀ±( Ē«
T āfā1+kā1/2 maxj Err
k2(s
j)) with probability 2/3. Thus yāi = yĻ(i)i = fi Ā± ( Ē«
T āfā1 + kā1/2 maxj Errk2(s
j))
with high probability. Now noting thatāT ā„ 2/Ē«, we apply Corollary 1 to Errk2(s
j) and unionbound over all j ā [O(log(n))] to obtain maxj Err
k2(s
j) ā¤ (1 + Ē«)Errk2(f) + Ē«āfā1 w.h.p., and unionbounding over all i ā [n] gives |yāi āfi| ā¤ 2( 1
k1/2Errk2(f)+Ē«āfā1) for all i ā [n] with high probability.
For the second claim, note that Errk2(f) ā¤ āf ā f ā²ā2 for any k-sparse vector f ā², from which thefirst inequality follows. Now if the top k coordinates are the same in yā as in f , then āf ā yā2 is atmost Errk2(f) plus the L2 error from estimating the top k elements, which is at most (4k(Ē«āfā1 +kā1/2Errk2(f))
2)1/2 ā¤ 2(k1/2Ē«āfā1 + Errk2(f)). In the worst case the top k coordinates of yā aredisjoint from the top k in f . Applying the triangle inequality, the error is at most the error on thetop k coordinates of yā plus the error on the top k in f . Thus āf ā yā2 ā¤ 5(k1/2Ē«āfā1 + Errk2(f))as required.
For the space bound, note that the Countsketch table has O(k log(n)) entries, each of whichstores two counters which holds O(S) samples in expectation. So the counters never exceedpoly(S) = poly(Ī±Ē« log(n)) w.h.p. by Chernoff bounds, and so can be stored using O(log(Ī± log(n)/Ē«))bits each (we can simply terminate if a counter gets too large).
We now address how the error term of Theorem 1 can be estimated so as to bound the potentialerror. This will be necessary for our L1 sampling algorithm. We first state the following well knownfact about norm estimation [56], and give a proof for completeness.
Lemma 4. Let R ā Rk be a row of the Countsketch matrix with k columns run on a stream withfrequency vector f . Then with probability 99/100, we have
āki=1R
2i = (1Ā±O(kā1/2))āfā22
Proof. Let 1(E) be the indicator function that is 1 if the event E is true, and 0 otherwise. Leth : [n] ā [k] and g : [n] ā 0, 1 be 4-wise independent hash functions which specify the rowof Countsketch. Then E[
āki=1R
2i ] = E[
ākj=1(
āni=1 1(h(i) = j)g(i)fi)
2] = E[ān
i=1 f2i ] + E[
ākj=1
(ā
i16=i2 1(h(i1) = j)1(h(i2) = j)g(i1)g(i2)fi1fi2)2]. By the 2-wise independence of g, the second
quantity is 0, and we are left with āfā22. By a similar computation using the full 4-wise indepen-dence, we can show that V ar(
āki=1R
2i ) = 2(āfā42 ā āfā44)/k ā¤ 2āfā42/k. Then by Chebyshevās
inequality, we obtain Pr[|āki=1R
2i ā āfā22| > 10
ā2āfā22/
āk] < 1/100 as needed.
Lemma 5. For k > 1, Ē« ā (0, 1), given a Ī±-property stream f , there is an algorithm that canproduce an estimate v such that Errk2(f) ā¤ v ā¤ 45k1/2Ē«āfā1 +20Errk2(f) with high probability. Thespace required is the space needed to run two instances of CSSS with paramters k, Ē«.
Proof. By Lemma 4, the L2 of row i of CSSS with a constant number of columns will be a (1 Ā±1/2) approximation of āsiā2 with probability 99/100, where si is the scaled up sampled vector
11
corresponding to row i. Our algorithm is to run two copies of CSSS on the entire stream, say CSSS1and CSSS2, with k columns and sensitivity Ē«. At the end of the stream we compute yā and y fromCSSS1 where y is the best k-sparse approximation to yā. We then feed āy into CSSS2. The resultingL2 of the i-th row of CSSS2 is (1 Ā± 1/2)āsi2 ā yā2 (after rescaling of si) with probability 99/100,where si2 is the sample corresponding to the i-th row of CSSS2.
Now let T = 4/Ē«2 as in Theorem 1, and let si be the sample corresponding to the i-th row ofCSSS1. Then for any i, āsismallā2 ā¤ 2Tā1/2āfā1 w.h.p. by Lemma 3, and the fact that āfsmallā2 ā¤Tā1/2āfā1 follows from the definition of small. Furthermore, by Lemma 1, we know w.h.p. that|sijāfj| < Ē«|fj| for all j ā big, and thus āsibigāfbigā2 ā¤ Ē«āfā1. Then by the reverse triangle inequality
we haveā£
ā£āsi ā yā2 ā āf ā yā2ā£
ā£ ā¤ āsi ā fā2 ā¤ āsismall ā fsmallā2 + āsibig ā fbigā2 ā¤ 5Ē«āfā1. Thus
āsiā yā2 = āfā yā2Ā±5Ē«āfā1 for all rows i w.h.p, so the L2 of the i-th row of CSSS2 at the end of thealgorithm is the value vi such that 1
2(āfāyā2ā5Ē«āfā1) ā¤ vi ā¤ 2(āfāyā2+5Ē«āfā1) with probability9/10 (note that the bounds do not depend on i). Taking v = 2 Ā·median(v1, . . . , vO(log(n)))+5Ē«āfā1,it follows that āf ā yā2 ā¤ v ā¤ 4āf ā yā2 + 25Ē«āfā1 with high probability. Applying the upper andlower bounds on āf ā yā2 given in Theorem 1 yields the desired result. The space required is thespace needed to run two instances of CSSS, as stated.
2.2 Inner Products
Given two streams f, g ā Rn, the problem of estimating the inner product ćf, gć = āni=1 figi has
attracted considerable attention in the streaming community for its usage in estimating the size ofjoin and self-join relations for databases [5, 53, 52]. For unbounded deletion streams, to obtain anĒ«āfā1āgā1-additive error approximation the best known result requires O(Ē«ā1 log(n)) bits with aconstant probability of success [22]. We show in Theorem 21 that Ī©(Ē«ā1 log(Ī±)) bits are requiredeven when f, g are strong Ī±-property streams. This also gives a matching Ī©(Ē«ā1 log(n)) lower boundfor the unbounded deletion case
In Theorem 2, we give a matching upper bound for Ī±-property streams up to log log(n) andlog(1/Ē«) terms. We first prove the following technical Lemma, which shows that inner products arepreserved under a sufficiently large sample.
Lemma 6. Let f, g ā Rn be two Ī±-property streams with lengths mf ,mg respectively. Let fā², gā² ā Rn
be unscaled uniform samples of f and g, sampled with probability pf ā„ s/mf and pg ā„ s/mg
respectively, where s = Ī©(Ī±2/Ē«2). Then with probability 99/100, we have ćpā1f f ā², pā1
g gā²ć = ćf, g, ć Ā±Ē«āfā1āgā1.
Proof. We have E[ćpā1f f ā², pā1
g gā²ć] = ā
i E[pā1f f ā²i ]E[p
ā1g gā²i] = ćf, gć. Now the (f ā²ig
ā²i)ās are indepen-
dent, and thus we need only compute Var(pā1f pā1
g f ā²igā²i). We have Var(pā1
f pā1g f ā²ig
ā²i) = (pā1
f pā1g )2
E[(f ā²i)2]E[(gā²i)
2] ā (figi)2. Let Xi,j be Ā±1 if the j-th update to fi is sampled, where the sign
indicates whether the update was an insertion or deletion. Let f+, fā be the insertion and dele-tion vectors of f respectively (see Definition 1), and let F = f+ + fi. Then f = f+ ā fā, andf ā²i =
āf+
i +fāi
j=1 Xij . We have
E[(f ā²i)2/p2f ] = E[pā2
f (
f+
i +fāi
ā
j=1
X2ij +
ā
j1 6=j2
Xij1Xij2)]
ā¤ pā1f Fi + (f+i )2 + (fāi )2 ā 2f+i f
āi
= pā1f Fi + f2i
12
Now define g+, gā and G = g+ + gā similarly. Note that the Ī±-property states that mf = āFā1 ā¤Ī±āfā1 and mg = āGā1 ā¤ Ī±āgā1. Moreover, it gives pā1
f ā¤ mf/s ā¤ Ē«2āfā1/Ī±, and pā1g ā¤ mg/s ā¤
Ē«2āgā1/Ī±. Then Var(ćpā1f f ā², pā1
g gā²ć = āni=1Var(p
ā1f pā1
g f ā²igā²i) ā¤
āni=1(f
2i p
ā1g Gi + g2i p
ā1f Fi + pā1
f pā1g
FiGi). First note thatān
i=1 pā1g f2i Gi ā¤ pā1
g āGā1āfā21 ā¤ Ē«2āgā21āfā21 and similarlyān
i=1 pā1f g2i
Fi ā¤ Ē«2āgā21āfā21. Finally, we haveān
i=1 pā1f pā1
g FiGi ā¤ pā1f pā1
g āFā1āGā1 ā¤ Ē«4āfā21āgā21, so the
variance is at most 3Ē«2āfā21āgā21. Then by Chebyshevās inequality:
Pr[ā£
ā£ćpā1f f ā², pā1
g gā²ć ā ćf, gćā£
ā£ > 30Ē«āfā1āgā1] ā¤ 1/100
and rescaling Ē« by a constant gives the desired result.
Our algorithm obtains such a sample as needed for Lemma 6 by sampling in exponentiallyincreasing intervals. Next, we hash the universe down by a sufficiently large prime to avoid collisionsin the samples, and then run a inner product estimator in the smaller universe. Note that thishashing is not pairwise independent, as that would require the space to be at least log n bits; ratherthe hashing just has the property that it preserves distinctness with good probability. We nowprove a fact that we will need for our desired complexity.
Lemma 7. Given a log(n) bit integer x, the value x (mod p) can be computed using only log log(n)+log(p) bits of space.
Proof. We initialize a counter c ā 0. Let x1, x2, . . . , xlog(n) be the bits of x, where x1 the leastsignificant bit. Then we set y1 = 1, and at every step t ā [log(n)] of our algorithm, we store yt, ytā1,where we compute yt as yt = 2ytā1 (mod p). This can be done in O(log(p)) space. Our algorithmthen takes log(n) steps, where on the t-th step it checks if xt = 1, and if so it updates c ā c + yt
(mod p), and otherwise sets c ā c. At the end we have c =ālog(n)
i=0 2ixi (mod p) = x (mod p) asdesired, and c never exceeds 2p, and can be stored using log(p) bits of space. The only other valuestored is the index t ā [log(n)], which can be stored using log log(n) bits as stated.
We now recall the Countsketch data structure (see Section 2). Countsketch picks a 2-wiseindependent hash function h : [n]ā [k] and a 4-wise independent hash function Ļ : [n]ā 1,ā1,and creates a vector A ā Rk. Then on every update (it,āt) to a stream f ā Rn, it updatesAh(it) ā Ah(it) + Ļitāt where Ļi = Ļ(i). Countsketch can be used to estimate inner products.In the following Lemma, we show that feeding uniform samples of Ī±-property streams f, g intotwo instances of Countmin, denoted A and B, then ćA,Bć is a good approximation of ćf, gć afterrescaling A and B by the inverse of the sampling probability.
Lemma 8. Let f ā², gā² be uniformly sampled rescaled vectors of general-turnstile Ī±-property streamsf, g with lengths mf ,mg and sampling probabilities pf , pg respectively. Suppose that pf ā„ s/mf andpg ā„ s/mg, where s = Ī©(Ī±2 log7(n)T 2Ē«ā10). Then let A ā Rk be the Countsketch vector run on f ā²,and let B ā Rk be the Countsketch vector run on gā², where A,B share the same hash function hand k = Ī(1/Ē«) Then
kā
i=1
AiBi = ćf, gć Ā± Ē«āfā1āgā1
with probability 11/13.
Proof. Set k = 1002/Ē«. Let Yij indicate h(i) = h(j), and let Xij = ĻiĻjfā²ig
ā²jYij . We have
āki=1AiBi = ćf ā², gā²ć + ā
i 6=j Xij . Let Ē«0 = Ī(log2(n)/Ē«3) and let T = Ī(log(n)/Ē«2), so we
13
can write s = Ī©(Ī±2 log(n)T 2/Ē«20) (this will align our notation with that of Lemma 3). DefineF = i ā [n] | |fi| ā„ āfā1/T and G = i ā [n] | |gi| ā„ āgā1/T. We now bound the term|āi 6=j,(i,j)āFĆGXij | ā¤
ā
(i,j)āFĆG Yij |f ā²igā²j|.Now by Lemma 1 and union bounding over all i ā [n], we have āf ā² ā fāā ā¤ Ē«0āfā1/T and
āgā² ā gāā ā¤ Ē«0āgā1/T with high probability, and we condition on this now. Thus for every(i, j) ā F Ć G, we have f ā²i = (1 Ā± Ē«0)fi and gā²j = (1 Ā± Ē«0)gj . It follows that
ā
(i,j)āFĆG Yij|f ā²igā²j | ā¤2ā
(i,j)āFĆG Yij |fi||gj | (using only Ē«0 < 1/4). Since E[Yij] = 1/k, we have
E[ā
(i,j)āFĆGYij|fi||gj |] ā¤
1
kāfā1āgā1
By Markovās inequality with k = 1002/Ē«, it follows that |āi 6=j,(i,j)āFĆGXij | ā¤ 2ā
(i,j)āFĆG Yij|fi||gj | ā¤ Ē«āfā1āgā1 with probability greater than 1 ā (1/1000 + O(nāc)) > 99/100, where theO(nāc) comes from conditioning on the high probability events from Lemma 1. Call this event E1.
Now let FC = [n] \ F , and GC = [n] \ G. Let A ā [n]2 be the set of all (i, j) with i 6= j andsuch that either i ā FC or j ā GC . Now consider the variables |Xij |i<j , and let Xij ,Xpq betwo such distinct variables. Then E[XijXpq] = E[f ā²ig
ā²if
ā²pg
ā²qYijYpqĻiĻjĻpĻq]. Now the variables Ļ are
independent from YijYpq, which are determined by h. Since i < j and p < q and (i, j) 6= (p, q), itfollows that one of i, j, p, q is unique among them. WLOG it is i, so by 4-wise independence of Ļwe have E[f ā²ig
ā²if
ā²pg
ā²qYijYpqĻiĻjĻpĻq] = E[f ā²ig
ā²if
ā²pg
ā²qYijYpqĻjĻpĻq]E[Ļi] = 0 = E[Xij ]E[Xpq]. Thus the
variables |Xij |i<j (and symmetrically |Xij |i>j) are uncorrelated so
Var(ā
i<j,(i,j)āAXij) =
ā
i<j,(i,j)āAVar(Xij) ā¤
ā
(i,j)āA(f ā²ig
ā²j)
2/k
Since E[ā
i<j,(i,j)āAXij ] = 0, by Chebyshevās inequality with k = 1002/Ē«, we have |āi<j,(i,j)āAXij| ā¤(Ē«ā
(i,j)āA(fā²ig
ā²j)
2)1/2 with probability 99/100. So by the union bound and a symmetric arguement
for j > i, we have |ā(i,j)āAXij | ā¤ |ā
i<j,(i,j)āAXij| + |ā
i>j,(i,j)āAXij | ā¤ 2(Ē«ā
(i,j)āA(fā²ig
ā²j)
2)1/2
with probability 1ā (2/100) = 49/50. Call this event E2. We have
ā
(i,j)āA(f ā²ig
ā²j)
2 =ā
i 6=j,(i,j)āFĆGC
(f ā²igā²j)
2 +ā
i 6=j,(i,j)āFCĆG(f ā²ig
ā²j)
2
+ā
i 6=j,(i,j)āFCĆGC
(f ā²igā²j)
2
Now the last term is at most āf ā²FCā22āgā²GCā22, which is at most 16āfā21āgā21/T 2 ā¤ Ē«4āfā21āgā21 w.h.p.
by Lemma 3 applied to both f and g and union bounding (note that FC ,GC are exactly the setsmall in Lemma 3 for their respective vectors). We hereafter condition on this w.h.p. event thatāf ā²FCā22 ā¤ 4āfā21/T and āgā²GCā22 ā¤ 4āgā21/T (note T > 1/Ē«2).
Now as noted earlier, w.h.p. we have f ā²i = (1 Ā± Ē«0)fi and f ā²j = (1 Ā± Ē«0)fj for i ā F and j ā G.Thus āf ā²Fā22 = (1Ā±O(Ē«0))āfFā22 < 2āfā21 and āgā²Gā22 ā¤ 2āgā21. Now the term
ā
i 6=j,(i,j)āFĆGC(f ā²igā²j)
2
is at mostā
iāF (fā²i)
2āgā²Gcā22 ā¤ O(Ē«2)āf ā²Fā22āgā21 ā¤ O(Ē«2)āfā21āgā21. Applying a symmetric argument,we obtain
ā
i 6=j,(i,j)āFCĆG(fā²ig
ā²j)
2 ā¤ O(Ē«2)āfā21āgā21 with high probability. Thus each of the three
terms is O(Ē«2)āfā21āgā21, soā
(i,j)āA(f ā²ig
ā²j)
2 = O(Ē«2)āfā21āgā21
14
with high probability. Call this event E3. Now by the union bound, Pr[E1 āŖ E2 āŖ E3] > 1ā (1/50 +1/100+O(nāc)) > 24/25. Conditioned on this, the error term |āi 6=jXij | ā¤ |
ā
i 6=j,(i,j)āFĆGXij |+|ā(i,j)āAXij | is at most Ē«āf ||1āgā1+2(Ē«
ā
(i,j)āA(fā²ig
ā²j)
2)1/2 = Ē«āf ||1āgā1+2(O(Ē«3)āfā21āgā21)1/2 =O(Ē«)āfā1āg|1 as desired. Thus with probability 24/25 we have
āki=1AiBi = ćf ā², gā²ćĀ±O(Ē«)āfā1āgā1.
By Lemma 6, noting that in the notation of this Lemma the vectors f ā², gā² have already been scaledby pā1
f and pā1g , we have ćf ā², gā²ć = ćf, gć Ā± Ē«āfā1āgā1 with probability 99/100. Altogether, with
probability 1ā (1/25 + 1/100) > 11/13 we haveāk
i=1AiBi = ćf, gć Ā± O(Ē«)āfā1āgā1, which is thedesired result after a constant rescaling of Ē«.
We will apply Lemma 8 to complete our proof of our main Theorem.
Theorem 2. Given two general-turnstile stream vectors f, g with the Ī±-property, there is a one-pass algorithm which with probability 11/13 produces an estimate IP(f, g) such that IP(f, g) =
ćf, gć Ā±O(Ē«)āfā1āgā1 using O(Ē«ā1 log(Ī± log(n)Ē« )) bits of space.
Proof. Let s = Ī(Ī±2 log(n)7/Ē«10), and let Ir = [sr, sr+2]. Then for every r = 1, 2, . . . , logs(n), wechoose a random prime P = [D,D3] where D = 100s4. We then do the following for the streamf (and apply the same following procedure for g). For every interval Ir and time step t ā Ir,we sample the t-th update (it,āt) to f with probability sār (we assume āt ā 1,ā1, for biggerupdates we expand them into multiple such unit updates.). If sampled, we let iā²t be the log(|P |)bit identity obtained by taking it modulo P . We then feed the unscaled update (iā²t,āt) into an
instance of count-sketch with k = O(1/Ē«) buckets. Call this instance CSfr . At any time step t, we
only store the instance CSfr and CS
fr+1 such that t ā Ir ā© Ir+1. At the end of the stream it is time
step mf , and fix r such that mf ā Ir ā© Ir+1.Now let f ā² ā Rn be the (scaled up by sr) sample of f taken in Ir. Let f
ā²ā² ā Rp be the (unscaled)
stream on |P | items that is actually fed into CSfr . Then CS
fr is run on the stream f ā²ā² which has a
universe of |P | items. Let F ā Rk be the Countsketch vector from the instance CSfr .Let f be the frequency vector f restricted to the suffix of the stream sr, sr + 1, . . . ,mf (these
were the updates that were being sampled from while running CSr). Since mf ā Ir+1, we havemf ā„ sr+1, so āf (sr)ā1 ā¤ m/s < Ē«āfā1 (by the Ī± property), meaning the L1 mass of the prefix ofthe stream 1, 2, . . . , sr is an Ē« fraction of the whole stream L1, so removing it changes the L1 by atmost Ē«āfā1. It follows that āf ā fā1 ā¤ O(Ē«)āfā1 and thus āfā1 = (1 Ā± O(Ē«))āfā1. If we let g bedefined analogously by replacing f with g in the previous paragraphs, we obtain āgāgā1 ā¤ O(Ē«)āgā1and āgā1 = (1Ā±O(Ē«))āgā1 as well.
Now with high probability we sampled fewer than 2sj+2/sj = 2s2 distinct identities whencreating f ā², so f ā² is 2s2-sparse. Let J ā [n]Ć [n] be the set of pairs pf indices i, j with f ā²i , f
ā²j non-
zero. Then |J | < 2s4. Let Qi,j be the event that iā j = 0 (mod P ). For this to happen, P mustdivide the difference. By standard results on the density of primes, there are s8 primes in [D,D3],and since |iā j| ā¤ n it follows that |iā j| has at most log(n) prime factors. So Pr[Qi,j] < log(n)/s8,Let Q = āŖ(i,j)āJQi,j, then by the union bound Pr[Q] < sā3. It follows that no two sampledidentities collide when being hashed to the universe of p elements with probability 1ā 1/s3.
Let supp(f ā²) ā [n] be the support of f ā² (non-zero indices of f ā²). Conditioned on Q, we havepā1f f ā²ā²i (mod p) = f ā²i for all i ā supp(f ā²), and f ā²ā²i = 0 if i 6= j (mod p) for any j ā supp(f ā²). Thus
there is a bijection between supp(f ā²) and supp(f ā²ā²), and the values of the respective coordinates off ā², f ā²ā² are equal under this bijection. Let g, gā²ā², g,mg, pg be defined analogously to f, f ā²ā² by replacingf with g in the past paragraphs, and let G ā Rk be the Countsketch vector obtained by runningCountsketch on f just as we did to obtain F ā Rk.
15
Conditioned on Q occurring (no collisions in samples) for both f ā²ā² and gā²ā² (call these Qf andQg respectively), which together hold with probability 1āO(sā3) by a union bound, the non-zeroentries of pā1
f f ā²ā² and f ā² and of pā1g gā²ā² and gā² are identical. Thus the scaled up Countsketch vectors
pā1f F and pā1
g G of F,G obtained from running Countsketch on f ā²ā², gā²ā² are identical in distributionto running Countsketch on f ā², gā². This holds because Countsketch hashes the non-zero coordinates4-wise independently into k buckets A,B ā Rk. Thus conditioned on Qf , Qg, we can assume thatpā1f F and pā1
g G are the result of running Countsketch on f ā² and gā². We claim that pā1f pā1
g ćF,Gćis the desired estimator.
Now recall that f ā² is a uniform sample of f , as is gā² of g. Then applying the Countsketcherror of Lemma 8, we have pā1
f pā1g ćF,Gć = ćf , gć + Ē«āfā1āgā1 = ćf , gć + O(Ē«)āfā1āgā1 with
probability 11/13. Now since āf ā fā1 ā¤ O(Ē«)āfā1 and āg ā gā1 ā¤ O(Ē«)āgā1, we have ćf , gć =ćf, gć Ā±ān
i=1(Ē«giāfā1 + Ē«fiāgā1 + Ē«2āfā1āgā1 = ćf, gć Ā±O(Ē«)āfā1āgā1. Thus
pā1f pā1
g ćF,Gć = ćf, gć Ā±O(Ē«)āfā1āgā1
as required. This last fact holds deterministically using only the Ī±-property, so the probability ofsuccess is 11/13 as stated.
For the space, at most 2s2 samples were sent to each of f ā²ā², gā²ā² with high probability. Thus thelength of each stream was at most poly(Ī± log(n)/Ē«), and each stream had P = poly(Ī± log(n)/Ē«)items. Thus each counter in A,B ā Rk can be stored with O(log(Ī± log(n)/Ē«)) bits. So storing A,Brequires O(Ē«ā1 log(Ī± log(n)/Ē«)) bits. Note we can safely terminate if too many samples are sentand the counters become too large, as this happens with probability O(poly(1/n)) . The 4-wiseindependent hash function h : [P ]ā [k] used to create A,B requires O(log(Ī± log(n)/Ē«)) bits.
Next, by Lemma 7, the space required to hash the log(n)-bit identities down to [P ] is log(P ) +log log(n), which is dominated by the space for Countsketch. Finally, we can assume that s is apower of two so pā1
f , pā1g = poly(s) = 2q can be stored by just storing the exponent q, which takes
log log(n) bits. To sample we then flip log(n) coins sequentially, and keep a counter to check if thenumber of heads reaches q before a tail is seen.
3 L1 Heavy Hitters
As an application of the Countsketch sampling algorithm presented in the last section, we givean improved upper bound for the classic L1 Ē«-heavy hitters problem in the Ī±-property setting.Formally, given Ē« ā (0, 1), the L1 Ē«-heavy hitters problem asks to return a subset of [n] thatcontains all items i such that |fi| ā„ Ē«āfā1, and no items j such that |fj| < (Ē«/2)āfā1.
The heavy hitters problem is one of the most well-studied problems in the data stream lit-erature. For general turnstile unbounded deletion streams, there is a known lower bound ofĪ©(Ē«ā1 log(n) log(Ē«n)) (see [8], in the language of compressed sensing, and [38]), and the Counts-ketch of [14] gives a matching upper bound (assuming Ē«ā1 = o(n)). In the insertion only case,however, the problem can be solved using O(Ē«ā1 log(n)) bits [10], and for the strictly harderL2 heavy hitters problem (where āfā1 is replaced with āfā2 in the problem definition), thereis an O(Ē«ā2 log(1/Ē«) log(n))-bit algorithm [11]. In this section, we beat the lower bounds for un-bounded deletion streams in the Ī±-property case. We first run a subroutine to obtain a valueR = (1Ā± 1/8)āfā1 with probability 1ā Ī“. To do this, we use the following algorithm from [39].
Fact 1 ([39]). There is an algorithm which gives a (1Ā± Ē«) multiplicative approximation with prob-ability 1ā Ī“ of the value āfā1 using space O(Ē«ā2 log(n) log(1/Ī“)).
16
Next, we run an instance of CSSS with parameters k = 32/Ē« and Ē«/32 to obtain our estimate yā
of f . This requires space O(Ē«ā1 log(n) log(Ī± log(n)Ē« )), and by Theorem 1 gives an estimate yā ā Rn
such that |yāi ā fi| < 2(ā
Ē«/32Errk2(f) + Ē«āfā1/32) for all i ā [n] with high probability. We thenreturn all items i with |yāi | ā„ 3Ē«R/4. Since the top 1/Ē« elements do not contribute to Errk2(f), thequantity is maximized by having k elements with weight āfā1/k, so Errk2(f) ā¤ kā1/2āfā1. Thusāyāi ā fāā < (Ē«/8)āfā1.
Given this, it follows that for any i ā [n] if |fi| ā„ Ē«āfā1, then |yāi | > (7Ē«/8)āfā1 > (3Ē«/4)R.Similarly if |fi| < (Ē«/2)āfā1, then |yāi | < (5Ē«/8)āfā1 < (3Ē«/4)R. So our algorithm correctlydistinguishes Ē« heavy hitters from items with weight less than Ē«/2. The probability of failure is
O(nāc) from CSSS and Ī“ for estimating R, and the space required is O(Ē«ā1 log(n) log(Ī± log(n)Ē« )) for
running CSSS and O(log(n) log(1/Ī“)) to obtain the estimate R. This gives the following theorem.
Theorem 3. Given Ē« ā (0, 1), there is an algorithm that solves the Ē«-heavy hitters problem for
general turnstile Ī±-property streams with probability 1 ā Ī“ using space O(Ē«ā1 log(n) log(Ī± log(n)Ē« ) +
log(n) log(1/Ī“)).
Now note for strict turnstile streams, we can compute R = āfā1 exactly with probability 1using an O(log(n))-bit counter. Since the error bounds from CSSS holds with high probability, weobtain the following result.
Theorem 4. Given Ē« ā (0, 1), there is an algorithm that solves the Ē«-heavy hitters problem forstrict turnstile Ī±-property streams with high probability using space O(Ē«ā1 log(n) log(Ī± log(n)/Ē«)).
4 L1 Sampling
Another problem of interest is the problem of designing Lp samplers. First introduced by Mon-emizadeh and Woodruff in [47], it has since been observed that Lp samplers lead to alternativealgorithms for many important streaming problems, such as heavy hitters, Lp estimation, andfinding duplicates [7, 47, 38].
Formally, given a data stream frequency vector f , the problem of returning an Ē«-approximaterelative error uniform Lp sampler is to design an algorithm that returns an index i ā [n] such that
Pr[i = j] = (1Ā± Ē«) |fj|p
āfāpp
for every j ā [n]. An approximate Lp sampler is allowed to fail with some probability Ī“, howeverin this case it must not output any index. For the case of p = 1, the best known upper bound isO(Ē«ā1 log(Ē«ā1) log2(n) log(Ī“ā1)) bits of space, and there is also an Ī©(log2(n)) lower bound for Lp
samplers with Ē« = O(1) for any p [38]. In this section, using the data structure CSSS of Section 2,we will design an L1 sampler for strict-turnstile strong L1 Ī±-property streams using O(Ē«ā1 log(Ē«ā1)
log(n) log(Ī± log(n)Ē« ) log(Ī“ā1)) bits of space. Throughout the section we use Ī±-property to refer to the
L1 Ī±-property.
4.1 The L1 Sampler
Our algorithm employs the technique of precision sampling in a similar fashion as in the L1 samplerof [38]. The idea is to scale every item fi by 1/ti where ti ā [0, 1] is a uniform random variable, and
return any index i such that zi = |fi|/ti > 1Ē«āfā1, since this occurs with probability exactly Ē« |fi|
āfā1 .
17
Ī±L1Sampler: L1 sampling algorithm for strict-turnstile strong Ī±-property streams:Initialization
1. Instantiate CSSS with k = O(log(Ē«ā1)) columns and parameter Ē«ā² = Ē«3/ log2(n).2. Select k = O(log(1Ē« ))-wise independent uniform scaling factors ti ā [0, 1] for i ā [n].3. Run CSSS on scaled input z where zi = fi/ti.4. Keep log(n)-bit counters r, q to store r = āfā1 and q = āzā1.
Recovery1. Compute estimate yā via CSSS.
2. Via algorithm of Lemma 5, compute v such that Errk2(z) ā¤ v ā¤ 45 k1/2Ē«3
log2(n)āzā1 + 20Errk2(z).
3. Find i with |yāi | maximal.
4. If v > k1/2r + 45 k1/2Ē«3
log2(n)q, or |yāi | < max1Ē« r,
(c/2)Ē«2
log2(n)q where the constant c > 0 is as in
Proposition 1, output FAIL, otherwise output i and tiyāi as the estimate for fi.
Figure 3: Our L1 sampling algorithm with sucsess probability Ī(Ē«)
One can then run a traditional Countsketch on the scaled stream z to determine when an elementpasses this threshold.
In this section, we will adapt this idea to strong Ī±-property streams (Definition 2). The necessityof the strong Ī±-property arises from the fact that if f has the strong Ī±-property, then any coordinate-wise scaling z of f still has the Ī±-property with probability 1. Thus the stream z given by zi = fi/tihas the Ī±-property (in fact, it again has the strong Ī±-property, but we will only need the fact thatz has the Ī±-property). Our full L1 sampler is given in Figure 3.
By running CSSS to find the heavy hitters of z, we introduce error additive in O(Ē«ā²āzā1) =O(Ē«3/ log2(n)āzā1), but as we will see the heaviest item in z is an Ī©(Ē«2/ log2(n)) heavy hitter withprobability 1 ā O(Ē«) conditioned on an arbitrary value of ti, so this error will only be an O(Ē«)fraction of the weight of the maximum weight element. Note that we use the term c-heavy hitterfor c ā (0, 1) to denote an item with weight at least cāzā1. Our algorithm then attempts to returnan item zi which crosses the threshold āfā1/Ē«, and we will be correct in doing so if the tail errorErrk2(z) from CSSS is not too great.
To determine if this is the case, since we are in the strict turnstile case we can compute r = āfā1and q = āzā1 exactly by keeping a log(n)-bit counter (note however that we will only need constantfactor approximations for these). Next, using the result of Lemma 5 we can accurately estimateErrk2(z), and abort if it is too large in Recovery Step 4 of Figure 3. If the conditions of this stephold, we will be guaranteed that if i is the maximal element, then yāi = (1 Ā± O(Ē«))zi. This allowsus to sample Ē«-approximately, as well as guarantee that our estimate of zi has relative error Ē«. Wenow begin our analysis our L1 sampler. First, the proof of the following fact can be found in [38].
Lemma 9. Given that the values of ti are k = log(1/Ē«)-wise independent, then conditioned onan arbitrary fixed value t = tl ā [0, 1] for a single l ā [n], we have Pr[20Errk2(z) > k1/2āfā1] =O(Ē«+ nāc).
The following proposition shows that the Ē«/ log2(n) term in the additive error of our CSSS willbe an Ē« fraction of the maximal element with high probability.
Proposition 1. There exists some constant c > 0 such that conditioned on an arbitrary fixed valuet = tl ā [0, 1] for a single l ā [n], if j is such that |zj | is maximal, then with probability 1āO(Ē«) wehave |zj | ā„ cĒ«2/ log2(n)āzā1.
18
Proof. Let f ā²i = fi, zā²i = zi for all i 6= l, and let f ā²l = 0 = zā²l. Let It = i ā [n] \ l | zi ā
[āfā²ā1
2t+1 ,āf ā²ā12t ). Then Pr[i ā It, i 6= l] = 2t|fi|/āf ā²ā1, so E[|It|] = 2t and by Markovās inequality
Pr[|It| > log(n)Ē«ā12t] < Ē«/ log(n). By the union boundā
tā[log(n)]ā
iāIt |zi| ā¤ log2(n)āf ā²ā1/Ē«with probability 1 ā Ē«. Call this event E . Now for i 6= l let Xi indicate zi ā„ Ē«āf ā²ā/2. ThenVar(Xi) = (2/Ē«)|fi|/āf ā²ā1 ā ((2/Ē«)|fi|/āf ā²ā1)2 < E[Xi], and so pairwise independence of the ti isenough to conclude that Var(
ā
i 6=lXi) < E[ā
i 6=lXi] = 2/Ē«. So by Chebyshevās inequality,
Pr[
ā£
ā£
ā
i 6=l
Xi ā 2Ē«ā1ā£
ā£ > Ē«ā1]
< 2Ē«
so there is at least one and at most 3/Ē« items in zā² with weight Ē«āf ā²ā1/2 with probability 1āO(Ē«).Call these āheavy itemsā. By the union bound, both this and E occur with probability 1 ā O(Ē«).Now the largest heavy item will be greater than the average of them, and thus it will be an Ē«/3-heavy hitter among the heavy items. Moreover, we have shown that the non-heavy items in zā² haveweight at most log2(n)āf ā²ā1/Ē« with probability 1 ā Ē«, so it follows that the maximum item in zā²
will have weight at least Ē«2/(2 log2(n))āzā1 with probability 1āO(Ē«).Now if zl is less than the heaviest item in zā², then that item will still be an Ē«2/(4 log2(n)) heavy
hitter. If zl is greater, then zl will be an Ē«2/(4 log2(n)) heavy hitter, which completes the proofwith c = 1/4.
Lemma 10. The probability that Ī±L1Sampler outputs the index i ā [n] is (Ē«Ā±O(Ē«2)) |fi|āfā1 +O(nāc).
The relative error of the estimate of fi is O(Ē«) with high probability.
Proof. Ideally, we would like to output i ā [n] with |zi| ā„ Ē«ā1r, as this happens if ti ā¤ Ē«|fi|/r, whichoccurs with probability precisely Ē«|fi|/r. We now examine what could go wrong and cause us tooutput i when this condition is not met or vice-versa. We condition first on the fact that v satisfiesErrk2(z) ā¤ v ā¤ (45k1/2Ē«3/ log2(n))āzā1 + 20Errk2(z) as in Lemma 5 with parameter Ē«ā² = Ē«3/ log2(n),and on the fact that |yāj ā zj | ā¤ 2(kā1/2Errk2(z) + (Ē«3/ log2(n))āzā1) for all j ā [n] as detailed inTheorem 1, each of which occur with high probability.
The first type of error is if zi ā„ Ē«ā1āfā1 but the algorithm fails and we do not output i.First, condition on the fact that 20Errk2(z) < k1/2āfā1 and that |zjā | ā„ cĒ«2/ log2(n)āzā1 wherejā ā [n] is such that |zjā | is maximal, which by the union bound using Lemma 9 and Proposition1 together hold with probability 1 ā O(Ē«) conditioned on any value of ti. Call this conditionalevent E . Since v ā¤ 20Errk2(z) + (45k1/2Ē«3/ log2(n))āzā1 w.h.p., it follows that conditioned on Ewe have v ā¤ k1/2āfā1 + (45k1/2Ē«3/ log2(n))āzā1. So the algorithm does not fail due to the firstcondition in Recovery Step 4. of Figure 3. Since v ā„ Errk2(z) w.h.p, we now have 1
k1/2Errk2(z) ā¤
āfā1 +45 Ē«3
log2(n)āzā1, and so |yāj ā zj | ā¤ 2( 1
k1/2Errk2(z) +
Ē«3
log2(n)āzā1) ā¤ 2āfā1 +92 Ē«3
log2(n)āzā1 for all
j ā [n].The second way we output FAIL when we should not have is if |yāi | < ((c/2)Ē«2/ log2(n))āzā1 but
|zi| ā„ Ē«ā1āfā1. Now E gives us that |zjā | ā„ cĒ«2/ log2(n)āzā1 where |zjā | is maximal in z, and since
|yāi | was maximal in yā, it follows that |yāi | ā„ |yājā| > |zjā | ā (2āfā1 + 92 Ē«3
log2(n)āzā1). But |zjā | is
maximal, so |zjā | ā„ |zi| ā„ Ē«ā1āfā1. The two lower bounds on |zjā | give us (2āfā1+92 Ē«3
log2(n)āzā1) =
O(Ē«)|zjā | < |zjā |/2, so |yāi | ā„ |zjā |/2 ā„ ((c/2)Ē«2/ log2(n))āzā1. So conditioned on E , this type offailure can never occur. Thus the probability that we output FAIL for either of the last two reasonswhen |zi| ā„ Ē«ā1āfā1 is O(Ē«2 |fi|
āfā1 ) as needed. So we can now assume that yāi > ((c/2)Ē«2/ log2(n))āzā1.Given this, if an index i was returned we must have yāi >
1Ē« r =
1Ē«āfā and yāi > ((c/2)Ē«2/ log2(n))āzā1.
These two facts together imply that our additive error from CSSS is at most O(Ē«)|yāi |, and thus atmost O(Ē«)|zi|, so |yāi ā zi| ā¤ O(Ē«)|zi|.
19
With this in mind, another source of error is if we output i with yāi ā„ Ē«ā1r but zi < Ē«ā1r, so weshould not have output i. This can only happen if zi is close to the threshold rĒ«
ā1. Since the additiveerror from our Countsketch is O(Ē«)|zi|, it must be that ti lies in the interval |fi|
r (1/Ē« + O(1))ā1 ā¤ti ā¤ |fi|
r (1/Ē«āO(1))ā1, which occurs with probability O(1)1/Ē«2āO(1)
|fi|r = O(Ē«2 |fi|
āfā1 ) as needed.
Finally, an error can occur if we should have output i because zi ā„ Ē«ā1āfā1, but we output
another index iā² 6= i because yāiā² > yāi . This can only occur if tiā² < (1/Ē«āO(1))ā1 |fiā² |r , which occurs
with probability O(Ē«|fiā² |āfā1 ). By the union bound, the probability that such an iā² exists is O(Ē«),
and by pairwise independence this bound holds conditioned on the fact that zi > Ē«ā1r. So theprobability of this type of error is O(Ē«2 |fi|
āfā1 ) as needed.
Altogether, this gives the stated Ē« |fi|āfā1 (1 Ā± O(Ē«)) + O(nāc) probability of outputting i, where
the O(nāc) comes from conditioning on the high probability events. For the O(Ē«) relative errorestimate, if we return an index i we have shown that our additive error from CSSS was at mostO(Ē«)|zi|, thus tiyāi = (1Ā±O(Ē«))tizi = (1Ā±O(Ē«))fi as needed.
Theorem 5. For Ē«, Ī“ > 0, there is an O(Ē«)-relative error one-pass L1 sampler for Ī±-propertystreams which also returns an O(Ē«)-relative error approximation of the returned item. The algorithmoutputs FAIL with probability at most Ī“, and the space is O(1Ē« log(
1Ē« ) log(n) log(Ī± log(n)/Ē«) log(1Ī“ )).
Proof. By the last lemma, it follows that the prior algorithm fails with probability at most 1 āĒ« + O(nāc). Conditioned on the fact that an index i is output, the probability that i = j is
(1 Ā± O(Ē«)) |fi|āfā1 + O(nāc). By running O(1/Ē« log(1/Ī“)) copies of this algorithm in parallel and
returning the first index returned by the copies, we obtain an O(Ē«) relative error sampler withfailure probability at most Ī“. The O(Ē«) relative error estimation of fi follows from Lemma 10.
For space, note that CSSS requires O(k log(n) log(Ī± log(n)/Ē«) = O(log(n) log(1/Ē«) log(Ī± log(n)Ē« ))
bits of space, which dominates the cost of storing r, q and the cost of computing v via Lemma 5,as well as the cost of storing the randomness to compute k-wise independent scaling factors ti.Running O(1/Ē« log(1/Ī“)) copies in parallel gives the stated space bound.
Remark 1. Note that the only action taken by our algorithm which requires more space in thegeneral turnstile case is the L1 estimation step, obtaining r, q in step 2 of the Recovery in Figure3. Note that r, q, need only be constant factor approximations in our proof, and such constantfactor approximations can be obtained with high probability using O(log2(n)) bits (see Fact 1).
This gives an O(1Ē« log(1Ē« ) log(n) log(
Ī± log(n)Ē« ) + log2(n))-bit algorithm for the general turnstile case.
5 L1 estimation
We now consider the well-studied L1 estimation problem in the Ī±-property setting (in this section wewrite Ī±-property to refer to the L1 Ī±-property). We remark that in the general turnstile unboundeddeletion setting, an O(1) estimation of āfā1 can be accomplished inO(log(n)) space [39]. We show inSection 8, however, that even for Ī± as small as 3/2, estimating āfā1 in general turnstile Ī±-propertystreams still requires Ī©(log(n))-bits of space. Nevertheless, in 5.2 we show that for Ī±-propertygeneral turnstile streams there is a O(Ē«ā2 log(Ī±) + log(n)) bits of space algorithm, where O hideslog(1/Ē«) and log log(n) terms, thereby separating the Ē«ā2 and log n factors. Furthermore, we showa nearly matching lower bound of Ī©( 1
Ē«2 log(Ē«2Ī±)) for the problem (Theorem 14).
20
Ī±L1Estimator: Input (Ē«, Ī“) to estimate L1 of an Ī±-property strict turnstile stream.1. Initialization: Set sā O(Ī±2Ī“ā1 log3(n)/Ē«2), and initialize Morris-Counter vt with param-
eter Ī“ā². Define Ij = [sj, sj+2].2. Processing: on update ut, for each j such that vt ā Ij , sample ut with probability sāj.3. For each update ut sampled while vt ā Ij, keep counters c+j , c
āj initialized to 0. Store all
positive updates sampled in c+j , and (the absolute value of) all negative updates sampled in
cāj .
4. if vt /ā Ij for any j, delete the counters c+j , cāj .
5. Return: sājā(c+jā ā cājā) for which jā is such that c+jā , cājā have existed the longest (the stored
counters which have been receiving updates for the most time steps).
Figure 4: L1 Estimator for strict turnstile Ī±-property streams.
5.1 Strict Turnstile L1 Estimation
Now for strict-turnstile Ī±-property streams, we show that the problem can be solved with O(log(Ī±))-bits. Ideally, to do so we would sample poly(Ī± log(n)/Ē«) updates uniformly from the stream andapply Lemma 1. To do this without knowing the length of the stream in advance, we sample inexponentially increasing intervals, throwing away a prefix of the stream. At any given time, we willsample at two different rates in two overlapping intervals, and we will return the estimate givenby the sample corresponding to the interval from which we have sampled from the longest upontermination. We first give a looser analysis of the well known Morris counting algorithm.
Lemma 11. There is an algorithm, Morris-Counter, that given Ī“ ā (0, 1) and a sequence of mevents, produces non-decreasing estimates vt of t such that
Ī“/(12 log(m))t ā¤ vt ā¤ 1/Ī“t
for a fixed t ā [m] with probability 1ā Ī“. The algorithm uses O(log log(m)) bits of space.
Proof. The well known Morris Counter algorithm is as follows. We initialize a counter v0 = 0, andon each update t we set vt = vtā1 + 1 with probability 1/2vt , otherwise vt = vtā1. The estimateof t at time t is 2vt ā 1. It is easily shown that E[2vt ] = t + 1, and thus by Markovās bound,Pr[2vt ā 1 > t/Ī“] < Ī“.
For the lower bound consider any interval Ei = [2i, 2i+1]. Now suppose our estimate of t = 2i
is less than 6(2iĪ“/ log(n)). Then we expect to sample at least 3 log(n)/Ī“ updates in Ei at 1/2 thecurrent rate of sampling (note that the sampling rate would decrease below this if we did samplemore than 2 updates). Then by Chernoff bounds, with high probability w.r.t. n and Ī“ we sampleat least two updates. Thus our relative error decreases by a factor of at least 2 by the time wereach the interval Ei+1. Union bounding over all log(m) = O(log(n)) intervals, the estimate neverdrops below Ī“/12 log(n)t for all t ā [m] w.h.p. in n and Ī“. The counter is O(log log(n)) bits withthe same probability, which gives the stated space bound.
Our full L1 estimation algorithm is given in Figure 4. Note that the value sājā can be returnedsymbolically by storing s and jā, without explicitly computing the entire value. Also observe thatwe can assume that s is a power of 2 by rescaling, and sample with probability sāi by flippinglog(s)i fair coins sequentially and sampling only if all are heads, which requires O(log log(n)) bitsof space.
21
Theorem 6. The algorithm Ī±L1Estimator gives a (1Ā±Ē«) approximation of the value āfā1 of a strictturnstile stream with the Ī±-property with probability 1āĪ“ using O(log(Ī±/Ē«)+log(1/Ī“)+log(log(n))))bits of space.
Proof. Let Ļ = 12 log2(m)/Ī“. By the union bound on Lemma 11, with probability 1 ā Ī“ theMorris counter vt will produce estimates vt such that t/Ļ ā¤ vt ā¤ Ļt for all points t = si/Ļand t = Ļsi for i = 1, . . . , log(m)/ log(s). Conditioned on this, Ij will be initialized by time Ļsj
and not deleted before sj+2/Ļ for every j = 1, 2, . . . , log(n)/ log(s). Thus, upon termination, theoldest set of counters c+jā , c
ājā must have been receiving samples from an interval of size at least
mā 2Ļm/s, with sampling probability sājā ā„ s/(2Ļm). Since s ā„ 2ĻĪ±2/Ē«2, it follows by Lemma1 that sājā(c+jā ā cājā) =
āni=1 fi Ā± Ē«āfā1 w.h.p., where f is the frequency vector of all updates
after time tā and tā is the time step where c+jā , cājā started receiving updates. By correctness of
our Morris counter, we know that tā < 2Ļm/s < Ē«āfā1, where the last inequality follows from thesize of s and the the Ī±-property, so the number of updates we missed before initializing c+jā , c
ājā
is at most Ē«āfā1. Since f is a strict turnstile stream,ān
i=1 fi = āfā1 Ā± tā = (1 Ā± O(Ē«))āfā1 and
āfā1 = (1Ā±O(Ē«))āfā1. After rescaling of Ē« we obtain sājā(c+jā ā cājā) = (1Ā± Ē«)āfā1 as needed.For space, conditioned on the success of the Morris counter, which requires O(log log(n))-bits
of space, we never sample from an interval Ij for more than Ļsj+2 steps, and thus the maximumexpected number of samples is Ļs2, and is at most s3 with high probability by Chernoff bounds.Union bounding over all intervals, we never have more than s2 samples in any interval with highprobability. At any given time we store counters for at most 2 intervals, so the space required isO(log(s) + log log(m)) = O(log(Ī±/Ē«) + log(1/Ī“) + log log(n)) as stated.
Remark 2. Note that if an update āt to some coordinate it arrives with |āt| > 1, our algorithmmust implicitly expand āt to updates in ā1, 1 by updating the counters by Sign(āt)Ā·Bin(|āt|, sāj)for some j. Note that computing this requires O(log(|āt|)) bits of working memory, which is poten-tially larger than O(log(Ī± log(n)/Ē«)). However, if the updates are streamed to the algorithm usingO(log(|āt|)) bits then it is reasonable to allow the algorithm to have at least this much workingmemory. Once computed, this working memory is no longer needed and does not factor into thespace complexity of maintaining the sketch of Ī±L1Estimator.
5.2 General Turnstile L1 Estimator
In [39], an O(Ē«ā2 log(n))-bit algorithm is given for general turnstile L1 estimation. We show howmodifications to this algorithm can result in improved algorithms for Ī±-property streams. We statetheir algorithm in Figure 5, along with the results given in [39]. Here D1 is the distribution of a1-stable random variable. In [35, 39], the variables X = tan(Īø) are used, where Īø is drawn uniformlyfrom [āĻ
2 ,Ļ2 ]. We refer the reader to [35] for a further discussion of p-stable distributions.
Lemma 12 ( A.6 [39]). The entries of A,Aā² can be generated to precision Ī“ = Ī(Ē«/m) usingO(k log(n/Ē«)) bits.
Theorem 7 (Theorem 2.2 [39]). The algorithm above can be implemented using precision Ī“ inthe variables Ai,j, A
ā²i,j , and thus precision Ī“ in the entries yi, y
ā²i, such that the output L satisfies
L = (1Ā± Ē«)āfā1 with probability 3/4, where Ī“ = Ī(Ē«/m). In this setting, we have yā²med = Ī(1)āf ||1,and
ā£
ā£
ā£
(1
r
rā
i=1
cos(yiyā²med
))
ā eā(
āfā1yā²med
)ā£
ā£
ā£ā¤ O(Ē«)
.
22
1. Initialization: Generate random matrices A ā RrĆn and A ā Rrā²Ćn of variables drawnfrom D1, where r = Ī(1/Ē«2) and rā² = Ī(1). The variables Aij are k-wise independent, fork = Ī(log(1/Ē«)/ log log(1/Ē«)) , and the variables Aā²
ij are kā²-wise independent for kā² = Ī(1).For i 6= iā², the seeds used to generate the variables Ai,jnj=1 and Aiā²,jnj=1 are pairwiseindependent
2. Processing: Maintain vectors y = Af and yā² = Aā²f .3. Return: Let yā²med = median|yā²i|r
ā²
i=1. Output L = yā²med
(
ā ln(
1r
āri=1 cos(
yiyā²med
)))
Figure 5: L1 estimator of [39] for general turnstile unbounded deletion streams.
We demonstrate that this algorithm can be implemented with reduced space complexity forĪ±-property streams by sampling to estimate the values yi, y
ā²i. We first prove an alternative version
of our earlier sampling Lemma.
Lemma 13. Suppose a sequence of I insertions and D deletions are made to a single item, andlet m = I + D be the total number of updates. Then if X is the result of sampling updates withprobability p = Ī©(Ī³ā3 log(n)/m), then with high probability
X = (I āD)Ā± Ī³m
Proof. Let X+ be the positive samples and Xā be the negative samples. First suppose I > Ē«m,
then Pr[|pā1Xā ā I| > Ī³m] < 2 exp(
ā Ī³2pI3
)
< exp(
ā Ī³3pm3
)
= 1/poly(n). Next, if I < Ī³m, wehave Pr[pā1X+ > 2Ī³m] < exp
(
ā (Ī³mp/3)
< 1/poly(n) as needed. A similar bound shows thatXā = D Ā±O(Ī³)m, thus X = X+ āXā = I āD Ā±O(Ī³m) as desired after rescaling Ī³.
Theorem 8. There is an algorithm that, given a general turnstile Ī±-property stream f , produces
an estimate L = (1Ā±O(Ē«))āfā1 with probability 2/3 using O(Ē«ā2 log(Ī± log(n)/Ē«) +log( 1
Ē«) log(n)
log log( 1Ē«)) bits
of space.
Proof. Using Lemma 12, we can generate the matrices A,Aā² usingO(k log(n)/Ē«) = O(log(1Ē« ) log(n/Ē«)/ log log(1Ē« )) bits of space, with precision Ī“ = Ī(Ē«/m). Every time an update (it,āt) arrives, wecompute the update Ī·i = ātAi,it to yi for i ā [r], and the update Ī·ā²iā² = ātA
ā²iā²,it
to yā²iā² for iā² ā [rā²].
Let X ā¼ D1 for a 1-stable distribution D1. We think of yi and yā²iā² as streams on one variable,
and we will sample from them and apply Lemma 13. We condition on the success of the algorithmof Theorem 7, which occurs with probability 3/4. Conditioned on this, this estimator L of Figure5 is a (1Ā± Ē«) approximation, so we need only show we can estimate L in small space.
Now the number of updates to yi isān
q=1 |Aiq|Fq, where Fq = Iq+Dq is the number of updates
to fq. Conditioned on maxjā[n] |Aij | = O(n2), which occurs with probability 1 ā O(1/n) by aunion bound, E[|Aiq|] = Ī(log(n)) (see, e.g., [35]). Then E[
ānq=1 |Aiq|Fq] = O(log(n))āFā1 is the
expected number of updates to yi, so by Markovās inequalityān
q=1 |Aiq|Fq = O(log(n)/Ē«2āFā1)with probability 1ā 1/(100(r + rā²)), and so with probability 99/100, by a union bound this holdsfor all i ā [r], iā² ā [rā²]. We condition on this now.
Our algorithm then is as follows. We then scale āi up by Ī“ā1 to make each update Ī·i, Ī·ā²i an
integer, and sample updates to each yi with probability p = Ī©(Ē«ā30 log(n)/m) and store the result
in a counter ci. Note that scaling by Ī“ā1 only blows up the size of the stream by a factor of m/Ē«.Furthermore, if we can (1Ā± Ē«) approximation the L1 of this stream, scaling our estimate down by
23
a factor of Ī“ gives a (1Ā± Ē«) approximation of the actual L1, so we can assume from now on that allupdates are integers.
Let yi = pā1ci. We know by Lemma 13 that yi = yiĀ±Ē«0(ān
q=1 |Aiq|Fq) = yiĀ±O(Ē«0 log(n)āFā1/Ē«2)with high probability. Setting Ē«0 = Ē«3/(Ī± log(n)), we have |yi ā yi| < Ē«/Ī±āFā1 ā¤ Ē«āfā1 by the Ī±-property for all i ā [r]. Note that we can deal with the issue of not knowing the length of thestream by sampling in exponentially increasing intervals [s1, s3], [s2, s4], . . . of size s = poly(Ī±/Ē«0)as in Figure 4, throwing out a Ē«0 fraction of the stream. Since our error is an additive Ē«0 fractionof the length of the stream already, our error does not change. We run the same routine to obtainestimates yā²i of y
ā²i with the same error guarantee, and output
Lā² = yā²med
(
ā ln(1
r
rā
i=1
cos(yiyā²med
))
)
where yā²med = median(yiā²). By Theorem 7, we have that yā²med = Ī(āfā1), thus yā²med = yā²medĀ± Ē«āfā1
since |yā²i ā yā²i| < Ē«āfā1 for all i. Using the fact that yā²med = Ī(āfā1) by Theorem 7, we have:
Lā² = (yā²med Ā± Ē«āfā1)(
ā ln(1
r
rā
i=1
cos(yi
yā²med(1Ā±O(Ē«))Ā± Ē«āfā1yā²med(1Ā±O(Ē«))
))
)
= (yā²med Ā± Ē«āfā1)(
ā ln(1
r
rā
i=1
cos(yiyā²med
)Ā±O(Ē«))
)
Where the last equation follows from the angle formula cos(Ī½ + Ī²) = cos(Ī½) cos(Ī²) ā sin(Ī½) sin(Ī²)
and the Taylor series expansion of sin and cos. Next, since |(
1r
āri=1 cos(
yiyā²med
))
ā eā(
āfā1yā²med
)| < O(Ē«)
and yā²med = Ī(āfā1) by Theorem 7, it follows that(
1r
āri=1 cos(
yiyā²med
))
= Ī(1), so using the fact that
ln(1Ā±O(Ē«)) = Ī(Ē«), this is (yā²medĀ± Ē«āfā1)(
ā ln(
1r
āri=1 cos(
yiyā²med
))
Ā±O(Ē«))
, which is LĀ±O(Ē«)āfā1,where L is the output of Figure 5, which satisfies L = (1Ā± Ē«)āfā1 by Theorem 7. It follows that ourestimate Lā² satisfies L = (1Ā±O(Ē«))āfā1, which is the desired result. Note that we only conditionedon the success of Figure 5, which occurs with probability 3/4, and the bound on the number ofupdates to every yi, y
ā²i, which occurs with probability 99/100, and high probability events, by the
union bound our result holds with probability 2/3 as needed.For space, generating the entries of A,Aā² requires O(log(1Ē« ) log(n/Ē«)/ log log(
1Ē« )) bits as noted,
which dominates the cost of storing Ī“. Moreover, every counter yi, yā²i is at most poly(p
ānq=1 |Aiq|Fq) =
poly(Ī± log(n)/Ē«) with high probability, and can thus be stored with O(log(Ī± log(n)/Ē«)) bits eachby storing a counter and separately storing p (which is the same for every counter). As there areO(1/Ē«2) counters, the total space is as stated.
6 L0 Estimation
The problem of estimating the support size of a stream is known as L0 estimation. In otherwords, this is L0 = |i ā [n] | fi 6= 0|. L0 estimation is a fundamental problem for networktraffic monitoring, query optimization, and database analytics [54, 1, 25]. The problem also hasapplications in detecting DDoS attacks [4] and port scans [24].
For general turnstile streams, Kane, Nelson, and Woodruff gave an O(Ē«ā2 log(n)(log(Ē«ā1) +log log(n)))-bit algorithm with constant probability of success [40], which nearly matches the knownlower bound of Ī©(Ē«ā2 log(Ē«2n)) [39]. For insertion only streams, they also demonstrated an O(Ē«ā2+
24
L0Estimator: L0 estimation algorithmInput: Ē« > 0
1. Set K = 1/Ē«2, and instantiate log(n)ĆK matrix B to 0.2. Fix random h1 ā H2([n], 0, . . . , n ā 1), h2 ā H2([n], [K
3]), h3 ā Hk([K3], [K]), and h4 ā
H2([K3], [K]), for k = Ī©(log(Ē«ā1)/ log log(Ē«ā1)).
3. Randomly choose prime p ā [D,D3], for D = 100K log(mM), and vector ~u ā FKp .
4. On Update (i,ā): set
Blsb(h1(i)),h3(h2(i)) ā(
Blsb(h1(i)),h3(h2(i)) +ā Ā· ~uh4(h2(i))
)
(mod p)
5. Return: run RoughL0Estimator to obtain R ā [L0, 110L0]. Set T = |j ā [K] | Biā,j 6= 0|,where iā = max0, log(16R/K). Return estimate L0 =
32RK
ln(1āT/K)ln(1ā1/K) .
Figure 6: L0 Estimation Algorithm of [40]
log(n)) upper bound. In this section we show that the ideas of [40] can be adapted to yield moreefficient algorithms for general turnstile L0 Ī±-property streams. For the rest of the section, we willsimply write Ī±-property to refer to the L0 Ī±-property.
The idea of the algorithm stems from the observation that if A = Ī(K), then the numberof non-empty bins after hashing A balls into K bins is well concentrated around its expectation.Treating this expectation as a function of A and inverting it, one can then recover A with goodprobability. By treating the (non-zero) elements of the stream as balls, we can hash the universedown into K = 1/Ē«2 bins and recover L0 if L0 = Ī(K). The primary challenge will be to ensurethis last condition. In order to do so, we subsample the elements of the stream at log(n) levels, andsimultaneously run an O(1) estimator R of the L0. To recover a (1 Ā± Ē«) approximation, we use Rto index into the level of subsampling corresponding to a substream with Ī(K) non-zero elements.We then invert the number of non-empty bins and scale up by a factor to account for the degreeof subsampling.
6.1 Review of Unbounded Deletion Case
For sets U, V and integer k, let Hk(U, V ) denote some k-wise independent hash family of functionsmapping U into V . Assuming that |U |, |V | are powers of 2, such hash functions can be representedusing O(k log(|U | + |V |)) bits [13] (without loss of generality we assume n, Ē«ā1 are powers of 2for the remainder of the section). For x ā Zā„0, we write lsb(x) to denote the (0-based index of)the least significant bit of x written in binary. For instance, lsb(6) = 1 and lsb(5) = 0. We setlsb(0) = log(n). In order to fulfill the algorithmic template outlined above, we need to obtain aconstant factor approximation R to L0. This is done using the following result which can be foundin [40].
Lemma 14. Given a fixed constant Ī“ > 0, there is an algorithm, RoughL0Estimator, that withprobability 1āĪ“ outputs a value R = L0 satisfying L0 ā¤ R ā¤ 110L0, using space O(log(n) log log(n)).
The main algorithm then subsamples the stream at log(n) levels. This is accomplished bychoosing a hash function h1 : [n] ā 0, . . . , n ā 1, and subsampling an item i at level lsb(h1(i)).Then at each level of subsampling, the updates to the subsampled items are hashed intoK = 1
Ē«2 binsk = Ī©(log(Ē«ā1/ log(log(Ē«ā1))))-wise independently. The entire data structure is then representedby a log(n) Ć K matrix B. The matrix B is stored modulo a sufficiently large prime p, and the
25
updates to the rows are scaled via a random linear function to reduce the probability that deletionsto one item cancel with insertions to another, resulting in false negatives in the number of bucketshit by items from the support. At the termination of the algorithm, we count the number T ofnon-empty bins in the iā-th row of B, where iā = max0, log(16RK ). We then return the value
L0 = 32RK
ln(1āT/K)ln(1ā1/K) . The full algorithm is given in Figure 6. First, the following Lemma can be
found in [40].
Lemma 15. There exists a constant Ē«0 such that the following holds. Let A balls be mapped intoK = 1/Ē«2 bins using a random h ā Hk([A], [K]) for k = c log(1/Ē«)/ log log(1/Ē«) for a sufficientlylarge constant c. Let X be a random variable which counts the number of non-empty bins after allballs are mapped, and assume 100 ā¤ A ā¤ K/20 and Ē« ā¤ Ē«0. Then E[X] = K(1ā (1āKā1)A) and
Pr[ā£
ā£X ā E[X]ā£
ā£ ā¤ 8Ē«E[X]] ā„ 4/5
Let A be the log(n) Ć K bit matrix such that Ai,j = 1 iff there is at least one v ā [n] withfv 6= 0 such that lsb(h1(v)) = i and h3(h2(v)) = j. In other words, Ai,j is an indicator bit which is1 if an element from the support of f is hashed to the entry Bi,j in the above algorithm. Clearlyif Bi,j 6= 0, then Ai,j 6= 0. However, the other direction may not always hold. The proofs of thefollowing facts and lemmas can be found in [40]. However we give them here for completeness.
Fact 2. Let t, r > 0 be integers. Pick h ā H2([r], [t]). For any S ā [r], E[ās
i=1
(|hā1(i)ā©S|2
)
] ā¤|S|2/(2t).
Proof. Let Xi,j be an indicator variable which is 1 if h(i) = j. Utilizing linearity of expectation,
the desired expectation is then tā
i<iā² E[Xi,1]E[Xiā²,1] = t(|S|
2
)
1t2ā¤ |S|2
2t .
Fact 3. Let Fq be a finite field and v ā Fdq be a non-zero vector. Then, if w ā Fd
q is selectedrandomly, we have Pr[v Ā· w = 0] = 1/q where v Ā· w is the inner product over Fq.
Proof. The set of vectors orthogonal to v is a linear subspace V ā Fdq of dimension d ā 1, and
therefore contains qdā1 points. Thus Pr[w ā V ] = 1/q as needed.
Lemma 16 (Lemma 6 of [40]). Assuming that L0 ā„ K/32, with probability 3/4, for all j ā [K]we have Aiā,j = 0 if and only if Biā,j = 0. Moreover, the space required to store each Bi,j isO(log log(n) + log(1/Ē«)).
Proof. The space follows by the choice of p ā O(D3), and thus it suffices to bound the probabilitythat Biā,j = 0 when Aiā,j 6= 0. Define Iiā = j ā [n] | lsb(h1(j)) = iā, fj 6= 0. This is the set of non-zero coordinates of f which are subsampled to row iā of B. Now conditioned on R ā [L0, 110L0],which occurs with arbitrarily large constant probability Ī“ = Ī(1), we have E[|Iiā |] ā¤ K/32, andusing the pairwise independence of h1 we have that Var(|Iiā |) < E[|Iiā |]. So by Chebyshevāsinequality Pr[|Iiā | ā¤ K/20] = 1āO(1/K), which we now condition on. Given this, since the rangeof h2 has sizeK
3, the indices of Iiā are perfectly hashed by h2 with probability 1āO(1/K) = 1āo(1),an event we call Q and condition on occurring.
Since we choose a prime p ā [D,D3], with D = 100K log(mM), for mM larger than someconstant, by standard results on the density of primes there are at least (K2 log2(mM)) primesin the interval [D,D3]. Since each fj has magnitude at most mM and thus has at most log(mM)prime factors, it follows that fj 6= 0 (mod p) with probability 1 ā O(1/K2) = 1 ā o(1). Unionbounding over all j ā Iiā , it follows that p does not divide |fj | for any j ā Iiā with probability1ā o(1/K) = 1ā o(1). Call this event Qā² and condition on it occurring. Also let Qā²ā² be the eventthat h4(h2(j)) 6= h4(h2(j
ā²)) for any distinct j, jā² ā Iiā such that h3(h2(j)) = h3(h2(jā²)).
26
To bound Pr[Ā¬Qā²ā²], let Xj,jā² indicate h3(h2(j)) = h3(h2(jā²)), and let X =
ā
j<jā² Xj,jā². By
Fact 2 with r = K3, t = K and |S| = |Iiā | < K/20, we have that E[X] ā¤ K/800. Let Z =(j, jā²) | h3(h2(j)) = h3(h2(j
ā²)). For (j, jā²) ā Z, let Yj,jā² indicate h4(h2(j)) = h4(h2(jā²)), and let
Y =ā
(j,jā²)āZ Yj,jā². By the pairwise independence of h4 and our conditioning on Q, we have E[Y ] =ā
(j,jā²)āZ Pr[h4(h2(j)) = h4(h2(jā²))] = |Z|/K = X/K. Now conditioned on X < 20E[X] = K/40
which occurs with probability 19/20 by Markovās inequality, we have E[Y ] ā¤ 1/40, so Pr[Y ā„ 1] ā¤1/40. So we have shown that Qā²ā² holds with probability (19/20)(39/40) ā„ 7/8.
Now for each j ā [K] such that Aiā,j = 1, we can view Biā,j as the dot product of the vectorv, which is f restricted to the coordinates of Iiā that hashed to j, with a random vector w ā Fp
which is u restricted to coordinates of Iiā that hashed to j. Conditioned on Qā² it follows that v isnon-zero, and conditioned on Qā²ā² it follows that w is indeed random. So by Fact 3 with q = p, unionbounding over all K counters Biā,j, we have that Biā,j 6= 0 whenever Aiā,j 6= 0 with probability1 ā K/p ā„ 99/100. Altogether, the success probability is then (7/8)(99/100) ā o(1) ā„ 3/4 asdesired.
Theorem 9. Assuming that L0 > K/32, the value returned by L0Estimator is a (1 Ā± Ē«) approxi-mation of the L0 using space O(Ē«ā2 log(n)(log(1Ē« )+log(log(n)) log(1Ī“ )), with 3/4 success probability.
Proof. By Lemma 16, we have shown that TA = |j ā [K] | Aiā,j 6= 0| = T with probability 3/4,
where T is as in Figure 6. So it suffices to show that LA0 = 32R
Kln(1āTA/K)ln(1ā1/K) is a (1Ā±Ē«) approximation.
Condition on the event E where R ā [L0, 110L0], which occurs with large constant probabilityĪ“ = Ī(1). Let Iiā = j ā [n] | lsb(h1(j)) = iā, fj 6= 0. Then E[|Iiā |] = L0/2
iā+1 = L0K/(32R)(assuming L0 > K/32) and Var(|Iiā |) < E[|Iiā |] by the pairwise independence of h1. ThenK/3520 ā¤E[|Iiā |] ā¤ K/32 by E , and by Chebyshevās inequality K/4224 ā¤ |Iiā | ā¤ K/20 with probability1 ā O(1/K) = 1 ā o(1). Call this event E ā², and condition on it. We then condition on the eventE ā²ā² that the indices of Iiā are perfectly hashed, meaning they do not collide with each other inany bucket, by h2. Given E ā², then by the pairwise independence of h2 the event E ā²ā² occurs withprobability 1āO(1/K) as well.
Conditioned on E ā²ā§E ā²ā², it follows that TA is a random variable counting the number of bins hit byat least one ball under a k-wise independent hash function, where there are C = |Iiā | balls, K bins,and k = Ī©(log(K/Ē«)/ log log(K/Ē«)). Then by Lemma 15, we have TA = (1Ā± 8Ē«)K(1ā (1ā 1/K)C)with probability 4/5. So
ln(1ā TA/K) = ln((1ā 1/K)C Ā± 8Ē«(1 ā (1ā 1/K)C))
Since we condition on the fact that K/4244 ā¤ C ā¤ K/32, it follows that (1 ā 1/K)C = Ī(1), sothe above is ln((1 Ā± O(Ē«))(1 ā 1/K)C) = C ln(1 ā 1/K) Ā± O(Ē«), and since ln(1 + x) = O(|x|) for|x| < 1/2, we have LA
0 = 32RCK + O(Ē«R). Now the latter term is O(Ē«L0), since R = Ī(L0), so it
suffices to show the concentration of C. Now since Var(C) ā¤ E[C] by pairwise independence of
h1, so Chebyshevās inequality gives Pr[ā£
ā£C ā L0K/(32R)ā£
ā£ ā„ c/āK] < E[C]
(c2/K)E[C]2ā¤ (16c )
2 and this
probability can be made arbitrarily small big increasing c, so set c such that the probability is 1/100.Note that 1/
āK = Ē«, so it follows that C = (1 Ā± O(Ē«))L0K/(32R). From this we conclude that
LA0 = (1Ā±O(Ē«))L0. By the union bound the events E ā§E ā²ā§E ā²ā² occur with arbitrarily large constant
probability, say 99/100, and conditioned on this we showed that LA0 = (1 Ā± Ē«)L0 with probability
4/5ā1/100 = 79/100. Finally, LA0 = L0 with probability 3/4, and so together we obtain the desired
result with probability 1ā 21/100 ā 1/100 ā 1/4 = 53/100. Running this algorithm O(1) times inparallel and outputting the median gives the desired probability.
27
Ī±L0Estimator: L0 estimator for Ī±-property streams.1. Initialize instance of L0Estimator, constructing only the top 2 log(4Ī±/Ē«) rows of B. Let all
parameters and hash functions be as in L0Estimator.2. Initialize Ī±StreamRoughL0Est to obtain a value Lt
0 ā [Lt0, 8Ī±L0] for all t ā [m], and set
Lt0 = maxLt
0, 8 log(n)/ log log(n)3. Update the matrix B as in Figure 6, but only store the rows with index i such that i =
log(16Lt0/K)Ā± 2 log(4Ī±/Ē«).
4. Return: run Ī±StreamConstL0Est to obtain R ā [L0, 100L0], and set T = |j ā [K] | Biā,j 6=0|, where iā = log(16R/K). Return estimate L0 =
32RK
ln(1āT/K)ln(1ā1/K) .
Figure 7: Our L0 estimation algorithm for Ī±-property streams with L0 > K/32
6.2 Dealing With Small L0
In the prior section it was assumed that L0 ā„ K/32 = Ē«ā2/32. We handle the estimation when thisis not the case the same way as [40]. We consider two cases. First, if L0 ā¤ 100 we can perfectlyhash the elements into O(1) buckets and recovery the L0 exactly with large constant probabilityby counting the number of nonzero buckets, as each non-zero item will be hashed to its own bucketwith good probability (see Lemma 21).
Now for K/32 > L0 > 100, a similar algorithm as in the last section is used, except we use onlyone row of B and no subsampling. In this case, we set K ā² = 2K, and create a vector Bā² of lengthK ā². We then run the algorithm of the last section, but update Bj instead of Bi,j every time Bi,j isupdated. In other words, Bā²
j is the j-th column of B collapsed, so the updates to all items in [n]are hashed into a bucket of Bā². Let I = i ā [n] | fi 6= 0. Note that the only fact about iā thatthe proof of Lemma 16 uses was that E[|Iiā |] < K/32, and since I = L0 < K/32, this is still thecase. Thus by the the same argument given in Lemma 16, with probability 3/4 we can recover abitvector A from B satisfying Aj = 1 iff there is some v ā [n] with fv 6= 0 and h3(h2(v)) = j. Thenif TA is the number of non-zero bits of A, it follows by a similar argument as in Theorem 9 thatLā²0 = ln(1 ā TA/K ā²)/ ln(1 ā 1/K ā²) = (1 Ā± Ē«)L0 for 100 < L0 < K/32. So if Lā²
0 > K ā²/32 = K/16,we return the output of the algorithm from the last section, otherwise we return Lā²
0. The spacerequired to store B is O(Ē«ā2(log log(n) + log(1/Ē«))), giving the following Lemma.
Lemma 17. Let Ē« > 0 be given and let Ī“ > 0 be a fixed constant. Then there is a subroutine usingO(Ē«ā2(log(Ē«ā1) + log log(n)) + log(n)) bits of space which with probability 1 ā Ī“ either returns a(1Ā± Ē«) approximation to L0, or returns LARGE, with the guarantee that L0 > Ē«ā2/16.
6.3 The Algorithm for Ī±-Property Streams
We will give a modified version of the algorithm in Figure 6 for L0 Ī± property streams. Ouralgorithm is given in Figure 7. We note first that the return value of the unbounded deletionalgorithm only depends on the row iā = log(16R/K), and so we need only ensure that this rowis stored. Our L0 Ī±-property implies that if Lt
0 is the L0 value at time t, then we must haveLm0 = L0 ā„ 1/Ī±Lt
0. So if we can obtain an O(Ī±) approximation Rt to Lt0, then at time t we
need only maintain and sample the rows of the matrix with index within c log(Ī±/Ē«) distance ofit = log(16Rt/K), for some small constant c.
By doing this, the output of our algorithm will then be the same as the output of L0Estimatorwhen run on the suffix of the stream beginning at the time when we first begin sampling to therow iā. Since we begin sampling to this row when the current Lt
0 is less than an Ē« fraction of the
28
final L0, it will follow that the L0 of this suffix will be an Ē«-relative error approximation of the L0
of the entire stream. Thus by the correctness of L0Estimator, the output of our algorithm will bea (1Ā± Ē«)2 approximation.
To obtain an O(Ī±) approximation to Lt0 at all points t in the stream, we employ another
algorithm of [40], which gives an O(1) estimation of the F0 value at all points in the stream,where F0 = |i ā [n] | f ti 6= 0 for some t ā [m]|. So by definition for any time t ā [m] we haveF t0 ā¤ F0 = āI +Dā0 ā¤ Ī±āfā0 = Ī±L0 by the Ī±-property, and also by definition F t
0 ā„ Lt0 at all times
t. These two facts together imply that [F t0 , 8F
t0 ] ā [Lt
0, 8Ī±L0].
Lemma 18 ([40]). There is an algorithm, RoughF0Est, that with probability 1ā Ī“ outputs non de-
creasing estimates F0tsuch that F0
t ā [F t0 , 8F
t0 ] for all t ā [m] such that F t
0 ā„ max8, log(n)/ log log(n),where m is the length of the stream. The spacedrequired is O(log(n) log(1Ī“ ))-bits.
Corollary 2. There is an algorithm, Ī±StreamRoughL0Est, that with probability 1 ā Ī“ on an Ī±-
deletion stream outputs non-decreasing estimates L0tsuch that L0
t ā [Lt0, 8Ī±L0] for all t ā [m] such
that F t0 ā„ max8, log(n)/ log log(n), where m is the length of the stream. The space required is
O(log(n) log(1Ī“ )) bits.
Note that the approximation is only promised for t such that F t0 ā„ max8, log(n)/ log log(n).
To handle this, we give a subroutine which produces the L0 exactly for F0 < 8 log(n)/ log log(n)using O(log(n)) bits. Our main algorithm will assume that F0 > 8 log(n)/ log log(n), and initialize
its estimate of L00 to be L0
0 = 8 log(n)/ log log(n), where Lt0 ā [Lt
0.8Ī±L0] is the estimate producedby Ī±StreamRoughL0Est.
Lemma 19. Given c ā„ 1, there is an algorithm that with probability 49/50 returns the L0 exactlyif F0 ā¤ c, and returns LARGE if F0 > c. The space required is O(c log(c) + c log log(n) + log(n))bits
Proof. The algorithm chooses a random hash function h ā H2([n], [C]) for some C = Ī(c2). Everytime an update (it,āt) arrives, the algorithm hashes the identity it and keeps a counter, initializedto 0, for each identity h(it) seen. The counter for h(it) is incremented by all updates āĻ such thath(iĻ ) = h(it). Furthermore, all counters are stored mod p, where p is a random prime picked inthe interval [P,P 3] for P = 1002c log(mM). Finally, if at any time the algorithm has more thanc counters stored, it returns LARGE. Otherwise, the algorithm reports the number of non-zerocounters at the end of the stream.
To prove correctness, first note that at most F0 items will ever be seen in the stream by definition.Suppose F0 ā¤ c. By the pairwise independence of h, and scaling C by a sufficiently large constantfactor, with probability 99/100 none of the F0 ā¤ c identities will be hashed to the same bucket.Condition on this now. Let I ā [n] be the set of non-zero indices of f . Our algorithm will correctlyreport |I| if p does not divide fi for any i ā I. Now for mM larger then some constant, by standardresults on the density of primes there are at least 100c2 log2(mM) primes in the interval [P,P 3].Since each fi has magnitude at most mM , and thus at most log(mM) prime factors, it follows thatp does not divide fi with probability 1ā 1/(100c2). Since |I| = L0 < F0 ā¤ c, union bounding overall i ā I, it follows that p ā¤ fi for all i ā I with probability 99/100. Thus our algorithm succeedswith probability 1ā (1/100 + 1/100) > 49/50.
If F0 > c then conditioned on no collisions for the first c+ 1 distinct items seen in the stream,which again occurs with probability 99/100 for sufficiently large C = Ī(c2), the algorithm willnecessarily see c+1 distinct hashed identities once the (c+1)-st item arrives, and correctly returnLARGE.
29
For the space, each hashed identity requires O(log(c)) bits to store, and each counter requiresO(log(P )) = log(c log(n)) bits to store. There are at most c pairs of identities and counters, andthe hash function h can be stored using O(log(n)) bits, giving the stated bound.
Finally, to remove the log(n) log log(n) memory overhead of running the RoughL0Estimator
procedure to determine the row iā, we show that the exact same O(1) approximation of the finalL0 can be obtained using O(log(Ī± log(n)) log(log(n))+ log(n)) bits of space for Ī±-property streams.We defer the proof of the following Lemma to Section 6.4
Lemma 20. Given a fixed constant Ī“, there is an algorithm, Ī±StreamConstL0Est that with proba-bility 1ā Ī“ when run on an Ī± property stream outputs a value R = L0 satisfying L0 ā¤ R ā¤ 100L0,using space O(log(Ī±) log log(n) + log(n)).
Theorem 10. There is an algorithm that gives a (1Ā± Ē«) approximation of the L0 value of a generalturnstile stream with the Ī±-property, using space O( 1
Ē«2log(Ī±Ē« )(log(
1Ē« ) + log log(n)) + log(n)), with
2/3 success probability.
Proof. The case of L0 < K/32 can be handled by Lemma 17 with probability 49/50, thus wecan assume L0 ā„ K/32. Now if F0 < 8 log(n)/ log log(n), we can use Lemma 19 with c =8 log(n)/ log log(n) to compute the L0 exactly. Conditioning on the success of Lemma 19, which oc-curs with probability 49/50, we will know whether or not F0 < 8 log(n)/ log log(n), and can correctlyoutput the result corresponding to the correct algorithm. So we can assume that F0 > 8 log(n)/ loglog(n). Then by the Ī±-property, it follows that L0 > 8 log(n)/(Ī± log log(n)).
Let tā be the time step when our algorithm initialized and began sampling to row iā. Conditionedon the success of Ī±StreamRoughL0Est and Ī±StreamConstL0Est, which by the union bound togetheroccur with probability 49/50 for constant Ī“, we argue that tā exists
First, we know that iā = log(16R/K) > log(16L0/K) > log(16 Ā· (8 log(n)/(log log(n)))/K) ālog(Ī±) by the success of Ī±StreamConstL0Est. Moreover, at the start of the algorithm we ini-tialize all rows with indices i = log(16 Ā· (8 log log(n)/ log(n))/K) Ā± 2 log(4Ī±/Ē«), so if iā < log(16 Ā·(8 log log(n)/ log(n))/K) + 2 log(4Ī±/Ē«) then we initialize iā at the very beginning (time tā = 0).Next, if iā > log(16 Ā· (8 log log(n)/ log(n))/L) + 2 log(4Ī±/Ē«), then we initialize iā at the first time
t when Lt0 ā„ R(Ē«/(4Ī±))2. We know by termination that L0
m ā [L0, 8Ī±L0] since F0 > 8 log(n)/ loglog(n) and therefore by the end of the stream Ī±StreamRoughL0Est will give its promised approxi-mation. So our final estimate satisfies L0
m ā„ L0m ā„ L0 ā„ R/8 > R(Ē«/(4Ī±))2. Thus iā will always
be initialized at some time tā.Now because the estimates Lt
0 are non-decreasing and we have L0m ā [L0, 8Ī±L0], it follows that
L0t< 8Ī±L0 for all t. Then, since Lt
0 < max8Ī±L0, 8 log(n)/ log log(n) < L0(4Ī±/Ē«)2, it follows that
at the termination of our algorithm the row iā was not deleted, and will therefore be stored at theend of the stream.
Now, at time tā ā 1 right before row iā was initialized we have Ltāā10 ā¤ Ltāā1
0 < R(Ē«/(4Ī±))2,and since R < 110L0 we have Ltāā1
0 /L0 ā¤ O(Ē«2). It follows that the L0 value of the streamsuffix starting at the time step tā is a value L0 such that L0 = (1Ā±O(Ē«2))Lm
0 . Since our algorithmproduces the same output as running L0Estimator on this suffix, we obtain a (1Ā±Ē«) approximationof L0 by the proof of Theorem 9 with probability 3/4, which in turn is a (1 Ā± Ē«)2 approximationof the actual L0, so the desired result follows after rescaling Ē«. Thus the probability of success is1ā (3/50 + 1/4) > 2/3.
For space, note that we only ever store O(log(Ī±/Ē«)) rows of the matrix B, each with entriesof value at most the prime p ā O((K log(n))3), and thus storing all rows of the matrix requiresO(1/Ē«2 log(Ī±/Ē«)(log(1/Ē«) + log log(n))) bits. The space required to run Ī±StreamConstL0Est is
30
an additional additive O(log(Ī±) log log(n) + log(n)) bits. The cost of storing the hash functionsh1, h2, h3, h4 is O(log(n) + log2(1/Ē«)) which dominates the cost of running Ī±StreamRoughL0Est.Along with the cost of storing the matrix, this dominates the space required to run the smallL0 algorithm of Lemma 17 and the small F0 algorithm of Lemma 19 with c on the order ofO(log(n)/ log log(n)). Putting these together yields the stated bound.
6.4 Our Constant Factor L0 Estimator for Ī±-Property Streams
In this section we prove Lemma 20. Our algorithm Ī±StreamConstL0Est is a modification of theRoughL0Estimator of [40], which gives the same approximation for turnstile streams. Their algo-rithm subsamples the stream at log(n) levels, and our improvement comes from the observationthat for Ī±-property streams we need only consider O(log(Ī±)) levels at a time. Both algorithms uti-lize the following lemma, which states that if the L0 is at most some small constant c, then it canbe computed exactly using O(c2 log log(mM)) space. The lemma follows from picking a randomprime p = Ī(log(mM) log log(mM)) and pairwise independently hashing the universe into [Ī(c2)]buckets. Each bucket is a counter which contains the sum of frequencies modulo p of updates tothe universe which land in that bucket. The L0 estimate of the algorithm is the total number ofnon-zero counters. The maximum estimate is returned after O(log(1/Ī·)) trials.
Lemma 21 (Lemma 8, [40]). There is an algorithm which, given the promise that L0 ā¤ c, outputsL0 exactly with probability at least 1āĪ· using O(c2 log log(n)) space, in addition to needing to storeO(log(1/Ī·)) pairwise independent hash functions mapping [n] onto [c2].
We now describe the whole algorithm RoughL0Estimator along with our modifications to it.The algorithm is very similar to the main algorithm of Figure 7. First, a random hash functionh : [n]ā [n] is chosen from a pairwise independent family. For each 0 ā¤ j ā¤ log(n), a substream Sjis created which consists of the indices i ā [n] with lsb(h(i)) = j. For any t ā [m], let Stāj denotethe substream of S restricted to the updates t, t+1, . . . ,m, and similarly let Ltā
0 denote the L0 ofthe stream suffix t, t+ 1, . . . ,m. Let L0(S) denote the L0 of the substream S.
We then initialize the algorithm RoughĪ±StreamL0-Estimator, which by Corollary 2 gives non-decreasing estimates Lt
0 ā [Lt0, 8Ī±L0] at all times t such that F0 > 8 log(n)/ log log(n) with proba-
bility 99/100. Let Lt0 = maxLt
0, 8 log(n)/ log log(n), and let Ut ā [log(n)] denote the set of indices
i such that i = log(Lt0)Ā± 2 log(Ī±/Ē«), for some constant Ē« later specified.
Then at time t, for each Sj with j ā Ut, we run an instantiation of Lemma 21 with c = 132 andĪ· = 1/16 on Sj , and all instantiations share the same O(log(1/Ī·)) hash functions h1, . . . , hĪ(log(1/Ī·)).If j ā Ut but j /ā Ut+1, then we throw away all data structures related to Sj at time t+1. Similarly,if j enters Ut at time t, we initialize a new instantiation of Lemma 21 for Sj at time t.
To obtain the final L0 estimate for the entire stream, the largest value j ā Um with j < 2Lm0
such that Bj declares L0(Sj) > 8 is found. Then the L0 estimate is L0 = 20000/99 Ā· 2j , and if nosuch j exists the estimate is L0 = 50. Note that the main difference between our algorithm andRoughL0Estimator is that RoughL0Estimator sets Ut = [log(n)] for all t ā [m], so our proof ofLemma 20 will follow along the lines of [40].
Proof of Lemma 20 . The space required to store the hash function h is O(log(n)) and each of theO(log(1/Ī·)) = O(1) hash functions hi takes log(n) bits to store. The remaining space to store asingle Bj is O(log log(n)) by Lemma 21, and thus storing all Bjs for j ā Ut at any time t requiresat most O(|Ut| log log(n)) = O(log(Ī±) log log(n)) bits (since Ē« = O(1)), giving the stated bound.
We now argue correctness. First, for F0 ā¤ 8 log(n) log log(n), we can run the algorithm ofLemma 19 to produce the L0 exactly using less space than stated above. So we condition on the
31
success of this algorithm, which occurs with probability 49/50, and assume F0 > 8 log(n)/ log log(n).This gives L0 > 8 log(n)/(Ī± log log(n)) by the Ī±-property, and it follows that Lm
0 ā¤ 8Ī±L0.Now for any t ā [m], E[L0(Stāj )] = Ltā
0 /2j+1 if j < log(n), and E[L0(Stāj )] = Ltā0 /n if
j = log(n). At the end of the algorithm we have all data structures stored for Bj ās with j āUm. Now let j ā Um be such that j < log(2Lm
0 ), and observe that Bj will be initialized at
time tj such that Ltj0 > 2Lm
0 (Ē«/Ī±)2, which clearly occurs before the algorithm terminates. Ifj = log(8 log(n)/ log log(n)) Ā± 2 log(Ī±/Ē«), then j ā U0 so tj = 0, and otherwise tj is such that
Ltj0 ā¤ L
tj0 ā¤ (Ē«/Ī±)22j ā¤ (Ē«/Ī±)2(Lm
0 ) < 8Ē«2/Ī±L0. So Ltj0 /L0 = O(Ē«2). This means that when Bj was
initialized, the value Ltj0 was at most an O(Ē«2) fraction of the final L0, from which it follows that
Ltjā0 = (1 Ā± Ē«)L0 after rescaling Ē«. Thus the expected output of Bj (if it has not been deleted by
termination) for j < log(2L0) is E[L0(Stjāj )] = (1Ā± Ē«)L0/2j+1.
Now let jā be the largest j satisfying E[L0(Stjāj )] ā„ 1. Then jā < log(2L0) < log(2Lm0 ),
and observe that 1 ā¤ E[L0(Stjāājā )] ā¤ 2(1 + Ē«) (since the expectations decrease geometrically
with constant (1 Ā± Ē«)/2). Then for any log(2Lm0 ) > j > jā, by Markovās inequality we have
Pr[L0(Stjāj ) > 8] < (1 + Ē«)1/(8 Ā· 2jājāā1). By the union bound, the probability that any such
j ā (jā, log(2Lm0 )) has L0(Stjāj ) > 8 is at most (1+Ē«)
8
ālog(2Lm0)
j=jā+1 2ā(jājāā1) ā¤ (1 + Ē«)/4. Now let
jāā < jā be the largest j such that E[L0(Stjāj )] ā„ 50. Since the expectations are geometrically
decreasing by a factor of 2 (up to a factor of 1Ā± Ē«), we have 100(1 + Ē«) ā„ E[L0(Stjāāājāā )] ā„ 50, and
by the pairwise independence of h we have Var[L0(Stjāāājāā )] ā¤ E[L0(S
tjāāājāā )], so by Chebyshevās
inequality we have
Pr[ā£
ā£L0(Stjāāājāā )ā E[L0(S
tjāāājāā )]
ā£
ā£ < 3
ā
E[L0(Stjāāājāā )]]
> 8/9
Then assuming this holds and setting Ē« = 1/100, we have
L0(Stjāāājāā ) > 50ā 3
ā50 > 28
L0(Stjāāājāā ) < 100(1 + Ē«) + 3
ā
100(1 + Ē«) < 132
What we have shown is that for every log(2Lm0 ) > j > jā, with probability at least 3/4(1ā Ē«/3) we
will have L0(Stjāj ) ā¤ 8. Since we only consider returning 2j for j ā Um with j < log(2Lm0 ), it follows
that we will not return L0 = 2j for any j > jā. In addition, we have shown that with probability
8/9 we will have 28 < L0(Stjāāājāā ) < 132, and by our choice of c = 132 and Ī· = 1/16, it follows that
Bjāā will output the exact value L0(Stjāāājāā ) > 8 with probability at least 1ā (1/9 +1/16) > 13/16
by Lemma 21. Hence, noting that Ē« = 1/100, with probability 1 ā (3/16 + 1/4(1 + Ē«)) < 14/25,we output 2j for some jāā ā¤ j ā¤ jā for which j ā Um. Observe that since Um contains all indicesi = log(Lm
0 ) Ā± 2 log(Ī±/Ē«), and along with the fact that L0 < Lm0 < 8Ī±L0, it follows that all
j ā [jāā, jā] will be in Um at termination for sufficiently large Ē« ā O(1).Now since (1 + 1/100)L0/2 > 2j
ā> (1 ā 1/100)L0/4, and (1 + 1/100)L0/100 > 2j
āā> (1 ā
1/100)L0/200, it follows that (99/100)L0/200 < 2j < 99L0/200, and thus 20000/99Ā·2j ā [L0, 100L0]as desired. If such a jāā does not exist then L0 < 50 and 50 ā [L0, 100L0]. Note that because ofthe Ī± property, unless the stream is empty (m = 0), then we must have L0 ā„ 1, and our ourapproximation is always within the correct range. Finally, if F0 ā¤ 8 log(n) log log(n) then with
32
Ī±-SupportSampler: support sampling algorithm for Ī± property streams.Initialization:
1. Set sā 205k, and initialize linear sketch function J : Rn ā Rq for q = O(s) via Lemma 22.2. Select random h ā H2([n], [n]), set Ij = i ā [n] | h(i) ā¤ 2j, and set Ē« = 1/48.
Processing:1. Run Ī±StreamRoughL0Est of Corollary 2 with Ī“ = 1/12 to obtain non-decreasing Rt ā
[Lt0, 8Ī±L0].
2. Let Bt =
j ā [log(n)] | j = log(ns/3Rt)Ā± 2 log(Ī±/Ē«) or j ā„ log(
ns log log(n)/(24 log(n)))
.3. Let tj ā [m] be the first time t such that j ā Bt (if one exists). Let tā²j be the first time steptā² > tj such that j /ā Btā² (if one exists).
4. For all j ā Bt, maintain linear sketch xj = J(f tj :tā²j |Ij).
Recovery:1. For j ā Bm at the end of the stream, attempt to invert xj into f
tj :m|Ij via Lemma 22. Returnall strictly positive coordinates of all successfully returned f tj :m|Ij ās.
Figure 8: Our support sampling algorithm for Ī±-property streams.
probability 49/50 Lemma 19 produces the L0 exactly, and for larger F0 we output the result ofthe algorithm just stated. This brings the overall sucsess probability to 14/25 ā 49/50 > 13/25.Running O(log(1/Ī“)) = O(1) copies of the algorithm and returning the median, we can amplify theprobability 13/25 to 1ā Ī“.
7 Support Sampling
The problem of support sampling asks, given a stream vector f ā Rn and a parameter k ā„ 1,return a set U ā [n] of size at least mink, āfā0 such that for every i ā U we have fi 6= 0. Supportsamplers are needed crucially as subroutines for many dynamic graph streaming algorithms, such asconnectivity, bipartitness, minimum spanning trees, min-cut, cut sparsifiers, spanners, and spectralsparsifiers [2]. They have also been applied to solve maximum matching [43], as well as hyperedgeconnectivity [32]. A more comprehensive study of their usefulness in dynamic graph applicationscan be found in [41].
For strict turnstile streams, an Ī©(k log2(n/k)) lower bound is known [41], and for generalturnstile streams there is an O(k log2(n)) algorithm [38]. In this section we demonstrate that forL0 Ī±-property streams in the strict-turnstile case, more efficient support samplers exist. For therest of the section, we write Ī±-property to refer to the L0 Ī±-property, and we use the notationdefined at the beginning of Section 6.1.
First consider the following template for the unbounded deletion case (as in [38]). First, wesubsample the set of items [n] at log(n) levels, where at level j, the set Ij ā [n] is subsampledwith expected size |Ij | = 2j . Let f |Ij be the vector f restricted to the coordinates of Ij (and 0elsewhere). Then for each Ij, the algorithm creates a small sketch xj of the vector f |Ij . If f |Ijis sparse, we can use techniques from sparse recovery to recover f |Ij and report all the non-zerocoordinates. We first state the following well known result which we utilize for this recovery.
Lemma 22 ([38]). Given 1 ā¤ s ā¤ n, there is a linear sketch and a recovery algorithm which, givenf ā Rn, constructs a linear sketch J(f) : Rn ā Rq for q = O(s) such that if f is s-sparse then therecovery algorithm returns f on input J(f), otherwise it returns DENSE with high probability. Thespace required is O(s log(n)) bits.
33
Next, observe that for L0 Ī±-deletion streams, the value F t0 is at least Lt
0 and at most Ī±L0 forevery t ā [m]. Therefore, if we are given an estimate of F t
0 , we show it will suffice to only subsampleat O(log(Ī±))-levels at a time. In order to estimate F t
0 we utilize the estimator Ī±StreamRoughL0Estfrom Corollary 2 of Section 6. For tā² ā„ t, let f t:t
ā² ā Rn be the frequency vector of the stream ofupdates t to tā². We use the notation given in our full algorithm in Figure 8. Notice that since Rt
is non-decreasing, once j is removed from Bt at time tā²j it will never enter again. So at the time oftermination, we have xj = J(f tj :m|Ij) for all j ā Bm.
Theorem 11. Given a strict turnstile stream f with the L0 Ī±-property and k ā„ 1, the algorithmĪ±-SupportSampler outputs a set U ā [n] such that fi 6= 0 for every i ā U , and such that withprobability 1 ā Ī“ we have |U | ā„ mink, āfā0 . The space required is O(k log(n) log(Ī“ā1)(log(Ī±) +log log(n))) bits.
Proof. First condition on the success of Ī±StreamRoughL0Est, which occurs with probability 11/12.Set iā = min(ā(log( ns
3L0))ā, log(n)). We first argue that tiā exists. Now xiā would be initialized as
soon as Rt ā„ L0(Ē«/Ī±)2, but Rt ā„ Lt
0, so this holds before termination of the algorithm. Further-more, for xiā to have been deleted by the end of the algorithm we would need Rt > (Ī±/Ē«)2L0, butwe know Rt < 8Ī±L0, so this can never happen. Finally, if L0 ā¤ F0 < 8 log(n)/ log log(n) and ourĪ±StreamRoughL0Est fails, then note iā ā„ ālog(ns log log(n)/(24 log(n)))ā, so we store iā for theentire algorithm.
Now we have Ltiā0 ā¤ Rtiā < (Ē«/Ī±)L0, and thus L
tiā0 /L0 < Ē«. Since f is a turnstile stream, it
follows that the number of strictly positive coordinates in f tiā :m is at least L0 ā Ltiā0 and at most
L0. Thus there are (1 Ā± Ē«)L0 strictly positive coordinates in f tiā :m. By same argument, we haveāf tiā :mā0 = (1Ā± Ē«)L0.
Let Xi indicate the event that ftiā :mi |Iiā 6= 0, and X =
ā
iXi. Using the pairwise inde-
pendence of h, the Xiās with ftiā :mi 6= 0 are pairwise independent, so we obtain Var(X) <
E[X] = āf tiā :mā0E[|Iiā |]/n. First assume L0 > s. Then ns/(3L0) ā¤ E[|Iiā |] < 2ns/(3L0), sofor Ē« < 1/48, we have E[X] ā [15s/48, 33s/48]. Then
ā
E[X] < 1/8E[X], so by Chebyshevās in-
equality, Pr[|X ā E[X]| > 1/4E[X]] < 1/4, and thus āf tiā :mi |Iiāā0 ā¤ 15/16s with probability 3/4.
In this case, ftiā :mi |Iiā is s-sparse, so we recover the vector w.h.p. by Lemma 22. Now if L0 ā¤ s
then F0 < Ī±s, so the index iā² = log(n) will be stored for the entire algorithm. Thus xiā² = J(f) andsince f is s sparse we recover f exactly w.h.p., and return all non-zero elements. So we can assumethat L0 > s.
It suffices now to show that there are at least k strictly positive coordinates in ftiā :mi |Iiā . Since
this number is also (1 Ā± L0), letting Xā²i indicate f
tiā :mi |Iiā > 0 and using the same inequalities as
in the last paragraph, it follows that there are at least s/15 > k strictly positive coordinates withprobability 3/4. Since the stream is a strict-turnstile stream, every strictly positive coordinate ofa suffix of the stream must be in the support of f , so we successfully return at least k coordinatesfrom the support with probability at least 1ā(1/4+1/4+1/12+1/12) = 1/3. Running O(log(Ī“ā1))copies in parallel and setting U to be the union of all coordinates returned, it follows that withprobability 1ā Ī“ at least mink, āfā0 distinct coordinates will be returned.
For memory, for each of the O(log(Ī“ā1)) copies we subsample at O(log(Ī±)+ log log(n)) differentlevels, and each to a vector of size O(k) (and each coordinate of each vector takes log(n) bits tostore) which gives our desired bound. This dominates the additive O(log(n)) bits needed to runĪ±StreamRoughL0Est.
34
8 Lower Bounds
We now show matching or nearly matching lower bounds for all problems we have considered. Ourlower bounds all follow via reductions from one-way randomized communication complexity. Weconsider both the public coin model, where Alice and Bob are given access to infinitely many sharedrandom bits, as well as the private coin model, where they do not have shared randomness.
We first state the communication complexity problems we will be reducing from. The first suchproblem we use is the Augmented-Indexing problem (Ind), which is defined as follows. Alice isgiven a vector y ā 0, 1n, and Bob is given a index iā ā [n], and the values yiā+1, . . . , yn. Alicethen sends a message M to Bob, from which Bob must output the bit yiā correctly. A correctprotocol for Ind is one for which Bob correctly outputs yiā with probability at least 2/3. Thecommunication cost of a correct protocol is the maximum size of the message M that the protocolspecifies to deliver. This problem has a well known lower bound of Ī©(n) (see [46], or [39]).
Lemma 23 (Miltersen et al. [46]). The one-way communication cost of any protocol for AugmentedIndexing ( Ind) in the public coin model that succeeds with probability at least 2/3 is Ī©(n).
We present the second communication complexity result which we will use for our reductions.We define the problem equality as follows. Alice is given y ā 0, 1n and Bob is given x ā 0, 1n,and are required to decide whether x = y. This problem has a well known Ī©(log(n))-bit lower boundwhen shared randomness is not allowed (see e.g., [6] where it is used).
Lemma 24. The one way communication complexity in the private coin model with 2/3 successprobability of Equality is Ī©(log(n)).
We begin with the hardness of the heavy hitters problem in the strict-turnstile setting. Ourhardness result holds not just for Ī±-property streams, but even for the special case of strongĪ±-property streams (Definition 2). The result matches our upper bound for normal Ī±-propertystreams from Theorem 4 up to log log(n) and log(Ē«ā1) terms.
Theorem 12. For p ā„ 1 and Ē« ā (0, 1), any one-pass Lp heavy hitters algorithm for strong L1
Ī±-property streams in the strict turnstile model which returns a set containing all i ā [n] such that|fi| ā„ Ē«āfāp and no i ā [n] such that |fi| < (Ē«/2)āfāp with probability at least 2/3 requires Ī©(Ē«āp log(nĒ«p) log(Ī±)) bits.
Proof. Suppose there is such an algorithm which succeeds with probability at least 2/3, and consideran instance of augmented indexing. Alice receives y ā 0, 1d, and Bob gets iā ā [d] and yj forj > iā. Set D = 6, and let X be the set of all subsets of [n] with ā1/(2Ē«)pā elements, and setd = log6(Ī±/4)ālog(|X|)ā. Alice divides y into r = log6(Ī±/4) contiguous chunks y1, y2, . . . , yr eachcontaining ālog(|X|)ā bits. She uses yj as an index into X to determine a subset xj ā [n] with|xj | = ā1/(2Ē«)pā. Thinking of xj as a binary vector in Rn, Alice defines the vector v ā Rn by.
v = (Ī±D + 1)x1 + (Ī±D2 + 1)x2 + Ā· Ā· Ā·+ (Ī±Dr + 1)xr
She then creates a stream and inserts the necessary items so that the current frequency vector ofthe stream is v. She then sends the state of her heavy hitters algorithm to Bob, who wants toknow yiā ā yj for some j = j(iā). Knowing yiā+1, . . . , yd already, he can compute u = Ī±Dj+1xj+1+Ī±Dj+2xj+2+ Ā· Ā· Ā·+Ī±Drxr. Bob then runs the stream which subtracts off u from the current stream,resulting in a final frequency vector of f = vāu. He then runs his heavy hitters algorithm to obtaina set S ā [n] of heavy hitters.
35
We now argue that if correct, his algorithm must produce S = xj. Note that some items in [n]may belong to multiple sets xi, and would then be inserted as part of multiple xiās, so we mustdeal with this possibility. For p ā„ 1, the weight āfāpp is maximized by having x1 = x2 = Ā· Ā· Ā· = xj ,and thus
āfāpp ā¤ 1/(2Ē«)p(
jā
i=1
Ī±Di + 1)p
ā¤ Ē«āp(Ī±Dj+1/10 + 1)p
ā¤ Ē«āpĪ±pDjp
and for every k ā xj we have |fk|p ā„ (Ī±Dj + 1)p ā„ Ē«pāfāpp as desired. Furthermore, āfāpp ā„|xj |Ī±pDjp = ā1/(2Ē«)pāĪ±pDjp, and the weight of any element kā² ā [n] \ xj is at most (
ājā1i=1 Ī±D
i +1)p ā¤ (Ī±Dj/5+(jā1))p < Djp/4p for Ī± ā„ 3, so fkā² will not be a Ē«/2 heavy hitter. Thus, if correct,Bobās heavy hitter algorithm will return S = xj . So Bob obtains S = xj , and thus recovers yj
which indexed xj, and can compute the relevant bit yiā ā yj and return this value successfully.Hence Bob solves ind with probability at least 2/3.
Now observe that at the end of the stream each coordinate has frequency at least 1 and re-ceived fewer than 3Ī±2 updates (assuming updates have magnitude 1). Thus this stream on [n]items has the strong (3Ī±2)-property. Additionally, the frequencies of the stream are always non-negative, so the stream is a strict turnstile stream. It follows by Lemma 23 that any heavy hittersalgorithm for strict turnstile strong Ī±-property streams requires Ī©(d) = Ī©(log(
ā
Ī±/3) log(|X|))= Ī©(Ē«āp log(Ī±) log(nĒ«p)) bits as needed.
Next, we demonstrate the hardness of estimating the L1 norm in the Ī±-property setting. First,we show that the problem of L1 estimation in the general turnstile model requires Ī©(log(n))-bitseven for Ī± property streams with Ī± = O(1). We also give a lower bound of Ī©(1/Ē«2 log(Ī±)) bits forgeneral turnstile L1 estimation for strong Ī±-property streams.
Theorem 13. For any Ī± ā„ 3/2, any algorithm that produces an estimate L1 ā (1 Ā± 1/16)āfā1 ofa general turnstile stream f with the L1 Ī± property with probability 2/3 requires Ī©(log(n)) bits ofspace.
Proof. Let G be a family of t = 2Ī©(n/2) = 2nā²subsets of [n/2], each of size n/8 such that no two
sets have more than n/16 elements in common. As noted in [6], the existence of G follows fromstandard results in coding theory, and can be derived via a simple counting argument. We nowreduce from equality, where Alice has y ā 0, 1nā²
and Bob has x ā 0, 1nā². Alice can use y
to index into G to obtain a subset sy ā [n/2], and similarly Bob obtains sx ā [n/2] via x. Letyā², xā² ā 0, 1n be the characteristic vectors of sy, sx respectively, padded with n/2 0ās at the end.
Now Alice creates a stream f on n elements by first inserting yā², and then inserting the vectorv where vi = 1 for i > n/2 and 0 otherwise. She then sends the state of her streaming algorithm toBob, who deletes xā² from the stream. Now if x = y, then xā² = yā² and āfā1 = āyā² + v ā xā²ā1 = n/2.On the other hand, if x 6= y then each of sx, sy have at least n/16 elements in one and not inthe other. Thus āyā² ā xā²ā1 ā„ n/8, so āfā1 ā„ 5n/8. Thus a streaming algorithm that producesL1 ā (1Ā±1/16)āfā1 with probability 2/3 can distinguish between the two cases, and therefore solveequality, giving an Ī©(log(nā²)) lower bound. Since nā² = Ī©(n), it follows that such an L1 estimatorrequires Ī©(log(n)) bits as stated.
Finally, note that at most 3n/4 unit increments were made to f , and āfā1 ā„ n/2, so f indeedhas the Ī± = 3/2 property.
36
Theorem 14. Any algorithm that produces an estimate L ā (1Ā± Ē«)āfā1 with probability 11/12 of ageneral turnstile stream f ā Rn with the strong L1 Ī± property requires Ī©( 1
Ē«2log(Ē«2Ī±)) bits of space.
To prove Theorem 14, we first define the following communication complexity problem.
Definition 3. In the Gap-Ham communication complexity problem, Alice is given x ā 0, 1nand Bob is given y ā 0, 1n. Bob is promised that either āxā yā1 < n/2āān (NO instance) orthat āxā yā1 > n/2 +
ān (YES instance), and must decide which instance holds.
Our proof of Theorem 14 will use the following reduction. Our proof is similar to the lowerbound proof for unbounded deletion streams in [39].
Theorem 15 ([36], [59] Section 4.3). There is a reduction from Ind to Gap-Ham such that decidingGap-Ham with probability at least 11/12 implies a solution to Ind with probability at least 2/3.Furthermore, in this reduction the parameter n in Ind is within a constant factor of that for thereduced Gap-Ham instance.
We are now ready to prove Theorem 14.
Proof. Set k = ā1/Ē«2ā, and let t = ālog(Ī±Ē«2)ā. The reduction is from Ind. Alice receives x ā 0, 1kt,and Bob obtains iā ā [kt] and xj for j > iā. Alice conceptually breaks her string x up into tcontiguous blocks bi of size k. Bobās index i
ā lies in some block bjā for jā = jā(iā), and Bob knowsall the bits of the blocks bi for i > jā. Alice then applies the reduction of Theorem 15 on each blockbi separately to obtain new vectors yi of length ck for i ā [t], where c ā„ 1 is some small constant.Let Ī² = c2Ē«ā2Ī±. Alice then creates a stream f on ckt items by inserting the update ((i, j), Ī²2i +1)for all (i, j) such that (yi)j = 1. Here we are using (i, j) ā [t]Ć [ck] to index into [ckt]. Alice thencomputes the value vi = āyiā1 for i ā [t], and sends v1, . . . , vt to Bob, along with the state of thestreaming algorithm run on f .
Upon receiving this, since Bob knows the bits yz for z ā„ iā, Bob can run the same reductionson the blocks bi as Alice did to obtain yi for i > jā. He then can make the deletions ((i, j),āĪ²2i)for all (i, j) such that i > jā and (yi)j = 1, leaving these coordinate to be f(i,j) = 1. Bob thenperforms the reduction from Ind to Gap-Ham specifically on the block bjā to obtain a vector y(B)of length ck, such that deciding whether āy(B)āyjāā1 > ck/2+
āck or āy(B)āyjāā1 < ck/2ā
āck
with probability 11/12 will allow Bob to solve the instance of Ind on block bjā with index iā in bjā .Then for each i such that y(B)i = 1, Bob makes the update ((jā, ji),āĪ²2jā) to the stream f . Hethen runs an L1 approximation algorithm to obtain L = (1 Ā± Ē«)āfā1 with probability 11/12. LetA be the number of indices such that y(B)i > (yjā)i. Let B be the number of indices such thaty(B)i < (yjā)i. Let C be the number of indices such that y(B)i = 1 = (yjā)i, and let D be thenumber of indices (i, j) with i > jā such that (yi)j = 1. Then we have
āfā1 = Ī²2jāA+ (Ī²2j
ā+ 1)B + C +D +
ā
i<jā
vi(Ī²2i + 1)
Let Z = C +D + B, and note that Z < ckt < Ī². Let Ī· =ā
i<jā vi(Ī²2i + 1). Bob can compute Ī·
exactly knowing the values v1, . . . , vt. Rearranging terms, we have āy(B)ā yjāā1 = (āfā1 āZ āĪ·)/(Ī²2j
ā) = (āfā1āĪ·)/(Ī²2j
ā)Ā±1. Recall that Bob must decide whether āy(B)āyjāā1 > ck/2+
āck
or āy(B) ā yjāā1 < ck/2 āāck. Thus, in order to solve this instance of Gap-Ham it suffices to
obtain an additiveāck/8 approximation of āfā1/(Ī²2j
ā). Now note that
āfā1/(Ī²2jā) ā¤ ckt(Ī²2jā)ā1 + (Ī²2j
ā)ā1
jāā
i=1
Ī²2j Ā· ck
37
ā¤ 1 + 4ck
where the ckt < Ī² term comes from the fact that every coordinate that is ever inserted will havemagnitude at least 1 at the end of the stream. Taking Ē«ā² =
āck/(40ck) = 1/(40
āck) = O(1/Ē«), it
follows that a (1Ā±Ē«ā²) approximation of āfā1 gives aāck/8 additive approximation of āfā1/(Ī²2j
ā) as
required. Thus suppose there is an algorithm A that can obtain such a (1Ā±O(1/Ē«2)) approximationwith probability 11/12. By the reduction of Theorem 15 and the hardness of Ind in Lemma 23,it follows that this protocol just described requires Ī©(kt) bits of space. Since the only informationcommunicated between Alice and Bob other than the state of the streaming algorithm was theset v1, . . . , vt, which can be sent with O(t log(k)) = o(kt) bits, it follows that A must have usedĪ©(kt) = Ī©(Ē«ā2 log(Ī±Ē«)) bits of space, which is the desired lower bound.
Now note that every coordinate i that was updated in the stream had final magnitude |fi| ā„ 1.Furthermore, no item was inserted more than Ī²2t + 1 < Ī±2c2 + 1 times, thus the stream has thestrong O(Ī±2) property. We have proven that any algorithm that gives a (1 Ā± Ē«) approximationof āfā1 where f is a strong Ī±-property stream with probability 11/12 requires Ī©(Ē«ā2 log(Ē«2
āĪ±))
= Ī©(Ē«ā2 log(Ē«2Ī±)) bits, which completes the proof.
We now give a matching lower bound for L1 estimation of strict-turnstile strong Ī±-propertystreams. This exactly matches our upper bound of Theorem 6, which is for the more generalĪ±-property setting.
Theorem 16. For Ē« ā (0, 1/2) and Ī± < n, any algorithm which gives an Ē«-relative error approxi-mation of the L1 of a strong L1 Ī± property stream in the strict turnstile setting with probability atleast 2/3 must use Ī©(log(Ī±) + log(1/Ē«) + log log(n)) bits.
Proof. The reduction is from ind. Alice, given x ā 0, 1t where t = log10(Ī±/4), constructs astream u ā Rt such that ui = Ī±10ixi + 1. She then sends the state of the stream u to Bob who,given j ā [n] and xj+1, . . . , xt, subtracts off v where vi = Ī±2ixi for i ā„ j + 1, and 0 otherwise. Hethen runs the L1 estimation algorithm on uā v, and obtains the value L such that L = (1 Ā± Ē«)L1
with probability 2/3. We argue that if L = (1Ā± Ē«)L1 (for Ē« < 1/2) then L > (1ā Ē«)(Ī±10j ) > Ī±10j/2iff xj = 1. If xj = 1 then (uj ā vj) = Ī±10j + 1, if L > (1 ā Ē«)L1 the result follows. If xj = 0, then
the total L1 is at most Ī±/4 + Ī±ājā1
i=1 10i < Ī±10i
9 + Ī±/4 < Ī±10i/3, so L < (1 + Ē«)L1 < Ī±10j/2 asneeded to solve ind. Note that each coordinate has frequency at least 1 at the end of the stream,and no coordinate received more than Ī±2 updates. Thus the stream has the strong Ī±2-property.By Lemma 23, it follows that any one pass algorithm for constant factor L1 estimation of a strictturnstile strong Ī±-property stream requires Ī©(log(
āĪ±)) = Ī©(log(Ī±)) bits.
Finally, note that in the restricted insertion only case (i.e. Ī± = 1), estimating the L1 normmeans estimating the value m given only the promise that m ā¤M = poly(n). There are log(M)/Ē«powers of (1+Ē«) that could potentially be a (1Ā±Ē«) approximation of m, so to represent the solutionrequires requires log(log(M)/Ē«) = O(log log(n) + log(1/Ē«)) bits of space, which gives the rest of thestated lower bound.
We now prove a lower bound on L0 estimation. Our lower bound matches our upper boundof Theorem 10 up to log log(n) and log(1/Ē«) multiplicative factors, and a log(n) additive term.To do so, we use the following Theorem of [39], which uses a one way two party communicationcomplexity lower bound.
Theorem 17 (Theorem A.2. [39]). Any one pass algorithm that gives a (1Ā±Ē«) multiplicative approx-imation of the L0 of a strict turnstile stream with probability at least 11/12 requires Ī©(Ē«ā2 log(Ē«2n))bits of space.
38
Setting n = O(Ī±) will give us:
Theorem 18. For Ē« ā (0, 1/2) and Ī± < n/ log(n), any one pass algorithm that gives a (1 Ā± Ē«)multiplicative approximation of the L0 of a L0 Ī±-property stream in the strict turnstile setting withprobability at least 11/12 must use Ī©(Ē«ā2 log(Ē«2Ī±) + log log(n)) bits of space.
Proof. We can first construct the same stream used in the communication complexity lower boundof Theorem 17 on n = Ī±ā 1 elements, and then allow Bob to insert a final dummy element at theend of the stream with frequency 1. The stream now has Ī± elements, and the L0 of the resultingstream, call it R, is exactly 1 larger than the initial L0 (which we will now refer to as L0). Moreover,this stream has the L0 Ī± property since the final frequency vector is non-zero and there Ī± itemsin the stream. If we then obtained an estimate R = (1 Ā± Ē«)R = (1 Ā± Ē«)(L0 + 1), then the originalL0 = 0 if R < (1 + Ē«). Otherwise R ā 1 = (1 Ā± O(Ē«)L0, and a constant factor rescaling of Ē«gives the desired approximation of the initial L0. By Theorem 17, such an approximation requiresĪ©(Ē«ā2 log(Ē«2Ī±)) bits of space, as needed. The Ī©(log log(n)) lower bound follows from the proof ofLemma 16, replacing the upper bound M ā„ m with n.
Next, we give lower bounds for L1 and support sampling. Our lower bound for L1 samplersholds in the more restricted strong Ī±-property setting, and for such streams we show that eventhose which return an index from a distribution with variation distance at most 1/6 from the L1
distribution |fi|/āfā1 requires Ī©(log(n) log(Ī±)) bits. In this setting, taking Ē« = o(1), this boundmatches our upper bound from Theorem 5 up to log log(n) terms. For Ī± = o(n), our supportsampling lower bound matches our upper bound in Theorem 11.
Theorem 19. Any one pass L1 sampler of a strong L1 Ī±-property stream f in the strict turnstilemodel with an output distribution that has variation distance at most 1/6 from the L1 distribution|fi|/āfā1 and succeeds with probability 2/3 requires Ī©(log(n) log(Ī±)) bits of space.
Proof. Consider the same strict turnstile strong O(Ī±2)-property stream constructed by Alice andBob in Theorem 12, with Ē« = 1/2. Then X = [n] is the set of all subsets of [n] with 1 item. Ifiā ā [n] is Bobās index, then let j = j(iā) be the block yj such that yiā ā yj . The block yj haslog(|X|) = log(n) bits, and Bobās goal is to determine the set xj ā [n] of exactly one item whichis indexed by yj . Then for the one sole item k ā xj will be a 1/2-heavy hitter, and no other itemwill be a 1/4-heavy hitter, so Bob can run O(1) parallel L1 samplers and find the item kā² that isreturned the most number times by his samplers. If his sampler functions as specified, having atmost 1/6 variational distance from the L1 distribution, then kā² = k with large constant probabilityand Bob can recovery the bit representation xj of k, from which he recovers the bit yiā ā yj asneeded. Since Aliceās string had length Ī©(log(Ī±) log(|X|)) = Ī©(log(Ī±) log(n)), we obtain the statedlower bound.
Theorem 20. Any one pass support sampler that outputs an arbitrary i ā [n] such that fi 6= 0,of an L0 Ī±-property stream with failure probability at most 1/3, requires Ī©(log(n/Ī±) log(Ī±)) bits ofspace.
Proof. The reduction is again from Ind. Alice receives y ā 0, 1d, for d = ālog(n/Ī±) log(Ī±/4)ā, andbreaks it into blocks y1, . . . , ylog(Ī±/4), each of size ālog(n/Ī±)ā. She then initializes a stream vectorf ā Rn, and breaks f into ā4n/Ī±ā blocks of size Ī±/4, say B1, . . . , Bā4n/Ī±ā. She uses yi as an indexinto a block Bj for j = j(i), and then inserts 2i distinct items into block Bj, each exactly once,and sends the state of her algorithm over to Bob. Bob wants to determine yiā for his fixed iā ā [n].Let k be such that yiā ā yj and Bk be the block indexed by yj. He knows yj+1, . . . , ylog(Ī±/4), and
39
can delete the corresponding items from f that were inserted by Alice. At the end, the block Bk
has 2j items in it, and the total number of distinct items in the stream is less than 2j+1. Moreoverno other block has more than 2jā1 items.
Now suppose Bob had access to an algorithm that would produce a uniformly random non-zeroitem at the end of the stream, and that would report FAIL with probability at most 1/3. Hecould then run O(1) such algorithms, and pick the block Bkā² such that more than 4/10 of thereturned indices are in Bkā² . If his algorithms are correct, we then must have Bkā² = Bk with largeconstant probability, from which he can recover yj and his bit yjā, thus solving Ind and giving theĪ©(d) = Ī©(log(n/Ī±) log(Ī±)) lower bound by Lemma 23.
We now show how such a random index from the support of f can be obtained using onlya support sampler. Alice and Bob can used public randomness to agree on a uniformly randompermutation Ļ : [n]ā [n], which gives a random relabeling of the items in the stream. Then, Alicecreates the same stream as before, but using the relabeling and inserting the items in a randomizedorder into the stream, using separate randomness from that used by the streaming algorithm. Inother words, instead of inserting i ā [n] if i was inserted before, Alice inserts Ļ(i) at a randomposition in the stream. Bob the receives the state of the streaming algorithm, and then similarlydeletes the items he would have before, but under the relabeling Ļ and in a randomized orderinstead.
Let i1, . . . , ir ā [n] be the items inserted by Alice that were not deleted by Bob, orderedby the order in which they were inserted into the stream. If Bob were then to run a supportsampler on this stream, he would obtain an arbitrary i = g(i1, . . . , ir) ā i1, . . . , ir, where g isa (possibly randomized) function of the ordering of the sequence i1, . . . , ir. The randomness usedby the streaming algorithm is separate from the randomness which generated the relabeling Ļ andthe randomness which determined the ordering of the items inserted and deleted from the stream.Thus, even conditioned on the randomness of the streaming algorithm, any ordering and labeling ofthe surviving items i1, . . . , ir is equally likely. In particular, i is equally likely to be any of i1, . . . , ir.It follows that Ļā1(i) is a uniformly random element of the support of f , which is precisely whatBob needed to solve Ind, completing the proof.
Finally, we show that estimating inner products even for strong Ī±-property streams requiresĪ©(Ē«ā1 log(Ī±)) bits of space. Setting Ī± = n, we obtain an Ī©(Ē«ā1 log(n)) lower bound for unboundeddeletion streams, which our upper bound beats for small Ī±.
Theorem 21. Any one pass algorithm that runs on two strong L1 Ī±-property streams f, g in thestrict turnstile setting and computes a value IP(f, g) such that IP(f, g) = ćf, gć + Ē«āfā1āgā1 withprobability 2/3 requires Ī©(Ē«ā1 log(Ī±)) bits of space.
Proof. The reduction is from IND. Alice has y ā 0, 1d where y is broken up into log10(Ī±)/4 blocksof size 1/(8Ē«), where the indices of the i-th block are called Bi. Bob wants to learn yiā and isgiven yj for j ā„ iā, and let jā be such that iā ā Bjā. Alice creates a stream f on d items, and ifi ā Bj , then Alice inserts the items to make fi = bi10
j + 1, where bi = Ī± if yi = 0, and bi = 2Ī±otherwise. She creates a second stream g = ~0 ā Rd, and sends the state of her streaming algorithmover to Bob. For every yi ā Bj = Bj(i) that Bob knows, he subtracts off bi10
j , leaving fi = 1.He then sets giā = 1, and obtains IP(f, g) via his estimator. We argue that with probability 2/3,IP(f, g) ā„ 3Ī±10j
ā/2 iff yiā = 1. Note that the error is always at most
Ē«āfā1āgā1 = Ē«āfā1
ā¤ Ē«(
d+ (8Ē«)ā1(2Ī±10jā) +
ā
j<jā
ā
iāBj
(2Ī±10j))
40
Ē«(
1/(8Ē«) log(Ī±)/4 + (4Ē«)ā1Ī±10jā+ Ē«ā1Ī±10j
ā/32
)
< Ī±10jā/3
Now since ćf, gć = (yiā + 1)Ī±10jā+ 1, if yiā = 0 then if the inner product algorithm succeeds with
probability 2/3 we must have IP(f, g) ā¤ Ī±10j +1+Ī±10j/3 < 3Ī±10jā/2, and similarly if yiā = 1 we
have IP(f, g) > 2Ī±10j + 1ā Ī±10j/3 > 3Ī±10j/2 as needed. So Bob can recover yiā with probability2/3 and solve IND, giving an Ī©(d) lower bound via Lemma 23. Since each item in f received atmost 5(Ī±2) updates and had final frequency 1, this stream has the strong 5Ī±2-property, and g wasinsertion only. Thus obtaining such an estimate of the inner product between strong Ī±-propertystreams requires Ī©(Ē«ā1 log(
āĪ±)) = Ī©(Ē«ā1 log(Ī±) bits, as stated.
9 Conclusion
We have shown that for bounded deletion streams, many important L0 and L1 streaming problemscan be solved more efficiently. For L1, the fact the fiās are approximately preserved under samplingpoly(Ī±) updates paves the way for our results, whereas for L0 we utilizes the fact that O(log(Ī±))-levels of sub-sampling are sufficient for several algorithms. Interestingly, it is unclear whetherimproved algorithms for Ī±-property streams exist for L2 problems, such as L2 heavy hitters orL2 estimation. The difficulty stems from the fact that āfā2 is not preserved in any form undersampling, and thus L2 guarantees seem to require different techniques than those used in this paper.
However, we note that by utilizing the heavy hitters algorithm of [11], one can solve the generalturnstile L2 heavy hitters problem for Ī±-property streams in O(Ī±2 log(n) log(Ī±)) space. A proofis sketched in Appendix A. Clearly a polynomial dependence on Ī± is not desirable; however, forthe applications in which Ī± is a constant this still represents a significant improvement over theĪ©(log2(n)) lower bound for turnstile streams. We leave it as an open question whether the optimaldependence on Ī± can be made logarithmic.
Additionally, it is possible that problems in dynamic geometric data streams (see [34]) wouldbenefit by bounding the number of deletions on the streams in question. We note that several ofthese geometric problems directly reduce to problems in the data stream model studied here, forwhich our results directly apply.
Finally, it would be interesting to see if analogous models with lower bounds on the āsignalsizeā of the input would result in improved algorithms for other classes of sketching algorithms.In particular, determining the appropriate analogue of the Ī±-property for linear algebra problems,such as row-rank approximation and regression, could be a fruitful direction for further research.
References
[1] Swarup Acharya, Phillip B Gibbons, Viswanath Poosala, and Sridhar Ramaswamy. The aquaapproximate query answering system. In ACM Sigmod Record, volume 28, pages 574ā576.ACM, 1999.
[2] Kook Jin Ahn, Sudipto Guha, and Andrew McGregor. Analyzing graph structure via linearmeasurements. In Proceedings of the Twenty-third Annual ACM-SIAM Symposium on DiscreteAlgorithms, SODA ā12, pages 459ā467, Philadelphia, PA, USA, 2012. Society for Industrial andApplied Mathematics. URL: http://dl.acm.org/citation.cfm?id=2095116.2095156.
41
[3] Miklos Ajtai, Randal Chilton Burns, Ronald Fagin, and Larry Joseph Stockmeyer. Systemand method for differential compression of data from a plurality of binary sources, April 162002. US Patent 6,374,250.
[4] Aditya Akella, Ashwin Bharambe, Mike Reiter, and Srinivasan Seshan. Detecting ddos attackson isp networks.
[5] Noga Alon, Phillip B Gibbons, Yossi Matias, and Mario Szegedy. Tracking join and self-joinsizes in limited storage. In Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGARTsymposium on Principles of database systems, pages 10ā20. ACM, 1999.
[6] Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating thefrequency moments. In Proceedings of the twenty-eighth annual ACM symposium on Theoryof computing, pages 20ā29. ACM, 1996.
[7] Alexandr Andoni, Robert Krauthgamer, and Krzysztof Onak. Streaming algorithms fromprecision sampling. arXiv preprint arXiv:1011.1263, 2010.
[8] Khanh Do Ba, Piotr Indyk, Eric Price, and David P Woodruff. Lower bounds for sparse recov-ery. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms,pages 1190ā1197. SIAM, 2010.
[9] Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom. Modelsand issues in data stream systems. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 1ā16. ACM, 2002.
[10] Arnab Bhattacharyya, Palash Dey, and David P Woodruff. An optimal algorithm for l1-heavyhitters in insertion streams and related problems. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 385ā400. ACM, 2016.
[11] Vladimir Braverman, Stephen R Chestnut, Nikita Ivkin, Jelani Nelson, Zhengyu Wang, andDavid P Woodruff. Bptree: an l2 heavy hitters algorithm using constant memory. arXivpreprint arXiv:1603.00759, 2016.
[12] Vladimir Braverman, Stephen R Chestnut, Nikita Ivkin, and David P Woodruff. Beatingcountsketch for heavy hitters in insertion streams. In Proceedings of the forty-eighth annualACM symposium on Theory of Computing, pages 740ā753. ACM, 2016.
[13] J Lawrence Carter and Mark N Wegman. Universal classes of hash functions. Journal ofcomputer and system sciences, 18(2):143ā154, 1979.
[14] Moses Charikar, Kevin Chen, and Martin Farach-Colton. Finding frequent items in datastreams. Automata, languages and programming, pages 784ā784, 2002.
[15] Edith Cohen. Stream sampling for frequency cap statistics. In Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining, pages 159ā168.ACM, 2015.
[16] Edith Cohen, Graham Cormode, and Nick Duffield. Donāt let the negatives bring you down:sampling from streams of signed updates. ACM SIGMETRICS Performance Evaluation Re-view, 40(1):343ā354, 2012.
42
[17] Edith Cohen, Graham Cormode, and Nick G. Duffield. Structure-aware sam-pling: Flexible and accurate summarization. PVLDB, 4(11):819ā830, 2011. URL:http://www.vldb.org/pvldb/vol4/p819-cohen.pdf.
[18] Edith Cohen, Nick Duffield, Haim Kaplan, Carsten Lund, and Mikkel Thorup. Stream sam-pling for variance-optimal estimation of subset sums. In Proceedings of the twentieth AnnualACM-SIAM Symposium on Discrete Algorithms, pages 1255ā1264. Society for Industrial andApplied Mathematics, 2009.
[19] Edith Cohen, Nick G. Duffield, Haim Kaplan, Carsten Lund, and Mikkel Thorup. Al-gorithms and estimators for summarization of unaggregated data streams. J. Comput.Syst. Sci., 80(7):1214ā1244, 2014. URL: https://doi.org/10.1016/j.jcss.2014.04.009,doi:10.1016/j.jcss.2014.04.009.
[20] Graham Cormode, Theodore Johnson, Flip Korn, Shan Muthukrishnan, Oliver Spatscheck,and Divesh Srivastava. Holistic udafs at streaming speeds. In Proceedings of the 2004 ACMSIGMOD international conference on Management of data, pages 35ā46. ACM, 2004.
[21] Graham Cormode, Hossein Jowhari, Morteza Monemizadeh, and S Muthukrishnan. The sparseawakens: Streaming algorithms for matching size estimation in sparse graphs. arXiv preprintarXiv:1608.03118, 2016.
[22] Graham Cormode and Shan Muthukrishnan. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58ā75, 2005.
[23] Cristian Estan and George Varghese. New directions in traffic measurement and accounting:Focusing on the elephants, ignoring the mice. ACM Trans. Comput. Syst., 21(3):270ā313,2003. URL: http://doi.acm.org/10.1145/859716.859719, doi:10.1145/859716.859719.
[24] Cristian Estan, George Varghese, and Mike Fisk. Bitmap algorithms for counting active flowson high speed links. In Proceedings of the 3rd ACM SIGCOMM conference on Internet mea-surement, pages 153ā166. ACM, 2003.
[25] Schkolnick Finkelstein, Mario Schkolnick, and Paolo Tiberio. Physical database design forrelational databases. ACM Transactions on Database Systems (TODS), 13(1):91ā128, 1988.
[26] M. Garofalakis, J. Gehrke, and R. Rastogi. Data Stream Management: Processing High-SpeedData Streams. Data-Centric Systems and Applications. Springer Berlin Heidelberg, 2016. URL:https://books.google.com/books?id=qiSpDAAAQBAJ.
[27] Minos Garofalakis, Johannes Gehrke, and Rajeev Rastogi. Data stream management: A bravenew world. pages 1ā9, 01 2016.
[28] Rainer Gemulla, Wolfgang Lehner, and Peter J. Haas. A dip in the reservoir: Maintainingsample synopses of evolving datasets. In Proceedings of the 32nd International Conference onVery Large Data Bases, Seoul, Korea, September 12-15, 2006, pages 595ā606, 2006. URL:http://dl.acm.org/citation.cfm?id=1164179.
[29] Rainer Gemulla, Wolfgang Lehner, and Peter J. Haas. Maintaining bernoulli samplesover evolving multisets. In Proceedings of the Twenty-Sixth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 11-13, 2007, Beijing,China, pages 93ā102, 2007. URL: http://doi.acm.org/10.1145/1265530.1265544,doi:10.1145/1265530.1265544.
43
[30] Rainer Gemulla, Wolfgang Lehner, and Peter J. Haas. Maintaining bounded-sizesample synopses of evolving datasets. VLDB J., 17(2):173ā202, 2008. URL:https://doi.org/10.1007/s00778-007-0065-y, doi:10.1007/s00778-007-0065-y.
[31] Phillip B. Gibbons and Yossi Matias. New sampling-based summary statistics for improvingapproximate query answers. In SIGMOD 1998, Proceedings ACM SIGMOD International Con-ference on Management of Data, June 2-4, 1998, Seattle, Washington, USA., pages 331ā342,1998. URL: http://doi.acm.org/10.1145/276304.276334, doi:10.1145/276304.276334.
[32] Sudipto Guha, Andrew McGregor, and David Tench. Vertex and hyperedge connectivity in dy-namic graph streams. In Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposiumon Principles of Database Systems, pages 241ā247. ACM, 2015.
[33] Peter J. Haas. Data-stream sampling: Basic techniques and results. In DataStream Management - Processing High-Speed Data Streams, pages 13ā44. 2016. URL:https://doi.org/10.1007/978-3-540-28608-0_2, doi:10.1007/978-3-540-28608-0_2.
[34] Piotr Indyk. Algorithms for dynamic geometric problems over data streams. In Proceedings ofthe thirty-sixth annual ACM symposium on Theory of computing, pages 373ā380. ACM, 2004.
[35] Piotr Indyk. Stable distributions, pseudorandom generators, embeddings, and data streamcomputation. Journal of the ACM (JACM), 53(3):307ā323, 2006.
[36] Thathachar S Jayram, Ravi Kumar, and D Sivakumar. The one-way communication complex-ity of hamming distance. Theory of Computing, 4(1):129ā135, 2008.
[37] Thathachar S Jayram and David P Woodruff. Optimal bounds for johnson-lindenstrausstransforms and streaming problems with subconstant error. ACM Transactions on Algorithms(TALG), 9(3):26, 2013.
[38] Hossein Jowhari, Mert Saglam, and Gabor Tardos. Tight bounds for lp samplers, findingduplicates in streams, and related problems. In Proceedings of the Thirtieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ā11, pages 49ā58,New York, NY, USA, 2011. ACM. URL: http://doi.acm.org/10.1145/1989284.1989289,doi:10.1145/1989284.1989289.
[39] Daniel M Kane, Jelani Nelson, and David P Woodruff. On the exact space complexity ofsketching and streaming small norms. In Proceedings of the twenty-first annual ACM-SIAMsymposium on Discrete Algorithms, pages 1161ā1178. SIAM, 2010.
[40] Daniel M Kane, Jelani Nelson, and David P Woodruff. An optimal algorithm for the dis-tinct elements problem. In Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGARTsymposium on Principles of database systems, pages 41ā52. ACM, 2010.
[41] Michael Kapralov, Jelani Nelson, Jakub Pachocki, Zhengyu Wang, David P Woodruff, andMobin Yahyazadeh. Optimal lower bounds for universal relation, and for samplers and findingduplicates in streams. arXiv preprint arXiv:1704.00633, 2017.
[42] Donald Ervin Knuth. The art of computer programming, Volume II:Seminumerical Algorithms, 3rd Edition. Addison-Wesley, 1998. URL:http://www.worldcat.org/oclc/312898417.
44
[43] Christian Konrad. Maximum matching in turnstile streams. In Algorithms-ESA 2015, pages840ā852. Springer, 2015.
[44] Anukool Lakhina, Mark Crovella, and Christophe Diot. Mining anomalies using traffic featuredistributions. In ACM SIGCOMM Computer Communication Review, volume 35, pages 217ā228. ACM, 2005.
[45] Gurmeet Singh Manku and Rajeev Motwani. Approximate fre-quency counts over data streams. PVLDB, 5(12):1699, 2012. URL:http://vldb.org/pvldb/vol5/p1699_gurmeetsinghmanku_vldb2012.pdf.
[46] Peter Bro Miltersen, Noam Nisan, Shmuel Safra, and Avi Wigderson. On data structures andasymmetric communication complexity. In Proceedings of the twenty-seventh annual ACMsymposium on Theory of computing, pages 103ā111. ACM, 1995.
[47] Morteza Monemizadeh and David P Woodruff. 1-pass relative-error lp-sampling with applica-tions. In Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms,pages 1143ā1160. SIAM, 2010.
[48] David Moore, 2001. URL: http://www.caida.org/research/security/code-red/.
[49] Robert Morris. Counting large numbers of events in small registers. Communications of theACM, 21(10):840ā842, 1978.
[50] Shanmugavelayutham Muthukrishnan et al. Data streams: Algorithms and applications. Foun-dations and Trends RĀ© in Theoretical Computer Science, 1(2):117ā236, 2005.
[51] Rob Pike, Sean Dorward, Robert Griesemer, and Sean Quinlan. Interpreting the data: Parallelanalysis with sawzall. Scientific Programming, 13(4):277ā298, 2005.
[52] Florin Rusu and Alin Dobra. Pseudo-random number generation for sketch-based estimations.ACM Transactions on Database Systems (TODS), 32(2):11, 2007.
[53] Florin Rusu and Alin Dobra. Sketches for size of join estimation. ACM Transactions onDatabase Systems (TODS), 33(3):15, 2008.
[54] Amit Shukla and Prasad Deshpande. Storage estimation for multidimensional aggregates inthe presence of hierarchies.
[55] Dan Teodosiu, Nikolaj Bjorner, Yuri Gurevich, Mark Manasse, and Joe Porkka. Optimizingfile replication over limited-bandwidth networks using remote differential compression. 2006.
[56] Mikkel Thorup and Yin Zhang. Tabulation based 4-universal hashing with applications tosecond moment estimation.
[57] Jeffrey Scott Vitter. Random sampling with a reservoir. ACM Trans. Math. Softw., 11(1):37ā57, 1985. URL: http://doi.acm.org/10.1145/3147.3165, doi:10.1145/3147.3165.
[58] Arno Wagner and Bernhard Plattner. Entropy based worm and anomaly detection in fast ipnetworks. In Enabling Technologies: Infrastructure for Collaborative Enterprise, 2005. 14thIEEE International Workshops on, pages 172ā177. IEEE, 2005.
[59] David Paul Woodruff. Efficient and private distance approximation in the communication andstreaming models. PhD thesis, Massachusetts Institute of Technology, 2007.
45
A Sketch of L2 heavy hitters algorithm
We first note that the BPTree algorithm for [11] solves the L2 heavy hitters problem in spaceO(Ē«ā2 log(n) log(1/Ē«)) with probability 2/3. Recall that the L2 version of the problem asks toreturn a set S ā [n] with all i such that |fi| ā„ Ē«āfā2 and no j such that |fj | < (Ē«/2)āfā2. Alsorecall that the Ī± property states that āI + Dā2 ā¤ Ī±āfā2, where I is the vector of the streamrestricted to the positive updates and D is the entry-wise absolute value of the vector of thestream restricted to the negative updates. It follows that if i ā [n] is an Ē« heavy hitter thenāI + Dā2 ā¤ Ī±āfā2 ā¤ Ī±/Ē«|fi| ā¤ Ī±/Ē«|Ii + Di|. So in the insertion-only stream I + D where everyupdate is positive, the item i must be an Ē«/Ī± heavy hitter.
Using this observation, we can solve the problem as follows. First run BPTree with Ē«ā² = Ē«/Ī± toobtain a set S ā n with all i such that |Ii +Di| ā„ (Ē«/Ī±)āI +Dā2 and no j such that |Ij +Dj | <(Ē«/2Ī±)āI +Dā2. It follows that |S| = O(Ī±2/Ē«2). So in parallel we run an instance of Countsketchon the original stream f with O(1/Ē«2) rows and O(log(Ī±/Ē«)) columns. For a fixed i, this gives anestimate of |fi| with additive error (Ē«/4)āfā2 with probability 1āO(Ē«/Ī±). At the end, we then querythe Countsketch for every i ā S, and return only those which have estimated weight (3Ē«/4)āfā2.By union bounding over all |S| queries, with constant probability the Countsketch will correctlyidentify all i ā |S| that are Ē« heavy hitters, and will throw out all items in S with weight less than(Ē«/2)āfā2. Since all Ē« heavy hitters are in S, this solves the problem. The space to run BPTree isO(Ī±2/Ē«2 log(n) log(Ī±/Ē«), which dominates the cost of the Countsketch.
46