Valutazione delle Norme con le Applicazioni (Norm Estimation with Applications: A Survey)

Valutazione delle Norme con le Applicazioni

(Norm Estimation with Applications: A Survey)

David Woodruff

IBM Almaden

Outline

1. The streaming model

2. Norm estimation1. Problems2. Results3. Upper bounds4. Lower bounds

3. Open questions

Data Stream Model [FM, AMS]

• Model• A large object x, modeled as a vector

• Could be a graph, matrix, set of points, etc.

• x = (x1, x2, …, xn) starts off as 0n

• Stream of m updates (j1, v1), …, (jm, vm)

• Update (j, v) causes change xj = xj + v

• v 2 {-M, -M+1, …, M}• Order and number of updates arbitrary

Application – IP session data

Source Destination Bytes Duration Protocol

18.6.7.1

10.6.2.3

11.1.0.6

12.3.1.5

…

19.7.3.2

12.3.4.8

11.6.8.2

14.7.0.1

…

40K

20K

58K

30K

…

28

18

22

32

…

http

ftp

http

http

…

AT & T collects 100+ GBs of NetFlow everyday

Application – IP Session Data

• AT & T needs to process massive stream of network data

• Traffic estimationWhat fraction of network IP addresses are active?Distinct elements computation

• Traffic analysis What are the 100 IP addresses with the most traffic? Frequent items computation

• Security/Denial of Service Are there any IP addresses witnessing a spike in traffic? Skewness computation

Algorithm Goals

• Space Complexity: Minimize memory used by the streaming algorithm• n, m, and M are large

• Pass Complexity: Minimize number of passes over the data • In many cases, only 1 pass is possible

• Computation: Minimize the time spent per stream update• Ideally constant time

Outline



3. Open questions

Vector Norm Estimation

• Problem – lp-norms

• Compute (j=1n |xj|p)1/p = |x|p

• p = 0 is number of non-zero entries of x• p = 1 is the Manhattan norm• p = 2 is the Euclidean norm• p = 3 is the skewness• p = 4 is the kurtosis• p = 1 is the maximum norm

- Estimating number of distinct elements

- Query planning + optimization

- Estimating number of distinct elements

- Query planning + optimization

- Measuring distances between distributions

- Embed other metrics into it (EMD, edit distance, etc.)

- Measuring distances between distributions

- Embed other metrics into it (EMD, edit distance, etc.)

- Geometric problems: clustering, nearest neighbor, etc.

- Databases: self-join size

- Geometric problems: clustering, nearest neighbor, etc.

- Databases: self-join size

- Testing distribution skewness. Easier than l1 norm

- Denial of Service attacks

- Ely Porat: “I know that Google is interested in compressed sensing with

lp guarantees for p > 2”

- Testing distribution skewness. Easier than l1 norm

- Denial of Service attacks

- Ely Porat: “I know that Google is interested in compressed sensing with

lp guarantees for p > 2”

-Long-Term Capital Risk Management hedge fund bailed out in late 90s because it underestimated kurtosis

- Use high accuracy for estimating |x|4

-Long-Term Capital Risk Management hedge fund bailed out in late 90s because it underestimated kurtosis

- Use high accuracy for estimating |x|4

Finding most frequent itemsFinding most frequent items

Other Applications of lp-Norms• lp for p 2 (0,1)

– Entropy estimation [HNO]

– Entropy = j qj log(1/qj), where qj = |xj|/|x|1– Estimate |x|p for p 2 (0,1)

• lp for p 2 [1, 1)

– Regression: minx |Ax-b|p– bi = Ai x + Noisei

– p = 1 is used to ignore outliers!

– p = 1 is used to find outliers!

– General p allows tuning

• private norm estimation [FIMNSW, IW, MM, W]

Matrix Norms

• Operator norms of n x d matrix A• Compute |A|p = maxx |Ax|p/|x|p• p = 1 is maximum l1-norm of a column• p = 2 is the spectral norm• p = 1 is maximum l1-norm of a row

• Entrywise norms• Compute |A|p = (i,j |Aij|p )1/p

• p = 2 is the Frobenius norm, also denoted |A|F

• Schatten norms– p = 1 is the nuclear norm

Numerical-linear algebra:- Approximate matrix product- Low-rank approximation

Optimization:- Minimize rank(X) subject to A(X)=B

Numerical-linear algebra:- Approximate matrix product- Low-rank approximation

Optimization:- Minimize rank(X) subject to A(X)=B

Mixed Norms

• Mixed norm of n x d matrix A• Compute lp(lq(A)) = (i=1

n |Ai|qp)1/p

• Sum-norm • lp(X(A)) = (i=1

n |Ai|Xp)1/p

- lp(l0(A)) useful for multigraphs [CM]

- lp(l2(A)) is used in k-median, k-means, and generalizations

- lp(l0(A)) useful for multigraphs [CM]

- lp(l2(A)) is used in k-median, k-means, and generalizations

- Earthmover distance [ABIW]

- l1-regression [SW]

- Earthmover distance [ABIW]

- l1-regression [SW]

Outline



3. Open questions

Initial Observations

Any exact computation• of a vector norm requires (n) space• of a matrix norm requires (nd) space

How do we cope?

Any deterministic computation • of a vector norm requires (n) space• of a matrix norm requires (nd) space

Output estimate Φ with |x|p · Φ · (1+ε)|x|p

Allow randomness and a small probability δ of error

How do we cope?


- Use O*(f) to denote f¢poly(log(n/δ)/ε)

- Assume n, m, M are polynomially related

Rough bounds:

Space Update time

lp, p 2 [0,2] O*(1) O*(1)

lp, p > 2 £*(n1-2/p) O*(1)

Algorithms are 1-pass. Lower bounds are for O*(1)-pass algorithms

[I]

[IW, SS,

BJKS]


Refined bounds for δ = 1/100:• p = 0: O(ε-2 log(n) (log 1/ε + loglog(n)) space, O(1) time (ε-2 log(n)) space [KNW]

• p 2 (0,2): O(ε-2 log(n)) space, O(log2(1/ε) log log(1/ε)) time (ε-2 log(n)) space [KNPW]

• p = 2: O(ε-2 log(n)) space, O(1) time(ε-2 log(n)) space [AMS, KNW, TZ]

• p > 2: O(ε-2 n1-2/p log2 n / min(log n, ε4/p-2)), O(log n) time (n1-2/p log n + ε-2 + n1-2/p ε-2/p) space [G, JW, BJKS]

For general δ, bounds in space get multiplied by log 1/δ [JW]

For general δ, bounds in space get multiplied by log 1/δ [JW]

Mixed Norms [CM, JW, AKO, BIKW, MW]

p

q0

1

2

1 2

1

1

n1-1/p

d1-2/q

n1-2/p d1-2/q

easy

Complexity of estimating lp(lq(A)) for n x d matrix A

n1-q/p

Matrix Norms

Operator norms• |A|1 in £*(d) space• |A|2 in O*(d2) 1-pass• |A|1 in £*(n) space

Entrywise norms• Space same as for vectors, e.g., |A|F in O*(1) space

Schatten norms• |A|pp = (i=1

n ¾ip )1/p doable in £*(d) space if n = d and A is

Laplacian of a graph and no negative values occur in the stream [KL]

Outline



3. Open questions


• Can estimate lp-norm for every p ¸ 0 with the same data structure (with different parameters)! [IW]

• Optimal in space and time up to O*(1) factors

• More generally: obtain entire histogram of the values

Histogramming

• Let Si = {j such that (1+ε)i · |xj| < (1+ε)i+1}

• The |Si| summarize the coordinate values of x– Small histogram: only O(log(n)/ε) different i – Many, many applications

• |x|pp = i |Si|¢(1+ε)ip

• Find a data structure for estimating the |Si|

Three Ideas

1. Sign vector ¾ 2 {-1,1}n

• For any fixed x, |<¾, x>| ¼ |x|2

2. Bucketing• Given r buckets b1, …, br, randomly hash the

coordinates of x into each bucket• Let x(bk) be the restriction of x to bucket k• E[|x(bk)|22] = |x|22/r

3. Subsampling• For j = 1, 2, …, log n

Randomly sample a set Tj of 2j coordinates of x

Let x(Tj) be the restriction of x to coordinates in Tj

The Data Structure

For j = 1, …, log n1. Choose a random set Tj of 2j coordinates of x

2. Randomly hash the coordinates of x(Tj) into r buckets

3. For each bucket bk, maintain < ¾j, x(Tj)(bk) >, where ¾j 2 {-1, 1}n

That’s all folks!Space ¼ r

Time ¼ 1

Space ¼ r

Time ¼ 1

Why it Works

• Suppose |Si| (1+ε)ip ¸ ε2|x|pp/log n

If not, then

• Consider j so that 2j |Si|/n = 1

• |x(Tj)|pp ¼ 2j |x|pp / n

• If k 2 Si Å Tj , then |xk|p ¸ ε2 |x(Tj)|pp / log n

or |xk| ¸ ε2/p |x(Tj)|p / log1/p n

For p · 2,

|xk| ¸ ε2/p |x(Tj)|2 / log1/p n

For p > 2,

|xk| ¸ ε2/p |x(Tj)|2 / (n1/2-1/p log1/p n)

For p · 2,

|xk| ¸ ε2/p |x(Tj)|2 / log1/p n

For p > 2,

|xk| ¸ ε2/p |x(Tj)|2 / (n1/2-1/p log1/p n)

Wrapping Up

• For each Si, look at the appropriate level j of sub-sampling to find Si Å Tj

• E[|Si Å Tj|] = |Si| 2j/n

• Scale by n/2j to estimate |Si|

• Output i |Si|¢(1+ε)ip

An Aside

• We obtain samples from each Si for which |Si|¢(1+ε)ip ¸ ε2|x|pp/log n

Sampling algorithm1. Choose Si with probability |Si|¢(1+ε)ip / |x|pp

2. Output a sample from Si

• Chooses a k 2 [n] with probability ¼ |xk|p/|x|pp

– almost – known as lp-sampling [MW]

– useful in sublinear-time algorithms for minimum enclosing ball and classification [CHW]

Mixed Norms [JW]

• lpp(lq(A)) = j ( k |Ajk|q )p/q

• Si = {j such that (1+ε)i · k |Ajk|q < (1+ε)i+1}

Algorithm

1. lq-sample from A, treated as a vector

2. Use row identities of samples to estimate |Si|

Matrix Norms

• Spectral norm of n x d matrix A– |A|2 = maxunit x |Ax|2

• Compute S¢A, where S is an O*(d) x n matrix of random signs– |SAx|2 ¼ |Ax|2 for all x

• Output maxunit x |SAx|2

• Can do faster [AC]

Outline



3. Open questions

1-Round Communication Complexity

Alice Bob

x

• Alice sends a single message M(x) to Bob

• Bob outputs a function of M(x), y

• Bob’s output should equal f(x,y) with constant probability (over randomness of the protocol)

• Communication cost CC(f) is |M(x)|, maximized over x and random bits

What is f(x,y)? y

Reduction to Streaming

x y

Stream s(x) Stream s(y)

Streaming algorithm A

Streaming algorithm A

State of A

If you can solve f(x,y) from A(s(x)±s(y)), then space of A is at least CC(f)

S

Canonical Indexing Problem

x 2 {0,1}n i 2 {1, 2, …, n}

What is xi?

CC(Indexing) = (n)

(1/ε2) Bound

x 2 {- ε, ε}1/ε2 y = ei

|x-y|pp = (1/ε2-1)εp + (1-xi)p

Solves Indexing for p ¸ 2, so (1/ε2) bound

For p < 2, see Amit’s talk

What is |x-y|p?

(n1-2/p) Bound for p ¸ 2 [SS, BJKS]

What is |x-y|p?

x 2 {1, 2,…, n}n y 2 {1, 2,…, n}n

Promise: either all i satisfy xi – yi 2 {0,1} or there is a j for which xj – yj ¸ n1/p

Communication is (n1-2/p)

Proof bounds information that message reveals about input

For every block of n2/p coordinates, reveal 1 bit of information

Outline



3. Open questions

lp-Norms in Other Models

- sliding window, time-decayed, out-of-order

- read/write streams, annotations

- distributed functional monitoring

- compressed sensing

A Universal Data Structure

For j = 1, …, log n

1. Choose a random set Tj of 2j coordinates of x

2. Randomly hash the coordinates of x(Tj) into r buckets

3. For each bucket bk, maintain < ¾j, x(Tj)(bk) >, where ¾j 2 {-1, 1}n

In what sense is this data structure optimal for all functions of the form i f(xi)?

Good progress on this [BO], but still open

Other Norms• Earthmover distance (EMD)

– Given n green and n blue points in O(1) dimensions

EMD( , ) = 6 + 3√2

– Output (1+ε)-approximation to min-cost perfect matching

– O(n) space upper bound, (log n) lower bound

– Some progress [ABIW]

The Future

We’ve made progress Improving ε and log n factors important in practice

Future themes?

- more complicated norms and problems from optimization

- emphasis on sketching for improving time

Bibliography• [ABIW] Andoni, DoBa, Indyk, W, FOCS, 2009.• [AC] Ailon, Chazelle, STOC, 2006.• [AMS] Alon, Matias, Szegedy, STOC, 1996. • [AKO] Andoni, Kraughtgamer, Onak, preprint.• [BJKS] Bar-Yossef et al., FOCS, 2002.• [BO] Braverman, Ostrovsky, STOC, 2010.• [CHW] Clarkson, Hazan, W. FOCS, 2010.• [CM] Cormode, Muthukrishnan, PODS, 2005. • [FIMNSW] Feigenbaum et al, ICALP, 2001• [FM] Flajolet, Martin, FOCS, 1983.• [G] Ganguly, preprint.• [HNO] Harvey, Nelson, Onak, FOCS, 2008.• [I] Indyk, FOCS, 2000.• [IW] Indyk, W, STOC, 2005.• [JW] Jayram, W, FOCS, 2009.• [JW] Jayram, W, SODA, 2011.• [KNW] Kane, Nelson, W, SODA, 2010.• [KNPW] Kane, Nelson, Porat, W, STOC, 2011.• [MM] Madeira, Muthukrishnan, FSTTCS 2009.• [MW] Monemizadeh, W, SODA, 2010.• [SS] Saks, Sun, STOC, 2002.• [SW] Sohler, W, STOC, 2011.• [W] W, STOC, 2011.

Valutazione delle Norme con le Applicazioni (Norm Estimation with Applications: A Survey)

Documents

Transcript of Valutazione delle Norme con le Applicazioni (Norm Estimation with Applications: A Survey)