CS 361A1 CS 361A (Advanced Data Structures and Algorithms) Lecture 19 (Dec 5, 2005) Nearest...

46
CS 361A 1 CS 361A CS 361A (Advanced Data Structures and (Advanced Data Structures and Algorithms) Algorithms) Lecture 19 (Dec 5, 2005) Nearest Neighbors: Dimensionality Reduction and Locality- Sensitive Hashing Rajeev Motwani
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    223
  • download

    0

Transcript of CS 361A1 CS 361A (Advanced Data Structures and Algorithms) Lecture 19 (Dec 5, 2005) Nearest...

CS 361A 1

CS 361A CS 361A (Advanced Data Structures and Algorithms)(Advanced Data Structures and Algorithms)

Lecture 19 (Dec 5, 2005)

Nearest Neighbors: Dimensionality Reduction and

Locality-Sensitive Hashing

Rajeev Motwani

CS 361A 2

Metric SpaceMetric Space• Metric Space (M,D)

– For points p,q in M, D(p,q) is distance from p to q

– only reasonable model for high-dimensional geometric space

• Defining Properties– Reflexive: D(p,q) = 0 if and only if p=q

– Symmetric: D(p,q) = D(q,p)

– Triangle Inequality: D(p,q) is at most D(p,r)+D(r,q)

• Interesting Cases– M points in d-dimensional space

– D Hamming or Euclidean Lp-norms

CS 361A 3

High-Dimensional Near NeighborsHigh-Dimensional Near Neighbors • Nearest Neighbors Data Structure

– Given – N points P={p1, …, pN} in metric space (M,D)

– Queries – “Which point pP is closest to point q?”

– Complexity – Tradeoff preprocessing space with query time

• Applications – vector quantization

– multimedia databases

– data mining

– machine learning

– …

CS 361A 4

Known ResultsKnown ResultsQuery Time

Storage Technique Paper

dN dN Brute-Force

2d log N N2^d+1 Voronoi Diagram Dobkin-Lipton 76

Dd/2 log N Nd/2 Random Sampling Clarkson 88

d5 log N Nd Combination Meiser 93

logd-1 N N logd-1 N Parametric Search Agarwal-Matousek 92

• Some expressions are approximate• Bottom-line – exponential dependence on d

CS 361A 5

Approximate Nearest NeighborApproximate Nearest Neighbor• Exact Algorithms

– Benchmark – brute-force needs space O(N), query time O(N)

– Known Results – exponential dependence on dimension

– Theory/Practice – no better than brute-force search

• Approximate Near-Neighbors

– Given – N points P={p1, …, pN} in metric space (M,D)

– Given – error parameter >0

– Goal – for query q and nearest-neighbor p, return r such that

• Justification– Mapping objects to metric space is heuristic anyway

– Get tremendous performance improvement

p)ε)D(q,(1r)D(q,

CS 361A 6

Results for Approximate NNResults for Approximate NN

Query Time Storage Technique Paper

dd e-d dN Balanced Trees Arya et al 94

d2 polylog(N,d)

N

N2d

dN polylog(N,d)Random Projection Kleinberg 97

log3 N N1/^2Search Trees + Dimension Reduction

Indyk-Motwani 98

dN1/log2N N1+1/log NLocality-Sensitive Hashing

Indyk-Motwani 98

External Memory External MemoryLocality-Sensitive Hashing

Gionis-Indyk-Motwani 99

• Will show main ideas of last 3 results • Some expressions are approximate

CS 361A 7

Approximate r-Near NeighborsApproximate r-Near Neighbors• Given – N points P={p1,…,pN} in metric space (M,D)

• Given – error parameter >0, distance threshold r>0

• Query – If no point p with D(q,p)<r, return FAILURE

– Else, return any p’ with D(q,p’)< (1+)r

• Application – Solving Approximate Nearest Neighbor

– Assume maximum distance is R

– Run in parallel for

– Time/space – O(log R) overhead

– [Indyk-Motwani] – reduce to O(polylog n) overhead

R,,ε)(1,ε)(1 ε),(1 1, r 32

CS 361A 8

Hamming MetricHamming Metric• Hamming Space

– Points in M: bit-vectors {0,1}d (can generalize to {0,1,2,…,q}d)

– Hamming Distance: D(p,q) = # of positions where p,q differ

• Remarks– Simplest high-dimensional setting

– Still useful in practice

– In theory, as hard (or easy) as Euclidean space

– Trivial in low dimensions

• Example– Hypercube in d=3 dimensions

– {000, 001, 010, 011, 100, 101, 110, 111}

CS 361A 9

Dimensionality ReductionDimensionality Reduction• Overall Idea

– Map from high to low dimensions

– Preserve distances approximately

– Solve Nearest Neighbors in new space

– Performance improvement at cost of approximation error

• Mapping?– Hash function family H = {H1, …, Hm}

– Each Hi: {0,1}d {0,1}t with t<<d

– Pick HR from H uniformly at randomuniformly at random

– Map each point in each point in P using same using same HR

– Solve NN problem on HR(P) = {HR(p1), …, HR(pN)}

CS 361A 10

Reduction for Hamming SpacesReduction for Hamming SpacesTheorem: For any r and small >0, there is hash family H such that

for any p,q and random HR H

with probability >1-, provided for some constant C,

ε/10)t(c(q))H(p),D(Hε)r(1q)D(p,

ε/20)t(c(q))H(p),D(Hrq)D(p,

RR

RR

2/δ log Ct

c ab

ε)r(1

r

ca

b

ε/10)t(c

ε/20)t(c

CS 361A 11

RemarksRemarks• For fixed threshold r, can distinguish between

– Near D(p,q) < r

– Far D(p,q) > (1+ε)r

• For N points, need

• Yet, can reduce to O(log N)-dimensional space, while approximately preserving distances

• Works even if points not known in advance

2Nδ

CS 361A 12

Hash FamilyHash Family• Projection Function

– Let S be ordered, multiset of s indexes from {1,…,d}

– p|S:{0,1}d {0,1}s projects p into s-dimensional subspace

– Example• d=5, p=01100• s=3, S={2,2,4} p|S = 110

• Choosing hash function HR in H– Repeat for i=1,…,t

• Pick Si randomly (with replacement) from {1…d}• Pick random hash function fi:{0,1}s {0,1}• hi(p)=fi(p|Si)

– HR(p) = (h1(p), h2(p),…,ht(p))

• Remark – note similarity to Bloom Filters

CS 361A 13

Illustration of HashingIllustration of Hashing

0 1 1 0 0 0 1 0 1 0

1 0 0 1 0 0 0 0

1 d . . . . .

1 . . . s 1 . . . s

. . . . .

p

p|S1 p|St

0 1 1 0

f1ft

h1(p) . . . ht(p)HR(p)

CS 361A 14

Analysis IAnalysis I

• Choose random index-set S

• Claim: For any p,q

• Why?– p,q differ in D(p,q) bit positions

– Need all s indexes of S to avoid these positions

– Sampling with replacement from {1, …,d}

s

d

q)D(p,1S]qSpPr[

CS 361A 15

Analysis IIAnalysis II

• Choose s=d/r

• Since 1-x<e-x for |x|<1, we obtain

• Thus

q)/rD(p,es

d

q)D(p,1S]qSpPr[

ε/3eS]qSpPr[ε)r(1q)D(p,

eS]qSpPr[rq)D(p,

1

1

CS 361A 16

Analysis IIIAnalysis III

• Recall hi(p)=fi(p|Si)

• Thus

• Choosing c= ½ (1-e-1)

])/2SqSpPr[(1

0]SqSpPr[

1/2])SqSpPr[(1(q)]h(p)Pr[h

ii

ii

iiii

ε/6c1/2ε/3)e(1(q)]h(p)Pr[hε)r(1q)D(p,

c1/2)e(1(q)]h(p)Pr[hrq)D(p,

1-ii

1-ii

CS 361A 17

Analysis IVAnalysis IV• Recall HR(p)=(h1(p),h2(p),…,ht(p))

• D(HR(p),HR(q)) = number of i’s where hi(p), hi(q) differ

• By linearity of expectations

• Theorem almost proved

• For high probability bound, need Chernoff Bound

(q)]h (p)Pr[ht

(q)]h (p)Pr[h (q))]H(p),E[D(H

ii

i iiRR

ε/6)tc((q))]H(p),E[D(Hε)r(1q)D(p,

ct(q))]H(p),E[D(Hrq)D(p,

RR

RR

CS 361A 18

Chernoff BoundChernoff Bound

• Consider Bernoulli random variables X1,X2, …, Xn – Values are 0-1

– Pr[Xi=1] = x and Pr[Xi=0] = 1-x

• Define X = X1+X2+…+Xn with E[X]=nx

• Theorem: For independent X1,…, Xn, for any 0<<1,

nx/3β2eβnxnx-XPr2

2nxP

X nx

CS 361A 19

Analysis VAnalysis V• Define

– Xi=0 if hi(p)=hi(q), and 1 otherwise

– n=t

– Then X = X1+X2+…+Xt = D(HR(p),HR(q))

• Case 1 [D(p,q)<r x=c]

• Case 2 [D(p,q)>(1+ε)r x=c+ε/6]

• Observe – sloppy bounding of constants in Case 2

tc/3/20)( 2

2eεtc/20]txXPr[ε/20)t](cPr[X

tc/3/20)( 2

2eεtc/20]txXPr[ε/10)t](cPr[X

CS 361A 20

Putting it all togetherPutting it all together• Recall

• Thus, error probability

• Choosing C=1200/c

• Theorem is proved!!

2/δ log Ct

δ

2log (cC/1200)

tc/320)/( 2e2e2

δ2e2e δ

2log

δ

2og(cC/1200)l

CS 361A 21

Algorithm IAlgorithm I• Set error probability

• Select hash HR and map points p HR(p)

• Processing query q– Compute HR(q)

– Find nearest neighbor HR(p) for HR(q)

– If then return p, else FAILURE

• Remarks– Brute-force for finding HR(p) implies query time

– Need another approach for lower dimensions

N) logO(εt1/poly(N)δ 2

ε)r(1q)D(p,

N) log NO(ε -2

CS 361A 22

Algorithm IIAlgorithm II• Fact – Exact nearest neighbors in {0,1}t requires

– Space O(2t)

– Query time O(t)

• How?– Precompute/store answers to all queries

– Number of possible queries is 2t

• Since

• Theorem – In Hamming space {0,1}d, can solve approximate nearest neighbor with:– Space

– Query time

)O(1/ε2

NN) log O(ε 2

N) logO(εt 2

CS 361A 23

Different MetricDifferent Metric• Many applications have “sparse” points

– Many dimensions but few 1’s

– Example – pointsdocuments, dimensionswords

– Better to view as “sets”

• Previous approach would require large s

• For sets A,B, define

• Observe– A=B sim(A,B)=1

– A,B disjoint sim(A,B)=0

• Question – Handling D(A,B)=1-sim(A,B) ?

BA

BAB)sim(A,

CS 361A 24

Min-HashMin-Hash• Random permutations p1,…,pt of universe (dimensions)

• Define mapping hj(A)=mina in A pj(a)

• Fact: Pr[hj(A)= hj(B)] = sim(A,B)

• Proof? – already seen!!

• Overall hash-function

HR(A) = (h1(A), h2(A),…,ht(A))

CS 361A 25

Min-Hash AnalysisMin-Hash Analysis• Select

• Hamming Distance– D(HR(A),HR(B)) number of j’s such that

• Theorem For any A,B,

• Proof? – Exercise (apply Chernoff Bound)

• Obtain – ANN algorithm similar to earlier result

1/δ log Ct

(B)h(A)h jj

δεtB))tsim(A,-(1 - H(B))D(H(A),Pr

CS 361A 26

GeneralizationGeneralization• Goal

– abstract technique used for Hamming space

– enable application to other metric spaces

– handle Dynamic ANN

• Dynamic Approximate r-Near Neighbors– Fix – threshold r

– Query – if any point within distance r of q, return any point within distance

– Allow insertions/deletions of points in P

• Recall – earlier method required preprocessing all possible queries in hash-range-space…

ε)r(1

CS 361A 27

Locality-Sensitive HashingLocality-Sensitive Hashing• Fix – metric space (M,D), threshold r, error

• Choose – probability parameters Q1 > Q2>0

Definition – Hash family H={h:MS} for (M,D) is called . -sensitive, if for random h and for any p,q in M

• Intuition– p,q are near likely to collide

– p,q are far unlikely to collide

)Q,Qε,(r, 21

2

1

Qh(p)h(q)Prε)r(1q)D(p,

Qh(p)h(q)Prrq)D(p,

CS 361A 28

ExamplesExamples• Hamming Space M={0,1}d

– point p=b1…bd

– H = {hi(b1…bd)=bi, for i=1…d}

– sampling one bit at random

– Pr[hi(q)=hi(p)] = 1 – D(p,q)/d

• Set Similarity D(A,B) = 1 – sim(A,B)

– Recall

– H =

– Pr[h(A)=h(B)] = 1 – D(A,B)

BA

BAB)sim(A,

π(A)}min(A)h:{h Aaππ

CS 361A 29

Multi-Index HashingMulti-Index Hashing• Overall Idea

– Fix LSH family H

– Boost Q1, Q2 gap by defining G = H k

– Using G, each point hashes into l buckets

• Intuition– r-near neighbors likely to collide

– few non-near pairs in any bucket

• Define – G = { g | g(p) = h1(p)h2(p)…hk(p) }

– Hamming metric sample k random bits

CS 361A 30

Example (Example (ll=4)=4)

p

q

r

g1

g2

g3

g4

h1 hk……

CS 361A 31

Overall SchemeOverall Scheme• Preprocessing

– Prepare hash table for range of G

– Select l hash functions g1, g2, …, gl

• Insert(p) – add p to buckets g1(p), g2(p), …, gl(p)

• Delete(p) – remove p from buckets g1(p), g2(p), …, gl(p)

• Query(q)

– Check buckets g1(q), g2(q), …, gl(q)

– Report nearest of (say) first 3l points

• Complexity– Assume – computing D(p,q) needs O(d) time

– Assume – storing p needs O(d) space

– Insert/Delete/Query Time – O(dlk)

– Preprocessing/Storage – O(dN+Nlk)

CS 361A 32

Collision Probability vs. DistanceCollision Probability vs. Distance

r r

Q1

Q2

1

0

r rll )Q(11PQP kk,

collcoll

CS 361A 33

Multi-Index versus ErrorMulti-Index versus Error

• Set l=Nz where

Theorem For l=Nz, any query returns r-near neighbor correctly with probability at least 1/6.

• Consequently (ignoring k=O(log N) factors)– Time O(dNz)

– Space O(N1+z)

– Hamming Metric

– Boost Probability – use several parallel hash-tables

2

1

1/Q log

1/Q logz

εl

1z

CS 361A 34

AnalysisAnalysis• Define (for fixed query q)

– p* – any point with D(q,p*) < r

– FAR(q) – all p with D(q,p) > (1+ )r

– BUCKET(q,j) – all p with gj(p) = gj(q)

– Event Esize:

(query cost bounded by O(dl))

– Event ENN: gj(p*) = gj(q) for some j

(nearest point in l buckets is r-near neighbor)

• Analysis

– Show: Pr[Esize] = x > 2/3 and Pr[ENN] = y > 1/2

– Thus: Pr[not(Esize & ENN)] < (1-x) + (1-y) < 5/6

ε

ll

3j)BUCKET(q,FAR(q)1j

CS 361A 35

Analysis – Bad CollisionsAnalysis – Bad Collisions• Choose

• Fact

• Clearly

• Markov Inequality – Pr[X>r.E[X]]<1/r, for X>0

• Lemma 1

N

1Qj)BUCKET(q,pPrFAR(q)p k

2

Nlogk21/Q

l

l

1jj)BUCKET(q,FAR(q)E

1N

1Nj)BUCKET(q,FAR(q)E

3

13j)BUCKET(q,FAR(q)Pr EPr

1jsize

ll

CS 361A 36

Analysis – Good CollisionsAnalysis – Good Collisions• Observe

• Since l=nz

• Lemma 2 Pr[ENN] >1/2

zlog1/Q

log1/Q

Nlog

1k1jj

NN

QQ(q)g(p*)gPr

2

1

21/Q

e

11

NN11

(q)g(p*)gPr11EPrz

z

jjNN

l

CS 361A 37

Euclidean NormsEuclidean Norms

• Recall

– x=(x1, x2, …, xd) and y=(y1, y2, …, yd) in Rd

– L1-norm

– Lp-norm (for p>1)

d

1i ii1yxyx

p d

1i

piip

yxyx

CS 361A 38

Extension to LExtension to L11-Norm-Norm

• Round coordinates to {1,…M}

• Embed L1-{1,…,M}d into Hamming-{0,1}dM

• Unary Mapping

• Apply algorithm for Hamming Spaces– Error due to rounding of 1/M – Space-Time Overhead due to mapping of d dM

dd11

dd11

yMyyMyd1

xMxxMxd1

00110011)y,,(y

00110011)x,,(x

Ω(1/ε)M

CS 361A 39

Extension to LExtension to L22-Norm-Norm

• Observe

– Little difference in L1-norm and L2-norm for high d

– Additional error is small

• More generally – Lp, for 1 p 2

– [Figiel et al 1977, Johnson-Schechtman 1982]

– Can embed Lp into L1

– Dimensions d O(d)

– Distances preserved within factor (1+a)

– Key Idea – random rotation of space

CS 361A 40

Improved BoundsImproved Bounds

• [Indyk-Motwani 1998]

– For any Lp-norm

– Query Time – O(log3 N)

– Space –

• Problem – impractical

• Today – only a high-level sketch

)O(1/ε2

N

CS 361A 41

Better ReductionBetter Reduction

• Recall– Reduced Approximate Nearest Neighbors to

Approximate r-Near Neighbors

– Space/Time Overhead – O(log R)

– R = max distance in metric space

• Ring-Cover Trees– Removed dependence on R

– Reduced overhead to O(polylog N)

CS 361A 42

Approximate r-Near NeighborsApproximate r-Near Neighbors

• Idea– Impose regular-grid on Rd

– Decompose into cubes of side length s

– Label cubes with points at distance <r

• Data Structure– Query q – determine cube containing q

– Cube labels – candidate r-near neighbors

• Goals– Small s lower error

– Fewer cubes smaller storage

CS 361A 43

p1

p2

p3

CS 361A 44

Grid AnalysisGrid Analysis• Assume r=1

• Choose

• Cube Diameter =

• Number of cubes =

Theorem – For any Lp-norm, can solve Approx r-Near Neighbor using– Space –

– Time – O(d)

d

εs

εsd 2 dd ) O(ε) /εd(Vol

)O(dNε d

CS 361A 45

Dimensionality ReductionDimensionality Reduction

[Johnson-Lindenstraus 84, Frankl-Maehara 88] For , can map points in P into subspace of dimension while preserving all inter-point distances to within a factor

• Proof idea – project onto random lines

• Result for NN– Space –

– Time – O(polylog N)

[1,2]plogN)O(ε 2

ε1

)O(dN21/ε

CS 361A 46

ReferencesReferences• Approximate Nearest Neighbors: Towards

Removing the Curse of Dimensionality P. Indyk and R. Motwani STOC 1998

• Similarity Search in High Dimensions via Hashing A. Gionis, P. Indyk, and R. Motwani VLDB 1999