Download - Summer School on Hashing’14 Locality Sensitive Hashing

Summer School on Hashing’14

Locality Sensitive Hashing

Alex Andoni(Microsoft Research)

Nearest Neighbor Search (NNS)

• Preprocess: a set of points

• Query: given a query point , report a point with the smallest distance to

𝑞𝑝

Approximate NNS• r-near neighbor: given a new

point , report a point s.t.

• Randomized: a point returned with 90% probability

c-approximate

𝑐𝑟if there exists apoint at distance

q

r p

cr

Heuristic for Exact NNS• r-near neighbor: given a new

point , report a set with• all points s.t. (each with 90%

probability)

• may contain some approximate neighbors s.t.

• Can filter out bad answers

q

r p

cr

c-approximate

Locality-Sensitive Hashing• Random hash function on

s.t. for any points :• Close when is “high” • Far when is “small”

• Use several hashtables

q

p

𝑑𝑖𝑠𝑡 (𝑝 ,𝑞)

Pr [𝑔 (𝑝 )=𝑔 (𝑞)]

𝑟 𝑐𝑟

1

𝑃 1𝑃 2

: , where

[Indyk-Motwani’98]

q

“not-so-small”𝑃 1=¿𝑃 2=¿

𝜌=log 1/𝑃1

log 1/𝑃2

Locality sensitive hash functions

6

• Hash function is usually a concatenation of “primitive” functions:

• Example: Hamming space • , i.e., choose bit for a random • chooses bits at random•

Full algorithm

7

• Data structure is just hash tables:• Each hash table uses a fresh random function • Hash all dataset points into the table

• Query:• Check for collisions in each of the hash tables• until we encounter a point within distance

• Guarantees:• Space: , plus space to store points• Query time: (in expectation)• 50% probability of success.

Analysis of LSH Scheme

8

• Choice of parameters ?• hash tables with

• Pr[collision of far pair] = • Pr[collision of close pair] = • Hence “repetitions” (tables) suffice!

distance

colli

sion

prob

abili

ty

𝑟 𝑐𝑟

𝑃1

𝑃2

𝑃12

𝑃22

set s.t.=

=

𝑘=1𝑘=2

Analysis: Correctness

9

• Let be an -near neighbor• If does not exists, algorithm can output

anything• Algorithm fails when:

• near neighbor is not in the searched buckets • Probability of failure:

• Probability do not collide in a hash table: • Probability they do not collide in hash tables at

most

Analysis: Runtime

10

• Runtime dominated by:• Hash function evaluation: time• Distance computations to points in buckets

• Distance computations:• Care only about far points, at distance • In one hash table, we have

• Probability a far point collides is at most • Expected number of far points in a bucket:

• Over hash tables, expected number of far points is

• Total: in expectation

NNS for Euclidean space

11

• Hash function is a concatenation of “primitive” functions:

• LSH function :• pick a random line , and quantize• project point into

• is a random Gaussian vector• random in • is a parameter (e.g., 4)

[Datar-Immorlica-Indyk-Mirrokni’04]

𝑝ℓ

• Regular grid → grid of balls• p can hit empty space, so take

more such grids until p is in a ball

• Need (too) many grids of balls

• Start by projecting in dimension t

• Analysis gives• Choice of reduced dimension

t?• Tradeoff between

• # hash tables, n, and• Time to hash, tO(t)

• Total query time: dn1/c2+o(1)

Optimal* LSH

2D

p

pRt

[A-Indyk’06]

x

Proof idea• Claim: , i.e.,

• P(r)=probability of collision when ||p-q||=r

• Intuitive proof:• Projection approx preserves distances

[JL] • P(r) = intersection / union• P(r)≈random point u beyond the

dashed line• Fact (high dimensions): the x-

coordinate of u has a nearly Gaussian distribution

→ P(r) exp(-A·r2)

pqr

q

P(r)

u

p

LSH Zoo

14

• Hamming distance• : pick a random coordinate(s)

[IM98]• Manhattan distance:

• : random grid [AI’06]• Jaccard distance between sets:

• : pick a random permutation on the universe

min-wise hashing [Bro’97,BGMZ’97]• Angle: sim-hash [Cha’02,…]

To be or not to be

To Simons or not to Simons

…21102…

be toornot

Sim

ons

…01122…

be toornot

Sim

ons

…11101… …01111…

{be,not,or,to} {not,or,to, Simons}

1 1

not

not

=be,to,Simons,or,not

be to

LSH in the wild

15

• If want exact NNS, what is ?• Can choose any parameters • Correct as long as • Performance:

• trade-off between # tables and false positives• will depend on dataset “quality”• Can tune to optimize for given dataset

𝐿

𝑘

safety not guaranteed

fewer false positives

fewer tables

Time-Space Trade-offs

[AI’06]

[KOR’98, IM’98, Pan’06]

[Ind’01, Pan’06]

Space Time Comment Reference

[DIIM’04, AI’06]

[IM’98]

querytime

space

medium medium

lowhigh

highlow

ω(1) memory lookups [AIP’06]

ω(1) memory lookups [PTW’08, PTW’10]

[MNP’06, OWZ’11]

1 mem lookup

LSH is tight…

leave the rest to cell-probe lower bounds?

Data-dependent Hashing!• NNS in Hamming space () with query

time, space and preprocessing for

• optimal LSH: of [IM’98]• NNS in Euclidean space () with:

• optimal LSH: of [AI’06]

18

+𝑂( 1𝑐3 /2 )+𝑜(1)

+𝑂( 1𝑐3 )+𝑜(1)

[A-Indyk-Nguyen-Razenshteyn’14]

A look at LSH lower bounds• LSH lower bounds in Hamming space

• Fourier analytic• [Motwani-Naor-Panigrahy’06]

• distribution over hash functions • Far pair: random, distance • Close pair: random at distance = • Get

19

[O’Donnell-Wu-Zhou’11]

𝜖 𝑑𝜖 𝑑𝑐

𝜌 ≥1/𝑐

Why not NNS lower bound?• Suppose we try to generalize [OWZ’11]

to NNS• Pick random • All the “far point” are concentrated in a small

ball of radius • Easy to see at preprocessing: actual near

neighbor close to the center of the minimum enclosing ball

20

Intuition• Data dependent LSH:

• Hashing that depends on the entire given dataset!

• Two components:• “Nice” geometric configuration with • Reduction from general to this “nice”

geometric configuration

21

Nice Configuration: “sparsity”• All points are on a sphere of radius

• Random points are at distance

• Lemma 1: • “Proof”:

• Obtained via “cap carving”• Similar to “ball carving” [KMS’98, AI’06]

• Lemma 1’: for radius = 22

Reduction: into spherical LSH• Idea: apply a few rounds of “regular” LSH

• Ball carving [AI’06]• as if target approx. is 5c => query time

• Intuitively:• far points unlikely to collide• partitions the data into buckets of small

diameter • find the minimum enclosing ball• finally apply spherical LSH on this ball!

23

Two-level algorithm• hash tables, each with:

• hash function • ’s are “ball carving LSH” (data independent)• ’s are “spherical LSH” (data dependent)

• Final is an “average” of from levels 1 and 2h1 (𝑝 ) ,…,h𝑙(𝑝)

𝑠1 (𝑝 ) ,…, 𝑠𝑚(𝑝)

Details• Inside a bucket, need to ensure “sparse”

case• 1) drop all “far pairs”• 2) find minimum enclosing ball (MEB)• 3) partition by “sparsity” (distance from

center)

25

1) Far points• In level 1 bucket:

• Set parameters as if looking for approximation

• Probability a pair survives:• At distance : • At distance : • At distance :

• Expected number of collisions for distance is constant

• Throw out all pairs that violate this

26

2) Minimum Enclosing Ball• Use Jung theorem:

• A pointset with diameter has a MEB of radius

27

3) Partition by “sparsity”• Inside a bucket, points are at distance at most

• “Sparse” LSH does not work in this case• Need to partition by the distance to center• Partition into spherical shells of width 1

28

Practice of NNS

29

• Data-dependent partitions…• Practice:

• Trees: kd-trees, quad-trees, ball-trees, rp-trees, PCA-trees, sp-trees…

• often no guarantees

• Theory?• assuming more about data: PCA-like

algorithms “work” [Abdullah-A-Kannan-Krauthgamer’14]

ℓ

Finale• LSH: geometric hashing• Data-dependent hashing can do better!

• Better upper bound? • Multi-level improves a bit, but not too much• for ?

• Best partitions in theory and practice?• LSH for other metrics:

• Edit distance, Earth-Mover Distance, etc• Lower bounds (cell probe?)

30

Open question:• Practical variant of ball-carving hashing?• Design space-partition of that is

• efficient: point location in poly(t) time• qualitative: regions are “sphere-like”

[Prob. needle of length 1 is not cut]

[Prob needle of length c is not cut]≥

1/c2

𝑝