Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing...

27
Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov 05, 2015 1/9

Transcript of Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing...

Page 1: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Neighbor-Sensitive Hashing

Yongjoo Park Michael Cafarella Barzan Mozafari

University of Michigan, Ann Arbor

Nov 05, 2015

1 / 9

Page 2: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

k-Nearest Neighbors Search (kNN)

Database

q = {120, 65, . . . , 212} v1 = {. . .} v2 = {. . .} v3 = {. . .}

v4 = {. . .} v5 = {. . .} v6 = {. . .}

q v1 v2 v3 v4 v5 v6

0distance from q

Exact kNN for high-dimensional vectors is Slow.

I (Kashyap KDD’11) took more than 1 min for 10M objects.

2 / 9

Page 3: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

k-Nearest Neighbors Search (kNN)

Database

q = {120, 65, . . . , 212}

v1 = {. . .} v2 = {. . .} v3 = {. . .}

v4 = {. . .} v5 = {. . .} v6 = {. . .}

q v1 v2 v3 v4 v5 v6

0distance from q

Exact kNN for high-dimensional vectors is Slow.

I (Kashyap KDD’11) took more than 1 min for 10M objects.

2 / 9

Page 4: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

k-Nearest Neighbors Search (kNN)

Database

q = {120, 65, . . . , 212} v1 = {. . .} v2 = {. . .} v3 = {. . .}

v4 = {. . .} v5 = {. . .} v6 = {. . .}

q v1 v2 v3 v4 v5 v6

0distance from q

Exact kNN for high-dimensional vectors is Slow.

I (Kashyap KDD’11) took more than 1 min for 10M objects.

2 / 9

Page 5: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

k-Nearest Neighbors Search (kNN)

Database

q = {120, 65, . . . , 212} v1 = {. . .} v2 = {. . .} v3 = {. . .}

v4 = {. . .} v5 = {. . .} v6 = {. . .}

q v1 v2 v3 v4 v5 v6

0distance from q

Exact kNN for high-dimensional vectors is Slow.

I (Kashyap KDD’11) took more than 1 min for 10M objects.

2 / 9

Page 6: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

k-Nearest Neighbors Search (kNN)

Database

q = {120, 65, . . . , 212} v1 = {. . .} v2 = {. . .} v3 = {. . .}

v4 = {. . .} v5 = {. . .} v6 = {. . .}

q v1 v2 v3 v4 v5 v6

0distance from q

Exact kNN for high-dimensional vectors is Slow.

I (Kashyap KDD’11) took more than 1 min for 10M objects.

2 / 9

Page 7: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Hashing-based kNN

Goal:Make Identical

Regular Search

{201, 101, 43, 188, . . .}

{answer set}

Hash Function

Hamming Search

101001,010010,100011

Hash Function

{answer set}Hash Function

Hamming Search

101001,010010,100011

3 / 9

Page 8: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Hashing-based kNN

Goal:Make Identical

Regular Search

{201, 101, 43, 188, . . .}

{answer set}

Hash Function

Hamming Search

101001,010010,100011

Hash Function

{answer set}Hash Function

Hamming Search

101001,010010,100011

3 / 9

Page 9: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Hashing-based kNN

Goal:Make Identical

Regular Search

{201, 101, 43, 188, . . .}

{answer set}

Hash Function

Hamming Search

101001,010010,100011

Hash Function

{answer set}Hash Function

Hamming Search

101001,010010,100011

3 / 9

Page 10: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Hashing-based kNN

Goal:Make Identical

Regular Search

{201, 101, 43, 188, . . .}

{answer set}

Hash Function

Hamming Search

101001,010010,100011

Hash Function

{answer set}Hash Function

Hamming Search

101001,010010,100011

3 / 9

Page 11: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Previous Approaches

2004, Locality Sensitive Hashing (LSH), Datar et al.

2009, Spectral Hashing (SH), Weiss et al.

2011, Anchor Graph Hashing (AGH), Liu et al.

2012, Spherical Hashing (SpH), Heo at al.Kernelized Supervised Hashing (KSH), Liu et al.

2013, Compressed Hashing (CH), Lin et al.Complementary Projection Hashing (CPH), Jin et al.

2014, Data Sensitive Hashing (DSH), Jagadish et al.

q v1 v2 v3 v4 v5 v6 v7 v8

0distance from q

code:0000 1000 1100 1110 1111

h1 h2 h3 h4

LSH

Good idea for:

I Sorting all data items

Remember:

I We are interested in only k items.

4 / 9

Page 12: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Previous Approaches

2004, Locality Sensitive Hashing (LSH), Datar et al.

2009, Spectral Hashing (SH), Weiss et al.

2011, Anchor Graph Hashing (AGH), Liu et al.

2012, Spherical Hashing (SpH), Heo at al.Kernelized Supervised Hashing (KSH), Liu et al.

2013, Compressed Hashing (CH), Lin et al.Complementary Projection Hashing (CPH), Jin et al.

2014, Data Sensitive Hashing (DSH), Jagadish et al.

q v1 v2 v3 v4 v5 v6 v7 v8

0distance from q

code:0000 1000 1100 1110 1111

h1 h2 h3 h4

LSH

Good idea for:

I Sorting all data items

Remember:

I We are interested in only k items.

4 / 9

Page 13: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Previous Approaches

2004, Locality Sensitive Hashing (LSH), Datar et al.

2009, Spectral Hashing (SH), Weiss et al.

2011, Anchor Graph Hashing (AGH), Liu et al.

2012, Spherical Hashing (SpH), Heo at al.Kernelized Supervised Hashing (KSH), Liu et al.

2013, Compressed Hashing (CH), Lin et al.Complementary Projection Hashing (CPH), Jin et al.

2014, Data Sensitive Hashing (DSH), Jagadish et al.

q v1 v2 v3 v4 v5 v6 v7 v8

0distance from q

code:0000 1000 1100 1110 1111

h1 h2 h3 h4

LSH

Good idea for:

I Sorting all data items

Remember:

I We are interested in only k items.

4 / 9

Page 14: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Previous Approaches

2004, Locality Sensitive Hashing (LSH), Datar et al.

2009, Spectral Hashing (SH), Weiss et al.

2011, Anchor Graph Hashing (AGH), Liu et al.

2012, Spherical Hashing (SpH), Heo at al.Kernelized Supervised Hashing (KSH), Liu et al.

2013, Compressed Hashing (CH), Lin et al.Complementary Projection Hashing (CPH), Jin et al.

2014, Data Sensitive Hashing (DSH), Jagadish et al.

q v1 v2 v3 v4 v5 v6 v7 v8

0distance from q

code:0000 1000 1100 1110 1111

h1 h2 h3 h4

LSH

Good idea for:

I Sorting all data items

Remember:

I We are interested in only k items.

4 / 9

Page 15: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Previous Approaches

2004, Locality Sensitive Hashing (LSH), Datar et al.

2009, Spectral Hashing (SH), Weiss et al.

2011, Anchor Graph Hashing (AGH), Liu et al.

2012, Spherical Hashing (SpH), Heo at al.Kernelized Supervised Hashing (KSH), Liu et al.

2013, Compressed Hashing (CH), Lin et al.Complementary Projection Hashing (CPH), Jin et al.

2014, Data Sensitive Hashing (DSH), Jagadish et al.

q v1 v2 v3 v4 v5 v6 v7 v8

0distance from q

code:0000 1000 1100 1110 1111

h1 h2 h3 h4

LSH

Good idea for:

I Sorting all data items

Remember:

I We are interested in only k items.

4 / 9

Page 16: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Previous Approaches

2004, Locality Sensitive Hashing (LSH), Datar et al.

2009, Spectral Hashing (SH), Weiss et al.

2011, Anchor Graph Hashing (AGH), Liu et al.

2012, Spherical Hashing (SpH), Heo at al.Kernelized Supervised Hashing (KSH), Liu et al.

2013, Compressed Hashing (CH), Lin et al.Complementary Projection Hashing (CPH), Jin et al.

2014, Data Sensitive Hashing (DSH), Jagadish et al.

q v1 v2 v3 v4 v5 v6 v7 v8

0distance from q

code:0000 1000 1100 1110 1111

h1 h2 h3 h4

LSH

Good idea for:

I Sorting all data items

Remember:

I We are interested in only k items.

4 / 9

Page 17: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Previous Approaches

2004, Locality Sensitive Hashing (LSH), Datar et al.

2009, Spectral Hashing (SH), Weiss et al.

2011, Anchor Graph Hashing (AGH), Liu et al.

2012, Spherical Hashing (SpH), Heo at al.Kernelized Supervised Hashing (KSH), Liu et al.

2013, Compressed Hashing (CH), Lin et al.Complementary Projection Hashing (CPH), Jin et al.

2014, Data Sensitive Hashing (DSH), Jagadish et al.

q v1 v2 v3 v4 v5 v6 v7 v8

0distance from q

code:0000 1000 1100 1110 1111

h1 h2 h3 h4

LSH

Good idea for:

I Sorting all data items

Remember:

I We are interested in only k items.

4 / 9

Page 18: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Previous Approaches

2004, Locality Sensitive Hashing (LSH), Datar et al.

2009, Spectral Hashing (SH), Weiss et al.

2011, Anchor Graph Hashing (AGH), Liu et al.

2012, Spherical Hashing (SpH), Heo at al.Kernelized Supervised Hashing (KSH), Liu et al.

2013, Compressed Hashing (CH), Lin et al.Complementary Projection Hashing (CPH), Jin et al.

2014, Data Sensitive Hashing (DSH), Jagadish et al.

q v1 v2 v3 v4 v5 v6 v7 v8

0distance from q

code:0000 1000 1100 1110 1111

h1

h2 h3 h4

LSH

Good idea for:

I Sorting all data items

Remember:

I We are interested in only k items.

4 / 9

Page 19: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Previous Approaches

2004, Locality Sensitive Hashing (LSH), Datar et al.

2009, Spectral Hashing (SH), Weiss et al.

2011, Anchor Graph Hashing (AGH), Liu et al.

2012, Spherical Hashing (SpH), Heo at al.Kernelized Supervised Hashing (KSH), Liu et al.

2013, Compressed Hashing (CH), Lin et al.Complementary Projection Hashing (CPH), Jin et al.

2014, Data Sensitive Hashing (DSH), Jagadish et al.

q v1 v2 v3 v4 v5 v6 v7 v8

0distance from q

code:0000 1000 1100 1110 1111

h1 h2 h3 h4

LSH

Good idea for:

I Sorting all data items

Remember:

I We are interested in only k items.

4 / 9

Page 20: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Previous Approaches

2004, Locality Sensitive Hashing (LSH), Datar et al.

2009, Spectral Hashing (SH), Weiss et al.

2011, Anchor Graph Hashing (AGH), Liu et al.

2012, Spherical Hashing (SpH), Heo at al.Kernelized Supervised Hashing (KSH), Liu et al.

2013, Compressed Hashing (CH), Lin et al.Complementary Projection Hashing (CPH), Jin et al.

2014, Data Sensitive Hashing (DSH), Jagadish et al.

q v1 v2 v3 v4 v5 v6 v7 v8

0distance from q

code:0000 1000 1100 1110 1111

h1 h2 h3 h4

LSH

Good idea for:

I Sorting all data items

Remember:

I We are interested in only k items.

4 / 9

Page 21: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Previous Approaches

2004, Locality Sensitive Hashing (LSH), Datar et al.

2009, Spectral Hashing (SH), Weiss et al.

2011, Anchor Graph Hashing (AGH), Liu et al.

2012, Spherical Hashing (SpH), Heo at al.Kernelized Supervised Hashing (KSH), Liu et al.

2013, Compressed Hashing (CH), Lin et al.Complementary Projection Hashing (CPH), Jin et al.

2014, Data Sensitive Hashing (DSH), Jagadish et al.

q v1 v2 v3 v4 v5 v6 v7 v8

0distance from q

code:0000 1000 1100 1110 1111

h1 h2 h3 h4

LSH

Good idea for:

I Sorting all data items

Remember:

I We are interested in only k items.

4 / 9

Page 22: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Our Approach

q v1 v2 v3 v4 v5 v6 v7 v8

code:0000 1000 1100 1110 1111

h1 h2 h3 h4

0distance from q

LSH

q v1 v2 v3 v4 v5 v6 v7 v8

code:0000 1000 1100 1110 1111

h1 h2 h3 h4

0distance from q

NSH

More Separators between Neighbors

I Neighbors close to the query are better distinguishedI Low Resolution for distant items =⇒ No Problem!

5 / 9

Page 23: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Our Approach

q v1 v2 v3 v4 v5 v6 v7 v8

code:0000 1000 1100 1110 1111

h1 h2 h3 h4

0distance from q

LSH

q v1 v2 v3 v4 v5 v6 v7 v8

code:0000 1000 1100 1110 1111

h1 h2 h3 h4

0distance from q

NSH

More Separators between Neighbors

I Neighbors close to the query are better distinguishedI Low Resolution for distant items =⇒ No Problem!

5 / 9

Page 24: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Algorithmic Details

q v1v2 v3 q v1v2 v3Hash

(a) Regular Hashing

q v1v2 v3 q v1 v2v3 q v1 v2v3NST Hash

(b) Hashing with NST

A special Non-Linear Transformation

I Expand the distance between NeighborsI Effectively, more hash functions for Neighbors

6 / 9

Page 25: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Recall Improvement over State-of-the-art

16 32 64 128 256

020406080

100

Improvement over:

Hashcode Length

Rec

all

Impr

ovem

ent

(%)

LSH AGH CH SH

CPH SpH DSH KSH

Hashcode size vs. Recall Improvement

I Up to 10% recall improvement over SpH [2]I Up to 15% recall improvement over LSH [1]

(The SIFT dataset)

7 / 9

Page 26: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Speed Improvement over State-of-the-art

10 20 30 40 50

−200

20406080

100

Improvement over:

Target Recall (%)

Tim

eRed

uction

(%)

LSH AGH CH SH

CPH SpH DSH KSH

Target Recall vs. Time Reduction

I Up to 22% time reduction over CPH [3]I Up to 67% time reduction over LSH [1]

(The SIFT dataset)

8 / 9

Page 27: Neighbor-Sensitive Hashingpyongjoo/resources/YongjooPark-knn.pdf · Neighbor-Sensitive Hashing Yongjoo Park Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Nov

Introduction Our Approach Results

Questions

[1] M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni.Locality-sensitive hashing scheme based on p-stable distributions.In SoCG, 2004.

[2] J.-P. Heo, Y. Lee, J. He, S.-F. Chang, and S.-E. Yoon.Spherical hashing.In CVPR, 2012.

[3] Z. Jin, Y. Hu, Y. Lin, D. Zhang, S. Lin, D. Cai, and X. Li.Complementary projection hashing.In ICCV, 2013.

9 / 9