Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

43
Max-kernel search How to search for just about anything? Parikshit Ram

description

Max-kernel search: How to search for just about anything? Nearest neighbor search is a well studied and widely used task in computer science and is quite pervasive in everyday applications. While search is not synonymous with learning, search is a crucial tool for the most nonparametric form of learning. Nearest neighbor search can directly be used for all kinds of learning tasks — classification, regression, density estimation, outlier detection. Search is also the computational bottleneck in various other learning tasks such as clustering and dimensionality reduction. Key to nearest neighbor search is the notion of “near”-ness or similarity. Mercer kernels form a class of general nonlinear similarity functions and are widely used in machine learning. They can define a notion of similarity between pairs of objects of any arbitrary type and have been successfully applied to a wide variety of object types — fixed-length data, images, text, time series, graphs. I will present a technique to do nearest neighbor search with this class of similarity functions provably efficiently, hence facilitating faster learning for larger data.

Transcript of Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Page 1: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Max-kernel searchHow to search for just about anything?

Parikshit Ram

Page 2: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Similarity search

● Set of objects● Query● Similarity functionR

q

1

Page 3: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Finding similar images

2

Page 4: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Drug discovery

3http://fineartamerica.com

Page 5: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Movie recommendations

4

Page 6: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Similarity search is ubiquitous

● Machine learning

● Computer vision

● Theory

● Databases

● Information retrieval

● Web application

● Collaborative filtering

● Scientific computing

5

Page 7: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Search-based classification

6

Page 8: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Search-based classification

6

?

Page 9: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Search-based classification

6

k-nearest-neighbor classification/regression

Page 10: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Search-based classification

7

“RomCom fan”

Page 11: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Search-based classification

7

“Kids movie fanatic”

Page 12: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Search-based outlier detection

8

Page 13: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

9

Page 14: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Search-based ML

Advantage● nonparametric - lets the data speak● no need to train complex models

Key ingredient● notion of similarity (domain/data-specific)

Main challenge: efficiency● Sheer size of the data● Varied data types

10

Page 15: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Properties of similarity functions

11

● symmetry

OR

Page 16: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

11

3

1

The dissimilarity is the size of the set-theoretic difference

Page 17: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Properties of similarity functions

11

● symmetry

● self-similarity

OR

OR

Page 18: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

11

We do not really care about this.

Page 19: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Properties of similarity functions

11

● symmetry

● self-similarity

OR

OR

Page 20: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

12

Page 21: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

12

Page 22: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

12

Metricsused everywhere

Page 23: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

12

Metricsused everywhere

Page 24: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

12

Bregman divergenceswidely used for distributions

Mercer kernelswidely used in ML for variety of objects and problems

???not quite explored in search or ML

Metricsused everywhere

Page 25: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Breadth of Kernel Functions

Objects Kernel Functions

Images linear, polynomial, Gaussian, Pyramid match

Documents cosine

Sequences p-spectrum kernel, alignment score

Trees subtree, syntactic, partial tree

Graphs random walk

Time series cross-correlation, dynamic time-warping

Natural Lang. convolution, decomposition, lexical semantic

13

Page 26: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

What is a Kernel Function?

In wordsA pairwise symmetric function

● Correlation in a richer but hidden feature space● Cannot access the hidden space

Object space

Hidden space

Hidden mapping

14

Page 27: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Max-kernel Search

Find the object in R most similar to q with respect to a kernel

15

Page 28: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Existing methods

● Brute-force (parallel/distributed)○ Domain-specific optimizations

● Coerce data to use metrics○ Only approximate

No standard search tools!

16

Page 29: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Understanding kernels

If two objects equally similar to each other

then they are equally similar to the query q

17

Page 30: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

IF

17

Understanding kernels

THEN

Page 31: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

18

Indexing our collection

Page 32: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

18

Indexing our collection

Page 33: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Multi-resolution index in O( n log n ) time

p

18

Indexing our collection

Cover Tree (BKL 2006)

Page 34: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

How to Search with this Index?

19

q

p

Page 35: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

How to Search with this Index?

19

q

p

p'

p''

Page 36: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

How to Search with this Index?

q

p

p''

p'

19

Page 37: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

How to Search with this Index?

q

p

p''

p'

19

Page 38: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

How to Search with this Index?

q

p

p''

p'

Safely ignore a large chunk (potentially millions)

19

Page 39: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Results: Efficiency

Improvement

20

Page 40: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

● Widely applicable algorithm● Performance data/kernel-dependent

Results: Efficiency10000x

10xImprovement

20

Page 41: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Results: Sublinear Query Time

Bigger data implies bigger efficiency gains

Improvement

Object set size

21

Page 42: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Can We Prove it?What Makes Search Hard?

Thm. For a set R of n objects, the query time is

● expansion constant

○ the distribution of the data

● directional concentration constant

○ the distribution of a kernel-induced transformation

of the data22

Page 43: Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL

Code/tutorial for Fast Exact Max-Kernel Search

23

version 1.0.5http://www.mlpack.org Ryan R. Curtin

Endnote

● Search is an essential tool for ML● Exploring different types of similarity functions

increases the applicability and quality of search● Kernels are widely applicable similarity functions

○ now we have provably fast max kernel search

Email: [email protected]