A Fast Image Retrieval System with Adjustable...

15
1 © BBDC © 2013 Berlin Big Data Center • All Rights Reserved 1 © BBDC A Fast Image Retrieval System with Adjustable Objectives TU Berlin IDA, TU Berlin/DFKI DIMA, Frauhofer HHI

Transcript of A Fast Image Retrieval System with Adjustable...

Page 1: A Fast Image Retrieval System with Adjustable Objectivesbig-data-berlin.dima.tu-berlin.de/fileadmin/news/... · Distributed Handy Deep Online Transductive Classification Regression

1 © BBDC© 2013 Berlin Big Data Center • All Rights Reserved1 © BBDC

A Fast Image Retrieval Systemwith Adjustable Objectives

TU Berlin IDA, TU Berlin/DFKI DIMA, Frauhofer HHI

Page 2: A Fast Image Retrieval System with Adjustable Objectivesbig-data-berlin.dima.tu-berlin.de/fileadmin/news/... · Distributed Handy Deep Online Transductive Classification Regression

2 © BBDC22 © BBDC

Page 3: A Fast Image Retrieval System with Adjustable Objectivesbig-data-berlin.dima.tu-berlin.de/fileadmin/news/... · Distributed Handy Deep Online Transductive Classification Regression

3 © BBDC33 © BBDC

IDA’s Role in BBDCAP1:MaterialscienceAP2:VideominingAP3:LanguageprocessingAP4:Imageanalysisinmedicine

IDA

DIMA(AP7/8)

Triviallyparallelizable(pre-processing,linearregression,etc.)

Non-trivial,furtherscalability,

justification Newtechnologies

AP6:ScalableMLDevelopnewmethodstobeaddedtoTechnologyX.

AP5:StatisticalDA&MLSummarizerequirementsfromapplicationpartners.

HugeComplex-structured

Multi-modalDistributedStreaming

Partially-observed

Data

Page 4: A Fast Image Retrieval System with Adjustable Objectivesbig-data-berlin.dima.tu-berlin.de/fileadmin/news/... · Distributed Handy Deep Online Transductive Classification Regression

4 © BBDC44 © BBDC

Requirements & Solutions

EfficientDistributedHandyDeepOnline

Transductive

ClassificationRegressionClusteringInferenceRetrieval

+

Requirements

HugeComplex-structured

Multi-modalDistributedStreaming

Partially-observed

Data

Solutions

DataindexingDeeplearning

DecomposableoptimizationMulti-modalanalysis

AP1:MaterialscienceAP2:VideominingAP3:LanguageprocessingAP4:Imageanalysisinmedicine

Page 5: A Fast Image Retrieval System with Adjustable Objectivesbig-data-berlin.dima.tu-berlin.de/fileadmin/news/... · Distributed Handy Deep Online Transductive Classification Regression

5 © BBDC55 © BBDC

Projects doneDataindexing:Multi-purposeLocalitySensitiveHashing(mpLSH)

Deeplearning:DeepTensorNeuralNetworksExplainingNon-linearMachineLearning

DecomposableOptimization:Multi-classSVMforExtremeClassificationParallelMatrixFactorizationPolynomial-timeMessagePassingforHigh-orderPotentialsPerformanceGuaranteeofApproximateBayesianLearning

Multi-modalAnalysis:Multi-modalSourcePowerCo-modulation(mSPoC)Transductive ConditionalRandomFieldRegression(TCRFR)

Pleaseseeourposter!

Page 6: A Fast Image Retrieval System with Adjustable Objectivesbig-data-berlin.dima.tu-berlin.de/fileadmin/news/... · Distributed Handy Deep Online Transductive Classification Regression

6 © BBDC

Multiple-purposeLocality SensitiveHashing

Page 7: A Fast Image Retrieval System with Adjustable Objectivesbig-data-berlin.dima.tu-berlin.de/fileadmin/news/... · Distributed Handy Deep Online Transductive Classification Regression

7 © BBDC77 © BBDC

Nearest neighbor search (NNS)Naïveimplementation(linearscan)requirestime.

Background: LSH for NNS

Nearest neighbor search (NNS)

Naive implementation takes ⇠ O(N) computation time:

bx = argmin

nkq � xnk2.

-

6

x1x2

x3

x4

x5

x6q

nn id = -1

nn dist = float("inf")

for n in range(0, N):

dist = np.linalg.norm(X[:, n] - q.T)

if dist < nn distance:

nn id = n

nn dist = dist

Search only the samples in the bucket h(q)!

Shinichi Nakajima Technische Universitat Berlin

Introduction of a NIPS 2014 Paper

Page 8: A Fast Image Retrieval System with Adjustable Objectivesbig-data-berlin.dima.tu-berlin.de/fileadmin/news/... · Distributed Handy Deep Online Transductive Classification Regression

8 © BBDC88 © BBDC

Locality sensitive hashing (LSH)

Randomprojections provide good LSHfunctions.

LSHenablesapproximateNNSintimefor.

Background: LSH for NNS

Nearest neighbor search (NNS)

Naive implementation takes ⇠ O(N) computation time:

bx = argmin

nkq � xnk2.

-

6

x1x2

x3

x4

x5

x6q

nn id = -1

nn dist = float("inf")

for n in range(0, N):

dist = np.linalg.norm(X[:, n] - q.T)

if dist < nn distance:

nn id = n

nn dist = dist

Search only the samples in the bucket h(q)!

Shinichi Nakajima Technische Universitat Berlin

Introduction of a NIPS 2014 Paper

Background: LSH for NNS

Simple Construction of LSHLSH enables sublinear time probabilistic algorithm:

h : RL ! N s.t. P(h(x) = h(x

0)) > P(h(x) = h(x

00)) if kx � x

0k2 < kx � x

00k2.

i.e., similar samples tend to assign to the same bucket.

-

6

x1x2

x3

x4

x5

x6q

h(x) assigned samples0 0 00 0 1 x1, x6

, q

0 1 0 x30 1 11 0 01 0 1 x41 1 01 1 1 x2, x5

Search only the samples in the bucket h(q)!

Shinichi Nakajima Technische Universitat Berlin

Introduction of a NIPS 2014 Paper

Page 9: A Fast Image Retrieval System with Adjustable Objectivesbig-data-berlin.dima.tu-berlin.de/fileadmin/news/... · Distributed Handy Deep Online Transductive Classification Regression

9 © BBDC99 © BBDC

Background: LSH for NNS

Nearest neighbor search (NNS)

Naive implementation takes ⇠ O(N) computation time:

bx = argmin

nkq � xnk2.

-

6

x1x2

x3

x4

x5

x6q

nn id = -1

nn dist = float("inf")

for n in range(0, N):

dist = np.linalg.norm(X[:, n] - q.T)

if dist < nn distance:

nn id = n

nn dist = dist

Search only the samples in the bucket h(q)!

Shinichi Nakajima Technische Universitat Berlin

Introduction of a NIPS 2014 Paper

Locality sensitive hashing (LSH)

Only samples inthe samebucket should be evaluated.

LSHenablesapproximateNNSintimefor.

Background: LSH for NNS

Simple Construction of LSHLSH enables sublinear time probabilistic algorithm:

h : RL ! N s.t. P(h(x) = h(x

0)) > P(h(x) = h(x

00)) if kx � x

0k2 < kx � x

00k2.

i.e., similar samples tend to assign to the same bucket.

-

6

x1x2

x3

x4

x5

x6q

h(x) assigned samples0 0 00 0 1 x1, x6

, q

0 1 0 x30 1 11 0 01 0 1 x41 1 01 1 1 x2, x5

Search only the samples in the bucket h(q)!

Shinichi Nakajima Technische Universitat Berlin

Introduction of a NIPS 2014 Paper

Page 10: A Fast Image Retrieval System with Adjustable Objectivesbig-data-berlin.dima.tu-berlin.de/fileadmin/news/... · Distributed Handy Deep Online Transductive Classification Regression

10 © BBDC1010 © BBDC

MotivationDifferentsimilarityrequiresdifferentLSHcodes.

L2-LSHforL2similarity[Datar etal.2004]

sign-LSHforcosinesimilarity[Coemans&Williamson 1995]

simple-LSHforinnerproduct(IP)similarity[Neyshabur&Srebro 2015]

Page 11: A Fast Image Retrieval System with Adjustable Objectivesbig-data-berlin.dima.tu-berlin.de/fileadmin/news/... · Distributed Handy Deep Online Transductive Classification Regression

11 © BBDC1111 © BBDC

MotivationDifferentsimilarityrequiresdifferentLSHcodes.

GeneralLSHcodingformulti-purposes?

Multi-purposesimilarity:

Page 12: A Fast Image Retrieval System with Adjustable Objectivesbig-data-berlin.dima.tu-berlin.de/fileadmin/news/... · Distributed Handy Deep Online Transductive Classification Regression

12 © BBDC1212 © BBDC

Multi-purpose LSH (mpLSH)DifferentsimilarityrequiresdifferentLSHcodes.

Multi-purposesimilarity:

Featureaugmentationfor metric transform

LSHcodingbyrandomprojection

Multi-metricsearchbycovertree

CommonpartSimilaritydependentpart

Multi-metricincodespace:

Page 13: A Fast Image Retrieval System with Adjustable Objectivesbig-data-berlin.dima.tu-berlin.de/fileadmin/news/... · Distributed Handy Deep Online Transductive Classification Regression

13 © BBDC1313 © BBDC

Multi-purpose LSH (mpLSH)Theoreticalandempiricalperformancevalidation:

Theorem 1 For �

(w) = �

(w) = 0, 8w, i.e., Lmp({q(w)},x)) is the cosine sim-ilarity, it holds that P

�DCA({q(w)},x) = 0

�= F sign(1 + 1

2Lmp({q(w)},x))T .

Theorem 2 For �

(w) = ⌘

(w) = 0, 8w, i.e., Lmp({q(w)},x)) is the IP similar-ity, the expectation of the mp-LSH-CA code similarity is bounded as

2pL�1⇡ (2 + Lmp({q(w)},x)) bDCA({q(w)},x)

T pL� 1(2 + Lmp({q(w)},x)) + 1.

L2 IP Mixed

(Conditional)LSHpropertytheoreticallyguaranteed.

Approximatesearchperformancevalidatedonseveralreaddata.

Computationalandmemoryefficiencyprovenonlargedata(100Msamples).

Proposed1Proposed2

Page 14: A Fast Image Retrieval System with Adjustable Objectivesbig-data-berlin.dima.tu-berlin.de/fileadmin/news/... · Distributed Handy Deep Online Transductive Classification Regression

14 © BBDC1414 © BBDC

Multi-purpose LSH (mpLSH)

Applications:Generalpurpose indexing atdata collection phase (without fixed analysis plan).Materical search with optimized properties (e.g.,stability,utility,etc.)Image/video retrieval with adjustable query (e.g.,preference +closeness).

CharacteristicsRandomprojectionbased(data-independent).Query-timeweightadjustmentsupportedbycovertree.Similarmemoryrequirementtosign-LSH.

Indexing(time-consuming)

NNsearch(sub-lineartime)

Datacollection

L2-LSH

Multi-purposeLSH!

sign-LSHsimple-LSH

Page 15: A Fast Image Retrieval System with Adjustable Objectivesbig-data-berlin.dima.tu-berlin.de/fileadmin/news/... · Distributed Handy Deep Online Transductive Classification Regression

15 © BBDC1515 © BBDC

Demo: Image retrieval with adjustable objectivehttp://bbdcdemo.bbdc.tu-berlin.de

Retrievesimilarimagestouser-providedquery(L2-query) withuser-preferencequery(IP-query)takenintoaccount.Mixingweightisadjustedinreal-time.

ClosesttoL2-query

BestmatchIP-query

Mostrelevanttomixedquery

Querysearchin~100msec!

Pleasetrythedemo!