A Fast Image Retrieval System with Adjustable...
Transcript of A Fast Image Retrieval System with Adjustable...
1 © BBDC© 2013 Berlin Big Data Center • All Rights Reserved1 © BBDC
A Fast Image Retrieval Systemwith Adjustable Objectives
TU Berlin IDA, TU Berlin/DFKI DIMA, Frauhofer HHI
2 © BBDC22 © BBDC
3 © BBDC33 © BBDC
IDA’s Role in BBDCAP1:MaterialscienceAP2:VideominingAP3:LanguageprocessingAP4:Imageanalysisinmedicine
IDA
DIMA(AP7/8)
Triviallyparallelizable(pre-processing,linearregression,etc.)
Non-trivial,furtherscalability,
justification Newtechnologies
AP6:ScalableMLDevelopnewmethodstobeaddedtoTechnologyX.
AP5:StatisticalDA&MLSummarizerequirementsfromapplicationpartners.
HugeComplex-structured
Multi-modalDistributedStreaming
Partially-observed
Data
4 © BBDC44 © BBDC
Requirements & Solutions
EfficientDistributedHandyDeepOnline
Transductive
ClassificationRegressionClusteringInferenceRetrieval
+
Requirements
HugeComplex-structured
Multi-modalDistributedStreaming
Partially-observed
Data
Solutions
DataindexingDeeplearning
DecomposableoptimizationMulti-modalanalysis
AP1:MaterialscienceAP2:VideominingAP3:LanguageprocessingAP4:Imageanalysisinmedicine
5 © BBDC55 © BBDC
Projects doneDataindexing:Multi-purposeLocalitySensitiveHashing(mpLSH)
Deeplearning:DeepTensorNeuralNetworksExplainingNon-linearMachineLearning
DecomposableOptimization:Multi-classSVMforExtremeClassificationParallelMatrixFactorizationPolynomial-timeMessagePassingforHigh-orderPotentialsPerformanceGuaranteeofApproximateBayesianLearning
Multi-modalAnalysis:Multi-modalSourcePowerCo-modulation(mSPoC)Transductive ConditionalRandomFieldRegression(TCRFR)
Pleaseseeourposter!
6 © BBDC
Multiple-purposeLocality SensitiveHashing
7 © BBDC77 © BBDC
Nearest neighbor search (NNS)Naïveimplementation(linearscan)requirestime.
Background: LSH for NNS
Nearest neighbor search (NNS)
Naive implementation takes ⇠ O(N) computation time:
bx = argmin
nkq � xnk2.
-
6
x1x2
x3
x4
x5
x6q
nn id = -1
nn dist = float("inf")
for n in range(0, N):
dist = np.linalg.norm(X[:, n] - q.T)
if dist < nn distance:
nn id = n
nn dist = dist
Search only the samples in the bucket h(q)!
Shinichi Nakajima Technische Universitat Berlin
Introduction of a NIPS 2014 Paper
8 © BBDC88 © BBDC
Locality sensitive hashing (LSH)
Randomprojections provide good LSHfunctions.
LSHenablesapproximateNNSintimefor.
Background: LSH for NNS
Nearest neighbor search (NNS)
Naive implementation takes ⇠ O(N) computation time:
bx = argmin
nkq � xnk2.
-
6
x1x2
x3
x4
x5
x6q
nn id = -1
nn dist = float("inf")
for n in range(0, N):
dist = np.linalg.norm(X[:, n] - q.T)
if dist < nn distance:
nn id = n
nn dist = dist
Search only the samples in the bucket h(q)!
Shinichi Nakajima Technische Universitat Berlin
Introduction of a NIPS 2014 Paper
Background: LSH for NNS
Simple Construction of LSHLSH enables sublinear time probabilistic algorithm:
h : RL ! N s.t. P(h(x) = h(x
0)) > P(h(x) = h(x
00)) if kx � x
0k2 < kx � x
00k2.
i.e., similar samples tend to assign to the same bucket.
-
6
x1x2
x3
x4
x5
x6q
h(x) assigned samples0 0 00 0 1 x1, x6
, q
0 1 0 x30 1 11 0 01 0 1 x41 1 01 1 1 x2, x5
Search only the samples in the bucket h(q)!
Shinichi Nakajima Technische Universitat Berlin
Introduction of a NIPS 2014 Paper
9 © BBDC99 © BBDC
Background: LSH for NNS
Nearest neighbor search (NNS)
Naive implementation takes ⇠ O(N) computation time:
bx = argmin
nkq � xnk2.
-
6
x1x2
x3
x4
x5
x6q
nn id = -1
nn dist = float("inf")
for n in range(0, N):
dist = np.linalg.norm(X[:, n] - q.T)
if dist < nn distance:
nn id = n
nn dist = dist
Search only the samples in the bucket h(q)!
Shinichi Nakajima Technische Universitat Berlin
Introduction of a NIPS 2014 Paper
Locality sensitive hashing (LSH)
Only samples inthe samebucket should be evaluated.
LSHenablesapproximateNNSintimefor.
Background: LSH for NNS
Simple Construction of LSHLSH enables sublinear time probabilistic algorithm:
h : RL ! N s.t. P(h(x) = h(x
0)) > P(h(x) = h(x
00)) if kx � x
0k2 < kx � x
00k2.
i.e., similar samples tend to assign to the same bucket.
-
6
x1x2
x3
x4
x5
x6q
h(x) assigned samples0 0 00 0 1 x1, x6
, q
0 1 0 x30 1 11 0 01 0 1 x41 1 01 1 1 x2, x5
Search only the samples in the bucket h(q)!
Shinichi Nakajima Technische Universitat Berlin
Introduction of a NIPS 2014 Paper
10 © BBDC1010 © BBDC
MotivationDifferentsimilarityrequiresdifferentLSHcodes.
L2-LSHforL2similarity[Datar etal.2004]
sign-LSHforcosinesimilarity[Coemans&Williamson 1995]
simple-LSHforinnerproduct(IP)similarity[Neyshabur&Srebro 2015]
11 © BBDC1111 © BBDC
MotivationDifferentsimilarityrequiresdifferentLSHcodes.
GeneralLSHcodingformulti-purposes?
Multi-purposesimilarity:
12 © BBDC1212 © BBDC
Multi-purpose LSH (mpLSH)DifferentsimilarityrequiresdifferentLSHcodes.
Multi-purposesimilarity:
Featureaugmentationfor metric transform
LSHcodingbyrandomprojection
Multi-metricsearchbycovertree
CommonpartSimilaritydependentpart
Multi-metricincodespace:
13 © BBDC1313 © BBDC
Multi-purpose LSH (mpLSH)Theoreticalandempiricalperformancevalidation:
Theorem 1 For �
(w) = �
(w) = 0, 8w, i.e., Lmp({q(w)},x)) is the cosine sim-ilarity, it holds that P
�DCA({q(w)},x) = 0
�= F sign(1 + 1
2Lmp({q(w)},x))T .
Theorem 2 For �
(w) = ⌘
(w) = 0, 8w, i.e., Lmp({q(w)},x)) is the IP similar-ity, the expectation of the mp-LSH-CA code similarity is bounded as
2pL�1⇡ (2 + Lmp({q(w)},x)) bDCA({q(w)},x)
T pL� 1(2 + Lmp({q(w)},x)) + 1.
L2 IP Mixed
(Conditional)LSHpropertytheoreticallyguaranteed.
Approximatesearchperformancevalidatedonseveralreaddata.
Computationalandmemoryefficiencyprovenonlargedata(100Msamples).
Proposed1Proposed2
14 © BBDC1414 © BBDC
Multi-purpose LSH (mpLSH)
Applications:Generalpurpose indexing atdata collection phase (without fixed analysis plan).Materical search with optimized properties (e.g.,stability,utility,etc.)Image/video retrieval with adjustable query (e.g.,preference +closeness).
CharacteristicsRandomprojectionbased(data-independent).Query-timeweightadjustmentsupportedbycovertree.Similarmemoryrequirementtosign-LSH.
Indexing(time-consuming)
NNsearch(sub-lineartime)
Datacollection
L2-LSH
Multi-purposeLSH!
sign-LSHsimple-LSH
15 © BBDC1515 © BBDC
Demo: Image retrieval with adjustable objectivehttp://bbdcdemo.bbdc.tu-berlin.de
Retrievesimilarimagestouser-providedquery(L2-query) withuser-preferencequery(IP-query)takenintoaccount.Mixingweightisadjustedinreal-time.
ClosesttoL2-query
BestmatchIP-query
Mostrelevanttomixedquery
Querysearchin~100msec!
Pleasetrythedemo!