Similarity Learning for High Dimensional Sparse Data
Existing methodsMotivationMeasuring distance or similarity is a key component of AI, machine learning, pattern recognition, data mining, etcEx: nearest neighbor classification, clustering, information retrieval...
How to define good distances between objects?
Metric learning [1]: learn distance or similarity function automatically from data (must-link/cannot-link relations)
metriclearning
Main contributions A similarity model that can efficiently learn high-dimensional similarity in the original space by parameterizing the similarity measure as a convex combination of rank-one matrices with specific sparsity structures.
Derivation of scalable algorithms for the proposed formulations, with time/memory cost independent from data dimensionality
Appealing optimization and generalization guarantees.
Experiments on high dimensional real data showing its potential in classification, dimensionality reduction and data exploration.
Classification Feature Selection and Sparsity
More experiments in the paper + MATLAB code available!
Dimension Reduction
Experiments
ApproachFormulation Optimization Theoretical analysis
Limited features selected: As iteration goes, the number of features tends to converge; Extreme sparse similarity matrix:0.0006% entries are non-zero.
One run of HDSL as means of dimension reduction, outperforming PCA and random projection in terms of k-NN classification test error.
References[1] Bellet, A.; Habrard, A.; and Sebban, M. 2013. A Survey on Metric Learning for Feature Vectors and Structured Data. Technical report, arXiv:1306.6709.[2] Jaggi, M. "Revisiting Frank-Wolfe: Projection-free sparse convex optimization." ICML, 2013.[3] Gao, X, et al. "SOML: Sparse online metric learning with application to image retrieval." AAAI. 2014.[4] Schultz, M, and Joachims. T. "Learning a distance metric from relative comparisons." NIPS,2004.
[5] Goldberger, J., Roweis, S., Hinton, G. and Salakhutdinov, R.. Neighbourhood Components Analysis. In NIPS, 2004.[6] K. Q. Weinberger and L. K. Saul. DistanceMetric Learning for Large Margin Nearest Neighbor Classification. JMLR, 2009.[7] Kedem, D., Tyree, S., Weinberger, K., Sha, F. and Lanckriet, G., Non-linear Metric Learning. In NIPS, 2012.[8] Shen. S. et al. Positive Semidefinite Metric Learning Using Boosting-like Algorithms. JMLR, 2012[9] Shi, Y., Bellet, A. and Sha, F., Sparse Compositional Metric Learning. In AAAI, 2014
Kuan Liu Aurélien Bellet Fei Sha
Time/Memory cost independent from datadimensionality (d).
Learning sparse similarity efficiently based on FW.
i.e., only one similarity basis (two features) added/removed at each iteration. gives compact storage of Mand efficient computation of objective function, activeconstraints, etc.
Frank Wolfe Algorithm[2]:Settting: assume data high dimensional, sparse. , ,
Goal: learn , , .
Similarity basis: Given
Learning Objective:
Smoothed hinge loss more similar to than to
(1)
Sparsity: At any iteration k, has at most rank k+1 with 4(k+1) non-zero entries, using at most 2(k+1) distinct features.
Convergence: Let be an optimal solution to(1), and let , for each iteration k,
Generalization: the number of iterations givesa tradeoff between optimization error and model complexity. For each iteration k,
expected risk empirical risk
This is expensive in high dimensionsMust learn D2 parametersMust ensure that is PSD: O(D3) time complexity
Most algorithms learn a Mahalanobis distance
where is a D x D positive semi-definite (PSD) matrix
Explicit low rank decomposition[5][6][7].Rank one matrices decompostion[8][9].
Low-dimensional projection based methods.Current approaches
Learn a diagonal matrix [3][4].
= 0.125 + 0.25 + ... + 0.125
In k-NN classification on data sets with d up to , our methods achieve lowest test errors than other state-of-the-art similarity learning approaches.
Top Related