Multiple Kernel Learning and Its Applications · Multiple Kernel Learning and Its Applications ......
Transcript of Multiple Kernel Learning and Its Applications · Multiple Kernel Learning and Its Applications ......
-
Multiple Kernel Learning and Its Applications
Yen-Yu Lin ()Research Center for Information Technology Innovation
Academia Sinica
-
Outline
Introduction
Motivation
Proposed Approach
Experimental Results
Conclusions
-
Computer Vision Researches
Goal of computer vision: establish systems that can see, perceive, and interpret the world like humans
Digit/Face recognion
handwritten digit recognition
face recognition
-
Face Detection
4
P
R
O
B
L
E
M
S
-
Image Clustering/Retrieval
5
shapetext
ure
color
-
People Counting
6
-
Object Recognition/Detection/Segmentation
7
Presence of bicycles? Yes
Presence of horses? No
Presence of persons? Yes
Person: (185,62,279,199)
Horse: (90,78,403,336)
recognition detection segmentation
-
The Problem
Many vision applications deal with data of multiple classes Supervised learning (with labeled training data)
Object recognition and face detection
Unsupervised learning (with unlabeled training data) Image clustering and data visualization
Semi-supervised learning (with partially labeled training data)Retrieval with feedback and metric learning with side information
Difficulties Diverse and broad data categories
Large intraclass variations
-
Diverse & Broad Categories
Caltech-101 database 101 object categories
One additional category of background
-
Large Intraclass Variations
Pascal VOC Challenge
airplane
dog
bus
boat
-
Observation and The Main Theme
Feature representation Image descriptor
Distance function
No single feature representation suffices to explain the complexity of the whole data
Improve performances of vision applications by using multiple feature representations
-
Difficulties in Feature Representation Combination
Data have diverse forms Histogram
Graph or constellation model
Tensor
Bag of feature vectors
-
Difficulties in Feature Representation Combination
Data are of high dimensions
Curse of dimensionality: for a given sample size, there is a maximum number of features above which the performance of the underlying predictor will degrade rather than improve
Computational issues
dimensionality
performance
-
Outline
Introduction
Motivation
Proposed Approach
Experimental Results
Conclusions
-
Motivation
bag of features
histogram
2D matrix
2. high dimensions1. diverse forms
a unified space of lower dimensions
-
Kernel as the Unified Representation
Multiple features result in diverse forms of representations Vector, histogram, bag-of-features, 2D matrix, high order tensor,
We transform data under each feature representation to a kernel matrix by
kinds of features will lead to kernels
-
Multiple Kernel Learning (MKL)
MKL: learning a kernel machine with multiple kernels Introduced by [Cristianini et al. 01], [Lanckriet et al. 02], [Bach et al. 04]
With data and base kernels , the learned model is of the form
Task of MKL: optimize both and
-
MKL and Feature Fusion
Represent data under each descriptor by a kernel matrix
Feature fusion = Learning an ensemble kernel, Fusion in the domain of kernel matrices
: the importance of descriptor
: Gabor wavelet: color histogram
m
: SIFT
-
Motivation
bag of features
histogram
2D matrix
2. high dimensions1. diverse forms
a unified space of lower dimensions
-
Which Dimensionality Reduction (DR) Method?
Unsupervised DR methods PCA: principal component analysis
LPP: locality preserving projections [He & Niyogi 03]
Supervised DR methods LDA: linear discriminant analysis
LDE: local discriminant embedding [Chen et al. 05]
Semi-supervised DR method ARE: augmented relation embedding [Lin et al. 05]
SDA: semi-supervised discriminant analysis [Cai et al. 07]
Graph embedding: a unified view of many DR methods
-
Graph Embedding
Graph embedding [Yan et al. 07]
By specifying particular and , a set of DR methods can be expressed by graph embedding, including
where
supervised unsupervised semi-supervised
Gaussian LDA PCA SDA
AREmanifold LDE / MFA LPP
-
Outline
Introduction
Motivation
Proposed Approach
Experimental Results
Conclusions
-
Main Idea
diverseforms
multiplekernellearning
1/2
highdimensions
graphembedding
-
Main Idea
high dimensions
graph embedding
diverseforms
MKL-DR
multiplekernellearning
2/2
-
On Integrating MKL into Graph Embedding
MKL-DR: integrate multiple kernel learning (MKL) into training process of dimensionality reduction (DR) methods
1. The ensemble kernel is the linear combination of base ones
2. Data are mapped to the induced RKHS
3. Prove each projection vector lie in the span of mapped data
4. Prove all the operations in graph embedding can be accomplished by kernel trick
-
Constrained Optimization for MKL-DR
The resulting constrained optimization problem is
where
-
Optimization of MKL-DR
A alternating optimization procedure
On optimizing by fixing :
Optimal is obtained by solving a generalized eigenvalue problem
On optimizing by fixing :
Non-convex QCQP (quadratically constrained quadratic programming)
SDP (semi-definite programming)-relaxation
-
Training Phase of MKL-DR
Graph Laplacian:unified representation
of DR methods
Kernel Matrix:unified representation
of descriptors
optimize by SDP
MKL-DR
optimize by GEP
-
Testing Phase of MKL-DR
Feature SpaceImage Descriptor
RKHS RKHS Euclidean Space
-
Impacts
From the perspective of DR methodsMany existing DR methods can consider multiple kernels (features)
E.g., PCA --> Kernel PCA --> MKL PCA
Systematic feature selection across different spaces
From the perspective of MKL framework From hinge loss to diverse objective functions of DR methods
E.g., maximizing the projected variances in PCA
Extend MKL from supervised applications to unsupervised and semi-supervised ones
-
Outline
Introduction
Motivation
Proposed Approach
Experimental Results
Conclusions
-
Supervised Object Categorization - Dataset
Caltech-101 dataset
Multi-class classification problem (102 classes)
-
Supervised Object Categorization - Input
Ten kernels (descriptors) GB / GB-Dist: based on geometric blur descriptor
SIFT-Dist / SIFT-SPM: based on SIFT descriptor
SS-Dist / SS-SPM: based on self-similarity descriptor
C2-SWP / C2-ML: based on biologically inspired features
PHOG: based on PHOG descriptor
GIST: based on GIST descriptor
Dimensionality reduction method Local discriminant embedding (LDE) [Chen et al. 05]
-
Supervised Object Categorization - Results
Nearest neighbor rule for classification
1/2
-
Object Recognition - Results
single feature
multiple features
1/2
-
Unsupervised Image Clustering - Dataset
20 classes from Caltech-101 dataset
-
Unsupervised Image Clustering - Input
Ten kernels GB / GB-Dist: based on geometric blur descriptor
SIFT-Dist / SIFT-SPM: based on SIFT descriptor
SS-Dist / SS-SPM: based on self-similarity descriptor
C2-SWP / C2-ML: based on biologically inspired features
PHOG: based on PHOG descriptor
GIST: based on GIST descriptor
Dimensionality reduction method Locality preserving projections (LPP) [He & Niyogi 03]
-
Unsupervised Image Clustering - Results
2-D visualization of the projected space
kernel LPP with kernel GB-Dist
kernel LPP with kernel GIST
MKL-LPP with all the ten kernels
1/2
-
Unsupervised Image Clustering - Results
Affinity propagation [Frey and Dueck 07]
Performance evaluation [NMI / ACC %] NMI: normalization mutual information
ACC: accuracy rate
2/2
-
Semi-supervised Face Recognition - Dataset
CMU PIE face database
We divide the 68 subjects into four equal-size groups
Lighting:
Rotation:
Occlusion:
Profile:
-
Semi-supervised Face Recognition - Input
Four descriptors (kernels) DeLight: Based on the delighting algorithm by [Gross et al. 03]
LBP: A rotation-invariant operator [Ojala et al. 00]
RsLTS: Least trimmed squares with 20% outliers allowed
RsL2: Pixel intensities with Euclidean distance
Dimensionality reduction method Semi-supervised discriminant analysis (SDA) [Cai et al. 07]
-
Semi-supervised Face Recognition - Input
DeLight:
LBP:
-
Semi-supervised Face Recognition - Results
Nearest neighbor rule for classification
-
Outline
Introduction
Motivation
Proposed Approach
Experimental Results
Conclusions
-
Conclusions
MKL-DR provides a unified and compact view of data with multiple feature representations Applied to a broad set of vision applications
A general framework for data analysis Adopt a graph-based dimensionality reduction method
Choose a proper set of features
Diverse objective functions of MKL Extend MKL to unsupervised and semi-supervised learning problems
Generalize many existing DR methods to consider multiple kernels
-
Thank You
Email: [email protected]
Yen-Yu Lin ()
Tel: (02) 2787-2392