Multiple Kernel Learning and Its Applications · Multiple Kernel Learning and Its Applications ......

Multiple Kernel Learning and Its Applications

Yen-Yu Lin ()Research Center for Information Technology Innovation

Academia Sinica

Outline

Introduction

Motivation

Proposed Approach

Experimental Results

Conclusions

Computer Vision Researches

Goal of computer vision: establish systems that can see, perceive, and interpret the world like humans

Digit/Face recognion

handwritten digit recognition

face recognition

Face Detection

4

P

R

O

B

L

E

M

S

Image Clustering/Retrieval

5

shapetext

ure

color

People Counting

6

Object Recognition/Detection/Segmentation

7

Presence of bicycles? Yes

Presence of horses? No

Presence of persons? Yes

Person: (185,62,279,199)

Horse: (90,78,403,336)

recognition detection segmentation

The Problem

Many vision applications deal with data of multiple classes Supervised learning (with labeled training data)

Object recognition and face detection

Unsupervised learning (with unlabeled training data) Image clustering and data visualization

Semi-supervised learning (with partially labeled training data)Retrieval with feedback and metric learning with side information

Difficulties Diverse and broad data categories

Large intraclass variations

Diverse & Broad Categories

Caltech-101 database 101 object categories

One additional category of background

Large Intraclass Variations

Pascal VOC Challenge

airplane

dog

bus

boat

Observation and The Main Theme

Feature representation Image descriptor

Distance function

No single feature representation suffices to explain the complexity of the whole data

Improve performances of vision applications by using multiple feature representations

Difficulties in Feature Representation Combination

Data have diverse forms Histogram

Graph or constellation model

Tensor

Bag of feature vectors

Difficulties in Feature Representation Combination

Data are of high dimensions

Curse of dimensionality: for a given sample size, there is a maximum number of features above which the performance of the underlying predictor will degrade rather than improve

Computational issues

dimensionality

performance

Outline

Introduction

Motivation

Proposed Approach


Conclusions

Motivation

bag of features

histogram

2D matrix

2. high dimensions1. diverse forms

a unified space of lower dimensions

Kernel as the Unified Representation

Multiple features result in diverse forms of representations Vector, histogram, bag-of-features, 2D matrix, high order tensor,

We transform data under each feature representation to a kernel matrix by

kinds of features will lead to kernels

Multiple Kernel Learning (MKL)

MKL: learning a kernel machine with multiple kernels Introduced by [Cristianini et al. 01], [Lanckriet et al. 02], [Bach et al. 04]

With data and base kernels , the learned model is of the form

Task of MKL: optimize both and

MKL and Feature Fusion

Represent data under each descriptor by a kernel matrix

Feature fusion = Learning an ensemble kernel, Fusion in the domain of kernel matrices

: the importance of descriptor

: Gabor wavelet: color histogram

m

: SIFT

Motivation

bag of features

histogram

2D matrix

2. high dimensions1. diverse forms

a unified space of lower dimensions

Which Dimensionality Reduction (DR) Method?

Unsupervised DR methods PCA: principal component analysis

LPP: locality preserving projections [He & Niyogi 03]

Supervised DR methods LDA: linear discriminant analysis

LDE: local discriminant embedding [Chen et al. 05]

Semi-supervised DR method ARE: augmented relation embedding [Lin et al. 05]

SDA: semi-supervised discriminant analysis [Cai et al. 07]

Graph embedding: a unified view of many DR methods

Graph Embedding

Graph embedding [Yan et al. 07]

By specifying particular and , a set of DR methods can be expressed by graph embedding, including

where

supervised unsupervised semi-supervised

Gaussian LDA PCA SDA

AREmanifold LDE / MFA LPP

Outline

Introduction

Motivation

Proposed Approach


Conclusions

Main Idea

diverseforms

multiplekernellearning

1/2

highdimensions

graphembedding

Main Idea

high dimensions

graph embedding

diverseforms

MKL-DR

multiplekernellearning

2/2

On Integrating MKL into Graph Embedding

MKL-DR: integrate multiple kernel learning (MKL) into training process of dimensionality reduction (DR) methods

1. The ensemble kernel is the linear combination of base ones

2. Data are mapped to the induced RKHS

3. Prove each projection vector lie in the span of mapped data

4. Prove all the operations in graph embedding can be accomplished by kernel trick

Constrained Optimization for MKL-DR

The resulting constrained optimization problem is

where

Optimization of MKL-DR

A alternating optimization procedure

On optimizing by fixing :

Optimal is obtained by solving a generalized eigenvalue problem

On optimizing by fixing :

Non-convex QCQP (quadratically constrained quadratic programming)

SDP (semi-definite programming)-relaxation

Training Phase of MKL-DR

Graph Laplacian:unified representation

of DR methods

Kernel Matrix:unified representation

of descriptors

optimize by SDP

MKL-DR

optimize by GEP

Testing Phase of MKL-DR

Feature SpaceImage Descriptor

RKHS RKHS Euclidean Space

Impacts

From the perspective of DR methodsMany existing DR methods can consider multiple kernels (features)

E.g., PCA --> Kernel PCA --> MKL PCA

Systematic feature selection across different spaces

From the perspective of MKL framework From hinge loss to diverse objective functions of DR methods

E.g., maximizing the projected variances in PCA

Extend MKL from supervised applications to unsupervised and semi-supervised ones

Outline

Introduction

Motivation

Proposed Approach


Conclusions

Supervised Object Categorization - Dataset

Caltech-101 dataset

Multi-class classification problem (102 classes)

Supervised Object Categorization - Input

Ten kernels (descriptors) GB / GB-Dist: based on geometric blur descriptor

SIFT-Dist / SIFT-SPM: based on SIFT descriptor

SS-Dist / SS-SPM: based on self-similarity descriptor

C2-SWP / C2-ML: based on biologically inspired features

PHOG: based on PHOG descriptor

GIST: based on GIST descriptor

Dimensionality reduction method Local discriminant embedding (LDE) [Chen et al. 05]

Supervised Object Categorization - Results

Nearest neighbor rule for classification

1/2

Object Recognition - Results

single feature

multiple features

1/2

Unsupervised Image Clustering - Dataset

20 classes from Caltech-101 dataset

Unsupervised Image Clustering - Input

Ten kernels GB / GB-Dist: based on geometric blur descriptor

SIFT-Dist / SIFT-SPM: based on SIFT descriptor

SS-Dist / SS-SPM: based on self-similarity descriptor

C2-SWP / C2-ML: based on biologically inspired features

PHOG: based on PHOG descriptor

GIST: based on GIST descriptor

Dimensionality reduction method Locality preserving projections (LPP) [He & Niyogi 03]

Unsupervised Image Clustering - Results

2-D visualization of the projected space

kernel LPP with kernel GB-Dist

kernel LPP with kernel GIST

MKL-LPP with all the ten kernels

1/2

Unsupervised Image Clustering - Results

Affinity propagation [Frey and Dueck 07]

Performance evaluation [NMI / ACC %] NMI: normalization mutual information

ACC: accuracy rate

2/2

Semi-supervised Face Recognition - Dataset

CMU PIE face database

We divide the 68 subjects into four equal-size groups

Lighting:

Rotation:

Occlusion:

Profile:

Semi-supervised Face Recognition - Input

Four descriptors (kernels) DeLight: Based on the delighting algorithm by [Gross et al. 03]

LBP: A rotation-invariant operator [Ojala et al. 00]

RsLTS: Least trimmed squares with 20% outliers allowed

RsL2: Pixel intensities with Euclidean distance

Dimensionality reduction method Semi-supervised discriminant analysis (SDA) [Cai et al. 07]

Semi-supervised Face Recognition - Input

DeLight:

LBP:

Semi-supervised Face Recognition - Results

Nearest neighbor rule for classification

Outline

Introduction

Motivation

Proposed Approach


Conclusions

Conclusions

MKL-DR provides a unified and compact view of data with multiple feature representations Applied to a broad set of vision applications

A general framework for data analysis Adopt a graph-based dimensionality reduction method

Choose a proper set of features

Diverse objective functions of MKL Extend MKL to unsupervised and semi-supervised learning problems

Generalize many existing DR methods to consider multiple kernels

Thank You

Email: [email protected]

Yen-Yu Lin ()

Tel: (02) 2787-2392

Multiple Kernel Learning and Its Applications · Multiple Kernel Learning and Its Applications ......

Documents

Transcript of Multiple Kernel Learning and Its Applications · Multiple Kernel Learning and Its Applications ......