Multiple Kernel Learning and Its Applications · Multiple Kernel Learning and Its Applications ......

46
Multiple Kernel Learning and Its Applications Yen-Yu Lin (林彥宇) Research Center for Information Technology Innovation Academia Sinica

Transcript of Multiple Kernel Learning and Its Applications · Multiple Kernel Learning and Its Applications ......

  • Multiple Kernel Learning and Its Applications

    Yen-Yu Lin ()Research Center for Information Technology Innovation

    Academia Sinica

  • Outline

    Introduction

    Motivation

    Proposed Approach

    Experimental Results

    Conclusions

  • Computer Vision Researches

    Goal of computer vision: establish systems that can see, perceive, and interpret the world like humans

    Digit/Face recognion

    handwritten digit recognition

    face recognition

  • Face Detection

    4

    P

    R

    O

    B

    L

    E

    M

    S

  • Image Clustering/Retrieval

    5

    shapetext

    ure

    color

  • People Counting

    6

  • Object Recognition/Detection/Segmentation

    7

    Presence of bicycles? Yes

    Presence of horses? No

    Presence of persons? Yes

    Person: (185,62,279,199)

    Horse: (90,78,403,336)

    recognition detection segmentation

  • The Problem

    Many vision applications deal with data of multiple classes Supervised learning (with labeled training data)

    Object recognition and face detection

    Unsupervised learning (with unlabeled training data) Image clustering and data visualization

    Semi-supervised learning (with partially labeled training data)Retrieval with feedback and metric learning with side information

    Difficulties Diverse and broad data categories

    Large intraclass variations

  • Diverse & Broad Categories

    Caltech-101 database 101 object categories

    One additional category of background

  • Large Intraclass Variations

    Pascal VOC Challenge

    airplane

    dog

    bus

    boat

  • Observation and The Main Theme

    Feature representation Image descriptor

    Distance function

    No single feature representation suffices to explain the complexity of the whole data

    Improve performances of vision applications by using multiple feature representations

  • Difficulties in Feature Representation Combination

    Data have diverse forms Histogram

    Graph or constellation model

    Tensor

    Bag of feature vectors

  • Difficulties in Feature Representation Combination

    Data are of high dimensions

    Curse of dimensionality: for a given sample size, there is a maximum number of features above which the performance of the underlying predictor will degrade rather than improve

    Computational issues

    dimensionality

    performance

  • Outline

    Introduction

    Motivation

    Proposed Approach

    Experimental Results

    Conclusions

  • Motivation

    bag of features

    histogram

    2D matrix

    2. high dimensions1. diverse forms

    a unified space of lower dimensions

  • Kernel as the Unified Representation

    Multiple features result in diverse forms of representations Vector, histogram, bag-of-features, 2D matrix, high order tensor,

    We transform data under each feature representation to a kernel matrix by

    kinds of features will lead to kernels

  • Multiple Kernel Learning (MKL)

    MKL: learning a kernel machine with multiple kernels Introduced by [Cristianini et al. 01], [Lanckriet et al. 02], [Bach et al. 04]

    With data and base kernels , the learned model is of the form

    Task of MKL: optimize both and

  • MKL and Feature Fusion

    Represent data under each descriptor by a kernel matrix

    Feature fusion = Learning an ensemble kernel, Fusion in the domain of kernel matrices

    : the importance of descriptor

    : Gabor wavelet: color histogram

    m

    : SIFT

  • Motivation

    bag of features

    histogram

    2D matrix

    2. high dimensions1. diverse forms

    a unified space of lower dimensions

  • Which Dimensionality Reduction (DR) Method?

    Unsupervised DR methods PCA: principal component analysis

    LPP: locality preserving projections [He & Niyogi 03]

    Supervised DR methods LDA: linear discriminant analysis

    LDE: local discriminant embedding [Chen et al. 05]

    Semi-supervised DR method ARE: augmented relation embedding [Lin et al. 05]

    SDA: semi-supervised discriminant analysis [Cai et al. 07]

    Graph embedding: a unified view of many DR methods

  • Graph Embedding

    Graph embedding [Yan et al. 07]

    By specifying particular and , a set of DR methods can be expressed by graph embedding, including

    where

    supervised unsupervised semi-supervised

    Gaussian LDA PCA SDA

    AREmanifold LDE / MFA LPP

  • Outline

    Introduction

    Motivation

    Proposed Approach

    Experimental Results

    Conclusions

  • Main Idea

    diverseforms

    multiplekernellearning

    1/2

    highdimensions

    graphembedding

  • Main Idea

    high dimensions

    graph embedding

    diverseforms

    MKL-DR

    multiplekernellearning

    2/2

  • On Integrating MKL into Graph Embedding

    MKL-DR: integrate multiple kernel learning (MKL) into training process of dimensionality reduction (DR) methods

    1. The ensemble kernel is the linear combination of base ones

    2. Data are mapped to the induced RKHS

    3. Prove each projection vector lie in the span of mapped data

    4. Prove all the operations in graph embedding can be accomplished by kernel trick

  • Constrained Optimization for MKL-DR

    The resulting constrained optimization problem is

    where

  • Optimization of MKL-DR

    A alternating optimization procedure

    On optimizing by fixing :

    Optimal is obtained by solving a generalized eigenvalue problem

    On optimizing by fixing :

    Non-convex QCQP (quadratically constrained quadratic programming)

    SDP (semi-definite programming)-relaxation

  • Training Phase of MKL-DR

    Graph Laplacian:unified representation

    of DR methods

    Kernel Matrix:unified representation

    of descriptors

    optimize by SDP

    MKL-DR

    optimize by GEP

  • Testing Phase of MKL-DR

    Feature SpaceImage Descriptor

    RKHS RKHS Euclidean Space

  • Impacts

    From the perspective of DR methodsMany existing DR methods can consider multiple kernels (features)

    E.g., PCA --> Kernel PCA --> MKL PCA

    Systematic feature selection across different spaces

    From the perspective of MKL framework From hinge loss to diverse objective functions of DR methods

    E.g., maximizing the projected variances in PCA

    Extend MKL from supervised applications to unsupervised and semi-supervised ones

  • Outline

    Introduction

    Motivation

    Proposed Approach

    Experimental Results

    Conclusions

  • Supervised Object Categorization - Dataset

    Caltech-101 dataset

    Multi-class classification problem (102 classes)

  • Supervised Object Categorization - Input

    Ten kernels (descriptors) GB / GB-Dist: based on geometric blur descriptor

    SIFT-Dist / SIFT-SPM: based on SIFT descriptor

    SS-Dist / SS-SPM: based on self-similarity descriptor

    C2-SWP / C2-ML: based on biologically inspired features

    PHOG: based on PHOG descriptor

    GIST: based on GIST descriptor

    Dimensionality reduction method Local discriminant embedding (LDE) [Chen et al. 05]

  • Supervised Object Categorization - Results

    Nearest neighbor rule for classification

    1/2

  • Object Recognition - Results

    single feature

    multiple features

    1/2

  • Unsupervised Image Clustering - Dataset

    20 classes from Caltech-101 dataset

  • Unsupervised Image Clustering - Input

    Ten kernels GB / GB-Dist: based on geometric blur descriptor

    SIFT-Dist / SIFT-SPM: based on SIFT descriptor

    SS-Dist / SS-SPM: based on self-similarity descriptor

    C2-SWP / C2-ML: based on biologically inspired features

    PHOG: based on PHOG descriptor

    GIST: based on GIST descriptor

    Dimensionality reduction method Locality preserving projections (LPP) [He & Niyogi 03]

  • Unsupervised Image Clustering - Results

    2-D visualization of the projected space

    kernel LPP with kernel GB-Dist

    kernel LPP with kernel GIST

    MKL-LPP with all the ten kernels

    1/2

  • Unsupervised Image Clustering - Results

    Affinity propagation [Frey and Dueck 07]

    Performance evaluation [NMI / ACC %] NMI: normalization mutual information

    ACC: accuracy rate

    2/2

  • Semi-supervised Face Recognition - Dataset

    CMU PIE face database

    We divide the 68 subjects into four equal-size groups

    Lighting:

    Rotation:

    Occlusion:

    Profile:

  • Semi-supervised Face Recognition - Input

    Four descriptors (kernels) DeLight: Based on the delighting algorithm by [Gross et al. 03]

    LBP: A rotation-invariant operator [Ojala et al. 00]

    RsLTS: Least trimmed squares with 20% outliers allowed

    RsL2: Pixel intensities with Euclidean distance

    Dimensionality reduction method Semi-supervised discriminant analysis (SDA) [Cai et al. 07]

  • Semi-supervised Face Recognition - Input

    DeLight:

    LBP:

  • Semi-supervised Face Recognition - Results

    Nearest neighbor rule for classification

  • Outline

    Introduction

    Motivation

    Proposed Approach

    Experimental Results

    Conclusions

  • Conclusions

    MKL-DR provides a unified and compact view of data with multiple feature representations Applied to a broad set of vision applications

    A general framework for data analysis Adopt a graph-based dimensionality reduction method

    Choose a proper set of features

    Diverse objective functions of MKL Extend MKL to unsupervised and semi-supervised learning problems

    Generalize many existing DR methods to consider multiple kernels

  • Thank You

    Email: [email protected]

    Yen-Yu Lin ()

    Tel: (02) 2787-2392