IJCAI-2011 1 Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1...

15
Multi-Kernel Multi-Label Learning with Max-Margin Concept Network IJCAI-2011 1 Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan , 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh, xyxue}@fudan.edu.cn

Transcript of IJCAI-2011 1 Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1...

Page 1: IJCAI-2011 1 Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh, xyxue}@fudan.edu.cn.

Multi-Kernel Multi-Label Learning with Max-Margin Concept

Network

IJCAI-2011

1Wei Zhang, 1Xiangyang Xue, 2Jianping Fan ,

1Xiaojing Huang, 1Bin Wu, 1Mingjie Liu

1Fudan University, China; 2UNCC, USA

{weizh, xyxue}@fudan.edu.cn

Page 2: IJCAI-2011 1 Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh, xyxue}@fudan.edu.cn.

Motivation Overview Concept Network Construction The Proposed Model Multi-Kernel Multi-Label Learning Experiments Conclusions

Content

Page 3: IJCAI-2011 1 Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh, xyxue}@fudan.edu.cn.

Semantics richness requires multiple labels for sufficient data semantic description, so multi-label is necessary.

When multiple labels are available for a single sample, there can be strong inter-label correlations.

Similarity diversity cannot be characterized effectively by one single kernel, so multi-kernel is necessary.

Motivation

Page 4: IJCAI-2011 1 Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh, xyxue}@fudan.edu.cn.

Inter-label dependency and similarity diversity are leveraged simultaneously in the proposed method.

A concept network is constructed to capture inter-label correlations for classifier training.

Maximal margin approach is used to effectively formulate the feature-label associations and the label-label correlations.

Specific kernels are learned not only for each label but also for each pair of the inter-related labels.

Overview

Page 5: IJCAI-2011 1 Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh, xyxue}@fudan.edu.cn.

A concept network is constructed to characterize the inter-label correlations and to learn the inter-related classifiers. ◦ Each concept corresponds to one certain node

in concept network.◦ If two concepts are inter-related, there is an

edge between the corresponding two nodes.

Empirical conditional probabilities:

If then

Concept Network Construction

Page 6: IJCAI-2011 1 Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh, xyxue}@fudan.edu.cn.

Our model captures the feature-concept associations and the inter-concept correlations in a unified framework:

◦ are functions mapping sample features x to kernel spaces with respective to the node and the edge, respectively.

◦ ,

The Proposed Model

Page 7: IJCAI-2011 1 Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh, xyxue}@fudan.edu.cn.

By considering both site and edge potentials in a unified framework, we sufficiently leverage the associations between features and labels, and the correlations among labels and their dependence on the features.

To learn the proposed model, the objective function is defined as:

◦ where

◦ and constraints !

Max-Margin Method for Model Learning

Page 8: IJCAI-2011 1 Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh, xyxue}@fudan.edu.cn.

we factor the proposed global model formulation as the sum of local models:

where

our optimization can be approximately decoupled into c interdependent sub-problems:

Learning Interdependent Classifiers

Page 9: IJCAI-2011 1 Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh, xyxue}@fudan.edu.cn.

The dual of the optimization problem is as follows:

where

We would employ multi-kernel technique to implement both the concept specific and the pairwise concept specific feature mappings such that similarity diversity can be effectively characterized.

Similarity Diversity by Multi-kernel

Page 10: IJCAI-2011 1 Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh, xyxue}@fudan.edu.cn.

We first define an original kernel regardless of label information using Gaussian kernel, and decompose the Gram kernel matrix by spectral decomposition:

To incorporate the label information, we learn the concept-specific kernel matrix for each label by maximizing the similarities between data with the same label:

To sufficiently leverage the correlations among the concepts and their dependence on the input features, the pairwise label specific kernel matrix can be learned by:

Both the concept-specific kernel matrix and the pairwise label specific kernel matrix share the common basis as the original kernel K:

Multi-kernel Learning

Page 11: IJCAI-2011 1 Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh, xyxue}@fudan.edu.cn.

For any new image, the inference problem is to find the optimal label configuration

The size of multi-label space is exponential to the number of classes, and it is intractable to enumerate all possible label configurations to find the best one.

We employ an approximate inference technique (ICM):i) Initialize a multi-label configurationii) In each iteration, given , we sequentially update using

the local model: If > then ; otherwise

Model Inference

Page 12: IJCAI-2011 1 Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh, xyxue}@fudan.edu.cn.

We compare our method in web page classification

with the state-of-the-art methods: RML [Petterson and Caetano,2010]; ML-KNN [Zhang and Zhou, 2007];

Tang’s method [Tang et al., 2009]; and RankSVM [Elisseeff and Weston, 2002].

Experiments

Page 13: IJCAI-2011 1 Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh, xyxue}@fudan.edu.cn.

We consider other real applications in experiments:

image annotation, music emotion tagging, and gene

categorization.

Experiments

Page 14: IJCAI-2011 1 Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh, xyxue}@fudan.edu.cn.

A concept network is constructed for characterizing the inter-label correlations effectively.

Maximal margin technique effectively captures the feature-label associations and the label-label correlations.

By decoupling the multi-label learning task into inter-dependant sub-problems label by label, the proposed method learns multiple interrelated classifiers jointly.

Specific kernels not only for each label but also for each pair of inter-related labels are learned to embed the label information and the inter-label correlations.

Conclusions: Inter-label dependency and similarity diversity are simultaneously leveraged in multi-kernel multi-label learning.

Page 15: IJCAI-2011 1 Wei Zhang, 1 Xiangyang Xue, 2 Jianping Fan, 1 Xiaojing Huang, 1 Bin Wu, 1 Mingjie Liu 1 Fudan University, China; 2 UNCC, USA {weizh, xyxue}@fudan.edu.cn.

Thanks a lot!

Q & A ?