Class distributions on SOM surfaces for feature extraction and object retrieval

26
1 Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and T echnology Class distributions on SOM surfaces for feature extraction and object retrieval Advisor : Dr. Hsu Graduate : Kuo-min Wang Authors : Jorma T. Laaksonen * , J. Markus Kos kela, Erkki Oja 2005 Expert Systems with Applications .

description

Class distributions on SOM surfaces for feature extraction and object retrieval. Advisor : Dr. Hsu Graduate : Kuo-min Wang Authors : Jorma T. Laaksonen * , J. Markus Koskela, Erkki Oja. 2005 Expert Systems with Applications. Outline. - PowerPoint PPT Presentation

Transcript of Class distributions on SOM surfaces for feature extraction and object retrieval

Page 1: Class distributions on SOM surfaces for feature extraction and  object retrieval

1Intelligent Database Systems Lab

國立雲林科技大學National Yunlin University of Science and Technology

Class distributions on SOM surfaces for feature extraction and

object retrieval

Advisor : Dr. Hsu

Graduate : Kuo-min Wang

Authors : Jorma T. Laaksonen*,

J. Markus Koskela,

Erkki Oja

2005 Expert Systems with Applications

.

Page 2: Class distributions on SOM surfaces for feature extraction and  object retrieval

2

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Outline Motivation Objective Introduction Class Distributions BMU Probabilities BMU Entropy SOM Surface Convolutions Multiple feature extraction Bayesian Decision estimation Personal Opinion

Page 3: Class distributions on SOM surfaces for feature extraction and  object retrieval

3

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Motivate A Self-Organizing Map (SOM) is typically

trained in unsupervised mode, using a large batch of training data.

Even from the same data, qualitatively different distributions can be obtained by using different feature extraction techniques

Page 4: Class distributions on SOM surfaces for feature extraction and  object retrieval

4

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Objective We use such distributions for comparing different cla

sses and different feature representations of the data in our content-based image retrieval system PicSOM.

The information-theoretic measures of entropy and mutual information are suggested to evaluate the compactness of a distribution and the independence of two distributions.

Page 5: Class distributions on SOM surfaces for feature extraction and  object retrieval

5

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Introduction 影像檢索( Image Retrieval)

segmentation, feature extraction, representation 及 query processing

Segmentation 將影像中不同的區域劃分出來,大多是時候是指者將影像中物件

的邊緣找出來,然後再確定這個區域是否是有意義的區域 Feature extraction

指一張影向上某一塊區域的特徵。 特徵的擷取跟特徵的表示方式

( Representation )有直接的關係,因為不同的表示法,就會需要不同的擷取法

顏色 (color) 、形狀 (shape) 、質地 (texture)

Page 6: Class distributions on SOM surfaces for feature extraction and  object retrieval

6

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

We study how object class histograms on SOMs can be given interpretations in terms of probability densities and information-theoretic measures Entropy and mutual information (Cover & Thomas, 1991)

A good feature the class is heavily concentrated on only a few nearby map ele

ments, giving a low value of entropy. The mutual information of two features’ distributions is a mea

sure on how independent those features are.

Introduction (cont.)

Page 7: Class distributions on SOM surfaces for feature extraction and  object retrieval

7

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Normalized to unit sum the hit frequency give a discrete histogram which is a sample

estimate of a probability distribution of the class on the SOM surface.

The shape of the distribution depends on several factors The distribution of the original data

Cannot to control the very-high-dimensional pattern space Feature extraction technique in use affects the metrics and the

distribution of all the generated feature vectors Feature invariance Some pattern space directions are retained better than others. Working properly, semantically similar patterns will be mapped nearer to

each other

Class Distributions

Page 8: Class distributions on SOM surfaces for feature extraction and  object retrieval

8

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Overall shape of the training set After it has been mapped from the original data space to the

feature vector space, determines the overall organization of the SOM.

The class distribution of the studied object subset or class, relative to the overall shape of the feature vector distribution.

Class Distributions (cont.)

Page 9: Class distributions on SOM surfaces for feature extraction and  object retrieval

9

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Class Distributions (cont.) Measures the denseness or locality of feature vectors on

a SOM SDH (Pampalk, Rauber, & Merkl, 2002)

Each data point is mapped not only to its nearest SOM unit but to s nearest units with reciprocally decreasing fractions.

Quantitative locality measures Map usage, average pair distance, fragmentation, and purity (Pullwitt , 2002) They fail to take into account the topological structure of the class

Page 10: Class distributions on SOM surfaces for feature extraction and  object retrieval

10

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

BMU Probabilities Calculating the a priori probability of each SOM unit

for being the BMU for any vector x of the feature space.

Probability density function (pdf)

Page 11: Class distributions on SOM surfaces for feature extraction and  object retrieval

11

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

BMU Probabilities (cont.) Voronoi region

The set of vectors in the original feature space that the closer to the weight vector of unit i than to any other weight vector

We are actually replacing the continuous pdf with a discrete probability histogram by counting the number of times that any given map unit is the BMU.

The probability histogram of class C on the SOM surface

Page 12: Class distributions on SOM surfaces for feature extraction and  object retrieval

12

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

BMU Entropy

The entropy H of a distribution P=(P0,P1,…Pk-1) is calculated as

1

2

s4

2 12

s

s

33

s4

s

Page 13: Class distributions on SOM surfaces for feature extraction and  object retrieval

13

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

BMU Entropy (cont.)

Page 14: Class distributions on SOM surfaces for feature extraction and  object retrieval

14

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

SOM Surface Convolutions Entropy Drawback

The calculation of entropies does not yet take into account the spatial topology of the SOM units in any way.

It is the topological order of the units that separates SOM from other vector quantization methods.

That method bears similarity to the smoothed data histogram approach (Pampalk, 2002) data points are not mapped one-to-one to their BMUs but spread i

nto s closet map units in the feature space.

Page 15: Class distributions on SOM surfaces for feature extraction and  object retrieval

15

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

SOM Surface Convolutions (cont.)

Page 16: Class distributions on SOM surfaces for feature extraction and  object retrieval

16

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

SOM Surface Convolutions (cont.) The larger the convolution window is , the smoother

is the overall shape of the distribution due to the vanishing of the details.

The selection of a proper size for the convolution mask can be identified as a form of the general scale-space problem

Page 17: Class distributions on SOM surfaces for feature extraction and  object retrieval

17

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

SOM Surface Convolutions (cont.)

Page 18: Class distributions on SOM surfaces for feature extraction and  object retrieval

18

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Multiple Feature Extractions It is possible to use more than one feature extraction method in

parallel. In CBIR, three different feature categories are generally

recognized: color, texture, and shape features. Let us denote by P=(P0, P1, …, Pk-1) and Q=(Q0, Q1,…, Qk-1) H(P) and H(Q) measure the distributions of the single feature

vectors, mutual information I(P, Q) can be used for studying the interplay between them

Page 19: Class distributions on SOM surfaces for feature extraction and  object retrieval

19

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Multiple Feature Extractions (cont.) HT and nHT have by far the largest values for mutual

information CS and SC have the largest value on both SOMs EH and HT is high on the smaller SOM, but not so m

uch on the larger SOM with more resolution.

Page 20: Class distributions on SOM surfaces for feature extraction and  object retrieval

20

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Bayesian Decision Estimation Using the Bayesian decision rule to make optimal cla

ssification Posterior probability

To decide on the jth object’s membership in class C

Page 21: Class distributions on SOM surfaces for feature extraction and  object retrieval

21

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Bayesian Decision Estimation (cont.) Query by example

Presents a number of images to the user at each query round, and the user is expected to evaluate their relevance to her current task.

Relevance feedback Incrementally fine-tune the selection so that more and more relev

ant images will be shown at consequtive query rounds

Choose next image for the user Maximal probability of relevance Minimal probability of nonrelevance

Page 22: Class distributions on SOM surfaces for feature extraction and  object retrieval

22

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Bayesian Decision Estimation (cont.)

Page 23: Class distributions on SOM surfaces for feature extraction and  object retrieval

23

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Bayesian Decision Estimation (cont.) Relevance feedback problem

By adding the hit caused by the new relevant and nonrelevant samples to the map units,

convolving them with the mask used, And renormalizing the distributions to unit sums

Let us denote the history of the query up to the t – 1’th round by

Maximize the current probability of relevance

),,...,,,,( 1111001 ttt RDRDRDH

)|( 1 trel HxxP

Page 24: Class distributions on SOM surfaces for feature extraction and  object retrieval

24

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Bayesian Decision Estimation (cont.)

Page 25: Class distributions on SOM surfaces for feature extraction and  object retrieval

25

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Conclusions The entropy of the distribution characterizes quantitatively the

compactness of an object class. Proposed method can be used as an efficient way of comparin

g these features and the SOMs produced with them. We showed that the mutual information of the distributions

could be used to identify both the most similar and the most uncorrelated of features

can also be used to select the subset of the feature extraction methods with the most independent features.

Bayesian decision used for choosing either the most probable class for a data item, or the

most likely data item belonging to a given class.

Page 26: Class distributions on SOM surfaces for feature extraction and  object retrieval

26

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Personal Opinions

Advantage Combined entropy & mutual information & smooth

method to find the important feature and independent of features.

Application Feature extraction

Drawback The structure of this paper is not good, Some diagram is not clear, so difficult to understand