PROBABILISTIC DISTANCE MEASURES FOR PROTOTYPE-BASED RULES

PROBABILISTIC DISTANCE PROBABILISTIC DISTANCE MEASURES FOR MEASURES FOR

PROTOTYPE-BASED RULESPROTOTYPE-BASED RULES

Włodzisław Duch

Department of Informatics, Nicolaus Copernicus University, Poland,

School of Computer Engineering, Nanyang Technological UniversitySingapore.

Marcin Blachnik, Tadeusz Wieczorek

Department of Electrotechnology

Faculty of Materials Engineering & Metallurgy, The Silesian University of Technology, Poland

ICONIP 2005 Taiwan

OutlineOutline

Type of rules What are prototype rules? Heterogeneous distance function Probability density function (PDF) estimation Results Conclusions

ICONIP 2005 Taiwan

Types of rulesTypes of rules

Crisp logical rules.

Rough sets and logic.

Fuzzy rules (F-rules).

Prototype rules (P-rules) – most general?

P-rules with additive similarity functions may be converted into the

neurofuzzy rules with “natural” membership functions, including

nominal features.

P-rules do not need the feature space.

There are many neurofuzzy programs, but no P-rules so far.

ICONIP 2005 Taiwan

MotivationMotivation

Understanding data, situations, recognizing objects or making diagnosis people frequently use similarity to known cases, and rarely use logical reasoning, but soft computing experts use logic instead of similarity ...

Relations between similarity and logic are not clear. Q1: How to obtain the same decision borders in Fuzzy Logic

systems and Prototype Rule Based systems? Q2: What type of similarity measure corresponds to a typical fuzzy

functions and vice versa? Q3: How to transform one type of a system into another type

preserving their decision borders? Q4: Are there any advantages of such transformations?

Q5: Can we understand data better using prototypes instead of logical rules?

ICONIP 2005 Taiwan

-4 -3 -2 -1 0 1 2 3 4 5-5

-4

-3

-2

-1

0

1

2

3

4

ExampleExample

-4 -3 -2 -1 0 1 2 3 4 5-5

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4 5-5

-4

-3

-2

-1

0

1

2

3

4

-4 -3 -2 -1 0 1 2 3 4

-4

-3

-2

-1

0

1

2

3

4

ICONIP 2005 Taiwan

Prototype rules - advantagesPrototype rules - advantages

Inspired by cognitive psychology: understanding data, situations, recognizing objects or making diagnosis people frequently use similarity to known cases, and rarely use logical reasoning.

With Heterogeneous Distance Functions P-rules supports all types of attributes: continues, discrete, symbolic and nominal, while F-rules require numerical inputs.

Locally linear decision borders to avoid overfitting.

Many algorithms for prototype selection and optimization exist but they have not been applied to understand data.

Applications of P-rules to real datasets give excellent results generating small number of prototypes.

ICONIP 2005 Taiwan

Prototype rules - learningPrototype rules - learning

Learning process involves: select similarity or dissimilarity (distance) functions model optimization: the number and positions of prototypesDecision making task consist of: calculating distance (similarity) to each prototype assigning P-rule to calculate the output class as a rule

Nearest Neighbour rule:

If P=argminp’(D(X,P’)) Then Class(X)=Class(P)

Threshold rule:

If D(X,P)≤dp Then Class(X)=Class(P)

Taking D(X,P) - Chebychev distance crisp logic rules are obtained

ICONIP 2005 Taiwan

Applications to real data (ICONIP’2004)Applications to real data (ICONIP’2004)

Gene expression data for 2 types of leukaemia (Golub et al, Science 286 (1999) 531-537

Description: 2 classes, 1100 features, 3 most relevant selected.Used methods: 1 prototype/class LVQ, DVDM similarity measure.Results (number of misclassified vectors):

Data Set Golub et al P-rules

Train 3 0

Test 5 3

Searching for Promoters in DNA stringsDescription: 2 classes, 57 features, all symbolic features. Used methods: 9 prototypes for promoters, 12 for nonpromoters, generated using C-means + LVQ, with VDM similarity measure. Results: 5 misclassified vectors in leave one out test.

ICONIP 2005 Taiwan

Distance (similarity) functionsDistance (similarity) functions

Continuous attributes

yxyxdmin ),(

Probabilistic Metrics

qK

iii

qVDM yCPxCPyxd

1

||,

N – number of attributes

K – number of classes

Input vectors

X=[x1, x2, … , xN]T

Y=[y1, y2, … , yN]T

q – exponent value

P(Ci|x) - posterior probab. for symbolic features, estimated as P(Ci|x)=ni /n

qK

iii

qMRM yCPxCPyxd

1

|1|,

qK

iiii

qSFM yCPxCPxCPyxd

1

|||,

ICONIP 2005 Taiwan

Heterogeneous distance functionHeterogeneous distance function

Combine contributions from symbolic and real-valued features to get the distance.

1 Prob

, , if is continuous

, , if is nominal

qN

q min i i

qi i i

d x y iD

d x y i

X,Y

Prob1

,N

q q

i ii

D d x y

X,Y

or use only probabilistic measures

ICONIP 2005 Taiwan

Probability density function estimationProbability density function estimation

Problem: how to combine influence of nominal/symbolic?1. Normalization – continuous symbolic2. Estimation – continuous attributes => prob.

If estimation, then several options to get probabilities: Discretization (DVDM) Discretization + Interpolation (IVDM) Gaussian kernel estimation (GVDM) Rectangular Parzen window (LVDM) Rectangular moving Parzen window (PVDM)

ICONIP 2005 Taiwan

0.6 0.8 1 1.2 1.4 1.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.6 0.8 1 1.2 1.4 1.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.6 0.8 1 1.2 1.4 1.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.6 0.8 1 1.2 1.4 1.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.6 0.8 1 1.2 1.4 1.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Discretization

Discretization & Interpolation

Moving Parzen windows.

Gaussian kernel

Rect. Parzen window

3 overlapping Gaussians in 4D, good parameters for estimation.

ICONIP 2005 Taiwan

0.6 0.8 1 1.2 1.4 1.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.6 0.8 1 1.2 1.4 1.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.6 0.8 1 1.2 1.4 1.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.6 0.8 1 1.2 1.4 1.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.6 0.8 1 1.2 1.4 1.60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Discretization

Discretization & Interpolation

Gaussian kernel

Rect. Parzen windowMoving Parzen wind.

3 overlapping Gaussians in 4D, bad parameters for estimation.

ICONIP 2005 Taiwan

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0.8 1 1.2 1.4 1.6 1.8 2

0.8

1

1.2

1.4

1.6

1.8

2

Testing and comparison procedureTesting and comparison procedure

6 real datasets with mixes symbolic/real features. Flags (UCI repository) Glass (UCI repository) Promoters (UCI repository) Wisconsin Brest Cancer, WBC (UCI repository) Pima Indians diabetes (UCI repository) Lancet (from A.J. Walker, S.S. Cross, R.F. Harrison, Visualization of biomedical

datasets by use of growing cell structure networks: a novel diagnostic classification technique. Lancet Vol. 354, pp. 1518-1522, 1999.)

For all tasks 10 fold CV test procedure is used.

Two artificial datasets for testing, 2D200 vectors/class uniform distribution 200 vectors/class normal distribution

ICONIP 2005 Taiwan

Classification resultsClassification results Results on artificial datasets.

Left: Gaussian distributed.Right: uniform distributed.

Similar results, except for convergence problems.

Datasets with all symbolic or discrete values.

leave-one-out results.

ICONIP 2005 Taiwan

Real datasetsReal datasets

ICONIP 2005 Taiwan

Results & discussionResults & discussion Selection of appropriate parameters is very important. Incorrect values if one uses:

too small sigma (Gaussian Estimation); too narrow window (Rectangular Parzen Window estimations) too many bins in discretization.

Increased sensitivity of estimation methods => overfitting if too high sigma (Gaussian Estimation); too wide window (Rectangular Parzen Window estimations) Low number of bins in discretization.Decreased sensitivity of estimation methods leading to over-generalization.

Middle values of parameters are best start points leading to good results (0.5, Parzen width0.5, Parzen step 0.01)

ICONIP 2005 Taiwan

Some conclusionsSome conclusions First step in understanding relations between fuzzy and similarity-

based systems. Prototype rules can be expressed using fuzzy rules and vice versa

leading to new possibilities in both fields: new type of membership functions & new type of distance functions.

Expert knowledge can be captured in any kind of rules, but sometimes it may be more natural to express knowledge as P-rules (similarity) or as F-rules (logical conditions).

VDM measure used in P-rules leads to a natural shape of membership functions in fuzzy logic for symbolic data.

There is no best choice of heterogeneous distance function type or PDF estimation method or probability metrics.

Simplest methods may lead to good results. Selection of appropriate parameters is very important. P-systems should be as popular as neurofuzzy systems, although

many open problems still remain, both theoretical and practical.

Thank youfor lending your ears ...

PROBABILISTIC DISTANCE MEASURES FOR PROTOTYPE-BASED RULES

Documents

Transcript of PROBABILISTIC DISTANCE MEASURES FOR PROTOTYPE-BASED RULES