Non-Euclidean Problems in Human Centered Information...

Post on 04-Oct-2020

4 views 0 download

Transcript of Non-Euclidean Problems in Human Centered Information...

7/2/07 R.P.W. Duin 1

Non-Euclidean Problems inHuman Centered Information Processing

Robert P.W. Duin, Delft University of Technology

(in cooperation with El bieta P kalska)

London, 7 February 2007

r.p.w.duin@tudelft.nl, http://ict.ewi.tudelft.nl/~duin/

z• ce

7/2/07 R.P.W. Duin 2

The Problem

Possible (initial) solutions

Further analysis

Conclusion

Contents

The Problem

7/2/07 R.P.W. Duin 4

Measuring human relevant information

7/2/07 R.P.W. Duin 5

Measuring human relevant information

7/2/07 R.P.W. Duin 6

Measuring human relevant information

The nearest neighbors sorted:

Euclidean distances used

7/2/07 R.P.W. Duin 7

Spatial connectivity is lost

x1 x2 x3

x1

x2

x3

Dependent (connected) measurements are represented independently,The dependency has to be refound from the data

The Connectivity Problem in the Pixel Representation

7/2/07 R.P.W. Duin 8

Feature space

Training set

Test object

Spatial connectivity is lost

The Connectivity Problem in the Pixel Representation

7/2/07 R.P.W. Duin 9

ReshufflePixels

Feature space

Reshuffling pixels will not change the classification

Training set

Test object

Spatial connectivity is lost

The Connectivity Problem in the Pixel Representation

7/2/07 R.P.W. Duin 10

ReshufflePixels

Feature space

Reshuffling pixels will not change the classification

Training set

Test object

Spatial connectivity is lost

The Connectivity Problem in the Pixel Representation

7/2/07 R.P.W. Duin 11

Interpolation does not yield valid objects

Feature Space

image_1 image_2 image_3

image_1 image_3image_2

class subspace

The Connectivity Problem in the Pixel Representation

Another Metric is Needed

Another Metric is Needed

Learn from Human Observed Similarities

7/2/07 R.P.W. Duin 14

World

7/2/07 R.P.W. Duin 15

World

sensor

7/2/07 R.P.W. Duin 16

World

representation

feature 1

feature 2

sensor

7/2/07 R.P.W. Duin 17

World

human expert

representation

feature 1

feature 2

sensor

7/2/07 R.P.W. Duin 18

Examples

Image database retrieval

Man-machine interaction

Consumer preferesences

Computer vision

Pattern recognition

Machine learning

World

human expert

representation in a non-Euclidean (non-metric) space

feature 1

feature 2

sensor

7/2/07 R.P.W. Duin 19

Deformable Templates

A.K. Jain, D. Zongker, Representation and recognition of handwritten digit using deformable templates,

IEEE-PAMI, vol. 19, no. 12, 1997, 1386-1391.

Matching new objects x to various templates y

class x( ) class minarg y D x y,( )( )( )=

Examples of deformed templates

Dissimilarity measure appears to be non-metric

The Problem

Non Metric Data

Possible Solution

A New Representation

7/2/07 R.P.W. Duin 22

The Pattern Recognition System

(area)

(perimeter) x1

x2

Class A

Class B

Objects

Training Set GeneralizationRepresentation

Feature SpaceClassifier

Test Object classified as ’B’

7/2/07 R.P.W. Duin 23

Dissimilarity Based Representation

d2

d1

dissimilarity space

d1

d2

7/2/07 R.P.W. Duin 24

1. Positivity: dij ≥ 0 2. Reflexivity: dii = 0 3. Definiteness: dij = 0 objects i and j are identical 4. Symmetry: dij = dji 5. Triangle inequality: dij < dik + dkj 6. Compactness: if the objects i and j are very similar then dij < δ. 7. True representation: if dij < δ then the objects i and j are very similar. 8. Continuity of d

Met

ricDissimilarity - Assumptions

7/2/07 R.P.W. Duin 25

A B

X

Given labeled training set T

Unlabeled object x to be classified

The traditional Nearest Neighbour rule (template matching) just finds: label(argmintrainset(di)), without using DT. Can we do any better?

dx = (d1 d2 d3 d4 d5 d6 d7)

DT

d11d12d13d14d15d16d17

d21d22d23d24d25d26d27

d31d32d33d34d35d36d37

d41d42d43d44d45d46d47

d51d52d53d54d55d56d57

d61d62d63d64d65d66d67

d71d72d73d74d75d76d77

=

Define dissimilarity measure dij between raw data of objects i and j

Dissimilarity Representation

7/2/07 R.P.W. Duin 26

features

objects

Consider dissimilarities as ’features’

⇒ n objects given by n features ⇒ overtrained

⇒ select ’features’, i.e. representation objects,

by

- regularisation

- systematic selection

- random selection

Approach 1: Dissimilarity Space

DT

d11d12d13d14d15d16d17

d21d22d23d24d25d26d27

d31d32d33d34d35d36d37

d41d42d43d44d45d46d47

d51d52d53d54d55d56d57

d61d62d63d64d65d66d67

d71d72d73d74d75d76d77

=

7/2/07 R.P.W. Duin 27

dx = ( d1 d2 d3 d4 d5 d6 d7)

DT

d11d12d13d14d15d16d17

d21d22d23d24d25d26d27

d31d32d33d34d35d36d37

d41d42d43d44d45d46d47

d51d52d53d54d55d56d57

d61d62d63d64d65d66d67

d71d72d73d74d75d76d77

=

r1 r2 r3

r1(d2)

r3(d7)

R3

Dissimilarities

Selection of 3 objects for representation Classification in a 3D dissimilarity space

A B

X

Given labeled training set T

Unlabeled object x to be classified

r2(d4)

Approach 2: Dissimilarity Space

7/2/07 R.P.W. Duin 28

Examples of the raw data

Example Dissimilarity Space: NIST Digits 3 and 8

7/2/07 R.P.W. Duin 29

d10

d300

NIST digits: Hamming distances of 2 x 200 digits

Example of Dissimilarity Space

7/2/07 R.P.W. Duin 30

0 20 40 60 80 1000.1

0.14

0.18

0.22

0.26

0.3

TRAINING size per class

Cla

ssifi

catio

n er

ror

Modified−Hausodorff dist. on contour digit

10

CNN selection 45 90

31

30

* Nearest neighbour results

Fisher LD

20 Size of the representation set

Dissimilarity based Classification outperforms Nearest neighbour Rule

Dissimilarity Space Classification ⇔ Nearest Neighbour rule

7/2/07 R.P.W. Duin 31

A B

Training set

Is there a feature space X for which Dist(X,X) = D?

Rk

x1

x2

Euclidean distances D

Dissimilarity matrix D X?

Approach 2: Embedding the Dissimilarity Representation

7/2/07 R.P.W. Duin 32

−300 −200 −100 0 100 200 300 400 500

−400

−300

−200

−100

0

100

200

ALEXANDRA

BALCLUTHA

BLENHEIM

CHRISTCHURCH

DUNEDIN FRANZ JOSEF

GREYMOUTH

INVERCARGILL

MILFORD

NELSON

QUEENSTOWN

TE ANAU

TIMARU

13 x 13Distance Matrix

Euclidean Embedding: Multidimensional Scaling

7/2/07 R.P.W. Duin 33

A B

Training set

Dissimilarity matrix Ddij

dkjdik

i

k

j

or if sometimes dij > dik + dkj : triangle inequality not satisfied.

embedding in Euclidean space not possible → Pseudo-Euclidean embedding

If the dissimilarity matrix cannot be explained from a vector space,

e.g. Hausdorff and Hamming distance.

(Linear) Euclidean Embedding

7/2/07 R.P.W. Duin 34

Metric non-Euclidean Example

B

C

A

D10

10

10

5.1

5.1

5.1

Triangle inequality is satisfied, but Euclidean embedding is impossible.

Examples: Hausdorff, L1-norm

7/2/07 R.P.W. Duin 35

Non-Metric Distances

14.9

7.8 4.1

object 78

object 419

object 425

D(A,C)A

B

C

D(A,C) > D(A,B) + D(B,C)

D(A,B) D(B,C)

A BC

J(A,C) = 0; J(A,B) = large; J(C,B) = small ≠ J(A,B)

Weighted edit-distance for strings Single Linkage Clustering

µA µB–

x

σA σB

The Fisher Criterion

Bunke’s Chicken Dataset

J A B,( )µA µB– 2

σA2 σB

2+-------------------------=

7/2/07 R.P.W. Duin 36

(Pseudo-)Euclidean Embedding

m×m D is a given, imperfect dissimilarity matrix of training objects.

Construct inner-product matrix: ,

Eigenvalue Decomposition ,

Select k eigenvectors: (problem: < 0)

Let ℑk be a k x k diag. matrix, ℑk(i,i) = sign(Λk(i,i))

Λk(i,i) < 0 → Pseudo-Euclidean

m×n Dz is the dissimilarity matrix between new objects and the training set.

The inner-product matrix:

The embedded objects:

B 12---JD 2( )J–= J I 1

m-----11T–=

B QΛQT=( )p

X QkΛk

12---

= Λk

X Qk Λk

12---

ℑk=

Bz12--- Dz

2( )J 1n---11TD 2( )J–( )–=

Z BzQk Λk

12---

ℑk=

7/2/07 R.P.W. Duin 37

If D is non-Euclidean, B has p positive and q negative eigenvalues.

Solutions:• Remove all eigenvectors with small and negative eigenvalues

• Take absolute values of eigenvalues and proceed

• Construct a pseudo-Euclidean space

Pseudo-Euclidean Embedding

0 200 400 600 800 1000−50

0

50

100

150

200Eigenvalues

7/2/07 R.P.W. Duin 38

Pseudo-Euclidean Space (Krein Space)If D is non-Euclidean, B has p positive and q negative eigenvalues.A pseudo-Euclidean space ε with signature (p,q), k =p+q, is a non-degenerate inner product space ℜk = ℜp ⊕ ℜq such that:

, x y,⟨ ⟩ε xTℑpqy xii 1=

p

∑ yii p 1+=

q

∑–= = ℑpqIp p× 0

0 Iq q×–=

dε2 x y,( ) x y– x y–,⟨ ⟩ε dp

2 x y,( ) dq2 x y,( )–= =

11w=J v

11v J x = 0T

<x,x> > 0ε

<x,x> < 0ε R1

R1

i

<x,x> > 0ε

<x,x> < 0ε

F

D

A

CE

B

−4

−3

−2

−1

0

1

2

3

4

−4 −3 −2 −1 0 1 2 3 4

v

Rq

R p

���������������

���������������

��������������������

��������������������

���������������

���������������

����������

����������

���������������

���������������

��������������������������������������������������������

��������������������������������������������������������

2 2d (x,y) = d (x,y) − d (x,y) p q 2

7/2/07 R.P.W. Duin 39

Dissimilarity Representation ⇔ Kernels

Let R = {x1, x2, ... , xn}

Kernel classifier:

Linear classifier in a dissimilarity space:

If D has an Euclidean behavior, then

or

are Mercer kernels: SVM can be directly applied, Otherwise:

• Transform D appropriately or regularize K

• Treat K as kernels in pseudo-Euclidean space: indefinite SVM

f x( ) wTK x R,( ) w0+ wiK x xi,( )wi SV∈

∑ w0+= =

f D x R,( )( ) wTD x R,( ) w0+=

K 12---JD*2J–= K D*2 σ2⁄–{ }exp=

7/2/07 R.P.W. Duin 40

1. Nearest neighbour rule

2. Reduce training set to representation set

⇒ dissimilarity space

3. Embedding:Select large Λii > 0 ⇒ Euclidean space }discriminant function

Select large |Λii| > 0 → pseudo-Euclidean space

A BTraining set Dissimilarity matrix D

XTest object Dissimilarities dx with training set

Dissimilarity Based Classification

7/2/07 R.P.W. Duin 41

Dissimilarity Space equivalent to Embedding better than Nearest Neighbour Rule

Three Approaches Compared for the Zongker Data

0 500 1000 15000

0.1

0.2

Size of the representation set R

Ave

rage

d ge

nera

lizat

ion

erro

r

Digit data

RLDC; Rep. SetLP; Rep. SetRLDC; Embed.1−NN3−NN

Nearest neighbour Rule

Dissimilarity Space

Embedding

7/2/07 R.P.W. Duin 42

Convex

Heptagons:

Pentagons:

Minimum edge length: 0.1 of maximum edge length

Distance measures: Hausdorff D = max{ maxi(minj(dij)) , maxj(mini(dij)) }.

dij = distance between vertex i of polygon_1 and vertex j of polygon_2.Polygons are scaled and centered.

D

Modified Hausdorff D = max{meani(minj(dij)) , meanj(mini(dij)) }. (no metric!)

find the largest from the smallest vertex distance

no class overlapzero error

Example 2: Polygons

7/2/07 R.P.W. Duin 43

0 500 1000 15000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Size of the representation set R

Ave

rage

d ge

nera

lizat

ion

erro

r

Polygon data

RLDC; Rep. SetLP; Rep. SetLDC; Embed.1−NN3−NN

Nearest neighbour Rule

Dissimilarity Space

Embedding

Zero error difficult to reach

Dissimilarity Based Classification of Polygons

7/2/07 R.P.W. Duin 44

4 6 8 10 14 20 30 40 55 70 100 140 2000

1

2

3

4

5

6

7

8

9

Number of prototypes

Ave

rage

cla

ssifi

catio

n er

ror

(in %

)Polydistm; #Objects: 1000; Classifier: BayesNQ

1−NN−finalk−NN−finalk−NNk−NN−DSEdiCon−1−NNRandom *RandomC *ModeSeek *KCentres *FeatSel *KCentres−LP *LinProg *EdiCon *

Prototype Selection: Polygon Dataset

The classification performance of the quadratic Bayes Normal classifier and the k-NN in dissimilarity spaces and the direct k-NN, as a function of the number of selected prototypes. Note that for 10-20 prototypes already better results are obtained than by using 1000 objects in the NN rules.

Possible Solution

The Dissimilarity Representation

Possible Solution

The Dissimilarity Representation

E. Pekalska et al., The Dissimilarity Representation for Pattern Recognition,

Foundations and Applications,

World Scientific, Singapore, 2005, ISBN 981-256-530-2

An Analysis

Causes of Non-Metric Data

7/2/07 R.P.W. Duin 48

Non-Metric Measurements

14.9

7.8 4.1

object 78

object 419

object 425

Weighted edit-distance for strings

Bunke’s Chicken Dataset

7/2/07 R.P.W. Duin 49

Representation by Orders Sets (Strings); Edit Distance

X = (x1, x2, .... , xk) Y = (y1, y2, .... , yn)

DE(X,Y) : Σ edit operations X ⇒ Y(insertions, deletions, substitutions)

DE(snert ,meer ) = 3:snert ⇒ seert ⇒ seer ⇒ meer

DE( ner ,meer ) = 2:ner ⇒ mer ⇒ meer

b

a

u

v

u

b

u

a u a v v b

Possibly weighted.

Triangle inequality ⇒ computational feasible.

Length normalisation problem: D(aa,bb) < D(abcdef,bcdd)

See Marzal & Vidal, IEEE PAMI-15, 1993, 926-932

7/2/07 R.P.W. Duin 50

Better Measurements are Sometimes more Non-Euclidean

E.Pekalska et al., Non-Euclidean or non-metric measures can be informative, SSSPR 2006, Springer, LNCS 4109.

Clas

sific

atio

n Er

ror

Non

-Euc

lidea

ness

7/2/07 R.P.W. Duin 51

VJ

1800:

Crossing the Jostedalsbreen was impossible.

7/2/07 R.P.W. Duin 52

VJ

1800:

Crossing the Jostedalsbreen was impossible.

Travelling around (200 km) lasted 5 days.

7/2/07 R.P.W. Duin 53

VJX

1800:

Crossing the Jostedalsbreen was impossible.

Travelling around (200 km) lasted 5 days.

Unitll the shared point X was found.

7/2/07 R.P.W. Duin 54

VJX

1800:

Crossing the Jostedalsbreen was impossible.

Travelling around (200 km) lasted 5 days.

Unitll the shared point X was found.

People could visit each other in 8 hours.

7/2/07 R.P.W. Duin 55

VJX

1800:

Crossing the Jostedalsbreen was impossible.

Travelling around (200 km) lasted 5 days.

Unitll the shared point X was found.

People could visit each other in 8 hours.

D(V,J) = 5 days

D(V,X) = 4 hours

D(X,J) = 4 hours} Non-Metric

7/2/07 R.P.W. Duin 56

Cause of non-metric data - 1

14.9

7.8 4.1

object 78

object 419

object 425

Weighted edit-distance for strings

Bunke’s Chicken Dataset

7/2/07 R.P.W. Duin 57

Cause of non-metric data - 1

14.9

7.8 4.1

object 78

object 419

object 425

Weighted edit-distance for strings

Bunke’s Chicken Dataset

Large distances are overestimated,due to computational problems

7/2/07 R.P.W. Duin 58

Cause of non-metric data - 2

7/2/07 R.P.W. Duin 59

Cause of non-metric data - 2

7/2/07 R.P.W. Duin 60

Cause of non-metric data - 2

7/2/07 R.P.W. Duin 61

Cause of non-metric data - 2

7/2/07 R.P.W. Duin 62

Cause of non-metric data - 2

?

Non-metric data due to partially observed projections

7/2/07 R.P.W. Duin 63

Cause of non-metric data - 2

?

Non-metric data due to partially observed projections

Small distances are underestimated

7/2/07 R.P.W. Duin 64

Cause of non-metric data - 2

S =

5 3 1 2 42 1 3 4 5

5 3 1 4 25 2 4 1 3

4 2 3 5 11 3 2 5 4

1 4 3 5 23 5 1 4 2

Preferences for items

Consumers

Example: consumer preferences for recommendation systems

7/2/07 R.P.W. Duin 65

Cause of non-metric data - 3

Distance(Table,Book) = 0

Distance(Table,Cup) = 0

Distance(Book,Cup) = 0.75

D(A,C)A

B

C

D(A,C) > D(A,B) + D(B,C)

D(A,B) D(B,C)

Single Linkage Clustering

7/2/07 R.P.W. Duin 66

Cause of non-metric data - 3

7/2/07 R.P.W. Duin 67

Cause of non-metric data - 3

Non-Euclidean Human Relations

7/2/07 R.P.W. Duin 68

Causes of non-metric data

1. Overestimated large distances (too difficult to compute)

2. Underestimated small distances (one-sided view of objects)

caused by the construction of complicated measures, needed to

correspond with human observations.

7/2/07 R.P.W. Duin 69

Causes of non-metric data

1. Overestimated large distances (too difficult to compute)

2. Underestimated small distances (one-sided view of objects)

caused by the construction of complicated measures, needed to

correspond with human observations.

3. Essential non-metric distance definitions

as the human concept of distance differs from the mathematical one.

Conclusion

The Human World is Non-Metric

Solutions

Neglect it: Use Euclidean Distances only

Solutions

Live with it: Heuristic Corrections

Solutions

Adapt to it: Dissimilarity Space

d2

dissimilarity space

d1

d2

Solutions

Adapt to it: Krein Space

11w=J v

11v J x = 0T

<x,x> > 0ε

<x,x> < 0ε R1

R1

i

<x,x> > 0ε

<x,x> < 0ε

F

D

A

CE

B

−4

−3

−2

−1

0

1

2

3

4

−4 −3 −2 −1 0 1 2 3 4

v

Rq

R p

���������������

���������������

��������������������

��������������������

���������������

���������������

����������

����������

���������������

���������������

��������������������������������������������������������

��������������������������������������������������������

2 2d (x,y) = d (x,y) − d (x,y) p q 2

Solutions

Adapt to it: Non-Point Representations

Blob Representation String Representation

Thank You!