WAGOS Conformal Changes of Divergence and Information Geometry Shun-ichi Amari RIKEN Brain Science...

Post on 19-Dec-2015

227 views 2 download

Tags:

Transcript of WAGOS Conformal Changes of Divergence and Information Geometry Shun-ichi Amari RIKEN Brain Science...

WAGOS

Conformal Changes of Divergence and Information Geometry

Shun-ichi AmariRIKEN Brain Science Institute

Information GeometryInformation Geometry

Systems Theory Information Theory

Statistics Neural Networks

Combinatorics PhysicsInformation Sciences

Riemannian ManifoldDual Affine Connections

Manifold of Probability Distributions

Math. AI

Vision, Shape

optimization

22

1; , ; , exp

22

xS p x p x

Information Geometry ?Information Geometry ?

p x

;S p x θ

Riemannian metricDual affine connections

( , ) θ

Manifold of Probability DistributionsManifold of Probability Distributions

1 2 3 1 2 3

1,2,3 { ( )}

, , 1

x p x

p p p p p p

1p

p

3p

2p

;S p x

0 1

1 0 1

1

{ | ( , ,..., ); 1; 0}

{ | ( , ,..., ); 0}

n n i i

n n i

n n

S p p p p p

S p p p p

S S

p

p

Riemannian Structure

2 ( )

( )

( ) ( )

Euclidean

i jij

T

ij

ds g d d

d G d

G g

G E

Fisher information

Affine Connectioncovariant derivative

,

0, X=X(t)

( )

X c

X

i jij

Y X Y

X

s g d d

minimal

geodesic

distance

straight line

DualityDuality

, , , i jijX Y X Y X Y g X Y

Riemannian geometry:

X

Y

X

Y

**, , ,X XX Y Z Y Z Y Z

Dual Affine Connections

e-geodesic

m-geodesic

log , log 1 logr x t t p x t q x c t

, 1r x t tp x t q x

,

q x

p x

*( , )

Divergence :D Z Y

: 0

: 0, iff

: ij i j

D

D

D d g dz dz

z y

z y z y

z z z

positive-definite

Z

Y

M

Metric and Connections Induced by Divergence(Eguchi)

'

' '

:

:

:

ij i j

ijk i j k

ijk i j k

g D

D

D

y z

y z

y z

z z y

z z y

z z y

', i ii iz y

Riemannian metric

affine connections

Duality:

, ,

k ij kij kji

ijk ijk ijk

g

T

M g T

*, , ,X XX Y Z Y Z Y Z

Two Types of Divergence Invariant divergence (Chentsov, Csiszar)

f-divergence: Fisher- structure

Flat divergence (Bregman)

KL-divergence belongs to both classes

q(x)

D[p : q] = p(x)f{ }dxp(x)

Invariant divergence (manifold of probability distributions; )

: one-to-one

sufficient statistics

y k x

: :X X Y YD p x q x D p y q y

{ ( , )}S p x

ChentsovAmari -Nagaoka

Csiszar f-divergence

: ,if i

i

qD p f

p

p q

: convex, 1 0,f u f

: :cf fD cDp q p q

1f u f u c u

' ''1 1 0 ; 1 1f f f 1

( )f u

u

Ali-SilveyMorimoto

Invariant geometrical structurealpha-geometry (derived from invariant divergence)

,S p x

ij i j

ijk i j k

g E l l

T E l l l

log , ; i i

l p x

-connection

, ;ijk ijki j k T

: dually coupled

, , ,X XX Y Z Y Z Y Z

α

Fisher information

Levi-civita:

 : Dually Flat Structure

1 2 1 2 1 2

affine coordinates

dual affine coordinates

potential ,

,

:D

nS

Dually flat manifold:Manifold with Convex Function

S coordinates 1 2, , , n L

: convex function

negative entropy log ,p p x p x dx energy

Euclidean 21

2i

mathematical programming, control systems, physics, engineering, economics

Riemannian metric and flatness

Bregman divergence

, grad D

1,

2i j

ijD d g d d

, ij i j i ig

: geodesic (notLevi-Civita)Flatness (affine)

{ , ( ), }S

Legendre Transformation

i i

one-to-one

0ii

,i i i

i

,D

( ) max { ( )}ii

Two flat coordinate systems ,

: geodesic (e-geodesic)

: dual geodesic (m-geodesic)

“dually orthogonal”

,

,

j ji i

ii i

i

*, , ,X XX Y Z Y Z Y Z

Geometry

ijG GRiemannian metric

1: ,

2TD d G p p p p p

1

G

G G

p p

p p

Straightness (affine connection)

: -geodesic

: -geodesic

t t

t t

p a b

p a b

Pythagorean Theorem (dually flat manifold)

: : :D P Q D Q R D P R

Euclidean space: self-dual

21

2 i

Projection Theorem min :

Q MD P Q

Q = m-geodesic projection of P to M

min :Q M

D Q P

Q’ = e-geodesic projection of P to M

dually flat space

convex functionsBregman

divergence

invariance

invariant divergence Flat divergence

KL-divergenceF-divergenceFisher inf metricAlpha connection

: space of probability distributions }{pS

logp(x)

D[p : q] = p(x) { }dxq(x)

, 0 : ( 1 not n holds)i iS p p p

Space of positive measures : vectors, matrices, arrays

f-divergence

α-divergence

Bregman divergence

divergence

1 1

2 21 1

[ : ] { }2 2i i i iD p q p q p q

[ : ] { log }ii i i

i

pD p q p p q

q

KL-divergence

α-representation (Amari-Nagaoka, Zhang)

1

2 , 1

log , 1i

i

i

prp

iU r r

typical case: u-representation,

2

1 , 1

, 1z

zU ze

i iU r p

Divergence over α-representation

: i i i iD U r V r rr p q

1

2

1

2

: -geodesic

2: -geodesic1

i i

i i

r p

r p

log 1i i

i i

r p

r p

β-divergence (Eguchi)

11

1 , 01

U z z

0 expU z z

0 : :D KLp q p q

-div -div

-divKL

( 1) ( 1)

1[ : ] { ( ) ( )

( 1

-structure

dually flat: -divergenc

)

( 1) ( ) ( ( ) ( )}

e

D p q p x q x

q x p x q x dx

( , ) divergence

, [ : ] { }i i i iD p q p q p q

: divergence

1: -divergence

Tsallis q-Entropy--

1

111

11 , 1

1ln

log , 1

exp ln 1 (1 ) } , ex

1ln 1

1

p

1 q

T

q

q

q

qq q

u qqu

u q

u u q u

H E p x dxp q

u

x

Shannon entropy1

[log ] ( ) log ( )( )

H E p x p x dxp x

Generalized log

α structure

2 1

1

2

q

q

1 11 : log

1 1T R q

q

q

q

H h H hq

h

q

p p x dx

: convex: ( ) { ( )}q qh p f h p p

1 11 ln

1q q qq

H E p xq h

p

1

: ln

11

1

p q

q q

r xD p x r x E

p x

p x r x dxq

: -divergence ; -structure

q -exponential family cf Pistone exponential

conve

escort distrib

, exp

:

( ) ( )ˆ ( ) :

1 11 ln

1

ution

x

q i i

q q

p p

i q i

q qq

p x x

p x p xdx p x

h h

E x

E p xq h

x

x x

p

κ

0

0

1 10

1

1

1

1

n

n ii

n

i ii

i q qi

qi i

q

S p

p x p x

p pq

ph

p

q-Geometry derived from : dually flat

( ) and ( )

0

0 01

log ( ) log ( )

(log log ) ( ) log

n

q q i ii

n

q i q i qi

p x p x

p p x p

Dually flat structure of q-escort

esc

ˆˆ: log

ˆ

ˆ log log q

q

pD p r p dx

rh rp x

q p x dxr x h p

esc ˆ ˆlog logij q i jg E p p

geodesic: exponential family

dual geodesic: q-family

1 2log , log 1 logp x t t p x t p x c t

1 2ˆ ˆ ˆ, 1p x t tp x t p t

q-escort probability distribution

1ˆ ( ) ( )

( )q

q

p x p xh p

Escort geometry

ˆ ˆ ˆ( ) [ log ( , ) log ( , )]ij q i jg E p x p x

q -escort geometry

1ˆ q

q

p x p xh

Dually flat structure of q-escort

esc

ˆˆ: log

ˆ

ˆ log log q

q

pD p r p dx

rh rp x

q p x dxr x h p

esc ˆ ˆlog logij q i jg E p p

geodesic: exponential family

dual geodesic: q-family

1 2log , log 1 logp x t t p x t p x c t

1 2ˆ ˆ ˆ, 1p x t tp x t p t

Pythagorean theoremq

: : :q qD p q D q r D p r

q - geodesic :

1 1 1

1 2, 1q q q

p x t tp x t p x c t

dual- -geodesic:q

1 2, 1q q q

p x t tp x t p x c t

Projection theorem

arg min :qr M

D p r p

Max-entropy theorem

constraint

max

s: q k k

q

E a x c

h

p

q -Cramer Rao theorem : ,p x

1

ˆ-unbiased estima

ˆ

tor

ˆ ˆ

q

q i i j j ij

E

q

E g

q -maximum likelihood estimator

1ˆ ˆarg max , , ;

1arg max ;

q N

q

iN

q

p x x

p xh

/ Bayes MAP with

N q

qh

q -super-robust estimator (Eguchi)

1

1

1 1

0 1

,ˆmax , max

bias-corrected -estimating function

ˆ, , log

1log

1

1ˆ, 0 max ,

q

q i q

q q

N

q i ii q

p xp x

h

q

s x p x p c

c hq

s x p xh

Conformal change of divergence

: :D p q p D p q

ij ijg p g

( )

logijk ijk k ij j ik i jk

i i

T T s g s g s g

s

q -Fisher information

( ) ( )( )

q Fij ij

q

qg p g p

h p

conformal transformation

11[ ( ) : ( )] (1 ( ) ( ) )

(1 ) ( )q q

qq

q divergence

D p x r x p x r x dxq h p

Total Bregman divergence (Vemuri)

2

( ) ( ) ( ) ( )TBD( : )

1 | ( ) |

p q q p qp q

q

Total Bregman Divergence and its Applications to Shape

Retrieval

•Baba C. Vemuri, Meizhu Liu, Shun-ichi Amari, Frank Nielsen

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010

Total Bregman Divergence

2

::

1

DTD

f

x yx y

•rotational invariance

•conformal geometry

TBD examples

Clustering : t-center

1, , mE x x

arg min , ii

TD x x x

E

y

T-center of E x

t-center

2

1

1

i i

i

i

i

w ff

w

wf

xx

x

x

t-center is robust

1, , ;

1; ,

nE

n

x x y

x x z x y

influence fun ;ction z x y

robust as : c z y

1,

1i i

f fG

w

G w fn

y xz x y

y

x x

21

1

f f

w f

w

y y

y y

y

Robust: is boundedz

21Euclidean case

2f x

1

2,

1

,

G

yz x y

y

z x y y

How good is Total Bregman Divergence

•vision

•signal processing

•geometry (conformal)

TBD application-shape retrieval

• Using MPEG7 database;• 70 classes, with 20 shapes each class (Meizhu Liu)

First clustering then retrieval

Advantages

• Accurate;• Easy to access (shape representation);• Space and time efficient (only need to store the

closed form t-centers, clustering can be done offline, hierarchical tree storage).

Shape retrieval framework

• Shape-->• Extract boundary points & align them-->• Represent using mixture of Gaussians-->• Clustering & use k-tree to store the clustering

results;• Query on the tree.

MPEG7 database• Great intraclass variability, and small

interclass dissimilarity.

Shape representation

Experimental results

Other TBD applications

Diffusion tensor imaging (DTI) analysis [Vemuri]

• Interpolation• Segmentation

Baba C. Vemuri, Meizhu Liu, Shun-ichi Amari and Frank Nielsen, Total Bregman Divergence and its Applications to DTI Analysis, IEEE TMI, to appear