machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf ·...
Transcript of machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf ·...
![Page 1: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/1.jpg)
Machine Learning SS: Kyoto U.
Information Geometryand Its Applications toMachine Learning
Shun-ichi AmariRIKEN Brain Science Institute
![Page 2: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/2.jpg)
Information Geometry
-- Manifolds of Probability Distributions
{ ( )}M p= x
![Page 3: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/3.jpg)
Information GeometryInformation Geometry
Systems Theory Information Theory
Statistics Neural Networks
Combinatorics PhysicsInformation Sciences
Riemannian ManifoldDual Affine Connections
Manifold of Probability Distributions
Math. AIVision
Optimization
![Page 4: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/4.jpg)
( ){ } ( ) ( )2
2
1; , ; , exp22
xS p x p x
μμ σ μ σ
σπσ
⎧ ⎫−⎪ ⎪= = −⎨ ⎬⎪ ⎪⎩ ⎭
Information Geometry ?Information Geometry ?
( ){ }p xσ
μ
( ){ };S p x= θ
Gaussian distributions
( , )μ σ=θ
![Page 5: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/5.jpg)
Manifold of Probability DistributionsManifold of Probability Distributions
( )1 2 3 1 2 3
1, 2,3 ={ ( )}, , 1
nx S p xp p p p p p
=
= + + =
3p
2p1p
p
( ){ };M p x= θ
![Page 6: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/6.jpg)
InvarianceInvariance ( ){ },S p x= θ
Invariant under different representation
( ) ( ), ,=y y x p ysufficient statistics
θ( ) ( )
2
1 2
21 2
, ,
| ( , ) ( , ) |
p x p x dx
p y p y dy
θ θ
θ θ
−
≠ −
∫∫
![Page 7: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/7.jpg)
Two Geometrical StructuresTwo Geometrical StructuresRiemannian metric affine connection --- geodesic
( )2ij i jds g d dθ θ= ∑ θ
Fisher information
log logiji j
g E p pθ θ
⎡ ⎤∂ ∂= ⎢ ⎥
∂ ∂⎢ ⎥⎣ ⎦
Orthogonality: innner product
1 2 1 2, Td d d Gdθ θ θ θ< >=
![Page 8: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/8.jpg)
Affine Connectioncovariant derivative; parallel transport
,
geodesic X=X X=X(t)
( )
X c
i jij
Y X Y
s g d dθ θ θ
∇ Π =
Π
= ∑∫minimal distance
& &
straight line
![Page 9: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/9.jpg)
Duality: two affine connectionsDuality: two affine connections
, , , i jijX Y X Y X Y g X Y∗= Π Π < >= ∑
Riemannian geometry: ∗∏ = ∏
X
Y
X
YΠ
*Π
{ , , , *}S g ∇ ∇
![Page 10: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/10.jpg)
Dual Affine Connections
e-geodesic
m-geodesic
( ) ( ) ( ) ( ) ( )log , log 1 logr x t t p x t q x c t= + − +
( ) ( ) ( ) ( ), 1r x t tp x t q x= + −
( ), ∗∇ ∇
( )q x
( )p x
*( , )Π Π
( ) 0* ( ) 0x
x
x tx t
∇ =∇ =
&
&
&
&
![Page 11: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/11.jpg)
Mathematical structure of ( ){ },S p x= ξ
( )( )
ij i j
ijk i j k
g E l l
T E l l l
ξ
ξ
⎡ ⎤= ∂ ∂⎣ ⎦⎡ ⎤= ∂ ∂ ∂⎣ ⎦
( )log , ; i il p xξ∂
= ∂ =∂
ξ
-connection
{ }, ;ijk ijki j k Tα αΓ = −
α α−∇ ↔ ∇ : dually coupled
, , ,X XX Y Z Y Z Y Z∗= ∇ + ∇
α
{M,g,T}
![Page 12: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/12.jpg)
Divergence: [ ]:D z y
[ ]
[ ]
[ ]
: 0
: 0, iff
: ij i j
D
D
D d g dz dz
≥
= =
+ = ∑
z y
z y z y
z z z
positive‐definite
Z
Y
M
![Page 13: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/13.jpg)
Kullback-Leibler Divergencequasi-distance
( )[ ( ) : ( )] ( ) log( )
[ ( ) : ( )] 0 =0 iff ( ) ( )[ : ] [ : ]
x
p xD p x q x p xq x
D p x q x p x q xD p q D q p
=
≥ =≠
∑
![Page 14: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/14.jpg)
[ ]: 0if i
i
qD p fp
⎛ ⎞= ≥⎜ ⎟
⎝ ⎠∑ %
% % %%
p q [ ]: 0fD = ⇔ =% % % %p q p q
( ) ( ) ( )not invariant under 1f u f u c u= − −%
divergence of f S%
{ }, 0 : ( 1 holds)i iS p p nn= > =∑% % % %p
![Page 15: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/15.jpg)
divergence
α
1 12 21 1[ : ] { }
2 2i i i iD p q p q p qα α
αα α − +− +
= + −∑% % % % % %
[ : ] { log }ii i i
i
pD p q p p qq
= + −∑ %% % % % %
%
KL-divergence
α
![Page 16: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/16.jpg)
( , ) divergenceα β −
, [ : ] { }i i i iD p q p q p qα β α β α βα β
α βα β α β
+ += + −+ +∑
: divergence1: -divergence
β α αα β
= − −=
![Page 17: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/17.jpg)
Metric and Connections Induced by Divergence(Eguchi)
( ) [ ] [ ] ( )
( ) [ ]
( ) [ ]
'
' '
1: : : = 2
:
:
=
=
∗=
= ∂ ∂ +
Γ = −∂ ∂ ∂
Γ = −∂ ∂ ∂
ij i j ij i j
ijk i j k
ijk i j k
g D D d g dz dz
D
D
y z
y z
y z
z z y z z z z
z z y
z z y
*
'
{ , }
, i ii iz y
∇ ∇
∂ ∂∂ = ∂ =
∂ ∂
Riemannian metric
affine connections
![Page 18: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/18.jpg)
Duality:
{ }, ,
k ij kij kji
ijk ijk ijk
g
T
M g T
∗
∗
∂ = Γ + Γ
Γ = Γ −
*, , ,X XX Y Z Y Z Y Z< >=< ∇ > + < ∇ >
![Page 19: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/19.jpg)
Dually flat manifoldexponential family; mixture family; {p(x); x discrete}
-coordinates : affine coordinates, flat, geodesics
-coordinates: (dual) affine coordinates, flat, geodesics
canonical divergence D(P: P') :
θ
η
−
b
KL divergencenot Riemannian flat
( ) ( ){ }, exp : exponential familyθ θ ψ θ= −∑ i ip x x
![Page 20: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/20.jpg)
( ) ( )
( ) ( )
( ) ( )
( ) ( ) ( )
( ) ( ){ }
2 2
potential functions ,
; :
0
, exp : exponential family
: cumulant generating function: negative en
ψ θ ϕ η
η ψ θ θ ϕ θθ η
ψ θ ϕ η θη
θ ψ θ ϕ θθ θ η η
θ θ ψ θ
ψϕ
∂ ∂= =
∂ ∂
+ − =
∂ ∂= =
∂ ∂ ∂ ∂
= −
∑
∑
L
i ii i
i i
ijij
i j i j
i i
g g
p x x
Legendre transformation
( ) ( )tropy
canonical divergence D(P: P')= ' 'ψ θ ϕ η θη+ − ∑ i i
Dually flat manifold
![Page 21: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/21.jpg)
Manifold with Convex Function
S : coordinates ( )1 2, , , nθ θ θ= Lθ
( )ψ θ : convex function
negative entropy ( ) ( ) ( )logp p x p x dxϕ = ∫
( ) ( )212
iψ θ= ∑θ
![Page 22: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/22.jpg)
Riemannian metric and flatness(affine structure)Bregman divergence
( ) ( ) ( ) ( ), grad D ψ ψ′ ′ ′= − − ⋅θ θ θ θ θ θ
( ) ( )1,2
i jijD d g d dθ θ+ = ∑θ θ θ θ
( ) , ij i j i ig ψθ∂
= ∂ ∂ ∂ =∂
θ
θ : geodesic (not Levi-Civita)Flatness (affine)
{ , ( ), }S ψ θ θ
![Page 23: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/23.jpg)
Legendre Transformation
( )i iη ψ= ∂ θ
↔θ η one-to-one
( ) ( ) 0iiϕ ψ θ η+ − =η θ
( ) ,i i i
i
θ ϕη∂
= ∂ ∂ =∂
η
( ) ( ) ( ),D ψ ϕ′ ′ ′= + − ⋅θ θ θ η θ η
( ) max { ( )}iiθϕ η θ η ψ θ= −
![Page 24: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/24.jpg)
Two affine coordinate systems ( ),θ η
θ : geodesic (e-geodesic)
η : dual geodesic (m-geodesic)
“dually orthogonal”,
,
j ji i
ii i
i
δ
θ η
∂ ∂ =
∂ ∂∂ = ∂ =
∂ ∂
*, , ,X XX Y Z Y Z Y Z< >=< ∇ > + < ∇ >
![Page 25: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/25.jpg)
Pythagorean Theorem (dually flat manifold)
[ ] [ ] [ ]: : :D P Q D Q R D P R+ =
Euclidean space: self-dual =θ η
( ) ( )212 iψ θ θ= ∑
![Page 26: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/26.jpg)
Projection Theorem
[ ]min :Q M
D P Q∈
Q = m-geodesic projection of P to M
[ ]min :Q M
D Q P∈
Q’ = e-geodesic projection of P to M
![Page 27: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/27.jpg)
Two Types of DivergenceInvariant divergence (Chentsov, Csiszar)
f-divergence: Fisher- structure
Flat divergence (Bregman) – convex function
KL-divergence belongs to both classes: flat and invariant
∫q(x)D[p : q] = p(x)f{ }dxp(x)
α
![Page 28: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/28.jpg)
dually flat space
convex functionsBregman
divergence
invariance
invariant divergence Flat divergence
KL‐divergenceF-divergenceFisher inf metricAlpha connection
: space of probability distributions}{p=S
log∫p(x)D[p : q] = p(x) { }dxq(x)
![Page 29: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/29.jpg)
{ }, 0 : ( 1 holds)i iS p p nn= > =∑% % % %p
Space of positive measures : vectors, matrices, arrays
f‐divergence
α‐divergence
Bregman divergence
![Page 30: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/30.jpg)
Applications of Information Geometry
Statistical InferenceMachine Learning and AIComputer VisionConvex ProgrammingSignal Processing (ICA; Sparse)Information Theory, Systems TheoryQuantum Information Geometry
![Page 31: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/31.jpg)
Applications to Statisticscurved exponential family:
( ) ( ) ( )( ){ }, expp x u u uθ ψ θ= ⋅ −x
1
1=
= ∑n
kk
x xn
: estimator
u
ˆ xη =
1, 2( , ) ,... np x u x x x ( , ) exp{ ( )}p x xθ θ ψ θ= ⋅ −
1ˆ( ,..., )nu x x
![Page 32: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/32.jpg)
x : discrete X = {0, 1, …, n}
0 1
0 0
{ ( ) | }:
( ) ( ) exp[ ( )]
log log ; ( ); ( ) log
( , )
n
n ni
i i ii i
ii i i
S p x x X
p x p x x
p p x x p
p x u
δ θ ψ θ
θ δ ψ θ= =
= ∈
= = −
= − = = −
∑ ∑
exponential family
statistical model :
![Page 33: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/33.jpg)
High-Order AsymptoticsHigh-Order Asymptotics
( )( )
1
1
, (u) : , ,
u u , ,n
n
p x x x
x x=
L
L
θ
( )( )ˆ ˆ Te E u u u u⎡ ⎤= − −⎣ ⎦
1 22
1 1e G Gn n
= +
11G G−≥ :Cramér-Rao: linear theory
( ) ( ) ( )2 2 2
2e m m
M AG H H= + + Γ
:
u
ˆ xη =
quadratic approximation
![Page 34: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/34.jpg)
Information Geometryof
Belief Propagation
• Shun-ichi Amari (RIKEN BSI)• Shiro Ikeda (Inst. Statist. Math.)• Toshiyuki Tanaka (Kyoto U.)
![Page 35: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/35.jpg)
Stochastic Reasoning
( , , , , )p x y z r s
( , , | , ), , ,... 1, 1p x y z r sx y z = −
![Page 36: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/36.jpg)
Stochastic Reasoningq(x1,x2,x3,…| observation)
X= (x1 x2 x3 …..) x = 1, -1
X= argmax q(x1, x2 ,x3 ,…..) maximum likelihood
Xi = sgn E[xi] least bit error rate estimator
![Page 37: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/37.jpg)
Mean Value Marginalization: projection to independent distributions
0 1 1 2 2 0( ) ( ) ( )... ( ) ( )n nq q x q x q x qΠ = =x x
1, 1( ) ( ..., ) .. ..i i n i nq x q x x dx dx dx= ∫(
0[ ] [ ]q qη = =E x E x
![Page 38: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/38.jpg)
( ) ( )
( ) ( ){ } ( )
( ) { }
1
1
1
1 2
exp
,
1, 1
e p
,
x
s
ij
L
i j i
i i r qr
r r i i s
i
i
q k x c
c c x x r
q w x x h x
i i
x r i i
ψ
ψ
=
⎧ ⎫= ⋅ + −⎨ ⎬⎩ ⎭
=
= −
=
+
=
= −
∑
∑
∑
∑
L L
x
x x
x
Boltzmann machine, spin glass, neural networksTurbo Codes, LDPC Codes
![Page 39: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/39.jpg)
Computationally DifficultComputationally Difficult
( ) [ ]( ) ( ){ }exp r q
q E
q c
η
ψ
→ =
= −∑x x
x x
mean-field approximation
belief propagation
tree propagation, CCCP (convex-concave)
![Page 40: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/40.jpg)
Information Geometry ofMean Field Approximation
• m-projection• e-projection
D[q:p]=
q = argmin D[q:p]q = argmin D[p:q]
( )( )log( )x
q xq xp x∑
0eΠ
0mΠ
0 { ( )}i i iM p x= Π0( ) ∈p x M
![Page 41: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/41.jpg)
Information GeometryInformation Geometry
( ){ } { }( ) ( ){ }{ }
0 0 0, exp
, exp
ψ
ψ
= = ⋅ −
= = + ⋅ −r r r r r r
M p
M p cξ ξ
x x
x x x
θ θ
1, ,r L= L
( )q x
rM
'rM
0Mθ
( ) exp{ ( )rq x c x φ= − }∑
![Page 42: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/42.jpg)
0
1 t0
1 1
( , )
( , ) : belief for ( )+
+ +
Π
= Π −
= ∑
tr r
t tr r r r r
t tr
p x
p x c x
ξ
θ ξ ξ
θ θ
Belief PropagationBelief Propagation
( ) ( ){ }: , exp ψ= + ⋅ −r r r r r rM p cξ ξx x x
![Page 43: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/43.jpg)
Belief Prop Algorithm
0M
rM
'rM
rς
'rς
rςΠ
'rς
![Page 44: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/44.jpg)
Equilibrium of BPEquilibrium of BP ( ),∗ ∗rξθ
1) m-condition
( )*0 ,∗ = Π r rp ξθ x
( )-flat submanifold m M θ∗
rM
'rM
0M
rM
'rM
0M
q2) e-condition
*11
∗ ∗=− ∑ r rL
θ ξ
( ) -flat submanifold q e∈x
ξ1( ')ξ θ
![Page 45: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/45.jpg)
( ) [ ] [ ]1 0 0, , , : :L rF D p q D p pθ ζ ζ = −∑L
critical point
0 : -condition
0 : -conditionr
F e
F m
∂=
∂∂
=∂
θ
ζnot convex
Free energy:Free energy:
![Page 46: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/46.jpg)
Belief Propagatione-condition OK
CCCP m-condition OK
1 2
1 2 1 2
1( ; , , , ' '1
( , , ) ( ' , ' , ' )
, ... ) =−
, ... → ,...
∑L r
L L
Lθ ξ ξ ξ θ ξ
ξ ξ ξ ξ ξ ξ
( ) ( )1 1 1
0 0
'( '), ( '),..., ( ')' , ' : ' , '
→
= Π = Πr r rp p
θ θξ θ ξ θ ξ θθ ξ ξ θx x
![Page 47: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/47.jpg)
( )1 1 10
1 1
,
ξ
+ + +
+ +
= Π = Π
= − ∑
t t tr r r
t t tr
p
L
ξ θ θ
θ θ
x
![Page 48: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/48.jpg)
Convex-Concave ComputationalProcedure (CCCP) Yuille
1 21
1 2
( ) ( ) ( )
( ) ( )
θ θ θ
θ θ+
= −
∇ = ∇t t
F F F
F FElimination of double loops
![Page 49: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/49.jpg)
Boltzmann Machine
( ) ( )1i ij j ip x w x hϕ= = −∑
( ) ( ){ }exp ij i j i ip x w x x h x wψ= − −∑ ∑
2x
3x
4x
1x
( )q x
( )p x B
![Page 50: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/50.jpg)
Boltzmann machine---hidden units
• EM algorithm• e-projection• m-projection
D
M
![Page 51: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/51.jpg)
EM algorithm
hidden variables
( ), ;p x y u
{ }1, , ND = Lx x
( ){ }, ;M p= x y u
( ) ( ) ( ){ },M DD p p p= =x y x x
( )ˆmin , :KL p p M∈⎡ ⎤⎣ ⎦x y m-projection to M
De-projection to( )ˆmin : , ;KL p D p∈⎡ ⎤⎣ ⎦x y u
![Page 52: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/52.jpg)
SVM : support vector machine
Embedding
Kernel
Conformal change of kernel
( )( ) ( ) ( , )
i i
i i i i i
z xf x w x y K x x
φ
φ α
=
= =∑ ∑
( , ') ( ) ( ')i iK x x x xφ φ= ∑
2
( , ') ( ) ( ') ( , ')( ) exp{ | ( ) | }
ρ ρ
ρ κ
⎯⎯→
= −
K x x x x K x xx f x
![Page 53: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/53.jpg)
Signal ProcessingICA : Independent Component Analysis
t t t tA= →x s x s
sparse component analysis
positive matrix factorization
![Page 54: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/54.jpg)
mixture and unmixtureof independent signals
2x
1s
ns2smx
1x1
n
i ij jj
x A s=
=
=
∑
x As
![Page 55: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/55.jpg)
Independent Component Analysis
1
i ij jA x A s
W W A−
= =
= =
∑x s
y x
s A W y
x
observations: x(1), x(2), …, x(t)recover: s(1), s(2), …, s(t)
![Page 56: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/56.jpg)
![Page 57: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/57.jpg)
Space of Matrices : Lie group
-1d d=X WW
( ) ( )2 1tr trT T T
T
d d d d d
ll
− −= =
∂∇ =
∂
W X X WW W W
W WW
:dX
I I d+ X
Wd+W W
non-holonomic basis
1W −
![Page 58: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/58.jpg)
Information Geometry of ICA
natural gradientestimating functionstability, efficiency
S ={p(y)}
1 1 2 2{ ( ) ( )... ( )}n nI q y q y q y=
{ ( )}p Wx
r q
( ) [ ( ; ) : ( )] ( )
l KL p qr
=W y W yy
![Page 59: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/59.jpg)
Semiparametric Statistical Model
( ; , ) | | ( )p r r=x W W Wxunknown
x(1), x(2), …, x(t)
ir r= Π, ( ) :r s−= 1W A
![Page 60: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/60.jpg)
Natural Gradient
( ), Tlη
∂Δ = −
∂y W
W W WW
![Page 61: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/61.jpg)
Basis Given: overcomplete caseSparse Solution
many solutionsmany 0
ˆ
i i
i
t t
A s
s
A
= =
→
=
∑x s a
x s
ˆ ˆ ˆ: =A x Assparse
![Page 62: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/62.jpg)
generalized inverse
ˆmin Σ 2is
sparse solution
ˆmin ii
∑ s
ˆ ˆ ˆ: x =A Assparse
2 :-normL
1 :-normL
![Page 63: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/63.jpg)
![Page 64: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/64.jpg)
Overcomplete Basis and Sparse Solution
1
'
min
min
i i
i
p p
s A
s
A α
= =
=
− +
∑∑
x a s
s
s x s
non-linear denoising
![Page 65: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/65.jpg)
Sparse Solution( )
( )
min
penalty : Bayes priorpp iF
ϕ
= ∑
β
β β
( )
( )
( )
( )
0
1 1
22
#1[ 0] :
:
: 0 1
:
i
i
p
i
F
F L
F p
F
β
β
β
= ≠
=
≤ ≤
=
∑
∑
sparsest solution
solution
generalized inverse solution
β
β
β
β
Sparse solution: overcomplete case
![Page 66: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/66.jpg)
Optimizationunder Spasity Condition
( )( )
min : convex functionconstraint F c
ϕ⎧⎪⎨ ≤⎪⎩
ββ
typical case: ( )
( )
21 1 ( *) ( *)2 21 ;
ϕ
β
= − =
= ∑
T
pi
X G
Fp
β β β − β β − β
β
y
2, 1, 1/ 2p p p= = =
![Page 67: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/67.jpg)
L1-constrained optimizationLASSO
LARS
( ) ( )( )
min under
solution : 0
0c
F c
c c
ϕ∗
∗ ∗
≤
= → ∞
= →
β β
β
β β
( ) ( )( )
min
solution 0
0λ
ϕ λ
λ λ∗
∗ ∗
+
= ∞ →
= →
Fβ β
β
β β
( )( )
, ,
: ,
≥* *c λsolutions β and β : coincide λ = λ c p 1
p < 1 λ = λ c multiple noncontinuous stability different
λP Problem
cP Problem
![Page 68: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/68.jpg)
*β
*β
Projection from to F = c (information geometry)*β
![Page 69: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/69.jpg)
Convex Cone Programming
P : positive semi-definite matrix
convex potential function
dual geodesic approach
, minA = ⋅x b c x
Support vector machine
![Page 70: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/70.jpg)
a) : 2, 1cR n p= > b) : 2, 1cR n p= =
c) : 2, 1cR n p= <
Fig. 1
non-convex
![Page 71: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/71.jpg)
( ) ( ) : ϕ ⎡ ⎤= ⎣ ⎦*min β
orthogonal projection, dual
D β :β , F β = c dual geodesic
projec
projec
tion
tion
η η η∗ ∗ ∗− ∝ ∇ ( )c cF
dual
![Page 72: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/72.jpg)
n
F= ∇n
Fig. 5 subgradient
η η∗ ∗∝ ∇ ( )c cF
![Page 73: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/73.jpg)
LASSO path and LARS path(stagewise solution)
( ) ( )
( ) ( )
min :
min
F c
F
ϕ
ϕ λ
=
+
β β
β β
( ) ( ),c λ∗ ∗ ⇔c λ correspondenceβ β
![Page 74: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/74.jpg)
Active set and gradient
( ) { }
( )( ) ( )
( )[ ]
1
0
sgn ,, ,
1,1
i
pi i
p
A i
i AF i A
β
β β − −
= ≠
⎧ ∈⎪
∇ = −∞ ∞ ∉⎨⎪ −⎩
β
β
![Page 75: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/75.jpg)
Solution path( ) ( )
( ) ( ){ } ( )
( )1
0,
;
ϕ λ
λ
λϕ λ
∗ ∗ ∗
∗ ∗
− ∗
∇ + ∇ =
∇ ∇ + ∇ ∇ ⋅ = − ∇
= − ∇ =&
&&
&&c c
A c c A c c
A A c c A A c c A c
c cA
c
cK
F
F F
ddc
F
β β β
β β β
ββ β
β
β
( ) ( )c c cK G Fλ∗ ∗= + ∇∇β β
( )1 1 10; (sgn ) : β∇∇ = ∇ = iF F L
![Page 76: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/76.jpg)
Solution pathin the subspace of the active set
( ) ( )
( )1
0 : active directionλ λ
λ λ
ϕ λ∗ ∗
∗ − ∗
∇ + ∇ = ∇
= − ∇&
A A A
A A
F
K F
β β
β β
′→turning point A A
![Page 77: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/77.jpg)
Gradient Descent Method
{ ( )}: covariant
{ ( )}: contravariant
i
ji
i
L L xx
L g L xx
∂∇ =
∂∂
∇ =∂∑%
2min L(x+a): i jijg a a ε=
1 ( )t t tx x c L x+ = − ∇
![Page 78: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/78.jpg)
Extended LARS (p = 1) and Minkovskian grad
( )
( )
( ) { }
( )
11
norm
max under 1
1
sgn , max , ,
0, otherwise
pip
p
p
i i NA
a
p
ψ ε
ψ ε λ
η η η ηψ
ψ
+
=
+ =
+ −
=
⎧ =⎪∇ = ⎨⎪⎩
= ∇
∑
1L
a
a a
a a
β
β
β
η β
![Page 79: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/79.jpg)
arg max ii f∗ =
max i i jf f f∗ ∗= =
( ) 1, for and ,0 otherwise.i
i i jF
∗ ∗⎧ =∇ = ⎨
⎩%
1t t Fη+ = − ∇%β β LARS
![Page 80: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/80.jpg)
F f∇ = ∇%
( )1
1sgn pi iF c f f −∇ =%
( )
0
1sgn
0
0
iF c f ∗
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥
∇ = ⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
M
%
M
Euclidean case
1α →
![Page 81: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/81.jpg)
λ c-trajectory and-trajectory
Ex. 1-dim ( ) ( )21 *2
ϕ β β β= −
( ) ( )21 2 22λ β φ λ β λ β= + = − +f F
L1/2 constraint: non-convex optimization
![Page 82: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/82.jpg)
( )2: min , cP cβ β β∗− ≤
c cβ =
: 0P fλ λ∇ = 0λβ ββ
∗− + = ( )ˆ : Xu Zongben's operatorλβ β ∗= R
( )c c cλ β ∗= −
0 c β ∗β
β ∗
( )Rλ β ∗
c
λ
![Page 83: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/83.jpg)
ICCN-Huangshan(黄山)
Sparse Signal AnalysisShun-ichi Ammari (甘利俊一)
RIKEN Brain Science Institute
(Collaborator: Masahiro Yukawa, Niigata University)
![Page 84: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/84.jpg)
Solution Path :
not continuous, not-monotonejump
cλ ↔
cλβ β⇔
![Page 85: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/85.jpg)
An Example of the greedy path
β1
β2
![Page 86: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/86.jpg)
Linear Programming
( ) ( )inner met
max
lo
ho
g
d
ij j i
i i
ij j ii
A x b
c x
A x bψ
≥
= −
∑∑
∑ ∑x
![Page 87: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/87.jpg)
Convex Programming━ Inner Method
: , 0LP A ≥ ⋅ ≥x b c x
min ⋅c x
( ) ( )log
logij j i
i
A x b
x
ψ = −
+
∑ ∑∑
x
( )iψ= ∂ xη
Simplex method ; inner method
![Page 88: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/88.jpg)
Polynomial-Time Algorithm
curvature : step-size( ) 2mH
( ) ( )min : geodesict tψ ∗⋅ + = ∇ −c x x x δ
![Page 89: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/89.jpg)
Neural Networks
Higher-order correlations
Synchronous firing
Multilayer Perceptron
![Page 90: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/90.jpg)
Multilayer Perceptrons
( )i iy v nϕ= ⋅ +∑ w x
( ) ( )( )
( ) ( )
21; exp ,2
, i i
p y c y f
f v ϕ
⎧ ⎫= − −⎨ ⎬⎩ ⎭
= ⋅∑
x x
x w x
θ θ
θ
x y
1 2( , ,..., )nx x x x=
1 1( ,..., ; ,..., )m mw w v vθ =
![Page 91: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/91.jpg)
Multilayer Perceptron
( )( )
( )1 1,
,
, ; ,i i
m m
y f
v
v v
ϕ
=
= ⋅
=
∑L L
x θ
w x
θ w w
neuromanifold ( )xψ
space of functions
![Page 92: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/92.jpg)
singularities
![Page 93: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/93.jpg)
Geometry of singular model
( )y v nϕ= ⋅ +w x
W
v| | 0v =w
![Page 94: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/94.jpg)
Backpropagation ---gradient learningBackpropagation ---gradient learning
( ) ( )
( ) ( )
1 1
2
examples : , , ,1 , log , ;2
t ty y
E y f p y= − = −
L
θ θ
x x
x x
( ) ( ),
t t
i i
E
f v
η
ϕ
∂Δ = −
∂= ⋅∑x w x
θθ
θ
1
natural gradient (Riemannian)
--steepest descentE G E−∇ = ∇%
![Page 95: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/95.jpg)
![Page 96: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/96.jpg)
![Page 97: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/97.jpg)
q ‐Fisher information
( )( ) ( )( )
q Fij ij
q
qg p g ph p
=
conformal transformation
11[ ( ): ( )] (1 ( ) ( ) )(1 ) ( )
q qq
q
q divergence
D p x r x p x r x dxq h p
−= −−
−
∫
![Page 98: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/98.jpg)
Total Bregman Divergence and its Applications to
Shape Retrieval
•Baba C. Vemuri, Meizhu Liu, Shun-ichi Amari, Frank Nielsen
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010
![Page 99: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/99.jpg)
Total Bregman Divergence
[ ] [ ]2
::
1
DTD
f=
+ ∇
x yx y
•rotational invariance
•conformal geometry
![Page 100: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/100.jpg)
Total Bregman divergence (Vemuri)
2( ) ( ) ( ) ( )TBD( : )
1 | ( ) |
p q q p qp qq
ϕ ϕ ϕ
ϕ
− − ∇ ⋅ −=
+ ∇
![Page 101: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/101.jpg)
Clustering : t-center
{ }1, , mE x x= L
[ ]arg min , ii
TD∗ = ∑x x x
E
y
T-center of E
∗x
![Page 102: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/102.jpg)
t-center
( ) ( )
( ) 2
1
1
i i
i
i
i
w ff
w
wf
∗ ∇∇ =
=+ ∇
∑∑
xx
x
∗x
![Page 103: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/103.jpg)
q ‐super‐robust estimator (Eguchi)
( ) ( )
( ) ( ) ( ){ }
( )
( ) ( )
1
1
1 1
0 1
,ˆmax , max
bias-corrected -estimating function
ˆ, , log
1 log1
1 ˆ, 0 max ,
q
q i q
q q
N
q i ii q
p xp x
h
q
s x p x p c
c hq
s x p xh
ξξ
ξ ξ ξ
ξ
ξ ξ
+
+
+ +
= +
→
= ∂ −
= ∂+
= ⇔∑ ∑
![Page 104: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/104.jpg)
Conformal change of divergence
( ) ( ) [ ]: :D p q p D p qσ=%
( )ij ijg p gσ=%
( )
logijk ijk k ij j ik i jk
i i
T T s g s g s g
s
σ
σ
= + + +
= ∂
%
![Page 105: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/105.jpg)
t-center is robust{ }
( )
1, , ;
1; ,
nE
nε ε
∗
∗ ∗ ∗
=
= + =
L
%
x x y
x x z x y
( )influence fun ;ction ∗z x y
robust as : c< → ∞z y
![Page 106: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/106.jpg)
( ) ( ) ( )( )
( ) ( )
1,
1i i
f fG
w
G w fn
∗∗ −
∇ − ∇=
= ∇∇∑
y xz x y
y
x x
( )( )
( )( )
( )
21
1
f fw f
w
∇ ∇= < ∞
+ ∇
>
y yy y
y
Robust: is boundedz
21Euclidean case 2
f = x
( )
( )
1
2,
1
,
G∗ −
∗
=+
=
yz x yy
z x y y
![Page 107: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/107.jpg)
MPEG7 database• Great intraclass variability, and small
interclass dissimilarity.
![Page 108: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/108.jpg)
Other TBD applicationsDiffusion tensor imaging (DTI) analysis
[Vemuri]• Interpolation• Segmentation
Baba C. Vemuri, Meizhu Liu, Shun-ichi Amari and Frank Nielsen, Total Bregman Divergence and its Applications to DTI Analysis, IEEE TMI, to appear
![Page 109: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/109.jpg)
TBD application-shape retrieval
• Using MPEG7 database;• 70 classes, with 20 shapes each class
(Meizhu Liu)
![Page 110: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/110.jpg)
Multiterminal Information & Statistical InferenceMultiterminal Information & Statistical Inference
: nX x
: nY y
XR
YRθ
( ), ;
2 2X YR n R nX r
p x y
M M
θ
= =
![Page 111: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/111.jpg)
marginal
correlation
0-rate Slepian-WolfM CG G G= +
p(x,y)
p(x,y,z)
![Page 112: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/112.jpg)
Linear SystemsLinear Systems
ARMAARMA
( )( )
11
1 11
1 1
11
11
, , : , ,
,
pq
t tpp
p q
t t
b z b zx u
a z a z
a a b b
f z u
θ
− −
+ − −
−+
+ + +=
+ + +
=
=
L
L
L L
x θ,
tu tx
AR---e-flat
MA---m-flat
![Page 113: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/113.jpg)
Machine LearningBoosting : combination of weak learners
( ) ( ) ( ){ }1 1 2 2, , , , , ,N ND y y y= Lx x x
1iy = ± −
( ) ( ) ( ), : , sgn ,f y h f= =x u x u x u
![Page 114: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/114.jpg)
Weak Learners
( ) ( )( )sgn t tH hα= ∑x x
( ){ }: Prob t t i i th y Wε ≠x
( ) ( ) ( ){ }1 expt t t i t iW i cW i y h xα+ = −
weight distribution
![Page 115: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/115.jpg)
Boosting ━generalization
( ) ( ) ( ){ }{ }1 expt t t t tQ Q y x Q y x yh x fα−= = − %
( ) ( ){ }, constt tF P y x Eyh x= =
: min :t tD P Qα ⎡ ⎤⎣ ⎦%
( ) ( )1, ,t tD P Q D P Q+ <% %
![Page 116: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/116.jpg)
Integration of evidences:
arithmetic meangeometric meanharmonic mean
-meanα
1 2, ,... mx x x
![Page 117: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/117.jpg)
Various Means
2a b+
ab2
1 1a b
+:
arithmetic geometric
Any other mean?
:
harmonic
![Page 118: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/118.jpg)
Generalized mean: f-meanf(u): monotone; f-representation of u
1 ( ) ( )( , ) { }2f
f a f bm a b f − +=
12( ) , 1
log ,
( , )
1
( ,
)
=
f fm ca cb cm a
f u u
b
u
α
α αα
−
= ≠
=scale free
α-representation
![Page 119: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/119.jpg)
-mean : α
2
1 :
1: 2
10 : ( )4 2
min( , ) max( , )
aba b
a ba b ab
m a bm a b
α
α
α
α
α
αα
=+
= −
+= + = +
= ∞ == −∞ =
1 2( ( ), ( ))m p s p sα
![Page 120: machine learn IG.ppt [互換モード]yosinski.com/mlss12/...Amari-Information-Geometry.pdf · Systems Theory Information Theory Statistics Neural Networks Combinatorics Information](https://reader033.fdocuments.in/reader033/viewer/2022053014/5f110e1de5a4ab5b9255a764/html5/thumbnails/120.jpg)
Family of Distributions{ }1( ), , ( )kp s p sL
1
( ) ( ), 1k
mix i i ii
p s t p s t=
= =∑ ∑
explog ( ) log ( )i ip s t p s ψ= −∑
α −
mixture family :
exponential family :
1( ; ) { ( ( ))}i ip x f f p xα αθ θ−= ∑