Kernels for Dynamic Textures - Purdue Universityvishy/talks/Dynamic.pdf · 2009. 8. 22. · Dynamic...

S.V.N. Vishwanathan: Kernels for Dynamic Textures,

Kernels for Dynamic TexturesS.V.N. Vishwanathan

[email protected]://web.anu.edu.au/~vishy

National ICT Australiaand

Australian National University

Joint work with Alex Smola and René Vidal

http://web.anu.edu.au/~vishy

Roadmap


Introduction to Kernel Methods

Why kernels?

Kernels on Dynamical Systems

Trajectories, Noise ModelsComputation

Dynamical Textures

ARMA ModelsApproximate SolutionsKernel ComputationExperiments

Outlook and Conclusion

Classification


Data:

Pairs of observations (xi, yi)

Underlying distribution P(x, y)

Examples (blood status, cancer), (transactions, fraud)

Task:

Find a function f (x) which predicts y given x

The function f (x) must generalize well

Optimal Separating Hyperplane


Minimize1

2‖w‖2 subject to yi(〈w, xi〉 + b) ≥ 1 for all i

Kernels and Nonlinearity


Problem: Linear functions are often toosimple to provide good estimators

Idea 1: Map to a higher dimensionalfeature space via Φ : x → Φ(x) andsolve the problem there Replace ev-ery 〈x, x′〉 by 〈Φ(x), Φ(x′)〉

Idea 2: Instead of computing Φ(x) ex-plicitly use a kernel functionk(x, x′) := 〈Φ(x), Φ(x′)〉A large class of functions are admis-sible as kernels

Non-vectorial data can be handled ifwe can compute meaningful k(x, x′)

Roadmap



Why kernels?



Dynamical Textures



The Basic Idea


Key Observation:

Trajectories are easily observableSimilar trajectories ⇒ similar systemsRestrict attention to interesting casesAverage over noise models

Kernels Using Dynamical Systems:

Simulate system for both inputsSimilar time evolution ⇒ similar inputs

Kernels on Dynamical Systems:

Restrict to interesting initial conditionsSimulate both the systemsSimilar time evolution ⇒ similar systems

Notation


X - state space (Hilbert space)

A - time evolution operators

T - time of measurement

µ - nice probability measure on T

Discounting Factors:For some λ > 0

µ(t) = λ−1e−λt for T = R+0

µ(t) =e−λt

1− e−λfor T = N0

Time Evolution:We study

xA(t) := A(t)x for A ∈ A

Trajectories and Kernels


Comparing Trajectories:Using the dot product on X we define a dot product on XT

〈θ, θ′〉 := Eµ[〈θ(t), θ′(t)〉] for θ, θ′ ∈ XT

Extending to Dynamical Systems:Identify a dynamical system with its trajectory and define

k((x,A), (x, A)) := Eµ

[〈A(t)x, A(t) x〉

]Other Ideas:

A nicely decaying measure required for convergenceModify the dot product in X

Covariance matrices?Rational kernels and transducers

Special Cases


Kernels on Dynamical Systems:

Restrict attention to x = x

Compare trajectory for identical initial conditionsTake expectation if interested in a range of x

k(A, A) := Ex

[k((x,A), (x, A))

]More generally

k(A, A) := EA EA Ex

[k((x,A), (x, A))

]Kernels Using Dynamical Systems:

Restrict attention to a particular dynamical systemAs before we can take expectations over A

k(x, x) := Ex Ex EA [k((x,A), (x,A))]

Discrete Linear Systems


Linear Systems:

We assume time propagation occurs as

xA(t + 1) = AxA(t) + at + ξt

In closed form

xA(t) = At x0 +

t∑i=0

At−i ξi + At−i at

To avoid messy math assume at = 0 and hence

xA(t) = At x0 +

t∑i=0

At−i ξi

Contribution to kernel due to A as well as noise

Continuous Linear Systems


Linear Systems:

Sytem dynamics here are described by

d

dtxA(t) = AxA(t) + a(t) + ξ(t)

Here ξ(t) with E[ξ(t)] = 0 is a stochastic process and

xA(t) = exp(A t)x0 +

∫ t

0

exp(A(t− τ ))(a(τ ) + ξ(τ ))dτ

As before we assume a(t) = 0

We even assume ξ(τ ) = 0 (avoids messy math again!)

xA(t) = exp(A t)x0

Kernel contribution only due to A

Convergence Criterion


Discrete Case:

Let A and B and W be linear operatorsThe matrix norms obey 0 ≤ ‖A‖, ‖B‖ ≤ Λ

For suitable λ with eλ > Λ2 and W � 0

M :=

∞∑t=0

e−λtAtWBt

Sylvester equation e−λAMB + W = M

Continuous Case:We define

M :=

∫ ∞

0

e−λt exp(At)>W exp(Bt) dt

Sylvester equation (A> + λ2 1)M + M(B + λ

2 1) = −W

Gory Details


Contribution due to A:

p∞∑t=0

e−λt〈Atx, Atx〉 := p · x>

[ ∞∑t=0

e−λt(At)>W At

]x

= p · x>M x

Contribution due to noise:

Eξ

p∞∑t=0

t∑j,j′=0

e−λt〈At−jξj, At−j′

ξj′〉

= p tr

(Cξ

[ ∞∑t=0

e−λt(At)>M At

]):= p tr(Cξ M)

In above equations p is a normalizing term

Delving Deeper


More on M and M :

The matrix M and M look like

M :=

[ ∞∑t=0

e−λt(At)>W At

]and

M :=

[ ∞∑t=0

e−λt(At)>M At

]Sylvester Equation:

Both M and M satisfy the Sylvester equation

e−λ A>M A +W = M and e−λ A> M A +M = M

Can be solved for in cubic time

Discrete Kernel


Discrete Case:

Putting it all together

k((A, x), (A, x)) = p[x>M x+ tr(CξM)

]Note that Cξ is the covariance matrix of ξt

Can assume different noise models per time step

Initial Conditions:

C be the covariance matrix of the initial conditionsIf we set x = x then

k((A, x), (A, x)) = p[tr(CM) + tr(CξM)

]

Continuous Kernel


Contribution due to A:

Since we assumed a(t) = ξ(t) = 0 we get

k((x,A), (x, A)) = λ−1

∫ ∞

0

e−λt〈exp(A t)x, exp(A t) x〉dt

The Final Form:

The kernel can be expressed as

k((x,A), (x, A)) = λ−1x>M x

where

(A> +λ

21)M + M>(A +

λ

21) = −W

Solution in cubic time by solving Sylvester equation

Special Cases


Snapshot:

If we consider only the snapshot at time instance T

k((x,A), (x, A)) = λ−1x exp(A t)W exp(A t)> x>

Initial Conditions:

Fix A = A

Now we just solve

M = −1

2(A+

λ

21)−1W

Dynamical Systems:

Fix x = x to get k(A, A) = λ−1 tr(MC)

Here C is the covariance matrix of initial conditions

Graph Kernels


Graph Laplacian:

Let E be the adjacency matrix and D := diag(E 1)

L := E −D and L := D−12LD−1

2

Diffusion Process:

We can define a diffusion process by

d

dtx(t) = Lx(t)

Diffusion Kernel (Kondor and Lafferty, 2002):

If we measure overlap at time instance T we get

K = exp(LT )> exp(LT )

Kij is the probability that state l reached from i and j

Graph Kernels


Undirected Graphs (Kondor and Lafferty, 2002):

Here L is symmetric and hence yields

K = exp(2LT )

Labeled Graphs (Gärtner, 2002):

If W acts as an indicator for node labelsSay Wij = 1 if two nodes have same labelFor other fancy weights see (Kashima et al, 2003)

Averaged Graph Laplacian:

If we average over a range of T values

K =1

2

(L +

λ

21

)−1

Roadmap



Why kernels?



Dynamical Textures



ARMA Models


ARMA Model:

An auto-regressive moving average model is

x(t + 1) = Ax(t) + B v(t)

y(t) = φ(x(t)) + w(t)

x(t) is a hidden variablev(t) and w(t) are IID random noise

Linear Gaussian Model:

If φ is linear and the noise is white Gaussian:

x(t + 1) = Ax(t) + v(t) v(t) ∼ N(0, Q)

y(t) = C x(t) + w(t) w(t) ∼ N(0, R)

Fix scaling by demanding that C>C = 1

Dynamic Textures


Image Model:

y(t) ∈ Rm are the observed noisy imagesx(t) ∈ Rn (n < m) are hidden variables

Modeling:

A sequence of images {y(1), . . . , y(τ )} is observedIdeally we want to solve

A(τ ),C(τ ), Q(τ ), R(τ ) = arg maxA,C,Q,R

p(y(1), . . . , y(τ ))

Exact Solution:

n4sid in MATLAB solves above problemDoes not scale well if m is largeImpractical for images where m ∼ 105

Approximate Solution


Problem To Solve:

For any variable z(t) define Zτi := [z(i), . . . , z(τ )]

We are solving

Y τ1 = CXτ

1 + W τ1 with C>C = 1

Solving By SVD:

Solving for arg minC,Xτ1‖W‖ yields

C(τ ) = U and X(τ ) = ΣV > where Y τ1 = UΣV >

Solving for arg minA ‖Xτ2 −AXτ

1‖ yields

A(τ ) = ΣV >D1V (V >D2V )−1Σ−1

Here D1 =

[0 0

1(τ−1) 0

]and D2 =

[1(τ−1) 0

0 0

]

Dynamic Texture Kernel


Kernel Definition:

Estimate model and compute kernels between modelsIf we average out the noise then for some W � 0

k((x0,A,C), (x′0,A′,C′)) := E

v,w

[ ∞∑t=1

e−λty>t Wy′t

]Kernel Computation:

The kernel can be computed as

k = x>0 Mx′0 +(eλ − 1

)−1tr[QM + WR

]The matrices M and M satisfy

M = e−λ A>C>WC ′A′ +e−λ A>M A′

M = C>W C′ +e−λ A> M A′

Experimental Setup


Typical Textures:

Some sample textures

A long clip was cut to shorter clips of 120 frames each

Freak Textures:

We also collected some freak textures

Results


Kernel Induced Metric:

Clips closer on a axis are from the same master clipWe plot the kernel induced metric for λ = 0.9 and 0.1

Results fairly independent of the cholice of λ

Notice the block diagonal structure of the metric matrix

Roadmap



Why kernels?



Dynamical Textures



Conclusion


A new method to embed dynamical systems

Analytical solutions for linear systems

Many graph kernels are special cases

Analytical solutions require cubic time

Are better solutions possible for special cases?

Extensions to nonlinear systems?

Application to dynamical textures

Works with approximate model parameters

Picks out clips from the same master clip

Close relations to rational kernels of Cortes et. al.

More information at http://mlg.anu.edu.au/~vishy

http://mlg.anu.edu.au/~vishy


Questions?

Kernels for Dynamic Textures - Purdue Universityvishy/talks/Dynamic.pdf · 2009. 8. 22. · Dynamic...

Documents

Transcript of Kernels for Dynamic Textures - Purdue Universityvishy/talks/Dynamic.pdf · 2009. 8. 22. · Dynamic...