Kernels for Dynamic Textures - Purdue Universityvishy/talks/Dynamic.pdf · 2009. 8. 22. · Dynamic...
Transcript of Kernels for Dynamic Textures - Purdue Universityvishy/talks/Dynamic.pdf · 2009. 8. 22. · Dynamic...
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 1
Kernels for Dynamic TexturesS.V.N. Vishwanathan
[email protected]://web.anu.edu.au/~vishy
National ICT Australiaand
Australian National University
Joint work with Alex Smola and René Vidal
Roadmap
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 2
Introduction to Kernel Methods
Why kernels?
Kernels on Dynamical Systems
Trajectories, Noise ModelsComputation
Dynamical Textures
ARMA ModelsApproximate SolutionsKernel ComputationExperiments
Outlook and Conclusion
Classification
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 3
Data:
Pairs of observations (xi, yi)
Underlying distribution P(x, y)
Examples (blood status, cancer), (transactions, fraud)
Task:
Find a function f (x) which predicts y given x
The function f (x) must generalize well
Optimal Separating Hyperplane
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 4
Minimize1
2‖w‖2 subject to yi(〈w, xi〉 + b) ≥ 1 for all i
Kernels and Nonlinearity
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 5
Problem: Linear functions are often toosimple to provide good estimators
Idea 1: Map to a higher dimensionalfeature space via Φ : x → Φ(x) andsolve the problem there Replace ev-ery 〈x, x′〉 by 〈Φ(x), Φ(x′)〉
Idea 2: Instead of computing Φ(x) ex-plicitly use a kernel functionk(x, x′) := 〈Φ(x), Φ(x′)〉A large class of functions are admis-sible as kernels
Non-vectorial data can be handled ifwe can compute meaningful k(x, x′)
Roadmap
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 6
Introduction to Kernel Methods
Why kernels?
Kernels on Dynamical Systems
Trajectories, Noise ModelsComputation
Dynamical Textures
ARMA ModelsApproximate SolutionsKernel ComputationExperiments
Outlook and Conclusion
The Basic Idea
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 7
Key Observation:
Trajectories are easily observableSimilar trajectories ⇒ similar systemsRestrict attention to interesting casesAverage over noise models
Kernels Using Dynamical Systems:
Simulate system for both inputsSimilar time evolution ⇒ similar inputs
Kernels on Dynamical Systems:
Restrict to interesting initial conditionsSimulate both the systemsSimilar time evolution ⇒ similar systems
Notation
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 8
X - state space (Hilbert space)
A - time evolution operators
T - time of measurement
µ - nice probability measure on T
Discounting Factors:For some λ > 0
µ(t) = λ−1e−λt for T = R+0
µ(t) =e−λt
1− e−λfor T = N0
Time Evolution:We study
xA(t) := A(t)x for A ∈ A
Trajectories and Kernels
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 9
Comparing Trajectories:Using the dot product on X we define a dot product on XT
〈θ, θ′〉 := Eµ[〈θ(t), θ′(t)〉] for θ, θ′ ∈ XT
Extending to Dynamical Systems:Identify a dynamical system with its trajectory and define
k((x,A), (x, A)) := Eµ
[〈A(t)x, A(t) x〉
]Other Ideas:
A nicely decaying measure required for convergenceModify the dot product in X
Covariance matrices?Rational kernels and transducers
Special Cases
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 10
Kernels on Dynamical Systems:
Restrict attention to x = x
Compare trajectory for identical initial conditionsTake expectation if interested in a range of x
k(A, A) := Ex
[k((x,A), (x, A))
]More generally
k(A, A) := EA EA Ex
[k((x,A), (x, A))
]Kernels Using Dynamical Systems:
Restrict attention to a particular dynamical systemAs before we can take expectations over A
k(x, x) := Ex Ex EA [k((x,A), (x,A))]
Discrete Linear Systems
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 11
Linear Systems:
We assume time propagation occurs as
xA(t + 1) = AxA(t) + at + ξt
In closed form
xA(t) = At x0 +
t∑i=0
At−i ξi + At−i at
To avoid messy math assume at = 0 and hence
xA(t) = At x0 +
t∑i=0
At−i ξi
Contribution to kernel due to A as well as noise
Continuous Linear Systems
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 12
Linear Systems:
Sytem dynamics here are described by
d
dtxA(t) = AxA(t) + a(t) + ξ(t)
Here ξ(t) with E[ξ(t)] = 0 is a stochastic process and
xA(t) = exp(A t)x0 +
∫ t
0
exp(A(t− τ ))(a(τ ) + ξ(τ ))dτ
As before we assume a(t) = 0
We even assume ξ(τ ) = 0 (avoids messy math again!)
xA(t) = exp(A t)x0
Kernel contribution only due to A
Convergence Criterion
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 13
Discrete Case:
Let A and B and W be linear operatorsThe matrix norms obey 0 ≤ ‖A‖, ‖B‖ ≤ Λ
For suitable λ with eλ > Λ2 and W � 0
M :=
∞∑t=0
e−λtAtWBt
Sylvester equation e−λAMB + W = M
Continuous Case:We define
M :=
∫ ∞
0
e−λt exp(At)>W exp(Bt) dt
Sylvester equation (A> + λ2 1)M + M(B + λ
2 1) = −W
Gory Details
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 14
Contribution due to A:
p∞∑t=0
e−λt〈Atx, Atx〉 := p · x>
[ ∞∑t=0
e−λt(At)>W At
]x
= p · x>M x
Contribution due to noise:
Eξ
p∞∑t=0
t∑j,j′=0
e−λt〈At−jξj, At−j′
ξj′〉
= p tr
(Cξ
[ ∞∑t=0
e−λt(At)>M At
]):= p tr(Cξ M)
In above equations p is a normalizing term
Delving Deeper
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 15
More on M and M :
The matrix M and M look like
M :=
[ ∞∑t=0
e−λt(At)>W At
]and
M :=
[ ∞∑t=0
e−λt(At)>M At
]Sylvester Equation:
Both M and M satisfy the Sylvester equation
e−λ A>M A +W = M and e−λ A> M A +M = M
Can be solved for in cubic time
Discrete Kernel
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 16
Discrete Case:
Putting it all together
k((A, x), (A, x)) = p[x>M x+ tr(CξM)
]Note that Cξ is the covariance matrix of ξt
Can assume different noise models per time step
Initial Conditions:
C be the covariance matrix of the initial conditionsIf we set x = x then
k((A, x), (A, x)) = p[tr(CM) + tr(CξM)
]
Continuous Kernel
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 17
Contribution due to A:
Since we assumed a(t) = ξ(t) = 0 we get
k((x,A), (x, A)) = λ−1
∫ ∞
0
e−λt〈exp(A t)x, exp(A t) x〉dt
The Final Form:
The kernel can be expressed as
k((x,A), (x, A)) = λ−1x>M x
where
(A> +λ
21)M + M>(A +
λ
21) = −W
Solution in cubic time by solving Sylvester equation
Special Cases
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 18
Snapshot:
If we consider only the snapshot at time instance T
k((x,A), (x, A)) = λ−1x exp(A t)W exp(A t)> x>
Initial Conditions:
Fix A = A
Now we just solve
M = −1
2(A+
λ
21)−1W
Dynamical Systems:
Fix x = x to get k(A, A) = λ−1 tr(MC)
Here C is the covariance matrix of initial conditions
Graph Kernels
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 19
Graph Laplacian:
Let E be the adjacency matrix and D := diag(E 1)
L := E −D and L := D−12LD−1
2
Diffusion Process:
We can define a diffusion process by
d
dtx(t) = Lx(t)
Diffusion Kernel (Kondor and Lafferty, 2002):
If we measure overlap at time instance T we get
K = exp(LT )> exp(LT )
Kij is the probability that state l reached from i and j
Graph Kernels
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 20
Undirected Graphs (Kondor and Lafferty, 2002):
Here L is symmetric and hence yields
K = exp(2LT )
Labeled Graphs (Gärtner, 2002):
If W acts as an indicator for node labelsSay Wij = 1 if two nodes have same labelFor other fancy weights see (Kashima et al, 2003)
Averaged Graph Laplacian:
If we average over a range of T values
K =1
2
(L +
λ
21
)−1
Roadmap
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 21
Introduction to Kernel Methods
Why kernels?
Kernels on Dynamical Systems
Trajectories, Noise ModelsComputation
Dynamical Textures
ARMA ModelsApproximate SolutionsKernel ComputationExperiments
Outlook and Conclusion
ARMA Models
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 22
ARMA Model:
An auto-regressive moving average model is
x(t + 1) = Ax(t) + B v(t)
y(t) = φ(x(t)) + w(t)
x(t) is a hidden variablev(t) and w(t) are IID random noise
Linear Gaussian Model:
If φ is linear and the noise is white Gaussian:
x(t + 1) = Ax(t) + v(t) v(t) ∼ N(0, Q)
y(t) = C x(t) + w(t) w(t) ∼ N(0, R)
Fix scaling by demanding that C>C = 1
Dynamic Textures
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 23
Image Model:
y(t) ∈ Rm are the observed noisy imagesx(t) ∈ Rn (n < m) are hidden variables
Modeling:
A sequence of images {y(1), . . . , y(τ )} is observedIdeally we want to solve
A(τ ),C(τ ), Q(τ ), R(τ ) = arg maxA,C,Q,R
p(y(1), . . . , y(τ ))
Exact Solution:
n4sid in MATLAB solves above problemDoes not scale well if m is largeImpractical for images where m ∼ 105
Approximate Solution
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 24
Problem To Solve:
For any variable z(t) define Zτi := [z(i), . . . , z(τ )]
We are solving
Y τ1 = CXτ
1 + W τ1 with C>C = 1
Solving By SVD:
Solving for arg minC,Xτ1‖W‖ yields
C(τ ) = U and X(τ ) = ΣV > where Y τ1 = UΣV >
Solving for arg minA ‖Xτ2 −AXτ
1‖ yields
A(τ ) = ΣV >D1V (V >D2V )−1Σ−1
Here D1 =
[0 0
1(τ−1) 0
]and D2 =
[1(τ−1) 0
0 0
]
Dynamic Texture Kernel
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 25
Kernel Definition:
Estimate model and compute kernels between modelsIf we average out the noise then for some W � 0
k((x0,A,C), (x′0,A′,C′)) := E
v,w
[ ∞∑t=1
e−λty>t Wy′t
]Kernel Computation:
The kernel can be computed as
k = x>0 Mx′0 +(eλ − 1
)−1tr[QM + WR
]The matrices M and M satisfy
M = e−λ A>C>WC ′A′ +e−λ A>M A′
M = C>W C′ +e−λ A> M A′
Experimental Setup
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 26
Typical Textures:
Some sample textures
A long clip was cut to shorter clips of 120 frames each
Freak Textures:
We also collected some freak textures
Results
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 27
Kernel Induced Metric:
Clips closer on a axis are from the same master clipWe plot the kernel induced metric for λ = 0.9 and 0.1
Results fairly independent of the cholice of λ
Notice the block diagonal structure of the metric matrix
Roadmap
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 28
Introduction to Kernel Methods
Why kernels?
Kernels on Dynamical Systems
Trajectories, Noise ModelsComputation
Dynamical Textures
ARMA ModelsApproximate SolutionsKernel ComputationExperiments
Outlook and Conclusion
Conclusion
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 29
A new method to embed dynamical systems
Analytical solutions for linear systems
Many graph kernels are special cases
Analytical solutions require cubic time
Are better solutions possible for special cases?
Extensions to nonlinear systems?
Application to dynamical textures
Works with approximate model parameters
Picks out clips from the same master clip
Close relations to rational kernels of Cortes et. al.
More information at http://mlg.anu.edu.au/~vishy
S.V.N. Vishwanathan: Kernels for Dynamic Textures, Page 30
Questions?