TWO-STREA MCON VOLUTIONA LNETWORKS FO … · TWO-STREA MCON VOLUTIONA LNETWORKS FO RDYNAMICTEXTUR...
Transcript of TWO-STREA MCON VOLUTIONA LNETWORKS FO … · TWO-STREA MCON VOLUTIONA LNETWORKS FO RDYNAMICTEXTUR...
[1] Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. Texture synthesis using convolutional neural networks. NIPS 2015.[2] Dmitry Ulyanov, Vadim Lebedev, Andrea Vedaldi, and Victor S. Lempitsky. Texture networks: Feed-forward synthesis of textures and stylized images. ICML 2016.[3] Konstantinos G. Derpanis and Richard P. Wildes. Spacetime texture representation and recognition based on a spatiotemporal orientation analysis. PAMI 2012.[4] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.
fireplace_1(original)
fireplace_1(synthesized)
fish(original)
appearancestream only
both streams
Dynamic Texture Synthesis Dynamics Style Transfer
synthesized output
dynamics targetappearancetarget
Dynamics Stream network trained for optical flow prediction inan appearance-invariant manner. Models the dynamics of theinput dynamic texture.
Dynamics Stream
encode
encode
encode
input(Nx2xHxWx1)
contrastnorm
conv(3x3)
64 filters
conv(1x1)2 filters
ReLU flow
decode
aEPE
targetflow
channelconcat
(x2)
(x4)
conv(2x11x11)32 filters
rectifymax
stride 1
pool(5x5)
conv(1x1)
64 filters
L1norm
encode
(x½)
(x½)
(VGG19 [4])
Overall Architecture
Iteratively coerce an initial Gaussian noise sequence such thatits spatiotemporal statistics from each stream match those of aninputs dynamic texture. This is done by optimizing (3) w.r.t. thespacetime volume (initially noise).
(3)
(2) where is the number ofConvNet layers being used in the
appearance stream, models the synthesized texture dynamics, and modelsthe target texture dynamics averaged across time.
(1) where is the number of ConvNetlayers being used in the dynamics
stream, is the number of generated frames, is the Frobenius norm, isthe Gram matrix that models the synthesized texture appearance, and modelsthe target texture appearance averaged across time.
1. Motivated by the recent successes in texture synthesis usingConvNets [1, 2], we present a novel, two-stream model ofdynamic texture synthesis to capture both appearanceand dynamics.
2. A novel network architecture (which is motivated by thespacetime oriented energy model of [3]) designed to computeoptical flow in an appearance-invariant manner, servingas the dynamics stream of our dynamic texture synthesismodel.
3. A two-stream model that enables dynamics styletransfer, where the appearance and dynamics from differentsources can be combined to generate a novel texture.
We introduce a two-stream model for dynamic texture synthesis based on pre-trained convolutional networks (ConvNets) that targettwo independent tasks: object recognition and optical flow prediction. Given an input dynamic texture, the object recognitionConvNet models the per-frame appearance of the input texture, while the optical flow ConvNet models its dynamics. To generatea novel texture, a noise sequence is optimized to match the feature statistics from each stream of the input texture.
TWO-STREAM CONVOLUTIONAL NETWORKSFOR DYNAMIC TEXTURE SYNTHESISMatthewTesfaldet, MarcusA.Brubaker - York University; Konstantinos G. Derpanis - Ryerson University{mtesfald, mab}@eecs.yorku.ca, [email protected]