DeepConv-DTI: Prediction of drug-target interactions via ...
Video Prediction via Example Guidance
Transcript of Video Prediction via Example Guidance
Video Prediction via Example Guidance
ICML 2020 Poster
Presented by Yueyu Hu
STRUCT Paper Reading
12020/12/14
Video Prediction
• Pioneer work, moving MNIST• Unsupervised Learning of Video Representations using LSTMs ICML'15
2020/12/14 7
VAE Powered Methods• Stochastic Video Generation with a Learned Prior, ICML'18
2020/12/14 8
Problems
• Prior Gaussian distribution?• Insufficient to cover future possibilities
• Multi-modal motion pattern• E.g. Moving MNIST: up or down?
• Sampling efficiency• How many samples are required to achieve
accurate prediction?
2020/12/14 9
Existing solutions
• External information• Bounding boxes / Skeleton (pose)• Compositional Video Prediction, ICCV'19 (CMU/FAIR)
2020/12/14 10
Insight & Claims
• “In contrast to these works above, we are motivated by one insight that prediction is based on similarity between the current situation and the past experiences.”
• Optimization, explicit distribution modeling• GAN, plausible predicted samples• Human skeleton topology, preserved• Real-world model
2020/12/14 11
Step 1: Disentangle
• Adopts existing methods:• Stochastic video generation with a learned prior, ICML’18
• Disentangle model used in this work• Unsupervised Keypoint Learning for Guiding Class-
Conditional Video Prediction, NeurIPS’19• Pretrained models used as pose extractor in this work
2020/12/14 12
Step 2: Retrieval
• Sequence X, motion feature F• , the whole training set• Nearest neighbor search, top K features
2020/12/14 13
What does it find?
• They have common pasts• They are non-Gaussian
2020/12/14 14
Next: Prediction
• Existing approaches
• z: latent representation• How to get z? Usually with a neural network
2020/12/14 15
(0,1)~zq
1. . )(z te g fz φ −=
The problem
Approach
• Replace with a new one
• Get prior from samples• Make the predicted close to the prior• Issues: lack diversity of ; distribution of z
infeasible to represent the samples
2020/12/14 16
(0,1)~zq
( || )KLD p q
,t tµ σ
ˆ ˆ,t tµ σ
Methods
• Calculated mean and variance• Sample z from this dist. • Predict multiple instances
2020/12/14 17
Best prediction j
Experiment
• Datasets:• Moving MNIST• BAIR Robot Push• PennAction (SVG / Keypoint settings)
2020/12/14 18
Moving MNIST
• Deterministic and Stochastic settings• D: Feed in motion information• S: Select best from 20 samples
2020/12/14 19
Robot Arm
• Better trajectory, saturating at 100 samples• K saturates at 5
2020/12/14 20
Penn Action• Class label and first frame are fed as inputs• Action Recognition and Fr´echet Video Distance
2020/12/14 21
Cross Class Action PredictionFacilitated by the guidance of examples, our model
produces a visually natural tennis serve sequence,
which clearly demonstrates the generalization
capability of proposed model. We argue that the
majority of previous works are (implicitly) forced
to memorize motion categories in the training set.
In contrast to the paradigm, our work is relieved from
such burden because the retrieved examples contain
the category information in assistance of prediction.
We thus focus only on intra-class diversity. If given
examples with unseen motion categories, our model is
still able to give reasonable predictions, thanks to the
example guidance.
2020/12/14 22
Conclusion
• Sampling methods might be a good idea• Video prediction techniques are still too far
away from being utilized in practical video coding
2020/12/14 23
Thanks
242020/12/14