Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William...
Transcript of Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William...
![Page 1: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/1.jpg)
Article overview by Ilya Kuzovkin
William Lotter, Gabriel Kreiman & David Cox
Computational Neuroscience Seminar University of Tartu
2015
Harvard University, Cambridge, USA
Unsupervised Learning of Visual Structure Using Predictive Generative Networks
![Page 2: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/2.jpg)
The idea of predictive coding in neuroscience
![Page 3: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/3.jpg)
“state-of-the-art deep learning models rely on
millions of labeled training examples to learn”
![Page 4: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/4.jpg)
“state-of-the-art deep learning models rely on
millions of labeled training examples to learn”
“in contrast to biological systems, where learning is
largely unsupervised”
![Page 5: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/5.jpg)
“state-of-the-art deep learning models rely on
millions of labeled training examples to learn”
“in contrast to biological systems, where learning is
largely unsupervised”
“we explore the idea that prediction is not only a
useful end-goal, but may also serve as a powerful unsupervised learning
signal”
![Page 6: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/6.jpg)
PART I THE IDEA OF PREDICTIVE ENCODER
"prediction may also serve as a powerful unsupervised learning signal"
![Page 7: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/7.jpg)
PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)
vs.
![Page 8: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/8.jpg)
input output
“bottleneck”
AUTOENCODER
![Page 9: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/9.jpg)
input output
“bottleneck”
AUTOENCODER
![Page 10: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/10.jpg)
input output
“bottleneck”
AUTOENCODER
![Page 11: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/11.jpg)
input output
“bottleneck”
Reconstruction
AUTOENCODER
![Page 12: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/12.jpg)
input output
“bottleneck”
Reconstruction
AUTOENCODER
![Page 13: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/13.jpg)
input output
“bottleneck”
Can we do prediction?
Reconstruction
AUTOENCODER
![Page 14: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/14.jpg)
PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)
vs.
![Page 15: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/15.jpg)
RECURRENT NEURAL NETWORK
![Page 16: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/16.jpg)
RECURRENT NEURAL NETWORK
![Page 17: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/17.jpg)
RECURRENT NEURAL NETWORK
![Page 18: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/18.jpg)
RECURRENT NEURAL NETWORK
![Page 19: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/19.jpg)
PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)
vs.
![Page 20: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/20.jpg)
PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)
vs.
Convolution
ReLu
Max-pooling
2x {
![Page 21: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/21.jpg)
PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)
vs.
Long Short-Term Memory (LSTM)
5 - 15 steps
1024 units
Convolution
ReLu
Max-pooling
2x {
![Page 22: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/22.jpg)
PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)
vs.
Long Short-Term Memory (LSTM)
5 - 15 steps
1024 units
Convolution
ReLu
Max-pooling
2x { 2 layers NN upsampling
ReLuConvolution
![Page 23: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/23.jpg)
PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)
vs.
Long Short-Term Memory (LSTM)
5 - 15 steps
1024 units
Convolution
ReLu
Max-pooling
2x {MSE loss RMSProp optimizer LR 0.001
2 layers NN upsampling
ReLuConvolution
![Page 24: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/24.jpg)
PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)
vs.
Long Short-Term Memory (LSTM)
5 - 15 steps
1024 unitshttp://keras.io
2 layers NN upsampling
Convolution ReLuReLu Convolution
Max-pooling
2x {MSE loss RMSProp optimizer LR 0.001
![Page 25: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/25.jpg)
![Page 26: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/26.jpg)
PART II ADVERSARIAL LOSS
"the generator is trained to maximally confuse the adversarial discriminator"
![Page 27: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/27.jpg)
![Page 28: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/28.jpg)
vs.
Long Short-Term Memory (LSTM)
5 - 15 steps
1568 units
Fully connected layer2 layers NN upsampling
Convolution ReLu
ReLu Convolution
Max-pooling
2x {MSE loss RMSProp optimizer LR 0.001
![Page 29: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/29.jpg)
vs.
Long Short-Term Memory (LSTM)
5 - 15 steps
1568 units
Fully connected layer2 layers NN upsampling
Convolution ReLu
ReLu Convolution
Max-pooling
2x {MSE loss RMSProp optimizer LR 0.001
![Page 30: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/30.jpg)
MSE loss
![Page 31: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/31.jpg)
MSE loss
![Page 32: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/32.jpg)
MSE loss
![Page 33: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/33.jpg)
3 FC layers (relu, relu, softmax)
MSE loss
![Page 34: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/34.jpg)
3 FC layers (relu, relu, softmax)
"trained to maximize the probability that a proposed frame came from the ground truth data and minimize it when it is produced by the generator"
MSE loss
![Page 35: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/35.jpg)
3 FC layers (relu, relu, softmax)
"trained to maximize the probability that a proposed frame came from the ground truth data and minimize it when it is produced by the generator"
AL loss to train PGN
MSE lossAL loss
![Page 36: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/36.jpg)
3 FC layers (relu, relu, softmax)
"trained to maximize the probability that a proposed frame came from the ground truth data and minimize it when it is produced by the generator"
AL loss to train PGN
MSE lossAL loss
![Page 37: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/37.jpg)
“with adversarial loss alone the generator easily found solutions that fooled the discriminator, but did not look anything like the correct samples”
MSE model is fairly faithful to the identities of the faces, but produces blurred versions
combined AL/MSE model tends to underfit the identity towards a more average face
![Page 38: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/38.jpg)
PART III INTERNAL REPRESENTATIONS AND LATENT VARIABLES
"we are interested in understanding the representations learned by the models"
![Page 39: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/39.jpg)
PGN model LSTM activities L2 regression Value of a latent variable
![Page 40: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/40.jpg)
PGN model LSTM activities L2 regression Value of a latent variable
![Page 41: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/41.jpg)
“An MDS algorithm aims to place each object in N-dimensional space such that the between-object distances are preserved as well as possible.”
MULTIDIMENSIONAL SCALING
![Page 42: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/42.jpg)
PART IV USEFULNESS OF PREDICTIVE LEARNING
"representations trained with a predictive loss outperform other models of comparable complexity in a supervised
classification problem"
![Page 43: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/43.jpg)
THE TASK: 50 randomly generated faces (12 angles per each)
Generative models:
Internal representation SVM Identify
class
![Page 44: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/44.jpg)
THE TASK: 50 randomly generated faces (12 angles per each)
Generative models:
Internal representation SVM Identify
class
• Encoder-LSTM-Decoder to predict next frame (PGN) • Encoder-LSTM-Decoder to predict last frame (AE LSTM dynamic) • Encoder-LSTM-Decoder on frames made into static movies (AE LSTM static) • Encoder-FC-Decoder with #weights as in LSTM (AE FC #weights) • Encoder-FC-Decoder with #units as in LSTM (AE FC #units)
![Page 45: Visual Structure with Predictive Generative Networks · Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University](https://reader034.fdocuments.in/reader034/viewer/2022043007/5f94dfbf3ccbc60f026c49fd/html5/thumbnails/45.jpg)