Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

45
Article overview by Ilya Kuzovkin William Lotter, Gabriel Kreiman & David Cox Computational Neuroscience Seminar University of Tartu 2015 Harvard University, Cambridge, USA Unsupervised Learning of Visual Structure Using Predictive Generative Networks

Transcript of Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

Page 1: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

Article overview by Ilya Kuzovkin

William Lotter, Gabriel Kreiman & David Cox

Computational Neuroscience Seminar University of Tartu

2015

Harvard University, Cambridge, USA

Unsupervised Learning of Visual Structure Using Predictive Generative Networks

Page 2: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

The idea of predictive coding in neuroscience

Page 3: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

“state-of-the-art deep learning models rely on

millions of labeled training examples to learn”

Page 4: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

“state-of-the-art deep learning models rely on

millions of labeled training examples to learn”

“in contrast to biological systems, where learning is

largely unsupervised”

Page 5: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

“state-of-the-art deep learning models rely on

millions of labeled training examples to learn”

“in contrast to biological systems, where learning is

largely unsupervised”

“we explore the idea that prediction is not only a

useful end-goal, but may also serve as a powerful unsupervised learning

signal”

Page 6: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

PART I THE IDEA OF PREDICTIVE ENCODER

"prediction may also serve as a powerful unsupervised learning signal"

Page 7: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)

vs.

Page 8: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

input output

“bottleneck”

AUTOENCODER

Page 9: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

input output

“bottleneck”

AUTOENCODER

Page 10: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

input output

“bottleneck”

AUTOENCODER

Page 11: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

input output

“bottleneck”

Reconstruction

AUTOENCODER

Page 12: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

input output

“bottleneck”

Reconstruction

AUTOENCODER

Page 13: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

input output

“bottleneck”

Can we do prediction?

Reconstruction

AUTOENCODER

Page 14: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)

vs.

Page 15: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

RECURRENT NEURAL NETWORK

Page 16: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

RECURRENT NEURAL NETWORK

Page 17: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

RECURRENT NEURAL NETWORK

Page 18: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

RECURRENT NEURAL NETWORK

Page 19: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)

vs.

Page 20: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)

vs.

Convolution

ReLu

Max-pooling

2x {

Page 21: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)

vs.

Long Short-Term Memory (LSTM)

5 - 15 steps

1024 units

Convolution

ReLu

Max-pooling

2x {

Page 22: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)

vs.

Long Short-Term Memory (LSTM)

5 - 15 steps

1024 units

Convolution

ReLu

Max-pooling

2x { 2 layers NN upsampling

ReLuConvolution

Page 23: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)

vs.

Long Short-Term Memory (LSTM)

5 - 15 steps

1024 units

Convolution

ReLu

Max-pooling

2x {MSE loss RMSProp optimizer LR 0.001

2 layers NN upsampling

ReLuConvolution

Page 24: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

PREDICTIVE GENERATIVE NETWORK (a.k.a “Predictive Encoder” Palm 2012)

vs.

Long Short-Term Memory (LSTM)

5 - 15 steps

1024 unitshttp://keras.io

2 layers NN upsampling

Convolution ReLuReLu Convolution

Max-pooling

2x {MSE loss RMSProp optimizer LR 0.001

Page 25: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks
Page 26: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

PART II ADVERSARIAL LOSS

"the generator is trained to maximally confuse the adversarial discriminator"

Page 27: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks
Page 28: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

vs.

Long Short-Term Memory (LSTM)

5 - 15 steps

1568 units

Fully connected layer2 layers NN upsampling

Convolution ReLu

ReLu Convolution

Max-pooling

2x {MSE loss RMSProp optimizer LR 0.001

Page 29: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

vs.

Long Short-Term Memory (LSTM)

5 - 15 steps

1568 units

Fully connected layer2 layers NN upsampling

Convolution ReLu

ReLu Convolution

Max-pooling

2x {MSE loss RMSProp optimizer LR 0.001

Page 30: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

MSE loss

Page 31: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

MSE loss

Page 32: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

MSE loss

Page 33: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

3 FC layers (relu, relu, softmax)

MSE loss

Page 34: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

3 FC layers (relu, relu, softmax)

"trained to maximize the probability that a proposed frame came from the ground truth data and minimize it when it is produced by the generator"

MSE loss

Page 35: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

3 FC layers (relu, relu, softmax)

"trained to maximize the probability that a proposed frame came from the ground truth data and minimize it when it is produced by the generator"

AL loss to train PGN

MSE lossAL loss

Page 36: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

3 FC layers (relu, relu, softmax)

"trained to maximize the probability that a proposed frame came from the ground truth data and minimize it when it is produced by the generator"

AL loss to train PGN

MSE lossAL loss

Page 37: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

“with adversarial loss alone the generator easily found solutions that fooled the discriminator, but did not look anything like the correct samples”

MSE model is fairly faithful to the identities of the faces, but produces blurred versions

combined AL/MSE model tends to underfit the identity towards a more average face

Page 38: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

PART III INTERNAL REPRESENTATIONS AND LATENT VARIABLES

"we are interested in understanding the representations learned by the models"

Page 39: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

PGN model LSTM activities L2 regression Value of a latent variable

Page 40: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

PGN model LSTM activities L2 regression Value of a latent variable

Page 41: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

“An MDS algorithm aims to place each object in N-dimensional space such that the between-object distances are preserved as well as possible.”

MULTIDIMENSIONAL SCALING

Page 42: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

PART IV USEFULNESS OF PREDICTIVE LEARNING

"representations trained with a predictive loss outperform other models of comparable complexity in a supervised

classification problem"

Page 43: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

THE TASK: 50 randomly generated faces (12 angles per each)

Generative models:

Internal representation SVM Identify

class

Page 44: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks

THE TASK: 50 randomly generated faces (12 angles per each)

Generative models:

Internal representation SVM Identify

class

• Encoder-LSTM-Decoder to predict next frame (PGN) • Encoder-LSTM-Decoder to predict last frame (AE LSTM dynamic) • Encoder-LSTM-Decoder on frames made into static movies (AE LSTM static) • Encoder-FC-Decoder with #weights as in LSTM (AE FC #weights) • Encoder-FC-Decoder with #units as in LSTM (AE FC #units)

Page 45: Article overview: Unsupervised Learning of Visual Structure Using Predictive Generative Networks