Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models...

91
The Success of Deep Generative Models Jakub Tomczak AMLAB, University of Amsterdam CERN, 2018

Transcript of Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models...

Page 1: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

The Success of Deep Generative ModelsJakub TomczakAMLAB, University of Amsterdam

CERN, 2018

Page 2: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

What is AI about?

Page 3: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

What is AI about?

Decision making:

Page 4: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

What is AI about?

Decision making: High probability of the red label.

=Highly probable

decision!

new data

Page 5: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

What is AI about?

Decision making:

Understanding:

High probability of the red label.

=Highly probable

decision!

new data

Page 6: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

What is AI about?

Decision making:

Understanding:

new data

High probability of the red label.

=Highly probable

decision!

High probability of the red label.

xLow probability

of the object=

Uncertain decision!

Page 7: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

What is generative modeling about?

Understanding:

finding underlying factors (discovery)

predicting and anticipating future events (planning)

finding analogies (transfer learning)

detecting rare events (anomaly detection)

decision making

Page 8: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Why generative modeling?

Why?

Less labeled data

Compression

Uncertainty

Hidden structureData simulation

Exploration

Page 9: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Generative modeling: How?

How?

Fully-observed (e.g., PixelCNN)

Implicit models(e.g., GANs)

Prescribed models(e.g., VAE)

Latent variable models

Page 10: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Recent successes: Style transfer

Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. CVPR 2017.

Page 11: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Recent successes: Image generation

generated

real

Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. ICLR 2017.

Page 12: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Recent successes: Text generation

Yang, Z., Hu, Z., Salakhutdinov, R., & Berg-Kirkpatrick, T. (2017). Improved variational autoencoders for text modeling using dilated convolutions. ICML 2017

Page 13: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Recent successes: Audio generation

van den Oord, A., & Vinyals, O. (2017). Neural discrete representation learning. NIPS 2017.

reconstruction generation

Page 14: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Recent successes: Reinforcement learning

Ha, D., & Schmidhuber, J. (2018). World models. arXiv preprint. arXiv preprint arXiv:1803.10122.

Page 15: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Recent successes: Drug discovery

Gómez-Bombarelli, R., et al. (2018). Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules ACS Cent.

Kusner, M. J., Paige, B., & Hernández-Lobato, J. M. (2017). Grammar variational autoencoder. arXiv preprint arXiv:1703.01925.

Page 16: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Recent successes: Physics (interacting systems)

Kipf, T., Fetaya, E., Wang, K. C., Welling, M., & Zemel, R. (2018). Neural relational inference for interacting systems. ICML 2018.

Page 17: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Generative modeling: Auto-regressive models

General idea is to factorise the joint distribution:

and use neural networks (e.g., convolutional NN) to model it efficiently:

Van Den Oord, A., et al. (2016). Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499.

Page 18: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Generative modeling: Latent Variable Models

We assume data lies on a low-dimensional manifold so the generator is:

where:

Two main approaches:

→ Generative Adversarial Networks (GANs)

→ Variational Auto-Encoders (VAEs)

Page 19: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Generative modeling: GANs

We assume a deterministic generator:

and a prior over latent space:

Page 20: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Generative modeling: GANs

We assume a deterministic generator:

and a prior over latent space:

How to train it?

Page 21: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Generative modeling: GANs

We assume a deterministic generator:

and a prior over latent space:

How to train it? By using a game!

Page 22: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Generative modeling: GANs

We assume a deterministic generator:

and a prior over latent space:

How to train it? By using a game!

For this purpose, we assume a discriminator:

Page 23: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Generative modeling: GANs

The learning process is as follows:

→ the generator tries to fool the discriminator;

→ the discriminator tries to distinguish between the real

and fake images.

We define the learning problem as a min-max problem:

In fact, we have a learnable loss function!

Goodfellow, I., et al. (2014). Generative adversarial nets. NIPS 2014

Page 24: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Generative modeling: GANs

The learning process is as follows:

→ the generator tries to fool the discriminator;

→ the discriminator tries to distinguish between the real

and fake images.

We define the learning problem as a min-max problem:

In fact, we have a learnable loss function!

Goodfellow, I., et al. (2014). Generative adversarial nets. NIPS 2014

→It learns high-order statistics.

Page 25: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Generative modeling: GANs

Page 26: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Generative modeling: GANs

Pros:→ we don’t need to specify a likelihood function;→ very flexible;→ the loss function is trainable;→ perfect for data simulation.

Cons:→ we don’t know the distribution;→ training is highly unstable (min-max objective);→ missing mode problem.

Page 27: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Generative modeling: VAEs

We assume a stochastic generator (decoder):

and a prior over latent space:

Page 28: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Generative modeling: VAEs

We assume a stochastic generator (decoder):

and a prior over latent space:

Additionally, we use a variational posterior (encoder):

Page 29: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Generative modeling: VAEs

We assume a stochastic generator (decoder):

and a prior over latent space:

Additionally, we use a variational posterior (encoder):

How to train it?

Page 30: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Generative modeling: VAEs

We assume a stochastic generator (decoder):

and a prior over latent space:

Additionally, we use a variational posterior (encoder):

How to train it? Using the log-likelihood function!

Page 31: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Variational inference for Latent Variable Models

Page 32: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Variational inference for Latent Variable Models

Variational posterior

Page 33: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Variational inference for Latent Variable Models

Jensen’s inequality

Page 34: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Variational inference for Latent Variable Models

Reconstruction error Regularization

Page 35: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Variational inference for Latent Variable Models

decoder

encoder

prior

Page 36: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Variational inference for Latent Variable Models

decoder (Neural Net)

encoder (Neural Net)

prior

= Variational Auto-Encoder+ reparameterization trick

Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114. (ICLR 2014)

Page 37: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Variational Auto-Encoder (Encoding-Decoding)

Page 38: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Variational Auto-Encoder (Encoding-Decoding)

Page 39: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Variational Auto-Encoder (Encoding-Decoding)

Page 40: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Variational Auto-Encoder (Encoding-Decoding)

Page 41: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Variational Auto-Encoder (Generating)

Page 42: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Variational Auto-Encoder: Extensions

Normalizing flowsVolume-preserving flowsnon-Gaussian distributions

Autoregressive PriorObjective PriorStick-Breaking PriorVampPrior

Importance Weighted AERenyi DivergenceStein Divergence

Fully-connectedConvNetsPixelCNNOther

Tomczak, J. M., & Welling, M. (2016). Improving variational auto-encoders using householder flow. NIPS Workshop 2016.Berg, R. V. D., Hasenclever, L., Tomczak, J. M., & Welling, M. (2018). Sylvester Normalizing Flows for Variational Inference. UAI 2018.Tomczak, J. M., & Welling, M. (2017). VAE with a VampPrior. arXiv preprint arXiv:1705.07120. (AISTATS 2018)Davidson, T. R., Falorsi, L., De Cao, N., Kipf, T., & Tomczak, J. M. (2018). Hyperspherical Variational Auto-Encoders. UAI 2018.

Page 43: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Generative modeling: VAEs

Pros:→ we know the distribution and can calculate the likelihood function;→ we can encode an object in a low-dim manifold (compression);→ training is stable;→ no missing modes.

Cons:→ we need know the distribution;→ we need a flexible encoder and prior;→ blurry images (so far…).

Page 44: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Generative modeling: VAEs (extensions)

● Normalizing flows

○ Intro

○ Householder flow

○ Sylvester flow

● VampPrior

Page 45: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Conclusion

Generative modeling: the way to go to achieve AI.

Deep generative modeling: very successful in recent years in many domains.

Two main approaches: GANs and VAEs.

Next steps: video processing, better priors and decoders, geometric methods, …

Page 46: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Conclusion

Generative modeling: the way to go to achieve AI.

Deep generative modeling: very successful in recent years in many domains.

Two main approaches: GANs and VAEs.

Next steps: video processing, better priors and decoders, geometric methods, …

Page 47: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Conclusion

Generative modeling: the way to go to achieve AI.

Deep generative modeling: very successful in recent years in many domains.

Two main approaches: GANs and VAEs.

Next steps: video processing, better priors and decoders, geometric methods, …

Page 48: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Conclusion

Generative modeling: the way to go to achieve AI.

Deep generative modeling: very successful in recent years in many domains.

Two main approaches: GANs and VAEs.

Next steps: video processing, better priors and decoders, geometric methods, …

Page 49: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Conclusion

Generative modeling: the way to go to achieve AI.

Deep generative modeling: very successful in recent years in many domains.

Two main approaches: GANs and VAEs.

Next steps: video processing, better priors and decoders, geometric methods, …

Page 50: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Code on github:https://github.com/jmtomczak

Webpage:http://jmtomczak.github.io/

Contact:[email protected] The research conducted by Jakub M.

Tomczak was funded by the European Commission within the Marie Skłodowska-Curie Individual Fellowship (Grant No. 702666, ”Deep learning and Bayesian inference for medical imaging”).

Page 51: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

APPENDIX

Page 52: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Variational Auto-Encoder

Normalizing flowsVolume-preserving flowsnon-Gaussian distributions

Page 53: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Improving posterior using Normalizing Flows

● Diagonal posterior - insufficient and inflexible.

● How to get more flexible posterior?

➢ Apply a series of T invertible transformations

● New objective:

Rezende, D. J., & Mohamed, S. (2015). Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770. ICML 2015

Page 54: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

● Diagonal posterior - insufficient and inflexible.

● How to get more flexible posterior?

➢ Apply a series of T invertible transformations

● New objective:

Improving posterior using Normalizing Flows

Rezende, D. J., & Mohamed, S. (2015). Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770. ICML 2015

Change of variables:

Page 55: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

● Diagonal posterior - insufficient and inflexible.

● How to get more flexible posterior?

➢ Apply a series of T invertible transformations

● New objective:

Improving posterior using Normalizing Flows

Rezende, D. J., & Mohamed, S. (2015). Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770. ICML 2015

Page 56: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Improving posterior using Normalizing Flows

● Diagonal posterior - insufficient and inflexible.

● How to get more flexible posterior?

➢ Apply a series of T invertible transformations

● New objective:

Rezende, D. J., & Mohamed, S. (2015). Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770. ICML 2015

Page 57: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Improving posterior using Normalizing Flows

● Diagonal posterior - insufficient and inflexible.

● How to get more flexible posterior?

➢ Apply a series of T invertible transformations

● New objective:

Rezende, D. J., & Mohamed, S. (2015). Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770. ICML 2015

Jacobian determinant: (i) general normalizing flow (|det J| is easy to calculate);

(ii) volume-preserving flow, i.e., |det J| = 1.

Page 58: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Volume-preserving flows

● How to obtain more flexible posterior and preserve |det J|=1?

● Model full-covariance posterior using orthogonal matrices.

● Proposition: Apply a linear transformation:

● Question: Is it possible to model an orthogonal matrix efficiently?

Tomczak, J. M., & Welling, M. (2016). Improving Variational Inference with Householder Flow. arXiv preprint arXiv:1611.09630. NIPS Workshop on Bayesian Deep Learning 2016

and since U is orthogonal, Jacobian-determinant is 1.

BACK

Page 59: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Householder Flow

● How to obtain more flexible posterior and preserve |det J|=1?

● Model full-covariance posterior using orthogonal matrices.

● Proposition: Apply a linear transformation:

● Question: Is it possible to model an orthogonal matrix efficiently?

Tomczak, J. M., & Welling, M. (2016). Improving Variational Inference with Householder Flow. arXiv preprint arXiv:1611.09630. NIPS Workshop on Bayesian Deep Learning 2016

and since U is orthogonal, Jacobian-determinant is 1.

Page 60: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Householder Flow

● How to obtain more flexible posterior and preserve |det J|=1?

● Model full-covariance posterior using orthogonal matrices.

● Proposition: Apply a linear transformation:

● Question: Is it possible to model an orthogonal matrix efficiently? YES

and since U is orthogonal, Jacobian-determinant would be 1.

TheoremAny orthogonal matrix with the basis acting on the K-dimensional subspace can be expressed as a product of exactly K Householder transformations.

Sun, X., & Bischof, C. (1995). A basis-kernel representation of orthogonal matrices. SIAM Journal on Matrix Analysis and Applications, 16(4), 1184-1196.

Tomczak, J. M., & Welling, M. (2016). Improving Variational Inference with Householder Flow. arXiv preprint arXiv:1611.09630. NIPS Workshop on Bayesian Deep Learning 2016

Page 61: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Householder Flow

In the Householder transformation we reflect a vector around a hyperplane defined by a Householder vector

Tomczak, J. M., & Welling, M. (2016). Improving Variational Inference with Householder Flow. arXiv preprint arXiv:1611.09630. NIPS Workshop on Bayesian Deep Learning 2016

Very efficient: small number of parameters, |J|=1, easy amortization (!).

Page 62: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Householder Flow (MNIST)

Tomczak, J. M., & Welling, M. (2016). Improving Variational Inference with Householder Flow. arXiv preprint arXiv:1611.09630. NIPS Workshop on Bayesian Deep Learning 2016

Method ELBO

VAE -93.9

VAE+HF(T=1) -87.8

VAE+HF(T=10) -87.7

VAE+NICE(T=10) -88.6

VAE+NICE(T=80) -87.2

VAE+HVI(T=1) -91.7

VAE+HVI(T=8) -88.3

VAE+PlanarFlow(T=10) -87.5

VAE+PlanarFlow(T=80) -85.1Non-linear

Volume-preserving

Page 63: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

General normalizing flow

van den Berg, R., Hasenclever, L., Tomczak, J. M., & Welling, M. (2018). Sylvester Normalizing Flows for Variational Inference, UAI 2018 (oral presentation)

BACK

Page 64: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Sylvester Flow

● Can we have a non-linear flow with a simple Jacobian-determinant?

● Let us consider the following normalizing flow:

where A is DxM, B is MxD.

● How to calculate the Jacobian-determinant efficiently?

➢ Sylvester’s determinant identity

van den Berg, R., Hasenclever, L., Tomczak, J. M., & Welling, M. (2018). Sylvester Normalizing Flows for Variational Inference, UAI 2018 (oral presentation)

Page 65: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Sylvester Flow

● Can we have a non-linear flow with a simple Jacobian-determinant?

● Let us consider the following normalizing flow:

where A is DxM, B is MxD.

● How to calculate the Jacobian-determinant efficiently?

➢ Sylvester’s determinant identity

van den Berg, R., Hasenclever, L., Tomczak, J. M., & Welling, M. (2018). Sylvester Normalizing Flows for Variational Inference, UAI 2018 (oral presentation)

Page 66: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Sylvester Flow

● Can we have a non-linear flow with a simple Jacobian-determinant?

● Let us consider the following normalizing flow:

where A is DxM, B is MxD.

● How to calculate the Jacobian-determinant efficiently?

➢ Sylvester’s determinant identity

van den Berg, R., Hasenclever, L., Tomczak, J. M., & Welling, M. (2018). Sylvester Normalizing Flows for Variational Inference, UAI 2018 (oral presentation)

Page 67: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Sylvester Flow

● Can we have a non-linear flow with a simple Jacobian-determinant?

● Let us consider the following normalizing flow:

where A is MxD, B is DxM.

● How to calculate the Jacobian-determinant efficiently?

➢ Sylvester’s determinant identity

TheoremFor all

van den Berg, R., Hasenclever, L., Tomczak, J. M., & Welling, M. (2018). Sylvester Normalizing Flows for Variational Inference, UAI 2018 (oral presentation)

Page 68: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Sylvester Flow

● How to use the Sylvester’s determinant identity?

● How to parameterize matrices A and B?

van den Berg, R., Hasenclever, L., Tomczak, J. M., & Welling, M. (2018). Sylvester Normalizing Flows for Variational Inference, UAI 2018 (oral presentation)

Page 69: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Sylvester Flow

● How to use the Sylvester’s determinant identity?

● How to parameterize matrices A and B?

van den Berg, R., Hasenclever, L., Tomczak, J. M., & Welling, M. (2018). Sylvester Normalizing Flows for Variational Inference, UAI 2018 (oral presentation)

Householder matrices, permutation matrix, orthogonalization procedure

Page 70: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Sylvester Flow

● The Jacobian-determinant:

● As a result, for properly chosen h, the determinant is upper-triangular and,

thus, easy to calculate.

van den Berg, R., Hasenclever, L., Tomczak, J. M., & Welling, M. (2018). Sylvester Normalizing Flows for Variational Inference, UAI 2018 (oral presentation)

Page 71: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Sylvester Flow (MNIST)

van den Berg, R., Hasenclever, L., Tomczak, J. M., & Welling, M. (2018). Sylvester Normalizing Flows for Variational Inference, UAI 2018 (oral presentation)

Page 72: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Sylvester Flow (MNIST)

van den Berg, R., Hasenclever, L., Tomczak, J. M., & Welling, M. (2018). Sylvester Normalizing Flows for Variational Inference, UAI 2018 (oral presentation)

Page 73: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Sylvester Flow

van den Berg, R., Hasenclever, L., Tomczak, J. M., & Welling, M. (2018). Sylvester Normalizing Flows for Variational Inference, UAI 2018 (oral presentation)

Page 74: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Sylvester Flow

van den Berg, R., Hasenclever, L., Tomczak, J. M., & Welling, M. (2018). Sylvester Normalizing Flows for Variational Inference, UAI 2018 (oral presentation)

Page 75: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Variational Auto-Encoder

Autoregressive PriorObjective PriorStick-Breaking PriorVampPrior

BACK

Page 76: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

New Prior

Tomczak, J. M., & Welling, M. (2018). VAE with a VampPrior, AISTATS 2018 (oral presentation, 14% of accepted papers)

Page 77: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

New Prior

● Let’s re-write the ELBO:

Tomczak, J. M., & Welling, M. (2018). VAE with a VampPrior, AISTATS 2018

Page 78: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

New Prior

● Let’s re-write the ELBO:

Empirical distribution

Tomczak, J. M., & Welling, M. (2018). VAE with a VampPrior, AISTATS 2018

Page 79: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

New Prior

● Let’s re-write the ELBO:

Aggregated posterior

Tomczak, J. M., & Welling, M. (2018). VAE with a VampPrior, AISTATS 2018

Page 80: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

New Prior (Variational Mixture of Posteriors Prior)

● We look for the optimal prior using the Lagrange function:

● The solution is simply the aggregated posterior.

● We approximate it using K pseudo-inputs instead of N observations:

Tomczak, J. M., & Welling, M. (2018). VAE with a VampPrior, AISTATS 2018

Page 81: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

New Prior (Variational Mixture of Posteriors Prior)

● We look for the optimal prior using the Lagrange function:

● The solution is simply the aggregated posterior.

● We approximate it using K pseudo-inputs instead of N observations:

Tomczak, J. M., & Welling, M. (2018). VAE with a VampPrior, AISTATS 2018

Page 82: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

New Prior (Variational Mixture of Posteriors Prior)

● We look for the optimal prior using the Lagrange function:

● The solution is simply the aggregated posterior.

● We approximate it using K pseudo-inputs instead of N observations:infeasible

Tomczak, J. M., & Welling, M. (2018). VAE with a VampPrior, AISTATS 2018

Page 83: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

New Prior (Variational Mixture of Posteriors Prior)

● We look for the optimal prior using the Lagrange function:

● The solution is simply the aggregated posterior.

● We approximate it using K pseudo-inputs instead of N observations:

Tomczak, J. M., & Welling, M. (2018). VAE with a VampPrior, AISTATS 2018

Page 84: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

New Prior (Variational Mixture of Posteriors Prior)

● We look for the optimal prior using the Lagrange function:

● The solution is simply the aggregated posterior.

● We approximate it using K pseudo-inputs instead of N observations:

they are trained from scratch by SGD

Tomczak, J. M., & Welling, M. (2018). VAE with a VampPrior, AISTATS 2018

Page 85: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

New Prior (Variational Mixture of Posteriors Prior)

pseudoinputs

Page 86: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

New Prior (Variational Mixture of Posteriors Prior)

pseudoinputs

Page 87: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

New Prior (Variational Mixture of Posteriors Prior)

pseudoinputs

Page 88: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Toy problem (MNIST): VAE with dim(z)=2Latent space representation + psedoinputs (black dots)

K=10standard

Page 89: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Toy problem (MNIST): VAE with dim(z)=2Latent space representation + psedoinputs (black dots)

K=100standard

Page 90: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

Experiments

Tomczak, J. M., & Welling, M. (2018). VAE with a VampPrior, AISTATS 2018

Page 91: Deep Generative The Success of Models - GitHub …Generative modeling: Auto-regressive models General idea is to factorise the joint distribution: and use neural networks (e.g., convolutional

BACK