Meta-Amortized Variational Inference and...

28
Meta-Amoized Variational Inference and Learning Kristy Choi CS236: December 4th, 2019

Transcript of Meta-Amortized Variational Inference and...

Page 1: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Meta-Amortized Variational Inference and Learning

Kristy Choi

CS236: December 4th, 2019

Page 2: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Probabilistic Inference

Probabilistic inference is a particular way of viewing the world:

+

Observations

=

Prior belief Updated (posterior) beliefTypically the beliefs are “hidden” (unobserved), and we want to

model them using latent variables.

Page 3: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Probabilistic Inference

Many machine learning applications can be cast as probabilistic inference queries:

BioinformaticsMedical diagnosis Human cognition Computer vision

Page 4: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

N

Medical Diagnosis Example

observed symptoms

x

identity of disease

z

Goal: Infer identity of disease given a set of observed symptoms from a patient population.

Page 5: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

N

Exact Inference

x

z

symptoms

disease

Marginal is intractable, we can’t compute this even if we want to

family of tractable distributions

intractable integral

Page 6: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

N

Approximate Variational Inference

x

z

symptoms

disease

→ turned an intractable inference problem into an optimization problem

dependence on x: learn new q per data point

Page 7: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

N

Amortized Variational Inference

x

z

symptoms

disease

→ scalability: VAE formulation

deterministic mapping predicts z as a function of x

Page 8: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Multiple Patient Populations

Nx

z

symptoms

disease

Nx

z

symptoms

disease

Nx

z

symptoms

disease

Nx

z

symptoms

disease

Nx

z

symptoms

disease

Doctor is equivalently re-learning how to diagnose an illness :/

Page 9: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Multiple Patient Populations

Share statistical strength across different populations to infer latent representations that transfer to similar, but previously unseen populations (distributions)

Page 10: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

(Naive) Meta-Amortized Variational Inference

meta-distribution

Page 11: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Meta-Amortized Variational Inference

meta-distribution

shared meta-inference network

Page 12: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Meta-Inference Network

● Meta-inference model takes in 2 inputs:○ Marginal ○ Query point

● Mapping● Parameterize encoder with neural network● Dataset : represent each marginal distribution as a

set of samples

Page 13: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

In Practice: MetaVAE

Summary network ingests samples from each dataset

Aggregation network performs inference

summary network

samples

query

aggregation network

decoder_i

Page 14: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Related Work

x

z

N

VAE MetaVAE

Tx

z

TT

xD

D

x != xD T

D

T

c

xT

zxD

Neural Statistician

xD = xT

D

T

c

xT

zxD

Variational Homoencoder (VHE)

xD != xT

Avoid restrictive assumption on global prior over datasets p(c)

Page 15: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Intuition: Clustering Mixtures of Gaussians

Learns how to cluster: for 50 datasets, MetaVAE achieves 9.9% clustering error, while VAE gets 27.9%

z = 0 z = 1

Page 16: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Learning Invariant Representations● Apply various

transformations● Amortize over subsets

of transformations, learn representations

● Test representations on held-out transformations (classification)

Page 17: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Invariance Experiment Results

MetaVAE representations consistently outperform NS/VHE on MNIST + NORB

datasets

Page 18: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Analysis

MetaVAE representations tend not to change very much within a family of transformations

that it was amortized over, as desired.

Page 19: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Conclusion

● Limitations○ No sampling○ Semi-parametric○ Arbitrary dataset construction

● Developed an algorithm for a family of probabilistic models: meta-amortized inference paradigm

● MetaVAE learns transferrable representations that generalize well across similar data distributions in downstream tasks

● Paper: https://arxiv.org/pdf/1902.01950.pdf

Page 20: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Encoding Musical Style with Transformer Autoencoders

Page 21: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Generative Models for Music

● Generating music is a challenging problem, as music contains structure at multiple timescales. ○ Periodicity, repetition

● Coherence in style and rhythm across (long) time periods!

raw audio: WaveNet, GANs, etc.symbolic: RNNs,

LSTMs, etc.

Page 22: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Music Transformer

● Symbolic: event-based representation that allows for generation of expressive performances (without generating a score)

● Current SOTA in music generation○ Can generate music over 60 seconds in length

● Attention-based○ Replaces self-attention with relative attention

Page 23: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

What We Want

● Control music generation using either (1) performance or (2) melody + perf as conditioning

● Generate pieces that sound similar in style to input pieces!

Page 24: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Transformer Autoencoder 1. Sum

2. Concatenation

3. Tile (temporal dimension)

Page 25: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Quantitative Metrics

Transformer autoencoder (both performance-only and melody & perf) outperform baselines in generating similar pieces!

Page 26: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Samples

Conditioning Performance

Generated Performance:“Twinkle, Twinkle” in the style of the above performance

Twinkle, Twinkle melody

Conditioning Performance

Generated Performance:“Claire de Lune” in the style of the above performance

Claire de Lune

Page 27: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

Conclusion

● Developed a method for controllable generation with high-level controls for music○ Demonstrated efficacy both quantitatively and

through qualitative listening tests● Thanks!

○ Stanford: Mike Wu, Noah Goodman, Stefano Ermon

○ Magenta @ Google Brain: Jesse Engel, Ian Simon, Curtis “Fjord” Hawthorne, Monica Dinculescu

Page 28: Meta-Amortized Variational Inference and Learningcs236.stanford.edu/assets/slides/meta_amortized.pdf · Meta-Amortized Variational Inference and Learning Kristy Choi CS236: December

References1. Edwards, H., and Storkey, A. Towards a neural statistician. 20162. Hewitt, L. B., Nye, M. I.; Gane, A.; Jaakkola, T., and Tenenbaum, J.B. Variational Homoencoder. 20183. Kingma, D.P., and Welling, M. Auto-encoding variational bayes. 20134. Gershman, S., and Goodman, N. Amortized inference in probabilistic reasoning. 20145. Jordan, M. I.; Ghahramani, Z.; Jaakkola, T.S.; and Saul, L.K. An introduction to variational methods for graphical models. 19996. Blei, D. M.; Kuckelbir, A.; and McAuliffe, J.D. Variational inference: a review for statisticians. 20177. Huang, C.Z.; Vaswani, A., Uskoreit, J., Shazeer, N., Simon, I., Hawthorne, C., Dai, A. M., Hoffman, M. D., Dinculescu, M., Eck, D.

Music Transformer. 20198. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., Polosukhin, I. Attention is all you need. 20179. Shaw, P., Uszkoreit, J., Vaswani, A. Self-Attention with relative position representations. 2018

10. https://magenta.tensorflow.org/music-transformer11. Engel, J., Agrawal, K. K., Chen, S., Gulrajani, I., Donahue, C., Roberts, A. Adversarial Neural Audio Synthesis. 201912. Van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A., Kavukcuoglu, K.

WaveNet: A Generative Model for Raw Audio. 201613. Kalchbrenner, N., Elsen, E., Simonyan, K., Noury, S., Casagrande, N., Lockhart, E., Stimberg, F., van den Oord, A., Dieleman, S.,

Kavukcuoglu, K. Efficient Neural Audio Synthesis. 2018