Generative Adversarial Networks - Machine Learningmachinelearning.math.rs/Jordanski-GAN.pdf•Text...
Transcript of Generative Adversarial Networks - Machine Learningmachinelearning.math.rs/Jordanski-GAN.pdf•Text...
Generative AdversarialNetworks
Miloš Jordanski
Generative models
• Training data 𝐷 = 𝑥1, … , 𝑥𝑁 ~𝑝𝑑𝑎𝑡𝑎(𝑥)
• Result: distribution 𝑝𝑚𝑜𝑑𝑒𝑙(𝑥) ≅ 𝑝𝑑𝑎𝑡𝑎(𝑥)
Generative models
• 𝑝𝑚𝑜𝑑𝑒𝑙(𝑥) directly
• Generate samples from 𝑝𝑚𝑜𝑑𝑒𝑙(𝑥)
Applications
• Additional training data• Missing values• Semi-supervised learning• Reinforcement learning• Multiple correct answers• Text-to-Image Synthesis• Learn useful embeddings• ...
Generative Adversarial Networks
Models with latent space
• x - observed variables• z – latent variables
DCGAN
GAN - architecture
GAN - discriminator
𝐽(𝐷) 𝜃 𝐷 , 𝜃 𝐺 = −12𝐸𝑥~𝑝𝑑𝑎𝑡𝑎𝑙𝑜𝑔𝐷(𝑥) −
12𝐸𝑧log(1 − 𝐷 𝐺(𝑧) )
• Optimal discriminator:
𝐷∗ 𝑥 =𝑝𝑑𝑎𝑡𝑎(𝑥)
𝑝𝑚𝑜𝑑𝑒𝑙(𝑥) + 𝑝𝑑𝑎𝑡𝑎(𝑥)
GAN – Choice of divergence
• By varying 𝜃 we can make 𝑃𝜃 distribution arbitrarily “close” to 𝑃𝑑𝑎𝑡𝑎• How to define “close”?
GAN – Choice of divergence
• Kullback-Leibler divergence: 𝐾𝐿(ℙ| ℚ = 𝔼𝑥~ℙ[𝑙𝑜𝑔
ℙ(𝑥)ℚ(𝑥)
]
• Jensen Shannon divergence:
𝐽𝑆(ℙ| ℚ =12𝐾𝐿(ℙ ||
12ℙ + ℚ ) +
12𝐾𝐿(ℚ||
12(ℙ + ℚ))
GAN – Choice of divergence
𝐾𝐿(ℙ| ℚ = ℙ(𝑥)𝑙𝑜𝑔 ℙ(𝑥)ℚ(𝑥)
𝐾𝐿(ℚ| ℙ = ℚ(𝑥)𝑙𝑜𝑔 ℚ(𝑥)ℙ(𝑥)
Zero-sum game
• N players• Payoff matrix
• Nash equilibrium of a game
Generator cost function
𝐽(𝐺) 𝜃 𝐷 , 𝜃 𝐺 = −𝐽(𝐷) 𝜃 𝐷 , 𝜃 𝐺
• Minimax game:𝑉(𝜃 𝐷 , 𝜃 𝐺 ) = −𝐽(𝐷) 𝜃 𝐷 , 𝜃 𝐺
• Solution:𝜃 𝐷 ∗ = 𝑎𝑟𝑔𝑚𝑖𝑛max
𝜃 𝐺 𝜃 𝐷𝑉(𝜃 𝐷 , 𝜃 𝐺 )
In practice:
𝐽(𝐺) 𝜃 𝐷 , 𝜃 𝐺 = − 12𝐸𝑧log(𝐷 𝐺(𝑧) )
Generator Cost function
• Minimax: 𝐽(𝐺)= 12𝐸𝑥~𝑝𝑑𝑎𝑡𝑎𝑙𝑜𝑔𝐷 𝑥 + 1
2𝐸𝑧log(1 − 𝐷 𝐺(𝑧) )
• Heuristic: 𝐽(𝐺) = − 12𝐸𝑧log(𝐷 𝐺(𝑧) )
• MLE: 𝐽(𝐺) = − 12𝐸𝑧𝑒(𝜎
−1(𝐷(𝐺(𝑧))))
GAN - Training
• Simultaneous stochastic gradient descent
• Discriminator updates 𝜃(𝐷) to reduce 𝐽(𝐷)
• Generator updates 𝜃(𝐺) to reduce 𝐽(𝐺)
• 𝐽(𝐺) does not make reference to the training data directly
GAN - Tricks
• 𝑥 ∈ [−1, 1]
• Using labels
• One-side label smoothing
• Batch normalization
• Running more steps of one player
Generative Adversarial Networks
• Advantages:
• Sample generation in a single step (does not depend on dimensionality of X)
• Does not use Markov chain
• Does not use variational inference
• Generation function has very few restrictions
• Disadvantages:
• Hard to train
GAN - results
Semi-supervised learning with GAN
GAN – Latent Space Arithmetic
GAN – another perspective
• Cooperative instead of adversarial
• Learn cost function
• Reinforcement Learning
• Actor-Critic
• …
THANKS!