Generative Adversarial Nets: Applications and...
Transcript of Generative Adversarial Nets: Applications and...
Generative Adversarial Nets: Applications and Extensions
Wangmeng Zuo
Vision Perception and Cognition CentreHarbin Institute of Technology
LeCun, NIPS 2016
• Reinforcement learning (cherry)
• Supervised learning (Chocolate)
• Unsupervised/Predictive learning (Cake)• Generative adversarial nets (GAN)
For Most Application Tasks
• For most applications, GANs only serve as the accessories to the existing solutions.
• How to Make Latte Art (i.e. improve the trainability of generator)
• How to make a perfect Latte Coffee (i.e. incorporate with other models for solving real problems)
GAN
Other Learning Models
Content
• Improve the trainability of GANs: An Application Perspective• Theoretical solution
• Incorporating with other learning models
• Designing generator based on signal/image characteristics
• Applications• Adversarial learning
• Low level vision
• Domain adaptation
• Image translation
Improve the trainability of GANs
Generative Adversarial Networks (Goodfellow et al., NIPS 2014)
• Update the generator to generate more realistic image
• Update the discriminator to discriminate the synthetic images from real ones
Mode Collapse
• D in inner loop: convergence to correct distribution
• G in inner loop: place all mass on most likely point
Let's first turn to supervised deep learning
• Unprecedented successes in:• Image classification
• Image denoising, image super-resolution
• ...
• Can we exploit these achievements to improve GAN training?• How to train a good generator (the later half of image restoration?)
• How to train a good discriminator (classification?)
Auto-encoder
• Auto-encoder
• Denoising auto-encoder
Variational AutoEncoder
• Variational AutoEncoder
• Relaxation of discrete variables
VAE/GAN (Larsen et al., ICML 2016)
• VAE
• GAN
• VAE/GAN
Classifier Discriminator
• Na Lei, Kehua Su, Li Cui, Shing-Tung Yau, David Xianfeng Gu, A Geometric View of Optimal Transportation and Generative Model, Arxiv 2017.
Nguyen et al., NIPS 2016
• Optimize the hidden code input (red bar) of a deep image generator network (DGN) to produce an image that highly activates h
InfoGAN (Chen et al., NIPS 2016)
• GAN
• InfoGAN (Chen et al., NIPS 2016)• Input: z, c
• Interpretable and disentangled representations
• Easy to train
AC-GAN (Odena et al., ICML 2017)
• Class-conditional image synthesis with Auxiliary Classifier GANs
• The log-likelihood of the correct source:
• The log-likelihood of the correct class:
Arbitrary Facial Attribute Editing
• One model for all tasks (He et al., Arxiv 2018)
A Favorable Framework
• Auto-encoder
AttGAN
Extension for attribute style manipulation
Single task
Multi-task
Continuous attribute
Attribute Style Manipulation
•
Take home message
• Incorporating auto-encoder to improve the trainability of generator;
• Incorporating deep classification model to improve the trainability of discriminator
Let's then turn to the objective of GANs
• Image generation
• What's the characteristics of an image• Multi-scale property
• Manifold property
• What makes a high quality image• Deep image prior
• Deep image quality assessment
LAPGANs (Denton et al., NIPS 2015)
LAPGANs (Denton et al., 2015)
Stack-GAN (Zhang et al., ICCV 2017)
Stack-GAN (Zhang et al., ICCV 2016)
• Stage-I GAN
• Stage-II GAN
Cascaded Refinement Networks (Chen & Koltun, ICCV 2017)
• CRN: not rely on adversarial training
Manifold property (Benaim & Wolf, NIPS 2017)
• Distance Constraints
• Self-distance Constraints
Total Variation
• Deep feature visualization
• Total variation (TV) regularization
• Better (deep) image prior?
Insight from deep image denoising
• DnCNN for image denoising (Zhang et al., TIP 2017)
• For noisy image,
• For clean image,
• Perceptual regularization (Li et al., Arxiv 2016)
ˆ ( ; )CNN x y y
( ; )CNN y y x2 2( ; )CNN mn y
2( ; ) 0CNN y
2( ; )CNN y
Deep image prior (Ulyanov et al., CVPR 2018)
• Energy
• Image restoration
• A randomly-initialized neural network can be used as a handcrafted prior
• The structure of a generator network is sufficient to capture a great deal of low-level image statistics prior to any learning
Deep Features as a Perceptual Metric (Zhang et al., CVPR 2018)
• Perceptual loss
• Deep features outperform all previous metrics by huge margins.
• This result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised).
• Deep Non-reference Image Quality Assessment?
Take home message
• Exploiting image property to improve GANs
• Developing deep models/GANs for better revealing image priors/quality
• Object-oriented design
Applications
Adversarial learning (Szegedy et al., ICLR 2014)
• Deep neural networks learn input-output mappings that are fairly discontinuous to a significant extent.
• We can cause the network to misclassify an image by applying a certain hardly perceptible perturbation, which is found by maximizing the network’s prediction error.
2018-4-22
Intriguing properties of neural networks (Szegedy et al., ICLR 2014)
2018-4-22
Deep Neural Networks are Easily Fooled (Nguyen et al., CVPR 2015)
2018-4-22
2018-4-22
> 99.6%confidences
Adversarial Attacks and Defences Competition (Kurakin et al., Arxiv 2018)
• 1st place in defense track: team TsAIL
• Team members: Yinpeng Dong, Fangzhou Liao, Ming Liang, Tianyu Pang, Jun Zhu and Xiaolin Hu.
• Solution: Denoising U-net
Adversarially-augmented training (Simon-Gabriel et al., Arxiv 2018)
• Adversarially-augmented training
• Replacing strided by average-pooling layers
• Increase generalization performance
Object detection: A-Fast-RCNN (Wang et al., CVPR 2017)
•
Visual tracking
• CVPR 2018
• VITAL: VIsual Tracking via Adversarial Learning
• SINT++: Robust Visual Tracking via Adversarial Hard Positive Generation
Low level vision
• SRGAN for super-resolution
DSLR-Quality Photos on Mobile Devices (Ignatov et al., ICCV 2017)
• Color loss
• Texture loss
• Content loss
• TV regularizer
• Discriminator
WESPE: Weakly Supervised Photo Enhancer (NTIRE 2018)
• Only require two distinct datasets
Image inpainting: more freedom and non-uniqueness
Context-encoders (Pathak et al., 2016)
• The first key: Auto-encoder
Problem with auto-encoder
• Information bottleneck
Adversarial loss is helpful
• But remains limited ...
Analyzing U-Net (Ronneberger et al., 2015)
• Fine-details
• Unfortunately, also not work for inpainting
Return to traditional patch-based inpainting
• Patch processing order
• PatchMatch
CNN and Patch-based Solutions are Complementary
• CNN-based solution• Poor texture
• Better structure
• Patch-based solution• Better details
• Poor structure
• Can we combine them in an end-to-end learning framework?
Context-encoders
CNN architecture
Objective and learning
• Objective
• Learning
Results
• Speed• MNPS: 40mins -> 40s
• Ours: 82 ms
• PSNR
Random mask
Real images
Guided face enhancement (Li et al., Arxiv 2018)
Film Restoration, Smartphones
Challenges
• 1. Blind enhancement: the degradation model is sophisticated and unknown
• blur, downsampling, noise, compression
• 2. The guided and degraded images are of different pose, expression and illumination
Challenge 1
• Train on realistic synthetic degraded images, test on real degraded image
• The degradation model:
Challenge 2: GFRNet
•
Model and losses for WarpNet
• Landmark loss
• TV regularization
Model objective
• Reconstruction loss
• Adversarial loss
• Objective
Appearance Flow
Results
MDnCNN MARCNN MDeblurGAN Ours
More images
Video
Domain Adaptation
• Domain Adaptation: learning from a (labeled) source data distribution a well performing model on a different (but related) (labeled or unlabled) target data distribution (wikipedia)
• Three categories:• Supervised domain adaptation
• Semi-supervised domain adaptation
• Unsupervised domain adaptation
The Future of Real-Time SLAM (ICCV 2015 Workshop)
• Panel discussion: Deep Learning vs SLAM
• Newcombe's Proposal: Use SLAM to fuel Deep Learning
• Today's SLAM systems are large-scale "correspondence engines" which can be used to generate large-scale datasets
• Graphics for CNN
The need of domain adaptation
Synthetic:
Real:
Domain Transfer
Unsupervised domain adaptation
• Only the class labels of source samples are known, all class labels of the target samples are unknown.
• Goal: a feature extractor f and a classifier c• P(f(xs)) = P(f(xt))
• Better classification performance on xs
• Key issue: Discrepancy metric between two complex distributions• D(P(f(xs)), P(f(xt)))
Weighted MMD
• Let
• Define
• Weighted MMD
Office-10+Caltech-10
Unsupervised Domain Adaptation by Backpropagation (Ganin & Lempitsky, ICML 2015)
Simultaneous Deep Transfer Across Domains and Tasks (Tzeng et al., ICCV 2015)
• “maximally confuse” the two domains
• uniform distribution over domain labels
Domain cocktail network (Xu et al., CVPR 2018)
•
SimGAN (CVPR 2017)
• Learning from Simulated and Unsupervised Images through Adversarial Training (Shrivastava, Arxiv 2016)
• Realism loss
• Self-regularization
• is also pixel-level DA
Unsupervised Pixel–Level Domain Adaptation (CVPR 2017)
Image translation (Zhu et al., CVPR 2017)
Pix2pix: supervised image translation (Isola et al., CVPR 2017)
• Positive pair: • (Input, groundtruth)
• Negative pair:• (Input, synthesis)
Learning Residual Images (Shen & Liu, CVPR 2017)
Cycle-Consistent supervision (Zhu et al., ICCV 2017)• Cycle consistency loss
BicycleGAN: Multimodal Image-to-Image Translation (Zhu et al., NIPS 2017)
Suggestion
• Problem-oriented• Generator
• Discriminator+
References
• I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, NIPS 2014.
• D.P. Kingma, M. Welling, Auto-encoding variational bayes, arXiv:1312.6114, 2013.
• N. Lei, K. Su, L. Cui, S.-T. Yau, D. X. Gu, A Geometric View of Optimal Transportation and Generative Model, Arxiv 2017.
• A.B.L. Larsen, S. K. Sønderby, H. Larochelle, O. Winther, Autoencoding beyond pixels using a learned similarity metric, ICML 2016.
• A. Nguyen, A. Dosovitskiy, J. Yosinski, T. Brox, and J. Clune. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks, NIPS 2016.
• X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, P. Abbeel, InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets, NIPS 2016.
• A. Odena, C. Olah, J. Shlens, Conditional image synthesis with auxiliary classifier GANs, ICML 2017.
• Z. He, W. Zuo, M. Kan, S. Shan, X. Chen, Arbitrary Facial Attribute Editing: Only Change What You Want, arXiv:1711.10678, 2017.
• E.L. Denton, S. Chintala, R. Fergus, Deep generative image models using a laplacian pyramid of adversarial networks, NIPS 2015.
• H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, D Metaxas, StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks, ICCV 2017.
• Q. Chen, V. Koltun, Photographic image synthesis with cascaded refinement networks, ICCV 2017.
• S. Benaim, and Lior Wolf, One-Sided Unsupervised Domain Mapping, NIPS 2017.
• K. Zhang, W. Zuo, Y. Chen, D. Meng, L. Zhang, Beyond a Gaussian denoiser: Residual learning of deep cnn for image denoising, IEEE T-IP 2017.
• M. Li, W. Zuo, D. Zhang, Deep Identity-aware Transfer of Facial Attributes, arXiv:1610.05586.
• D. Ulyanov, A. Vedaldi, V. Lempitsky, Deep Image Prior, CVPR 2018.
• R. Zhang, P. Isola, A. A. Efros, E. Shechtman, O. Wang, The Unreasonable Effectiveness of Deep Features as a Perceptual Metric, CVPR 2018.
• C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, ICLR 2014.
• A. Nguyen, J. Yosinski, J. Clune, Deep neural networks are easily fooled: High confidence predictions for unrecognizable images, CVPR 2015.
• C.-J. Simon-Gabriel, Y. Ollivier, B. Schölkopf, L. Bottou, D. Lopez-Paz, Adversarial Vulnerability of Neural Networks Increases With Input Dimension, arXiv:1802.01421.
• X. Wang, A. Shrivastava, A. Gupta, A-Fast-RCNN: Hard positive generation via adversary for object detection, CVPR 2017.
• C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, W. Shi, Photo-realistic single image super-resolution using a generative adversarial network, CVPR 2017.
• A. Ignatov, N. Kobyshev, R. Timofte, K. Vanhoey, L. Van Gool, DSLR-quality photos on mobile devices with deep convolutional networks, ICCV 2017.
• A. Ignatov, N. Kobyshev, R. Timofte, K. Vanhoey, L. Van Gool, WESPE: Weakly supervised photo enhancer for digital cameras, NTIRE 2018.
• D. Pathak,P. Krahenbuhl, J. Donahue, T. Darrell, A.A. Efros, Context encoders: Feature learning by inpainting, CVPR 2016.
• Z. Yan, X. Li, M. Li, W. Zuo, S. Shan, Shift-Net: Image Inpainting via Deep Feature Rearrangement, arXiv:1801.09392.
• X. Li, M. Liu, Y. Ye, W. Zuo, L. Lin, R. Yang, Learning Warped Guidance for Blind Face Restoration, arXiv:1804.04829.
• H. Yan, Y. Ding, P. Li, Q. Wang, Y. Xu, W. Zuo, Mind the class weight bias: Weighted maximum mean discrepancy for unsupervised domain adaptation, CVPR 2017.
• Y. Ganin, V. Lempitsky, Unsupervised Domain Adaptation by Backpropagation, ICML 2015.
• E. Tzeng, J. Hoffman, T. Darrell, Simultaneous Deep Transfer Across Domains and Tasks, ICCV 2015.
• R. Xu, Z. Chen, W. Zuo, J. Yan, L. Lin, Deep Cocktail Network: Multi-source Unsupervised Domain Adaptation with Category Shift, CVPR 2017.
• A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, R. Webb, Learning from simulated and unsupervised images through adversarial training, CVPR 2017.
• K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, D. Krishnan, Unsupervised pixel-level domain adaptation with generative adversarial networks, CVPR 2017.
• P. Isola, J.Y. Zhu, T. Zhou, A.A. Efros, Image-to-image translation with conditional adversarial networks, CVPR 2017.
• W. Shen, R. Liu, Learning residual images for face attribute manipulation, CVPR 2017.
• J.Y. Zhu, T. Park, P. Isola, A.A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks, ICCV 2017.
• J.Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A.A. Efros, O Wang, E Shechtman, Toward Multimodal Image-to-Image Translation, NIPS 2017.