Generative Adversarial Nets: Applications and...

Generative Adversarial Nets: Applications and Extensions

Wangmeng Zuo

Vision Perception and Cognition CentreHarbin Institute of Technology

LeCun, NIPS 2016

• Reinforcement learning (cherry)

• Supervised learning (Chocolate)

• Unsupervised/Predictive learning (Cake)• Generative adversarial nets (GAN)

For Most Application Tasks

• For most applications, GANs only serve as the accessories to the existing solutions.

• How to Make Latte Art (i.e. improve the trainability of generator)

• How to make a perfect Latte Coffee (i.e. incorporate with other models for solving real problems)

GAN

Other Learning Models

Content

• Improve the trainability of GANs: An Application Perspective• Theoretical solution

• Incorporating with other learning models

• Designing generator based on signal/image characteristics

• Applications• Adversarial learning

• Low level vision

• Domain adaptation

• Image translation

Improve the trainability of GANs

Generative Adversarial Networks (Goodfellow et al., NIPS 2014)

• Update the generator to generate more realistic image

• Update the discriminator to discriminate the synthetic images from real ones

Mode Collapse

• D in inner loop: convergence to correct distribution

• G in inner loop: place all mass on most likely point

Let's first turn to supervised deep learning

• Unprecedented successes in:• Image classification

• Image denoising, image super-resolution

• ...

• Can we exploit these achievements to improve GAN training?• How to train a good generator (the later half of image restoration?)

• How to train a good discriminator (classification?)

Auto-encoder

• Auto-encoder

• Denoising auto-encoder

Variational AutoEncoder

• Variational AutoEncoder

• Relaxation of discrete variables

VAE/GAN (Larsen et al., ICML 2016)

• VAE

• GAN

• VAE/GAN

Classifier Discriminator

• Na Lei, Kehua Su, Li Cui, Shing-Tung Yau, David Xianfeng Gu, A Geometric View of Optimal Transportation and Generative Model, Arxiv 2017.

Nguyen et al., NIPS 2016

• Optimize the hidden code input (red bar) of a deep image generator network (DGN) to produce an image that highly activates h

InfoGAN (Chen et al., NIPS 2016)

• GAN

• InfoGAN (Chen et al., NIPS 2016)• Input: z, c

• Interpretable and disentangled representations

• Easy to train

AC-GAN (Odena et al., ICML 2017)

• Class-conditional image synthesis with Auxiliary Classifier GANs

• The log-likelihood of the correct source:

• The log-likelihood of the correct class:

Arbitrary Facial Attribute Editing

• One model for all tasks (He et al., Arxiv 2018)

A Favorable Framework

• Auto-encoder

AttGAN

Extension for attribute style manipulation

Single task

Multi-task

Continuous attribute

Attribute Style Manipulation

•

Take home message

• Incorporating auto-encoder to improve the trainability of generator;

• Incorporating deep classification model to improve the trainability of discriminator

Let's then turn to the objective of GANs

• Image generation

• What's the characteristics of an image• Multi-scale property

• Manifold property

• What makes a high quality image• Deep image prior

• Deep image quality assessment

LAPGANs (Denton et al., NIPS 2015)

LAPGANs (Denton et al., 2015)

Stack-GAN (Zhang et al., ICCV 2017)

Stack-GAN (Zhang et al., ICCV 2016)

• Stage-I GAN

• Stage-II GAN

Cascaded Refinement Networks (Chen & Koltun, ICCV 2017)

• CRN: not rely on adversarial training

Manifold property (Benaim & Wolf, NIPS 2017)

• Distance Constraints

• Self-distance Constraints

Total Variation

• Deep feature visualization

• Total variation (TV) regularization

• Better (deep) image prior?

Insight from deep image denoising

• DnCNN for image denoising (Zhang et al., TIP 2017)

• For noisy image,

• For clean image,

• Perceptual regularization (Li et al., Arxiv 2016)

ˆ ( ; )CNN x y y

( ; )CNN y y x2 2( ; )CNN mn y

2( ; ) 0CNN y

2( ; )CNN y

Deep image prior (Ulyanov et al., CVPR 2018)

• Energy

• Image restoration

• A randomly-initialized neural network can be used as a handcrafted prior

• The structure of a generator network is sufficient to capture a great deal of low-level image statistics prior to any learning

Deep Features as a Perceptual Metric (Zhang et al., CVPR 2018)

• Perceptual loss

• Deep features outperform all previous metrics by huge margins.

• This result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised).

• Deep Non-reference Image Quality Assessment?

Take home message

• Exploiting image property to improve GANs

• Developing deep models/GANs for better revealing image priors/quality

• Object-oriented design

Applications

Adversarial learning (Szegedy et al., ICLR 2014)

• Deep neural networks learn input-output mappings that are fairly discontinuous to a significant extent.

• We can cause the network to misclassify an image by applying a certain hardly perceptible perturbation, which is found by maximizing the network’s prediction error.

2018-4-22

Intriguing properties of neural networks (Szegedy et al., ICLR 2014)

2018-4-22

Deep Neural Networks are Easily Fooled (Nguyen et al., CVPR 2015)

2018-4-22

2018-4-22

> 99.6%confidences

Adversarial Attacks and Defences Competition (Kurakin et al., Arxiv 2018)

• 1st place in defense track: team TsAIL

• Team members: Yinpeng Dong, Fangzhou Liao, Ming Liang, Tianyu Pang, Jun Zhu and Xiaolin Hu.

• Solution: Denoising U-net

Adversarially-augmented training (Simon-Gabriel et al., Arxiv 2018)

• Adversarially-augmented training

• Replacing strided by average-pooling layers

• Increase generalization performance

Object detection: A-Fast-RCNN (Wang et al., CVPR 2017)

•

Visual tracking

• CVPR 2018

• VITAL: VIsual Tracking via Adversarial Learning

• SINT++: Robust Visual Tracking via Adversarial Hard Positive Generation

Low level vision

• SRGAN for super-resolution

DSLR-Quality Photos on Mobile Devices (Ignatov et al., ICCV 2017)

• Color loss

• Texture loss

• Content loss

• TV regularizer

• Discriminator

WESPE: Weakly Supervised Photo Enhancer (NTIRE 2018)

• Only require two distinct datasets

Image inpainting: more freedom and non-uniqueness

Context-encoders (Pathak et al., 2016)

• The first key: Auto-encoder

Problem with auto-encoder

• Information bottleneck

Adversarial loss is helpful

• But remains limited ...

Analyzing U-Net (Ronneberger et al., 2015)

• Fine-details

• Unfortunately, also not work for inpainting

Return to traditional patch-based inpainting

• Patch processing order

• PatchMatch

CNN and Patch-based Solutions are Complementary

• CNN-based solution• Poor texture

• Better structure

• Patch-based solution• Better details

• Poor structure

• Can we combine them in an end-to-end learning framework?

Context-encoders

CNN architecture

Objective and learning

• Objective

• Learning

Results

• Speed• MNPS: 40mins -> 40s

• Ours: 82 ms

• PSNR

Random mask

Real images

Guided face enhancement (Li et al., Arxiv 2018)

Film Restoration, Smartphones

Challenges

• 1. Blind enhancement: the degradation model is sophisticated and unknown

• blur, downsampling, noise, compression

• 2. The guided and degraded images are of different pose, expression and illumination

Challenge 1

• Train on realistic synthetic degraded images, test on real degraded image

• The degradation model:

Challenge 2: GFRNet

•

Model and losses for WarpNet

• Landmark loss

• TV regularization

Model objective

• Reconstruction loss

• Adversarial loss

• Objective

Appearance Flow

Results

MDnCNN MARCNN MDeblurGAN Ours

More images

Domain Adaptation

• Domain Adaptation: learning from a (labeled) source data distribution a well performing model on a different (but related) (labeled or unlabled) target data distribution (wikipedia)

• Three categories:• Supervised domain adaptation

• Semi-supervised domain adaptation

• Unsupervised domain adaptation

The Future of Real-Time SLAM (ICCV 2015 Workshop)

• Panel discussion: Deep Learning vs SLAM

• Newcombe's Proposal: Use SLAM to fuel Deep Learning

• Today's SLAM systems are large-scale "correspondence engines" which can be used to generate large-scale datasets

• Graphics for CNN

The need of domain adaptation

Synthetic:

Real:

Domain Transfer

Unsupervised domain adaptation

• Only the class labels of source samples are known, all class labels of the target samples are unknown.

• Goal: a feature extractor f and a classifier c• P(f(xs)) = P(f(xt))

• Better classification performance on xs

• Key issue: Discrepancy metric between two complex distributions• D(P(f(xs)), P(f(xt)))

Weighted MMD

• Let

• Define

• Weighted MMD

Office-10+Caltech-10

Unsupervised Domain Adaptation by Backpropagation (Ganin & Lempitsky, ICML 2015)

Simultaneous Deep Transfer Across Domains and Tasks (Tzeng et al., ICCV 2015)

• “maximally confuse” the two domains

• uniform distribution over domain labels

Domain cocktail network (Xu et al., CVPR 2018)

•

SimGAN (CVPR 2017)

• Learning from Simulated and Unsupervised Images through Adversarial Training (Shrivastava, Arxiv 2016)

• Realism loss

• Self-regularization

• is also pixel-level DA

Unsupervised Pixel–Level Domain Adaptation (CVPR 2017)

Image translation (Zhu et al., CVPR 2017)

Pix2pix: supervised image translation (Isola et al., CVPR 2017)

• Positive pair: • (Input, groundtruth)

• Negative pair:• (Input, synthesis)

Learning Residual Images (Shen & Liu, CVPR 2017)

Cycle-Consistent supervision (Zhu et al., ICCV 2017)• Cycle consistency loss

BicycleGAN: Multimodal Image-to-Image Translation (Zhu et al., NIPS 2017)

Suggestion

• Problem-oriented• Generator

• Discriminator+

References

• I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio, Generative adversarial nets, NIPS 2014.

• D.P. Kingma, M. Welling, Auto-encoding variational bayes, arXiv:1312.6114, 2013.

• N. Lei, K. Su, L. Cui, S.-T. Yau, D. X. Gu, A Geometric View of Optimal Transportation and Generative Model, Arxiv 2017.

• A.B.L. Larsen, S. K. Sønderby, H. Larochelle, O. Winther, Autoencoding beyond pixels using a learned similarity metric, ICML 2016.

• A. Nguyen, A. Dosovitskiy, J. Yosinski, T. Brox, and J. Clune. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks, NIPS 2016.

• X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, P. Abbeel, InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets, NIPS 2016.

• A. Odena, C. Olah, J. Shlens, Conditional image synthesis with auxiliary classifier GANs, ICML 2017.

• Z. He, W. Zuo, M. Kan, S. Shan, X. Chen, Arbitrary Facial Attribute Editing: Only Change What You Want, arXiv:1711.10678, 2017.

• E.L. Denton, S. Chintala, R. Fergus, Deep generative image models using a laplacian pyramid of adversarial networks, NIPS 2015.

• H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, D Metaxas, StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks, ICCV 2017.

• Q. Chen, V. Koltun, Photographic image synthesis with cascaded refinement networks, ICCV 2017.

• S. Benaim, and Lior Wolf, One-Sided Unsupervised Domain Mapping, NIPS 2017.

• K. Zhang, W. Zuo, Y. Chen, D. Meng, L. Zhang, Beyond a Gaussian denoiser: Residual learning of deep cnn for image denoising, IEEE T-IP 2017.

• M. Li, W. Zuo, D. Zhang, Deep Identity-aware Transfer of Facial Attributes, arXiv:1610.05586.

• D. Ulyanov, A. Vedaldi, V. Lempitsky, Deep Image Prior, CVPR 2018.

• R. Zhang, P. Isola, A. A. Efros, E. Shechtman, O. Wang, The Unreasonable Effectiveness of Deep Features as a Perceptual Metric, CVPR 2018.

• C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, ICLR 2014.

• A. Nguyen, J. Yosinski, J. Clune, Deep neural networks are easily fooled: High confidence predictions for unrecognizable images, CVPR 2015.

• C.-J. Simon-Gabriel, Y. Ollivier, B. Schölkopf, L. Bottou, D. Lopez-Paz, Adversarial Vulnerability of Neural Networks Increases With Input Dimension, arXiv:1802.01421.

• X. Wang, A. Shrivastava, A. Gupta, A-Fast-RCNN: Hard positive generation via adversary for object detection, CVPR 2017.

• C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, W. Shi, Photo-realistic single image super-resolution using a generative adversarial network, CVPR 2017.

• A. Ignatov, N. Kobyshev, R. Timofte, K. Vanhoey, L. Van Gool, DSLR-quality photos on mobile devices with deep convolutional networks, ICCV 2017.

• A. Ignatov, N. Kobyshev, R. Timofte, K. Vanhoey, L. Van Gool, WESPE: Weakly supervised photo enhancer for digital cameras, NTIRE 2018.

• D. Pathak,P. Krahenbuhl, J. Donahue, T. Darrell, A.A. Efros, Context encoders: Feature learning by inpainting, CVPR 2016.

• Z. Yan, X. Li, M. Li, W. Zuo, S. Shan, Shift-Net: Image Inpainting via Deep Feature Rearrangement, arXiv:1801.09392.

• X. Li, M. Liu, Y. Ye, W. Zuo, L. Lin, R. Yang, Learning Warped Guidance for Blind Face Restoration, arXiv:1804.04829.

• H. Yan, Y. Ding, P. Li, Q. Wang, Y. Xu, W. Zuo, Mind the class weight bias: Weighted maximum mean discrepancy for unsupervised domain adaptation, CVPR 2017.

• Y. Ganin, V. Lempitsky, Unsupervised Domain Adaptation by Backpropagation, ICML 2015.

• E. Tzeng, J. Hoffman, T. Darrell, Simultaneous Deep Transfer Across Domains and Tasks, ICCV 2015.

• R. Xu, Z. Chen, W. Zuo, J. Yan, L. Lin, Deep Cocktail Network: Multi-source Unsupervised Domain Adaptation with Category Shift, CVPR 2017.

• A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, R. Webb, Learning from simulated and unsupervised images through adversarial training, CVPR 2017.

• K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, D. Krishnan, Unsupervised pixel-level domain adaptation with generative adversarial networks, CVPR 2017.

• P. Isola, J.Y. Zhu, T. Zhou, A.A. Efros, Image-to-image translation with conditional adversarial networks, CVPR 2017.

• W. Shen, R. Liu, Learning residual images for face attribute manipulation, CVPR 2017.

• J.Y. Zhu, T. Park, P. Isola, A.A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks, ICCV 2017.

• J.Y. Zhu, R. Zhang, D. Pathak, T. Darrell, A.A. Efros, O Wang, E Shechtman, Toward Multimodal Image-to-Image Translation, NIPS 2017.

Generative Adversarial Nets: Applications and...

Documents

Transcript of Generative Adversarial Nets: Applications and...