Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT)...
Transcript of Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT)...
![Page 1: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/1.jpg)
Multimodal Unsupervised Image-to-Image Translation
Ming-Yu Liu
NVIDIA
![Page 2: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/2.jpg)
Supervised/Paired/Aligned/Registered Unsupervised/Unpaired/Unaligned/Unregistered
Supervised vs UnsupervisedPaired vs UnpairedAligned vs UnalignedSupervised vs Unsupervised
![Page 3: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/3.jpg)
Image Domain Transfer
![Page 4: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/4.jpg)
Example Applications
![Page 5: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/5.jpg)
Generator
……
Discriminator real/fake
Real data
Goodfellow et al. 2014
Generative Adversarial Networks (GANs)
Sample
![Page 6: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/6.jpg)
TranslationNetwork𝑭𝟏→𝟐
……
Discriminator real/fake
Domain 2
……Domain 1
Plain GAN for Unsupervised Image-to-Image Translation
![Page 7: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/7.jpg)
CycleGAN and UNIT
• CycleGAN (cycle consistency) [Zhu et al. 2017]
• UNIT (shared latent space) [Liu et al. 2017]
shared latent space
cycleconsistency
shared latent space ⟹ cycle consistency
![Page 8: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/8.jpg)
Unimodality
![Page 9: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/9.jpg)
Towards Multimodality
…
![Page 10: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/10.jpg)
Cycle consistency does not allow multimodality
Domain 𝒳1 Domain 𝒳2
𝑥1
𝑥2′
𝑥2′′
Cycle consistency:𝒳1 → 𝒳2 →𝒳1
Cycle consistency:𝒳2 → 𝒳1 →𝒳2
𝑥2′ = 𝑥2
′′
𝑝(𝑥2|𝑥1) = 𝛿 𝑥2 − 𝑥2′
𝑝(𝑥2|𝑥1) = 𝛿(𝑥2 − 𝑥2′′)
Shared latent space does not allow multimodality
shared latent space
cycleconsistency
![Page 11: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/11.jpg)
Disentangling the Latent Space
•UNIT • A single shared, domain-invariant latent space 𝒵
![Page 12: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/12.jpg)
Disentangling the Latent Space
•Multimodal UNIT (MUNIT)• A content space 𝒞 that is shared, domain-invariant• Two style spaces 𝒮1, 𝒮2 that are unshared, domain-specific
![Page 13: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/13.jpg)
Training
•Notations:• 𝑥: images• 𝑐: content• 𝑠: style
• Loss:• Bidirectional reconstruction loss
• Image reconstruction loss
• Latent reconstruction loss
• GAN lossWithin-domain reconstructionCross-domain translation
![Page 14: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/14.jpg)
Bidirectional Reconstruction Loss:Image Reconstruction
Notations:•𝑥: images• 𝑐: content• 𝑠: style
![Page 15: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/15.jpg)
Bidirectional Reconstruction Loss:Latent Reconstruction
Notations:•𝑥: images• 𝑐: content• 𝑠: style
![Page 16: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/16.jpg)
GAN Loss
Notations:•𝑥: images• 𝑐: content• 𝑠: style
![Page 17: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/17.jpg)
remove style
Background: Instance Normalization (IN)
Content feature: 𝑐normalization affine
transformation
“Instance Normalization: The Missing Ingredient for Fast Stylization”, Ulyanov et al. 2017
apply style
Feedforward transfer of a single style Content Output
![Page 18: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/18.jpg)
Adaptive Instance Normalization (AdaIN)
Style feature: 𝑠
Content feature: 𝑐
Feedforward transfer of arbitrary styles
VGG
Content
Style
AdaIN Decoder Output
remove style
apply style
![Page 19: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/19.jpg)
AdaIN in a Generative Network
En
co
de
rE
nco
de
r
Ad
aIN
ML
PE
nco
de
r
Ad
aIN
De
co
de
r
De
co
de
r
AdaIN in style transfer AdaIN in a generative network
![Page 20: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/20.jpg)
AdaIN in a Generative Network
En
co
de
rE
nco
de
r
Ad
aIN
De
co
de
r
ML
PE
nco
de
r
Ad
aIN
De
co
de
rAd
aIN
Ad
aIN
Co
nv
Co
nv
AdaIN in style transfer AdaIN in a generative network
![Page 21: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/21.jpg)
Architectural Implementation
![Page 22: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/22.jpg)
Sketches <-> Photo
Input Outputs
![Page 23: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/23.jpg)
Cats ↔ Dogs
Input Outputs
![Page 24: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/24.jpg)
Synthetic ↔ Real
Input Outputs
![Page 25: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/25.jpg)
Summer ↔ Winter
Input Outputs
![Page 26: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/26.jpg)
Example-guided Translation
![Page 27: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/27.jpg)
Example-guided Translation
![Page 28: Multimodal Unsupervised Image-to-Image Translation · 2019-09-22 · •Multimodal UNIT (MUNIT) •A content space 𝒞that is shared, domain-invariant •Two style spaces ...](https://reader033.fdocuments.in/reader033/viewer/2022043022/5f3dbaf12eae8f52887e6036/html5/thumbnails/28.jpg)
Conclusion
• Translate one input image to multiple corresponding images in the target domain.
• Content and style decomposition via the AdaIN design
• ECCV 2018
• MUNIT code: https://github.com/nvlabs/munit/
• Paper: https://arxiv.org/abs/1804.04732
Xun HuangNVIDIA, Cornell
Serge BelongieCornell
Jan KautzNVIDIA
Ming-Yu LiuNVIDIA