3D Volumetric Data Generation with
Generative Adversarial Networks
Hiroyuki Vincent Yamazaki Keio University [email protected]
Preferred Networks Summer Internship, 2016
BackgroundGenerative Adversarial Networks (GAN) [1] have achieved state-of-the-art performance in unsupervised learning, generating synthetic images by training on the MNIST dataset or ImageNet for multi-channel images.However, these networks have not yet been extended to higher dimensions such as volumetric 3D data. Generated 3D model have various applications in entertainment and could be used as an alternative to existing procedural methods for creating graphics.This study demonstrates the capabilities of GAN-based architectures for generating practical 3D models by applying 3 dimensional convolutions and deconvolutions* on voxel data.
Goal• Extension of GANs to 3D volumetric data, training on a single class• Control the shapes of the generated models by e.g. interpolation
1. Introduction
*Transposed Convolutions
2. Training Data3D CAD models from ShapeNet [2]• Class: Chair• Instances: 4846
Preprocessing • Voxelization
• 3D CAD models are converted into binary 0, 1 voxels with dimensions (32, 32, 32). [3]• Normalization
• No normalization is applied. Data is in range [0, 1]• Other
• Remove bad samples and centre the models in the space
Training Data Volume DistributionMean 3D Model
A GAN consists of a generator G and a discriminator D, in this case, both of them are represented as a feed forward neural network that are trained simultaneously.• Random noise z vectors sampled from a uniform or
Gaussian distribution
Loss • Softmax cross-entropies based on the predictions of D• Separate losses for G and D defined by the minimax game
Optimal Discriminator Strategy
Optimization• Adam for both G and D • Learning rate of G is larger than D
3. Generative Adversarial Network
Random Noise
Random Index
Generator(Linear, Deconvolution, Batch Normalization,
ReLU, Sigmoid)
Discriminator(Convolution, Linear, Leaky ReLU)
Training Data
Generated3D Model
Real3D Model
Generated/RealPrediction
See Appendix for the network architecture and Adam parameters
min
G
max
D
V (G,D) = Ex⇠Pdata(x)[logD(x)] + E
z⇠Pz(z)[log(1�D(G(z)))]
D(x) =pdata(x)
pdata(x) + pG(x)
Issues with GAN• Collapsing Generator
• G outputs similar 3D models for different inputs• Non-semantic input z
• Interpolation of z indicate on sharp edges in the latent space. Hence no way to control the shape of the output
Improving the GAN• Avoid Generator from collapsing
• Minibatch Discrimination [4] layer in D• Embed semantic meaning into the input [5]
• With z, concatenate additional latent codes before feeding it to G
• Additional loss based on mutual information reconstruction by D
Random Noise + Latent Codes
Random Index
Generator(Linear, Deconvolution, Batch Normalization,
ReLU, Sigmoid)
Discriminator(Convolution, Linear, Leaky ReLU, Minibatch Discrimination)
Training Data
Generated3D Model
Real3D Model
Generated/RealPrediction
Mutual InformationReconstruction
Minibatch DiscriminationMotivationAvoid generator from collapsing to a single pointIdeaReproduce the diversity in the training dataMinibatch Discrimination layer to D, before the generated/real prediction
For each minibatch fed to this layer, compute the L1 distance between all input vectorsAdd this information to the given minibatch
Mutual Information ReconstructionMotivationEmbed semantic meanings in zIdeaMaximize the mutual information being preserved for latent codes C that are passed through the networksLatent Codes, input to G• C = [C1, C2, C3] (Concatenations)
• Categorical one-hot vector C1~Cat(K=2, p=0.5)• Continuous C2~Unif(-1, 1)• Continuous C3~Unif(-1, 1)
Reconstruction, output from D• Categorical
• Softmax Cross Entropy
• Continuous• Assume a fixed variance and compute the Gaussian
negative log-likelihood based on the mean.
z c1, e.g. [0, 1] c2 c3
Softmax1
𝞵2 𝞵3
Minibatch Discrimination Layer
Kernel … …
• Minibatch size: 128• Epochs: 100
4. Results
Generated 3D Models
*The blue models are their nearest models in the training dataset
3D VolumeDistributions
Chair-likeness Learned Distribution
True DistributionLosses
5. Conclusions• GANs can be extended to 3D volumetric data using 3 dimensional convolutions and deconvolutions• Smaller datasets (sparse data) leads to worse looking models with noise
• Partially mitigated by reconstructing mutual information reconstruction and minibatch discrimination• In many cases, D improves faster than G
• Gradients back propagated through G saturates and training stops• Training not converging
Future Work• Larger dataset with potentially multiple classes• Balance training between G and D
• Heuristic• Stop updating D while it is too strong
• Larger G, i.e. more parameters
Reference[1] Goodfellow et al. (2014). Generative Adversarial Networks. abs/1406.2661, .[2] Angel X. Chang and (2015). ShapeNet: An Information-Rich 3D Model Repository. CoRR, abs/1512.03012, .[3] Patrick Min, Binvox, 3D Mesh Voxelizer, http://www.patrickmin.com/binvox/[4] Tim Salimans et al. (2016). Improved Techniques for Training GANs. CoRR, abs/1606.03498, .[5] Xi Chen et al. (2016). InfoGAN: Interpretable Representation Learning by Information Maximizing. CoRR, abs/1606.03657, .
Appendix
Generator DiscriminatorInput ∈ R128+2+2 Input 32x32x32 3D voxel data
FC 1024, BN, ReLU Conv 1 → 64, Kernel 4, Stride 2, lReLU (leaky ReLU)FC 16384, BN, ReLU Conv 64 → 128, Kernel 4, Stride 2, BN, lReLU
DC 256 → 128, Kernel 4, Stride 2, BN, ReLU Conv 128 → 256, Kernel 4, Stride 2, BN, lReLUDC 128 → 64, Kernel 4, Stride 2, BN, ReLU FC 1024, BN, lReLU
Output DC 64 → 1, Kernel 4, Stride 2, BN, ReLU Minibatch Discrimination, Kernels 64, Kernel Dimension 16Output FC 2 (Generated/Real prediction)
FC 256, BN, lReLUOutput FC 2+2 (Mutual Information Reconstruction)
Adam Optimizer ParametersGenerator Discriminator
ɑ 0.001 0.00005β1 0.5 0.5β1 0.999 0.999
GAN Architecture
Top Related