3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D +...
Transcript of 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D +...
![Page 1: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/1.jpg)
3D + GraphicsQI ZHU & JUHO KIM
![Page 2: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/2.jpg)
Outline
• Pose estimation (3D recovery from 2D images)
• Novel Image / View synthesis
• Reconstruction and generation of 3D
![Page 3: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/3.jpg)
Part 1POSE ESTIMATION (3D RECOVERY FROM 2D IMAGES)
![Page 4: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/4.jpg)
Viewpoint Estimation
INPUT: RGB image
OUTPUT: Camera pose = Rotation (yaw, pitch, roll) and Translation Matrix
Beyond PASCAL: A Benchmark for 3D Object Detection in the Wild WACV14
![Page 5: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/5.jpg)
Render for CNN [ICCV15]
Su, Hao, et al. "Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views." ICCV2015
![Page 6: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/6.jpg)
Render for CNN [ICCV15]
Su, Hao, et al. "Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views." ICCV2015
![Page 7: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/7.jpg)
Render for CNN [ICCV15]
Su, Hao, et al. "Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views." ICCV2015
CNN
![Page 8: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/8.jpg)
Render for CNN [ICCV15]
Su, Hao, et al. "Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views." ICCV2015
![Page 9: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/9.jpg)
Generating synthesizedimagesStructure-preserving deformation
◦ Symmetry-preserving free-form deformation
◦ Embed object in uniform grid
◦ Represent every point in space as a weighted combination of the control points
◦ Draw i.i.d vectors for control point
◦ Translate vector of each control point
◦ Set the translations of symmetric control points to be equal
Scott Schaefer Free-Form Deformation of Solid Geometric Models TAMU
![Page 10: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/10.jpg)
Cont’dOverfit-Resistant Image Synthesis ◦ Variation in rendering
◦ vary light condition and camera pose
◦ Background synthesis ◦ alpha composition blend
◦ Cropping◦ Adding real annotated image
Image Compositing and Blending CMU15-463 2007 Fall
![Page 11: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/11.jpg)
Final Training Dataset
Image Compositing and Blending CMU15-463 2007 Fall
![Page 12: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/12.jpg)
Problem formulationInput: single RGB imageViewpoint as a tuple (θ, φ, ψ) of camera rotation parameters
◦ Discretized and divided into 360, 180, 360 bins
Rotation reference: predefined initial pose face camera
Output: probabilities of each viewpoint
Loss function:
where Pv(s; cs ) is the probability of view v for sample s from the soft-max viewpoint classifier of class cs
Su, Hao, et al. "Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views." ICCV2015
![Page 13: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/13.jpg)
Class-Dependent Network Architecture
• Based on Alex Net [NIPS12]
• Shared weights but different class FC Layer
• Large number of outputs! (380+180+360) x N
• Claim is that different output layers handle the large variance among
different object categories
Su, Hao, et al. "Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views." ICCV2015
![Page 14: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/14.jpg)
Does synthetic training data help?
Su, Hao, et al. "Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views." ICCV2015
![Page 15: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/15.jpg)
Several results
Su, Hao, et al. "Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views." ICCV2015
![Page 16: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/16.jpg)
Keypoint Estimation: 2D to 3D
Wei, Shih-En, et al. “Convolutional pose machines.”CVPR 2016, IKEA dataset [Lim et al., 2013]
![Page 17: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/17.jpg)
Single Image 3D Interpreter Network [ECCV16]
Simultaneously infer 2D keypoints heatmap and
3D structures from single image!
Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016
![Page 18: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/18.jpg)
ChallengesAnnotations?◦ 2D keypoints labels – easy to get, e.g. crowdsourcing
◦ 3D object annotations in real 2D images – hard to acquire
Synthetic training data?◦ Statistics of real and synthesized images is different
Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016
![Page 19: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/19.jpg)
Using both real and synthetic image
Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016
![Page 20: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/20.jpg)
Using both real and synthetic image
Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016
![Page 21: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/21.jpg)
3D Object Representation
Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016
![Page 22: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/22.jpg)
3D-Skeleton RepresentationAssumption:
◦ objects can only have constrained deformations
◦ The first base shape is the mean shape of all objects within the category
◦ 3D keypoint locations are a weighted sum of base shapes
Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016
![Page 23: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/23.jpg)
Step 1: Estimates 2D keypoint heatmaps (a and b)
Step 2: Train 3D interpreter on 3D synthetic data (c)
Step 3: Jointly train projection layer from 2D annotations (d)
Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016
Network Overview
![Page 24: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/24.jpg)
Zoom in: Keypoint Estimator
Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016
![Page 25: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/25.jpg)
Zoom in: 3D Interpreter
Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016
![Page 26: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/26.jpg)
Zoom In: End to End
Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016
![Page 27: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/27.jpg)
Results
Wu, Jiajun, et al. “Single image 3d interpreter network.” ECCV 2016
![Page 28: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/28.jpg)
Part 2NOVEL IMAGE / V IEW SYNTHESIS
![Page 29: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/29.jpg)
New scene synthesis?
Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016
![Page 30: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/30.jpg)
Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016
Given Images
Newly Generated Image
Note the change in viewpoint
![Page 31: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/31.jpg)
DeepStereo: Learning to Predict New Views from the World’s Imagery
Issues:
• Interpret rotation and
image reprojection
• Long-distance pixel
correlation
Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016
Given V1, V2, generate C
![Page 32: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/32.jpg)
DeepStereo: plane–sweep volumes
Solve two other
problems before
generating a new
view!But why?
Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016
![Page 33: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/33.jpg)
Plane–sweep volumes (Cont’d)
• Sweep family of planes at different depths w.r.t. a reference camera
• For each depth, project each input image onto that plane
• This is equivalent to a homography warping each input image into the
reference view
R. Collins. A space-sweep approach to true multi-image matching. CVPR 1996.
![Page 34: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/34.jpg)
DeepStereo: overview
{S} Depth Selection tower
output
{C} Color tower output
Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016
Output is a masked sum over
images selected at different
depths
![Page 35: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/35.jpg)
DeepStereo: Selection Tower
INPUT: Plane-sweep volumes
OUTPUT: A probability map (or selection map) for each depth indicating the likelihood of each pixel having that depth.
Weight-sum imagesynthesis: Can beinterpreted as expectationamong depth
Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016
![Page 36: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/36.jpg)
DeepStereo: Color Tower
Bonus: D input planes
reduce the effect of
occlusion
OUTPUT: 3D volume
of nodes -> R,G,B
channels each pixel
Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016
![Page 37: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/37.jpg)
DeepStereo: TrainingApply multi-resolution patches in Color Tower
Feed 26 x 26 patches Produce 8 x 8 patches
Use 96 depth planes
Trained via Adagrad
Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016
![Page 38: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/38.jpg)
DeepStereo: Results
Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016
![Page 39: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/39.jpg)
Flynn, John, et al. “DeepStereo: Learning to predict new views from the world‘s imagery.” CVPR 2016
![Page 40: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/40.jpg)
Part 3RECONSTRUCTION AND GENERATION OF 3D
![Page 41: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/41.jpg)
Part 3
3D structure reconstruction and generation
• DC-IGN model (Kulkarni et al. NIPS 2015)
• Perspective transformer networks (Yan et al. NIPS 2016)
• 3D-GAN (Wu et al. NIPS 2016)
![Page 42: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/42.jpg)
Deep Convolutional Inverse Graphics Network (DC-IGN)
• Motivation: Can a deep network learn to disentangle factors of image generation such as lighting, rotation, etc.?
• Can we learn a renderer?
• Recall: Conditional VAEs and constraints on latent space
Kulkarni et al., Deep Convolutional Inverse Graphics Network, NIPS 2015
![Page 43: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/43.jpg)
DC-IGN• Generate transformed images w.r.t rotations, light variations, etc.
• Input: 2D image (150 x 150 pixels)
• Output: 2D image that has one different 3D property
Kulkarni et al., Deep Convolutional Inverse Graphics Network, NIPS 2015
rotation light variation
• Learn latent variables that represent complex transformations
• Use an encoder-decoder structure based on VAE
Link: http://willwhitney.github.io/dc-ign/www/
![Page 44: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/44.jpg)
DC-IGN• Graphics codes z
Kulkarni et al., Deep Convolutional Inverse Graphics Network, NIPS 2015
rotation of the object
elevation of the object with respect to the camera
variations of the light source
Extrinsic properties
![Page 45: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/45.jpg)
DC-IGN• Model architecture
- Deep convolution and de-convolution within a VAE formulation
Kulkarni et al., Deep Convolutional Inverse Graphics Network, NIPS 2015
![Page 46: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/46.jpg)
DC-IGN• Encoder output:
• Model parameter:
• Distribution parameters:
• Variational objective function:
Kulkarni et al., Deep Convolutional Inverse Graphics Network, NIPS 2015
![Page 47: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/47.jpg)
DC-IGN• Training on a minibatch in which only one extrinsic property
changes i.e. Only a specific latent variable changes
• Key Idea: Force all other latent variables to be same across examples – Force all of them to be close to the minibatch mean
Kulkarni et al., Deep Convolutional Inverse Graphics Network, NIPS 2015
![Page 48: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/48.jpg)
DC-IGN• Generation w.r.t. manipulation of pose variables
Kulkarni et al., Deep Convolutional Inverse Graphics Network, NIPS 2015
![Page 49: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/49.jpg)
DC-IGN• Generation w.r.t. light directions
Kulkarni et al., Deep Convolutional Inverse Graphics Network, NIPS 2015
![Page 50: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/50.jpg)
DC-IGN• Manipulate rotation for a different dataset (chair dataset)
Kulkarni et al., Deep Convolutional Inverse Graphics Network, NIPS 2015
![Page 51: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/51.jpg)
Pixel to Voxel
• DC-IGN: 2D image (pixel) → transformed 2D image (pixel)
• Next models: 2D image (pixel) → 3D image (voxel)
![Page 52: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/52.jpg)
• Predict the underlying true 3D shape of an object given a 2D single image
• Learn 3D object reconstruction without 3D ground-truth data
• Use different 2D images from multiple viewpoints
• Define two loss functions to generate 3D structures
Perspective Transformer Nets
Yan et al., Perspective Transformer Nets, NIPS 2016
![Page 53: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/53.jpg)
Perspective Transformer Nets• 𝐼 𝑘 : 2D image from k-th viewpoint α 𝑘 by projection
→ 𝐼 𝑘 = 𝑃(𝑋 ; α 𝑘 )
Yan et al., Perspective Transformer Nets, NIPS 2016
![Page 54: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/54.jpg)
Perspective Transformer NetsCase 1 we know the ground truth 3D volume 𝑉
• Generate 3D volume 𝑉 = 𝑓 𝐼 𝑘 = 𝑔(ℎ(𝐼 𝑘 ))
where ℎ · learns a viewpoint-invariant latent representation
𝑔 · is a volume generator
• Loss function
• However, ground truth volume may not be available in practice
Yan et al., Perspective Transformer Nets, NIPS 2016
![Page 55: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/55.jpg)
Perspective Transformer NetsCase 2 we do NOT know the ground truth 3D volume 𝑉
• Use 2D silhouette images
• 𝑆(𝑗): ground truth 2D silhouette image for the j-th viewpoint α 𝑗
• መ𝑆(𝑗): generated silhouettes
• Loss function:
![Page 56: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/56.jpg)
Perspective Transformer Nets
![Page 57: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/57.jpg)
• Consider a combination of ℒ𝑣𝑜𝑙 and ℒ𝑝𝑟𝑜𝑗
Perspective Transformer Nets
Yan et al., Perspective Transformer Nets, NIPS 2016
𝑓 𝐼 𝑘 = 𝑔(ℎ(𝐼 𝑘 ))
Reference for encoder: Yang et al., NIPS 2015
![Page 58: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/58.jpg)
Perspective Transformer Nets• How to obtain 2D silhouette መ𝑆(𝑗) - perspective projection
• Transformation matrix
where K: camera calibration matrix & (R, t): extrinsic parameters
• Perspective transformation:
where 3D coordinates:
screen coordinates:
Yan et al., Perspective Transformer Nets, NIPS 2016
![Page 59: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/59.jpg)
Perspective Transformer Nets• Use spatial transformer network (Jaderberg et al. NIPS 2015)
(1) Perform dense sampling from input volume in 3D coordinates
to output volume in screen coordinates
(2) Flatten the 3D spatial output across disparity dimension.
Yan et al., Perspective Transformer Nets, NIPS 2016
![Page 60: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/60.jpg)
Perspective Transformer Nets• Training on single category
Yan et al., Perspective Transformer Nets, NIPS 2016
![Page 61: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/61.jpg)
Perspective Transformer Nets
![Page 62: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/62.jpg)
Perspective Transformer Nets• Training on multiple category
![Page 63: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/63.jpg)
3D-GAN• Generate an object in 3D voxel space from a randomly sampled vector
• Use the Generative Adversarial Network (GAN)
• Map randomly sampled vector in a latent space to an object in 3D voxel space
Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016
Random vector
Link: http://3dgan.csail.mit.edu/
![Page 64: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/64.jpg)
3D-GAN• GAN – Generator + Discriminator
• Generator 𝐺: 𝑧 → 𝐺(𝑧)
where 𝑧: latent vector (200 dimension),
𝐺(𝑧): 3D object in 3D voxel space (64 x 64 x 64 cube)
• Discriminator D: output a confidence value of whether an input is
real or synthetic
• Overall adversarial loss function:
where 𝑥: a real object in a 64 x 64 x 64 space,
𝑧: randomly sampled noise from 𝑝(𝑧)
Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016
![Page 65: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/65.jpg)
3D-GAN• Network structure of generator in 3D-GAN
Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016
![Page 66: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/66.jpg)
3D-VAE-GAN• Extension to 3D-GAN
• Inspired by VAE-GAN of Larsen et al. (ICML 2016)
• Take a 2D image as input to generate a 3D object
Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016
![Page 67: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/67.jpg)
3D-VAE-GAN• New loss function
where 𝑥: a 3D shape from the training set,
𝑦: 𝑥′𝑠 corresponding 2D image,
𝑞(𝑧|𝑦): variational distribution of 𝑧,
𝑝(𝑧): multivariate Gaussian prior
Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016
![Page 68: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/68.jpg)
3D-GAN• 3D object generation
Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016
![Page 69: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/69.jpg)
3D-GAN• 3D object classification
Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016
![Page 70: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/70.jpg)
3D-VAE-GAN• Single image 3D reconstruction
Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016
![Page 71: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/71.jpg)
3D-VAE-GAN• Single image 3D reconstruction
Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016
![Page 72: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/72.jpg)
3D-GAN• Shape arithmetic for chairs
Wu et al., Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016
![Page 73: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/73.jpg)
Summary
• 3D vision and graphics based on deep learning
- Pose estimation (3D recovery from 2D images)
- Novel Image / View synthesis
- Reconstruction and generation of 3D
![Page 74: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/74.jpg)
References• John Flynn, Ivan Neulander, James Philbin, Noah Snavely, DeepStereo: Learning to Predict New Views from the World’s Imagery,
CVPR 2016.
• Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, Spatial Transformer Networks, NIPS 2015
• Alex Kendall, Matthew Grimes, Roberto Cipolla, PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization, ICCV 2015.
• Tejas D. Kulkarni, William F. Whitney, Pushmeet Kohli, Josh Tenenbaum, Deep Convolutional Inverse Graphics Network, NIPS 2015.
• Danilo Jimenez Rezende, S. M. Ali Eslami, Shakir Mohamed, Peter Battaglia, Max Jaderberg, Nicolas Heess, Unsupervised Learning of 3D Structure from Images, NIPS 2016.
• Hao Su, Charles R. Qi, Yangyan Li, Leonidas J. Guibas, Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views. ICCV 2015.
• Jiajun Wu, Tianfan Xue, Joseph J. Lim, Yuandong Tian, Joshua B. Tenenbaum, Antonio Torralba, and William T. Freeman, Single Image 3D Interpreter Network, ECCV 2016.
• Jiajun Wu, Chengkai Zhang, Tianfan Xue, William T. Freeman, Joshua B. Tenenbaum, Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling, NIPS 2016.
• Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo, Honglak Lee, Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision, NIPS 2016.
• Jimei Yang, Scott Reed, Ming-Hsuan Yang, Honglak Lee, Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis, NIPS 2015.
![Page 75: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/75.jpg)
Backup slidesREZENDE ET AL . , N IPS 2016 .
![Page 76: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/76.jpg)
Rezende et al. (NIPS 2016)• Construct the underlying 3D structures from 2D observations
• Learn a generative model of 3D structures
• Recover the structure from 2D images via inference
Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016
![Page 77: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/77.jpg)
Rezende et al. (NIPS 2016)• Consider a conditional latent variable model
𝑥: observed image or volume
𝑐: observed contextual information (nothing, object
class label, or one or more views of the scene from
different cameras)
𝑧: low-dimensional codes of latent manifold of object
shapes
ℎ: 3D representations (volume or mesh)
Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016
![Page 78: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/78.jpg)
Rezende et al. (NIPS 2016)• Proposed framework
Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016
![Page 79: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/79.jpg)
Rezende et al. (NIPS 2016)• Sequential generative process: refinement of hidden representations
• Each step generates an independent set of 𝑧𝑡
• 𝑓𝑟𝑒𝑎𝑑: task dependent context encoder
• 𝑓𝑠𝑡𝑎𝑡𝑒: transition function (fully connected LSTM)
• 𝑓𝑤𝑟𝑖𝑡𝑒: volumetric spatial transformer (Jaderberg et al. NIPS 2015)
• Proj: projection operator from latent 3D representation h𝑇 to the training data’s domain
Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016
![Page 80: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/80.jpg)
Rezende et al. (NIPS 2016)• Volumetric spatial transformer (VST)
where : simple affine transformation of a 3D
grid of points that uniformly covers the input image
Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016
![Page 81: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/81.jpg)
Rezende et al. (NIPS 2016)• Projection operator (3 types)
- 3D → 3D (identity)
- 3D → 2D neural network projection (learned)
- 3D → 2D OpenGL projection (fixed)
Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016
![Page 82: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/82.jpg)
Rezende et al. (NIPS 2016)• Probabilistic volume completion (Necker cube, Primitives and MNIST3D)
Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016
![Page 83: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/83.jpg)
Rezende et al. (NIPS 2016)• Class-conditional samples (one-hot encoding of class as context)
Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016
![Page 84: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/84.jpg)
Rezende et al. (NIPS 2016)• Recover 3D structure from 2D images
Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016
![Page 85: 3D + Graphics - University Of Illinois › spring17 › lec14_3d.pdf · 2017-03-07 · 3D + Graphics QI ZHU & JUHO KIM. Outline •Pose estimation (3D recovery from 2D images) •Novel](https://reader030.fdocuments.in/reader030/viewer/2022041112/5f19c19637118115b7239dc1/html5/thumbnails/85.jpg)
Rezende et al. (NIPS 2016)• Unsupervised learning of 3D structure (mesh representations)
Rezende et al., Unsupervised Learning of 3D structure from Images, NIPS 2016
Link: https://www.youtube.com/watch?v=stvDAGQwL5c