And now for something completely...
Transcript of And now for something completely...
![Page 1: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/1.jpg)
And now for something completely different!
![Page 2: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/2.jpg)
Learning-Based 3D
![Page 3: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/3.jpg)
Goal
• I’d like to answer: what is computer vision and where is it headed?
• In the process, I’d like to give you a sense of what computer vision is like.
![Page 4: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/4.jpg)
What is CV?
Get a computer to understand
![Page 5: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/5.jpg)
What is CV?
This could incorporate:
• Naming things (recognition)
• Reconstruction (geometry)
• Understanding opportunities for action (didn’t cover – call me in 10 years)
• In the process, requires building up tools for processing images and fitting models
![Page 6: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/6.jpg)
Reality of “What is CV?”Right now: most people would say a buffet of techniques & accumulated knowledge about
geometry, pixels, data and learning.
𝑨𝒙 + 𝒃
![Page 7: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/7.jpg)
Don’t Be Disappointed With The Buffet!
Get a computer to understand
Understanding an image is incredibly difficult, involves much
of your incredible brain, and is perhaps “AI-Complete”
Plus, email me in 15 years and see if we have non-buffet answers.
![Page 8: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/8.jpg)
Where is it Headed?
3 topics that I am betting on or which other people are betting on:
• Learning and geometry (Today)
• Embodied agents (Thursday)
• Vision and language (Next Tuesday)
Some slides won’t be posted since I’m borrowing heavily from others’ current research slides that they’ve been generous to share.
![Page 9: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/9.jpg)
In the Process
• I hope to give you a sense of how:
• Research in vision is conducted
• We think we know we’ve succeeded
• We think we know we’re not fooling ourselves!
![Page 10: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/10.jpg)
Cues For 3D
p
X?X?
X?
• Given a calibrated camera and an image, we only know the ray corresponding to each pixel.
• Nowhere near enough constraints for X
![Page 11: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/11.jpg)
Cues For 3D
X
p
p
• Stereo: given 2 calibrated cameras in different views and correspondences, can solve for X
![Page 12: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/12.jpg)
Cues For 3D
Yet when you look at this…
![Page 13: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/13.jpg)
Pictorial Cues for 3D
“3D”f( · )
Learned from Data
![Page 14: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/14.jpg)
Pictorial Cues for 3D
“3D”f( · )
Learned from Data
![Page 15: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/15.jpg)
3D Representations
Depthmap
FAR
NEAR
2.3m
![Page 16: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/16.jpg)
3D Representations
FAR
NEAR
𝑧(𝑢, 𝑣)
𝜕
𝜕𝑧
𝜕𝑢,𝜕𝑧
𝜕𝑣, −1
![Page 17: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/17.jpg)
3D Representations
Surface Normals
Room
Legend
[0.06,0.99,0.12]
![Page 18: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/18.jpg)
3D Representations
f( · )
Surface Normals
D.F. Fouhey, A. Gupta, M. Hebert. Data-Driven 3D Primitives for Single Image Understanding. ICCV 2013.
![Page 19: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/19.jpg)
Direct Cues For Normals
“Depth-difference judgments and attitude settings [surface normals] appear to be independent tasks.”-Koenderink, van Doorn, Kappers ‘96
Norman and Todd, The Discriminability of Local Surface Structure. Perception 1996Koenderink, Van Doorn, Kappers. Pictorial Surface attitude and Local Depth Comparisons. Perception & Psychophysics, 1996
Johnston and Passmore, Independent Encoding of Surface Orientation and Surface Curvature. Vision Research, 1994etc.
![Page 20: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/20.jpg)
Direct Cues For Normals
![Page 21: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/21.jpg)
Direct Cues to Normals
Vanishing Point
Vanishing Point
![Page 22: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/22.jpg)
Comment – Representations
• For something as simple as whether to predict depth (z(u,v)) or the orientation of the plane
(𝜕𝑧
𝜕𝑢,𝜕𝑧
𝜕𝑣, −1 ), there are different:
• Metrics (duh)
• Methods (hmm) – these typically aim to take advantage of special structure of the problem
• Applications of techniques
![Page 23: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/23.jpg)
Surface Normals
NormalsColor ImageD. F. Fouhey, A. Gupta, M. Hebert. Data-Driven 3D Primitives for Single Image Understanding. ICCV 2013.
D. F. Fouhey, A. Gupta, M. Hebert. Unfolding an Indoor Origami World. ECCV 2014.X. Wang, D.F. Fouhey, A. Gupta. Designing Deep Networks for Surface Normal Estimation. CVPR 2015.
![Page 24: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/24.jpg)
Surface Normals
D. F. Fouhey, A. Gupta, M. Hebert. Data-Driven 3D Primitives for Single Image Understanding. ICCV 2013.D. F. Fouhey, A. Gupta, M. Hebert. Unfolding an Indoor Origami World. ECCV 2014.
X. Wang, D.F. Fouhey, A. Gupta. Designing Deep Networks for Surface Normal Estimation. CVPR 2015.
![Page 25: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/25.jpg)
Applying Deep Learning
CNN
Input Output
How do we represent
the output?
How do weincorporateconstraints?
X. Wang, D.F. Fouhey, A. Gupta. Designing Deep Networks for Surface Normal Estimation. CVPR 2015
![Page 26: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/26.jpg)
Representation and Objective
Normal quantization scheme from Ladicky et al. 2014
Ground TruthInput
Class 1: Class 2: Class K:…
Quantized Normals
![Page 27: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/27.jpg)
+VP
CNN
Results
![Page 28: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/28.jpg)
Results
Input Output Input Output
![Page 29: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/29.jpg)
Results
Input Output Input Output
![Page 30: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/30.jpg)
Comment – Picking Problems
• I’d show results, and the response from many people would be “sure, sure, neat but I’ll just buy a Kinect”
![Page 31: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/31.jpg)
3D Representations
![Page 32: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/32.jpg)
3D Representations
Adams et al., Scientific Reports, 2016
~$50K, 6.5 minutes an image
![Page 33: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/33.jpg)
3D Representations
How thick is the table? What’s behind it?Is the chair attached to the table?
![Page 34: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/34.jpg)
3D Representations
RGB Image VoxelsR. Girdhar, D. F. Fouhey, M. Rodriguez, A. Gupta.
Learning a predictable and generative vector representation for objects. ECCV 2016Contemporary work also proposing to predict voxels: C. Choy et al. ECCV 2016.
f( · )
![Page 35: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/35.jpg)
3D Representations
RGB Image VoxelsR. Girdhar, D. F. Fouhey, M. Rodriguez, A. Gupta.
Learning a predictable and generative vector representation for objects. ECCV 2016Contemporary work also proposing to predict voxels: C. Choy et al. ECCV 2016.
f( · )
![Page 36: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/36.jpg)
Approach
20x20x20Voxel output
P(voxel i empty)
P(voxel i filled)
0.1
0.9
![Page 37: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/37.jpg)
Approach
256384384
256
96
4096
64 F
ull
y C
on
nec
ted
6
96
256
384
256224x224x3Image Input
20x20x20Voxel output
20x20x20Voxel Truth
![Page 38: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/38.jpg)
Main Idea
Representation should satisfy
• Generative in 3D: should be able to construct objects
• Predictable from 2D: should be able to infer from an ordinary image
![Page 39: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/39.jpg)
Output Representation – Voxels
• Binary 3D Pixels• Size fixed in advance • Spatially organized
![Page 40: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/40.jpg)
MotivationHow many couches are there really?
173766203193809456599982445949435627061939786100117250547173286503262376022458008465094333630120854338003194362163007597987225472483598640843335685441710193966274131338557192586399006789292714554767500194796127964596906605976605873665859580600161998556511368530960400907199253450604168622770350228527124626728538626805418833470107651091641919900725415994689920112219170907023561354484047025713734651608777544579846111001059482132180956689444108315785401642188044178788629853592228467331730519810763559577944882016286493908631503101121166109571682295769470379514531105239965209245314082665518579335511291525230373316486697786532335206274149240813489201828773854353041855598709390675430960381072270432383913542702130202430186637321862331068861776780211082856984506050024895394320139435868484643843368002496089956046419964019877586845530207748994394501505588146979082629871366088121763790555364513243984244004147636040219136443410377798011608722717131323621700159335786445601947601694025107888293017058178562647175461026384343438874861406516767158373279032321096262126551620255666605185789463207944391905756886829667520553014724372245300878786091700563444079107099009003380230356461989260377273986023281444076082783406824471703499844642915587790146384758051663547775336021829171033411043796977042190519657861762804226147480755555085278062866268677842432851421790544407006581148631979148571299417963950579210719961422405768071335213324842709316205032078384168750091017964584060285240107161561019930505687950233196051962261970932008838279760834318101044311710769457048672103958655016388894770892065267451228938951370237422841366052736174160431593023473217066764172949768821843606479073866252864377064398085101223216558344281956767163876579889759124956035672317578122141070933058555310274598884089982879647974020264495921703064439532898207943134374576254840272047075633856749514044298135927611328433323640657533550512376900773273703275329924651465759145114579174356770593439987135755889403613364529029604049868233807295134382284730745937309910703657676103447124097631074153287120040247837143656624045055614076111832245239612708339272798262887437416818440064925049838443370805645609424314780108030016683461562597569371539974003402697903023830108053034645133078208043917492087248958344081026378788915528519967248989338
592027124423914083391771884524464968645052058218151010508471258285907685355807229880747677634789376
![Page 41: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/41.jpg)
173766203193809456599982445949435627061939786100117250547173286503262376022458008465094333630120854338003194362163007597987225472483598640843335685441710193966274131338557192586399006789292714554767500194796127964596906605976605873665859580600161998556511368530960400907199253450604168622770350228527124626728538626805418833470107651091641919900725415994689920112219170907023561354484047025713734651608777544579846111001059482132180956689444108315785401642188044178788629853592228467331730519810763559577944882016286493908631503101121166109571682295769470379514531105239965209245314082665518579335511291525230373316486697786532335206274149240813489201828773854353041855598709390675430960381072270432383913542702130202430186637321862331068861776780211082856984506050024895394320139435868484643843368002496089956046419964019877586845530207748994394501505588146979082629871366088121763790555364513243984244004147636040219136443410377798011608722717131323621700159335786445601947601694025107888293017058178562647175461026384343438874861406516767158373279032321096262126551620255666605185789463207944391905756886829667520553014724372245300878786091700563444079107099009003380230356461989260377273986023281444076082783406824471703499844642915587790146384758051663547775336021829171033411043796977042190519657861762804226147480755555085278062866268677842432851421790544407006581148631979148571299417963950579210719961422405768071335213324842709316205032078384168750091017964584060285240107161561019930505687950233196051962261970932008838279760834318101044311710769457048672103958655016388894770892065267451228938951370237422841366052736174160431593023473217066764172949768821843606479073866252864377064398085101223216558344281956767163876579889759124956035672317578122141070933058555310274598884089982879647974020264495921703064439532898207943134374576254840272047075633856749514044298135927611328433323640657533550512376900773273703275329924651465759145114579174356770593439987135755889403613364529029604049868233807295134382284730745937309910703657676103447124097631074153287120040247837143656624045055614076111832245239612708339272798262887437416818440064925049838443370805645609424314780108030016683461562597569371539974003402697903023830108053034645133078208043917492087248958344081026378788915528519967248989338
592027124423914083391771884524464968645052058218151010508471258285907685355807229880747677634789376
MotivationHow many couches are there really?
![Page 42: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/42.jpg)
Approach
64 F
ull
y C
on
nec
ted
20x20x20Voxel Input
20x20x20Voxel output
4096 FC4096 FC
256
384
384
256
96
96
256
384
2566
96
256
384
256
Voxels
Image
Voxels3D Conv
2D C
on
v
3D Conv
![Page 43: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/43.jpg)
Approach
64 F
ull
y C
on
nec
ted
20x20x20Voxel Input
20x20x20Voxel output
4096 FC4096 FC
256
384
384
256
96
96
256
384
2566
96
256
384
256
Turning off image branch yields an autoencoder over
voxels
![Page 44: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/44.jpg)
Approach
64 F
ull
y C
on
nec
ted
20x20x20Voxel Input
20x20x20Voxel output
4096 FC4096 FC
256
384
384
256
96
96
256
384
2566
96
256
384
256
Turning off voxel input
yields an image to
voxel predictor
![Page 45: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/45.jpg)
Approach
64 F
ull
y C
on
nec
ted
20x20x20Voxel Input
20x20x20Voxel output
4096 FC4096 FC
256
384
384
256
96
96
256
384
2566
96
256
384
256
Learned embedding parameterizes shape in a way that is:
(a) generative and predictable(b) accessible from voxels and images
![Page 46: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/46.jpg)
TL-Network
64 F
ull
y C
on
nec
ted
20x20x20Voxel Input
20x20x20Voxel output
4096 FC4096 FC
256
384
384
256
96
96
256
384
2566
96
256
384
256
Voxels Voxels
Image
![Page 47: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/47.jpg)
TL-Network
64 F
ull
y C
on
nec
ted
20x20x20Voxel Input
20x20x20Voxel output
4096 FC4096 FC
256
384
384
256
96
96
256
384
2566
96
256
384
256
Voxels
Image
![Page 48: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/48.jpg)
TL-Network
64 F
ull
y C
on
nec
ted
20x20x20Voxel Input
20x20x20Voxel output
4096 FC4096 FC
256
384
384
256
96
96
256
384
2566
96
256
384
256
Voxels
Image
![Page 49: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/49.jpg)
Training
64 F
ull
y C
on
nec
ted
20x20x20Voxel Input
20x20x20Voxel output
4096 FC4096 FC
256
384
384
256
96
96
256
384
2566
96
256
384
256
![Page 50: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/50.jpg)
Training – Stage 1
64 F
ull
y C
on
nec
ted
20x20x20Voxel Input
20x20x20Voxel output
4096 FC4096 FC
256
384
384
256
96
96
256
384
2566
96
256
384
256
Learned by cross-entropy loss
![Page 51: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/51.jpg)
Training – Stage 2
64 F
ull
y C
on
nec
ted
20x20x20Voxel Input
20x20x20Voxel output
4096 FC4096 FC
256
384
384
256
96
96
256
384
2566
96
256
384
256
Frozen
Learned by L2 Loss
![Page 52: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/52.jpg)
Training Data
• 5 Categories from ShapeNet (no category labels used in learning)
• Standard rendering techniques
Rendering techniques from Su et al. ICCV15
![Page 53: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/53.jpg)
Training – Stage 3
64 F
ull
y C
on
nec
ted
20x20x20Voxel Input
20x20x20Voxel output
4096 FC4096 FC
256
384
384
256
96
96
256
384
2566
96
256
384
256
Learned by cross-entropy loss
![Page 54: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/54.jpg)
Experiments
64 F
ull
y C
on
nec
ted
20x20x20Voxel Input
20x20x20Voxel output
4096 FC4096 FC
256
384
384
256
96
96
256
384
2566
96
256
384
256
Voxels
Image
Voxels
![Page 55: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/55.jpg)
Commentary – Experiments
• Main goals of any experiments: Empirically verify that we achieved what we said we would achieve.
• “In the computer field, the moment of truth is a running program; all else is prophecy” –Herbert Simon
![Page 56: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/56.jpg)
Experiments
64 F
ull
y C
on
nec
ted
20x20x20Voxel Input
20x20x20Voxel output
4096 FC4096 FC
256
384
384
256
96
96
256
384
2566
96
256
384
256
Voxels
Image
Voxels
Does it represent voxels well?
![Page 57: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/57.jpg)
Visualization
![Page 58: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/58.jpg)
Quantifying PerformanceGround-Truth
VoxelsPredicted
P(Occupied)
![Page 59: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/59.jpg)
Voxel RepresentationTest Shape PCA TL-Network
![Page 60: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/60.jpg)
Commentary – Baselines
• Is 95% good or bad?
• It depends! You might want to know: how well does something simple do? How well does a known method do?
• Typically comparison points: past methods, linear models, nearest neighbors.
• Considered embarrassing if someone later finds something simple that beats your complex method!
![Page 61: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/61.jpg)
Reconstruction Accuracy
PCA TL
96.8 97.6
Average precision
Qualitatively a pretty big gap, but quantitatively not so.
Because metric isn’t quite right.
![Page 62: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/62.jpg)
Voxel Representation
Voxels
64 F
ull
y C
on
nec
ted
20x20x20Voxel Input
96
256
384
256
Voxels f(voxels)
f(voxels)
f(voxels)
𝑥 ∈ 𝑅64
𝑥 ∈ 𝑅64
Does this feature contain useful information for distinguishing these categories?
![Page 63: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/63.jpg)
Commentary – Alternate Tasks
• Convnets can have hundreds of million of degrees of freedom
• Biggest fear of many researchers: are we actually learning the thing we set out to learn or something entirely different?
• This can have profound issues, especially if you deploy this
• One solution: test the features on an entirely different task
![Page 64: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/64.jpg)
Voxel Representation
• Classification of 3D shape categories (e.g., toilet) on ModelNet40
• TL was not trained for this task; support vector fit on features
PCA TL
68.4 74.4
No Class Info.
77.3
Class Info. Used
3D ShapeNets
Wu et al., 3D ShapeNets: A Deep Representation for Volumetric Shape Modeling, CVPR ‘15
![Page 65: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/65.jpg)
Experiments
64 F
ull
y C
on
nec
ted
20x20x20Voxel Input
20x20x20Voxel output
4096 FC4096 FC
256
384
384
256
96
96
256
384
2566
96
256
384
256
Voxels
Image
VoxelsCan you predict
voxels from images?
![Page 66: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/66.jpg)
Reconstructing Test Models
![Page 67: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/67.jpg)
Comments
• Train on synthetic images, test on synthetic images. Any issues?
• Is the network actually learning something or cheating by using cues left in by the renderer
• One solution: test on new, non-rendered data
![Page 68: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/68.jpg)
Reconstructing IKEA
Data from Lim et al., Parsing IKEA Objects: Fine Pose Estimation, ICCV 2013
![Page 69: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/69.jpg)
Baseline
64 F
ull
y C
on
nec
ted
20x20x20Voxel Input
20x20x20Voxel output
4096 FC4096 FC
256
384
384
256
96
96
256
384
2566
96
256
384
256
Voxels
Image
VoxelsCurrent Setup
![Page 70: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/70.jpg)
Baseline
64 F
ull
y C
on
nec
ted
20x20x20Voxel Input
20x20x20Voxel output
4096 FC4096 FC
256
384
384
256
96
96
256
384
2566
96
256
384
256
Voxels
Image
VoxelsGo directly for voxels
![Page 71: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/71.jpg)
Comments – Baselines
• Not quite the right baseline: just tests whether the 3D convolutional structure is necessary.
• But you don’t always get things right the first time around!
• Research is a process, and the real knowledge comes from multiple papers in a whole series, typically from different authors, not from just one paper
![Page 72: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/72.jpg)
Quantitatively
Conv4 FC8
Direct to Voxels
TL Networks
31.1 19.8 38.3IKEA
CAD 38.0 24.8 65.4
CAD IKEA
![Page 73: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/73.jpg)
Applying to Scenes
Output: VoxelsInput: RGB Image
1. Doesn’t separate objects
S. Tulsiani, S. Gupta, D.F. Fouhey, A.A. Efros, J. Malik. Factoring Shape, Pose, and Layout from the 2D Image of a 3D Scene. To appear at CVPR 2018.
![Page 74: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/74.jpg)
Applying to Scenes
Output: VoxelsInput: RGB Image
2. Conflates shape and pose
![Page 75: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/75.jpg)
Applying to Scenes
Output: FactoredInput: RGB Image
![Page 76: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/76.jpg)
Applying to Scenes
Output: Factored Part 1: Layout
![Page 77: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/77.jpg)
Applying to Scenes
Output: Factored Part 2: Per-Object
Voxels(323)
Translation
Scale Rotation
![Page 78: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/78.jpg)
Approach
Layout
Standard encoder/decoder with skip connections
Image
![Page 79: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/79.jpg)
Approach
Voxels(323)
Trans.
Scale
Rot.
R
Per-Object
Bounding Boxes
Image
![Page 80: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/80.jpg)
Approach
Voxels(323)
Trans.
Scale
Rot.
R
Bounding Boxes
Per-Object
![Page 81: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/81.jpg)
Quantitative Results
% Rot. < 30° % Trans. < 1m
RegressRotation
48.1 --
No Context
69.3 85.4
Base 75.2 90.7
![Page 82: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/82.jpg)
Results (SUNCG)Input Ground-Truth Prediction
SunCG: Song et al. CVPR 2017, rendered by Zhang et al. CVPR 2017
![Page 83: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/83.jpg)
Results (SUNCG)Input Ground-Truth Prediction
SunCG: Song et al. CVPR 2017, rendered by Zhang et al. CVPR 2017
![Page 84: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/84.jpg)
Results (SUNCG)Input Ground-Truth Prediction
SunCG: Song et al. CVPR 2017, rendered by Zhang et al. CVPR 2017
![Page 85: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/85.jpg)
Results (NYUv2)
NYUv2: Silberman et al. ECCV 2012
Input Prediction Other View
![Page 86: And now for something completely different!web.eecs.umich.edu/~fouhey/teaching/EECS442_W19/slides/learning… · And now for something completely different! Learning-Based 3D. Goal](https://reader034.fdocuments.in/reader034/viewer/2022043004/5f8776614787415ec449f7e8/html5/thumbnails/86.jpg)
Representational Benefits
Input RGB Image Output