3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and...
Transcript of 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and...
![Page 1: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/1.jpg)
3D Data for
Data-Driven
Scene Understanding
Thomas Funkhouser
Princeton University*
* On Sabbatical at Stanford and Google
![Page 2: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/2.jpg)
Disclaimer: I am talking about the work of these people …
Shuran Song
Andy Zeng Angela Dai Matthias Niessner
Manolis Savva Angel Chang
Maciej Halber
Yinda Zhang
![Page 3: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/3.jpg)
Scene Understanding
Help a computer with cameras to understand indoor environments
• Robotics
• Augmented reality
• Virtual tourism
• Surveillance
• Home remodeling
• Real estate
• Telepresence
• Forensics
• Games
• etc.
![Page 4: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/4.jpg)
Scene Understanding
Help a computer with cameras to understand indoor environments
Input RGB-D Image(s)
Semantic Segmentation
![Page 5: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/5.jpg)
Scene Understanding
Help a computer with cameras to understand indoor environments in 3D
3D Scene Understanding
Input RGB-D Image(s)
Semantic Segmentation
![Page 6: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/6.jpg)
3D Scene Understanding Research
3D scene understanding research problems:
• Surface reconstruction
• Object detection
• Semantic segmentation
• Scene classification
• Scene completion
• etc.Semantic Segmentation
chair
![Page 7: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/7.jpg)
3D Scene Understanding Research
3D scene understanding research problems:
• Surface reconstruction
• Object detection
• Semantic segmentation
• Scene classification
• Scene completion
• Materials, lights, etc.
• Physical properties
• Affordances
• Anomalies
• Changes
• Possibilities
• etc.
Semantic Segmentation
woodmovable
chair
light
![Page 8: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/8.jpg)
What is the main roadblock for
3D scene understanding research?
![Page 9: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/9.jpg)
What is the main roadblock for
3D scene understanding research?
Data!!!
![Page 10: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/10.jpg)
Outline of This Talk
“New” 3D datasets for indoor scene understanding research:
Synthetic RGB-D Image RGB-D Video
Room
Multiroom
Disclaimer: focus on datasets curated by my students and postdocs ☺
![Page 11: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/11.jpg)
Outline of This Talk
“New” 3D datasets for indoor scene understanding research:
Synthetic RGB-D Image RGB-D Video
Room SUN RGB-D
Multiroom
![Page 12: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/12.jpg)
SUN RGB-D
A RGB-D Scene Understanding Benchmark Suite
• 10,000 RGB-D images
• 147K 2D polygons
• 59K 3D bounding boxes
• Object categories
• Object orientations
• Room categories
• Room layouts
S. Song, S. Lichtenberg, J. Xiao, “SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite,” CVPR 2015
3D Scene Understanding
![Page 13: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/13.jpg)
Outline of This Talk
“New” 3D datasets for indoor scene understanding research:
Synthetic RGB-D Image RGB-D Video
Room SUN RGB-D
Multiroom
![Page 14: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/14.jpg)
Outline of This Talk
“New” 3D datasets for indoor scene understanding research:
Synthetic RGB-D Image RGB-D Video
Room SUN RGB-D
Multiroom SUN3D
![Page 15: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/15.jpg)
SUN3D
A place-centric 3D database
• 245 spaces
• 415 scans
• Multiple rooms
• Originally
camera poses
distributed for
only 8 spaces
J. Xiao, A. Owens, and A. Torralba, “SUN3D: A Database of Big Spaces Reconstructed using SfM and Object Labels,” ICCV 2013
![Page 16: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/16.jpg)
SUN3D
New: camera poses for all 245 spaces (algorithmically reconstructed)
M. Halber and T. Funkhouser, “Fine-to-Coarse Registration of RGB-D Scans,” CVPR 2017
![Page 17: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/17.jpg)
SUN3D
New: “ground truth” point correspondences and camera poses for 25 spaces
• 10,401 manually-specified
point correspondences
• Surface reconstructions
without visible errors
M. Halber and T. Funkhouser, “Fine-to-Coarse Registration of RGB-D Scans,” CVPR 2017
Point correspondence interface
![Page 18: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/18.jpg)
What Can Be Done with SUN3D?
![Page 19: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/19.jpg)
What Can Be Done with SUN3D?
1) Benchmark SLAM algorithms
M. Halber and T. Funkhouser, “Fine-to-Coarse Registration of RGB-D Scans,” CVPR 2017
FtC
RMSE of ground truth correspondences (in meters)
![Page 20: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/20.jpg)
What Can Be Done with SUN3D?
2) Learn 3D shape descriptors
A. Zeng, S. Song, M. Niessner, M. Fisher, J. Xiao, and T. Funkhouser, “3DMatch: Learning Local Geometric Descriptors from RGB-D Reconstructions,” CVPR 2017
![Page 21: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/21.jpg)
What Can Be Done with SUN3D?
2) Learn 3D shape descriptors – descriptors learned from scenes of SUN3D
and three other datasets outperform other descriptors
Match classification error at 95% recall
Fragment Alignment Success Rate
![Page 22: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/22.jpg)
What Can Be Done with SUN3D?
2) Learn 3D shape descriptors – descriptors learned from scenes of SUN3D
and three other datasets outperform other descriptors
Match classification error at 95% recall
Fragment Alignment Success RateUseful for detecting object poses of small objects
![Page 23: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/23.jpg)
What Can Be Done with SUN3D?
2) Learn 3D shape descriptors – descriptors learned from scenes of SUN3D
and three other datasets outperform other descriptors
Match classification error at 95% recall
Fragment Alignment Success RateUseful for detecting surface matches in CG models
![Page 24: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/24.jpg)
Outline of This Talk
“New” 3D datasets for indoor scene understanding research:
Synthetic RGB-D Image RGB-D Video
Room SUN RGB-D
Multiroom SUN3D
![Page 25: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/25.jpg)
Outline of This Talk
New 3D datasets for indoor scene understanding research:
Synthetic RGB-D Image RGB-D Video
Room SUN RGB-D ScanNet
Multiroom SUN3D
![Page 26: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/26.jpg)
ScanNet
3D reconstructions and annotations of rooms scanned with RGB-D video
A. Dai, A. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Niessner, “ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes,” CVPR 2017.
![Page 27: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/27.jpg)
ScanNet
3D reconstructions and annotations of rooms scanned with RGB-D video
• 1500 scans
A. Dai, A. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Niessner, “ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes,” CVPR 2017.
![Page 28: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/28.jpg)
ScanNet
3D reconstructions and annotations of rooms scanned with RGB-D video
• Raw RGB-D video
• Surface reconstructions
• Labeled objects
• CAD model placements
A. Dai, A. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Niessner, “ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes,” CVPR 2017.
Surface reconstructions
Labeled objects
CAD model placements
![Page 29: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/29.jpg)
ScanNet
3D reconstructions and annotations of rooms scanned with RGB-D video
• 1500 scans
• 700 rooms
• 2.5M frames
• 78K sq meters
A. Dai, A. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Niessner, “ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes,” CVPR 2017.
![Page 30: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/30.jpg)
ScanNet
3D reconstructions and annotations of rooms scanned with RGB-D video
• 36K object annotations
A. Dai, A. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Niessner, “ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes,” CVPR 2017.
ScanNetSceneNNPiGraphs
![Page 31: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/31.jpg)
What Can Be Done with ScanNet?
![Page 32: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/32.jpg)
What Can Be Done With ScanNet?
3D Semantic Voxel Labeling
• Task: predict the semantic category of every visible voxel
A. Dai, A. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Niessner, “ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes,” CVPR 2017.
![Page 33: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/33.jpg)
What Can Be Done With ScanNet?
3D Semantic Voxel Labeling
• Method: 3D ConvNet for Sliding Windows
A. Dai, A. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Niessner, “ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes,” CVPR 2017.
![Page 34: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/34.jpg)
What Can Be Done With ScanNet?
3D Semantic Voxel Labeling
• Results: 73% accuracy overall on 20 classes
A. Dai, A. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Niessner, “ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes,” CVPR 2017.
Voxel classification accuracy
on ScanNet test set
Dense pixel classification accuracy on NYUv2
Ground Truth:
Prediction:
![Page 35: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/35.jpg)
What Can Be Done With ScanNet?
3D Semantic Voxel Labeling
• Results: pretraining on ScanNet helps prediction for NYUv2
A. Dai, A. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Niessner, “ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes,” CVPR 2017.
Dense pixel classification accuracy on NYUv2
![Page 36: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/36.jpg)
Outline of This Talk
New 3D datasets for indoor scene understanding research:
Synthetic RGB-D Image RGB-D Video
Room SUNCG SUN RGB-D ScanNet
Multiroom SUNCG SUN3D
![Page 37: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/37.jpg)
SUNCG
Computer graphics models of houses
• 46K houses
• 50K floors
• 400K rooms
S. Song, F. Yu, A. Zeng, A.X. Chang, M. Savva, T. Funkhouser, “Semantic Scene Completion from a Single Image,” CVPR 2017.
![Page 38: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/38.jpg)
SUNCG
Computer graphics models of houses
• Furnished (5.6M object instances)
S. Song, F. Yu, A. Zeng, A.X. Chang, M. Savva, T. Funkhouser, “Semantic Scene Completion from a Single Image,” CVPR 2017.
![Page 39: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/39.jpg)
SUNCG
Computer graphics models of houses
• Furnished (5.6M object instances)
• Labeled room categories
Living room Bed room
Bathroom
Kitchen
S. Song, F. Yu, A. Zeng, A.X. Chang, M. Savva, T. Funkhouser, “Semantic Scene Completion from a Single Image,” CVPR 2017.
![Page 40: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/40.jpg)
SUNCG
Computer graphics models of houses
• Furnished (5.6M object instances)
• Labeled room categories
• Multiple floors
S. Song, F. Yu, A. Zeng, A.X. Chang, M. Savva, T. Funkhouser, “Semantic Scene Completion from a Single Image,” CVPR 2017.
![Page 41: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/41.jpg)
Texture
Material
Light Source
SUNCG
Computer graphics models of houses
• Furnished (5.6M object instances)
• Labeled room categories
• Multiple floors
• Materials
• Textures
• Lights
S. Song, F. Yu, A. Zeng, A.X. Chang, M. Savva, T. Funkhouser, “Semantic Scene Completion from a Single Image,” CVPR 2017.
![Page 42: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/42.jpg)
What Can Be Done with SUNCG?
![Page 43: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/43.jpg)
Input: Single view depth map Output: Semantic scene completion
What Can Be Done With SUNCG?
1) Semantic Scene Completion (label ALL voxels, not just visible ones)
S. Song, F. Yu, A. Zeng, A.X. Chang, M. Savva, and T. Funkhouser, “Semantic Scene Completion from a Single Image,” CVPR 2017.
![Page 44: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/44.jpg)
visible surface
free space
occluded space
outside view
outside room
3D Scene
What Can Be Done With SUNCG?
1) Semantic Scene Completion (label ALL voxels, not just visible ones)
table chair wall
![Page 45: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/45.jpg)
3D Scene
visible surface
free space
occluded space
outside view
outside room
Semantic Scene Completion
1) Semantic Scene Completion (label ALL voxels, not just visible ones)
![Page 46: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/46.jpg)
Multi-scale aggregation
Receptive field: 0.98 m Receptive field:1.62 m Receptive field: 2.26 m
Semantic Scene Completion : “SSCNet”
![Page 47: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/47.jpg)
Semantic Scene Completion : “SSCNet”
1) Semantic Scene Completion results
Comparison to previous algorithms for volumetric completion
Comparison to previous algorithms for 3D model fitting
SSCNetGround Truth
![Page 48: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/48.jpg)
What Else Can Be Done with SUN3D?
![Page 49: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/49.jpg)
What Can Be Done With SUNCG?
2) Learn from synthetic images
• Rendered 400K synthetic
images with Metropolis
Light Transport in Mitsuba
Y. Zhang, S. Song, E. Yumer, M. Savva, J. Lee, H. Jin, T. Funkhouser “Physically-Based Rendering for Indoor Scene Understanding Using CNNs,” CVPR 2017.
![Page 50: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/50.jpg)
What Can Be Done With SUNCG?
2) Learn from synthetic images
• Rendered 400K synthetic
images with Metropolis
Light Transport in Mitsuba
• All images annotated with
depths, normals, boundaries,
segmentations, labels, etc.Depth
Normal Segmentation
Color
Y. Zhang, S. Song, E. Yumer, M. Savva, J. Lee, H. Jin, T. Funkhouser “Physically-Based Rendering for Indoor Scene Understanding Using CNNs,” CVPR 2017.
![Page 51: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/51.jpg)
What Can Be Done With SUNCG?
2) Learn from synthetic images
• Rendered 400K synthetic
images with Metropolis
Light Transport in Mitsuba
• All images annotated with
depths, normals, boundaries,
segmentations, labels, etc.
• Experiments show that pre-training on
these images improves performance
on 3 scene understanding tasks
… and better rendering helps more
Boundary Estimation Accuracy
Normal Estimation Errors (degrees)
Semantic Segmentation Accuracy (%)
Y. Zhang, S. Song, E. Yumer, M. Savva, J. Lee, H. Jin, T. Funkhouser “Physically-Based Rendering for Indoor Scene Understanding Using CNNs,” CVPR 2017.
![Page 52: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/52.jpg)
Outline of This Talk
New 3D datasets for indoor scene understanding research:
Synthetic RGB-D Image RGB-D Video
Room SUNCG SUN RGB-D ScanNet
Multiroom SUNCG SUN3D
![Page 53: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/53.jpg)
Outline of This Talk
New 3D datasets for indoor scene understanding research:
Synthetic RGB-D Image RGB-D Video
Room SUNCG SUN RGB-D ScanNet
Multiroom SUNCG Matterport3D SUN3D
![Page 54: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/54.jpg)
Matterport3D
Annotated 3D reconstructions of large spaces scanned with RGB-D panoramas
Matterport Camera
RGB-D Panorama 3D Textured Mesh Reconstruction
A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matteport3D: Learning from RGB-D Data in Indoor Environments,” 3DV 2017
![Page 55: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/55.jpg)
Matterport3D
Annotated 3D reconstructions of large spaces scanned with RGB-D panoramas
• 90 Buildings
• 231 Floors
• 1K Rooms
• 11K Panoramas
• 194K Images
• 46K m2
A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matteport3D: Learning from RGB-D Data in Indoor Environments,” 3DV 2017
![Page 56: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/56.jpg)
Matterport3D
Annotated 3D reconstructions of large spaces scanned with RGB-D panoramas
• Entire buildings
• Mostly personal living spaces
• Comprehensive coverage
![Page 57: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/57.jpg)
Matterport3D
Annotated 3D reconstructions of large spaces scanned with RGB-D panoramas
• Calibrated panoramas
• Stationary cameras
• 1280x1024 images
• HDR color
SUN3D ScanNet Matterport3D
![Page 58: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/58.jpg)
Matterport3D
Annotated 3D reconstructions of large spaces scanned with RGB-D panoramas
• Calibrated panoramas
• Stationary cameras
• 1280x1024 images
• HDR color
SUN3D ScanNet Matterport3D
![Page 59: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/59.jpg)
Matterport3D
Annotated 3D reconstructions of large spaces scanned with RGB-D panoramas
• Calibrated panoramas
• Stationary cameras
• 1280x1024 images
• HDR color
SUN3D ScanNet Matterport3D
![Page 60: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/60.jpg)
Matterport3D
Annotated 3D reconstructions of large spaces scanned with RGB-D panoramas
• Evenly-spaced view sampling (panorama are ~2.25m apart)
• Precise global alignment
• Textured mesh reconstruction
![Page 61: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/61.jpg)
Matterport3D
Annotated 3D reconstructions of large spaces scanned with RGB-D panoramas
• 50K object segmentations and labels
Textured Mesh Segmentation Labels
Labels
![Page 62: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/62.jpg)
Matterport3D
Annotated 3D reconstructions of large spaces scanned with RGB-D panoramas
• 2K region segmentations and labels
![Page 63: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/63.jpg)
What Can Be Done with Matterport3D?
![Page 64: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/64.jpg)
What Can Be Done With Matterport3D?
View classification
• Given an arbitrary RGB image, predict what type of room contains the camera
A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matteport3D: Learning from RGB-D Data in Indoor Environments,” 3DV 2017
Living Room
![Page 65: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/65.jpg)
What Can Be Done With Matterport3D?
View classification
• Given an arbitrary RGB image, predict what type of room contains the camera
A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matteport3D: Learning from RGB-D Data in Indoor Environments,” 3DV 2017
?
![Page 66: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/66.jpg)
What Can Be Done With Matterport3D?
View classification
• Given an arbitrary RGB image, predict what type of room contains the camera
A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matteport3D: Learning from RGB-D Data in Indoor Environments,” 3DV 2017
Hallway
![Page 67: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/67.jpg)
What Can Be Done With Matterport3D?
View classification
• Given an arbitrary RGB image, predict what type of room contains the camera
A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matteport3D: Learning from RGB-D Data in Indoor Environments,” 3DV 2017
?
![Page 68: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/68.jpg)
What Can Be Done With Matterport3D?
View classification
• Given an arbitrary RGB image, predict what type of room contains the camera
A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matteport3D: Learning from RGB-D Data in Indoor Environments,” 3DV 2017
Bedroom
![Page 69: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/69.jpg)
What Can Be Done With Matterport3D?
View classification
• Given an arbitrary RGB image, predict what type of room contains the camera
A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matteport3D: Learning from RGB-D Data in Indoor Environments,” 3DV 2017
?
![Page 70: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/70.jpg)
What Can Be Done With Matterport3D?
View classification
• Can use the region annotations to classify views for training and testing
A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matteport3D: Learning from RGB-D Data in Indoor Environments,” 3DV 2017
![Page 71: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/71.jpg)
What Can Be Done With Matterport3D?
View classification
• Can use the region annotations to classify views for training and testing
A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matteport3D: Learning from RGB-D Data in Indoor Environments,” 3DV 2017
![Page 72: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/72.jpg)
What Can Be Done With Matterport3D?
View classification
• Results for ResNet-50
A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matteport3D: Learning from RGB-D Data in Indoor Environments,” 3DV 2017
Classification accuracies (%)
![Page 73: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/73.jpg)
What Can Be Done With Matterport3D?
View classification
• Results for ResNet-50
A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matteport3D: Learning from RGB-D Data in Indoor Environments,” 3DV 2017
Single image Panoramic image
Classification accuracies (%)
![Page 74: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/74.jpg)
Summary and Conclusion
3D datasets are just now becoming available – they provide new
opportunities for research in 3D scene understanding
Synthetic RGB-D Image RGB-D Video
Room SUNCG SUN RGB-D ScanNet
Multiroom SUNCG Matterport3D SUN3D
I think each of these datasets is the largest and most richly-annotated of its kind
![Page 75: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/75.jpg)
Future Work
More data:
• Internet-scale 3D scanning?
Richer annotations:
• Lighting, materials, physical properties, etc.
Multimedia data associations:
• Images, CAD models, floorplans, etc.
Real-time scene understanding tasks:
• Real-time scene parsing
• Robot navigation
![Page 76: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/76.jpg)
Acknowledgments
Students and postdocs:
• Angel Chang, Maciej Halber, Jerry Liu, Manolis Savva, Shuran Song,
Fisher Yu, Yinda Zhang, Andy Zeng
Collaborators:
• Angela Dai, Matthias Niessner, Ersin Yumer, Matt Fisher, Jianxiong Xiao,
Kyle Simek, Craig Reynolds, Matt Bell
Data:
• Matterport, Planner5D
Funding:
• NSF, Facebook, Intel, Google, Adobe, Pixar
Thank
You!
![Page 77: 3D Data for Data-Driven Scene Understandingfunk/VRWorkshop.pdf · 3D reconstructions and annotations of rooms scanned with RGB-D video •1500 scans •700 rooms •2.5M frames •78K](https://reader033.fdocuments.in/reader033/viewer/2022053000/5f0478517e708231d40e209a/html5/thumbnails/77.jpg)
Dataset Webpages
• SUN3D – http://sun3d.cs.princeton.edu
• SUN RGB-D – http://rgbd.cs.princeton.edu
• SUNCG – http://suncg.cs.princeton.edu
• ScanNet – http://www.scan-net.org
• Matterport3D – http://github.com/niessner/Matterport
• ShapeNet – http://shapenet.org
• ModelNet – http://modelnet.cs.princeton.edu
• LSUN – http://lsun.cs.princeton.edu