Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

47
Bridging the gap between 2D and 3D with Deep Learning Evgeny Burnaev (PhD) <[email protected] > assoc. prof. Skoltech Alexandr Notchenko <[email protected] > PhD student

Transcript of Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Page 1: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Bridging the gap between 2D and 3D with Deep Learning

Evgeny Burnaev (PhD) <[email protected]>assoc. prof. Skoltech

Alexandr Notchenko <[email protected]>PhD student

Page 2: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

[1]

Page 3: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

ImageNet top-5 error over the years

- Deep learning based methods- Feature based methods- human performance

Page 4: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Supervised Deep Learning dataType

2D Image classification, detection segmentation

Pose Estimation

Supervision

class label , object detection box, segmentation contours

Structure of “skeleton” on image

Page 5: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

But world is in 3D

Page 6: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

3D deep learning is gaining popularity

Workshops:● Deep Learning for Robotic Vision Workshop

CVPR 2017● Geometry Meets Deep Learning ECCV 2016 ● 3D Deep Learning Workshop @ NIPS 2016● Large Scale 3D Data: Acquisition, Modelling

and Analysis CVPR 2016● 3D from a Single Image CVPR 2015

Google Scholar when searched for "3D" "Deep Learning" returns:

year # articles

2012 410

2013 627

2014 1210

2015 2570

2016 5440

Page 7: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Representation of 3D data for Deep Learning

Method Pros (+) Cons (-)

Many 2D projections sustain surface texture,There is a lot of 2D DL methods

Redundant representation,vulnerable to optic illusions

Voxels simple, can be sparse, has volumetric properties

losing surface properties

Point Cloud Can be sparse losing surface properties and volumetric properties

2.5D images Cheap measurement devices, senses depth

self occlusion of bodies in a scene, a lot of Noise in

measurements

Page 8: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

[6]

Page 9: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

[2]

Page 10: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

3D shape as dense Point Cloud

Page 11: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Learning Rich Features from RGB-D Images forObject Detection and Segmentation

[10]

Page 12: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Latest development in SLAM family of methods

Page 13: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

LSD-SLAM (Large-Scale Direct Monocular Simultaneous Localization and Mapping)

[5]LSD-SLAM - direct (feature-less) monocular SLAM

Page 14: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

ElasticFusion

ElasticFusion - DenseSLAM without a pose-graph [7]

Page 15: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Dynamic Fusion

The technique won the prestigious CVPR 2015 best paper award. [9]

Page 16: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Problems of SLAM algorithms● Don’t represent objects (only know surfaces)

● Mostly dense representation (requires a lot of data)

● Whole scene is one big surface, e.g. cannot separate different objects that

are close to each other.

Page 17: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

3D Shape Retrieval

Page 18: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

3D Design Phase•

There exists massive storages with 3D CAD models, e.g. GrabCADChairs Mechanical parts

Page 19: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

3D Design Phase•Designers spend about 60% of their time searching for the right information• Massive and complex CAD models are

usually disorderly archived in enterprises, which makes design reuse a difficult task

3D Model retrieval can significantly shorten the product lifecycles

Page 20: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

3D Shape-based Model Retrieval•3D models are complex = No clear search rules•The text-based search has its limitations: e.g. often 3D models are poorly annotated

• There is some commercial software for 3D CAD modeling, e.g.➢ Exalead OnePart by Dassault Systems,➢ Geolus Search by Siemens PLM, and others

• However, used methods ➢ are time-consuming,➢ are often based on hand-crafted descriptors, ➢ could be limited to a specific class of shapes, ➢ are not robust to scaling, rotations, etc.

Page 21: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Sparse 3D Convolutional Neural Networks for Large-Scale Shape Retrieval

Alexandr Notchenko, Ermek Kapushev, Evgeny Burnaev

Presented at 3D Deep Learning Workshop at NIPS 2016

Page 22: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Sparsity of voxel representation

30^3 Voxels is already enough to understand simple shape

But with texture information it would be even easier

Sparsity for all classes of ModelNet40 train dataset at voxel resolution 40 is only 5.5%

Page 23: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Shape Retrieval

Precomputedfeature vector of

dataset.(Vcar , Vperson ,...)

Vplane - feature vector of plane

Sparse3DCNN

Query

Retrieved items

Cosine distance

Page 24: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Triplet loss

The representation can be efficiently learned by minimizing triplet loss.

Triplet is a set (a, p, n), where

● a - anchor object● p - positive object that is similar to anchor object● n - negative object that is not similar to anchor object

, where is a margin parameter, and are distances between p and a and n and a.

Page 25: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Our approach

● Use very large resolutions, and sparse representations.

● Used triplet learning for 3D shapes.

● Used Large Scale Shape Datasets ModelNet and ShapeNet.

Page 26: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Represent voxel shape as vector

Page 27: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Obligatory t-SNE

Page 28: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Conclusions

● For small datasets of shape or 3D sparse tensors voxels can work.

● Voxels don’t scale for hundreds of “classes” and loose texture information.

● Cannot encode complicated object domains.

Page 29: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Problems for next 5 years

Page 30: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Autonomous Vehicles

Page 31: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning
Page 32: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning
Page 33: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Augmented (Mixed) Reality

Page 34: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning
Page 35: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning
Page 36: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Robotics in human environments

Page 37: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Robotic Control in Human Environments

Page 38: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Commodity sensors to create 2.5D images

Intel RealSense Series

Asus Xtion Pro

Microsoft Kinect v2

Structure Sensor

Page 39: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

What they have in common?

Page 40: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

What they have in common?

They require understanding the whole scene

Page 41: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Problem of “Holistic” Scene understanding

Page 42: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Lin D., Fidler S., Urtasun R. Holistic scene understanding for 3d object detection with rgbd cameras //Proceedings of the IEEE International Conference on Computer Vision. – 2013. – С. 1417-1424.

● Human environments often designed by humans● A most of the objects are created by humans● Context provides information by joint probability functions● Textures caused by materials and therefore can explain a functions and

structure of an object

Problem of “Holistic” Scene understanding

Page 43: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Connecting 3 families of CV algorithms is inevitable

Learnable Computer Vision Systems(Deep Learning)

Geometric Computer Vision (SLAMs)

Probabilistic Computer Vision

(Bayesian methods)

Page 44: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Connecting 3 families of CV algorithms is inevitable

Learnable Computer Vision Systems(Deep Learning)

Geometric Computer Vision (SLAMs)

Probabilistic Computer Vision

(Bayesian methods)

ProbabilisticInverse

Graphics

Page 45: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Probabilistic Inverse Graphics enables● Takes into account setting information (shop: shelves and products | street: buildings,

cars, pedestrians)

● Make maximum likelihood estimates from data and model (or give directions on how

to reduce uncertainty the best way)

● Learns structure of objects (Materials and textures / 3D shape / intrinsic dynamics)

Page 46: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Thank you.

Alexandr Notchenko Ermek Kapushev Evgeny Burnaev

Page 47: Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Learning

Citations and Links1. Deep Learning NIPS’2015 Tutorial by Geoff Hinton, Yoshua Bengio & Yann LeCun2. Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015). 3D ShapeNets: A Deep Representation for Volumetric Shapes.

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1912-1920).3. C. Nash, C. Williams Generative Models of Part-Structured 3D Objects4. Qin, Fei-wei, et al. "A deep learning approach to the classification of 3D CAD models." Journal of Zhejiang University SCIENCE C 15.2

(2014): 91-106.5. Engel, Jakob, Thomas Schöps, and Daniel Cremers. "LSD-SLAM: Large-scale direct monocular SLAM." European Conference on Computer

Vision. Springer International Publishing, 2014.6. Su, Hang, et al. "Multi-view convolutional neural networks for 3D shape recognition." Proceedings of the IEEE International Conference on

Computer Vision. 2015.7. Whelan, Thomas, et al. "ElasticFusion: Dense SLAM Without A Pose Graph." Robotics: science and systems. Vol. 11. 2015.8. Notchenko, Alexandr, Ermek Kapushev, and Evgeny Burnaev. "Sparse 3D Convolutional Neural Networks for Large-Scale Shape Retrieval."

arXiv preprint arXiv:1611.09159 (2016).9. Newcombe, Richard A., Dieter Fox, and Steven M. Seitz. "Dynamicfusion: Reconstruction and tracking of non-rigid scenes in

real-time." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.10. Gupta, Saurabh, et al. "Learning rich features from RGB-D images for object detection and segmentation." European Conference on

Computer Vision. Springer International Publishing, 2014.