POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point...
Transcript of POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point...
1 / 57
Innfarn Yoo, 3/29/2018
POINT CLOUD DEEP LEARNING
2 / 572 / 57
AGENDA
• Introduction
• Previous Work
• Method
• Result
• Conclusion
3 / 57
INTRODUCTION
4 / 57
2D OBJECT CLASSIFICATION
• Convolutional Neural Network (CNN) for 2D images works really well
• AlexNet, ResNet, & GoogLeNet
• R-CNN Fast R-CNN Faster R-CNN Mask R-CNN
• Recent 2D image classification can even extract precise boundaries of objects (FCN Mask R-CNN)
Deep Learning for 2D Object Classification
[1] He et al., Mask R-CNN (2017)
5 / 57
3D OBJECT CLASSIFICATION
• 3D object classification approaches are getting more attentions
• Collecting 3D point data is easier and cheaper than before (LiDAR & other sensors)
• Size of data is bigger than 2D images
• Open datasets are increasing
• Recent researches approaches human level detection accuracy
• MVCNN, ShapeNet, PointNet, VoxNet, VoxelNet, & VRN Ensemble
Deep Learning for 3D Object Classification
[2] Zhou and Tuzel, VoxelNet (2017)
6 / 57
GOALS
• Evaluating & comparing different types of Neural Network models for 3D object classification
• Providing the generic framework to test multiple 3D neural network models
• Simple & easy to implement neural network models
• Fast preprocessing (remove bottleneck of loading, sampling, & jittering 3D data)
The goals of our method
7 / 57
PREVIOUS WORK
8 / 57
3D POINT-BASED APPROACHES
• PointNet
• First 3D point-based classification
• Unordered dataset
• Transform Multi-Layer Perceptron (MLP) Max Pool (MP) Classification
3D Points Neural Nets
[5] Qi et al., PointNet (2017)
9 / 57
PIXEL-BASED APPROACHES
• Multi-Layer Perceptron (MLP)
• Convolutional Neural Network (CNN)
• Multi-View Convolutional Neural Network (MVCNN)
3D 2D Projections Neural Nets
[3] Su et al., MVCNN (2015)
[4] Krizhevsky et al., AlexNet (2012)
10 / 57
VOXEL-BASED APPROACHES
• VoxNet
• VRN Ensemble
• VoxelNet
3D Points Voxels Neural Nets
[6] Maturana and Scherer, VoxNet (2015)
[7] Broke et al., VRN Ensemble (2016)
[2] Zhou and Tuzel, VoxelNet (2017)
11 / 57
METHOD
12 / 5712 / 57
PREPROCESSING
• Loading 3D polygonal objects
• Required Operations on 3D objects
• Sampling, Shuffling, Jittering, Scaling, & Rotating
• Projection, & Voxelization
• Python interface is not that good for multi-core processing (or multi-threading)
• # of objects is notoriously for single-core processing
Requirement
13 / 5713 / 57
FRAMEWORKBasic Pipeline
Trainer:C++ program
Load 3D ObjectsSample Points
Loading 3D Object
Sampler Threads
Thread N3D Point Sample
Thread 23D Point Sample
…Thread 33D Point Sample
Call Python NN Model Functions:Train, Test, Eval, Report, & Save
Epoch #i
Increase epoch
Thread 13D Point Sample
ConverterPixel, Point, Voxel
ConverterPixel, Point, Voxel
ConverterPixel, Point, Voxel
ConverterPixel, Point, Voxel
…
nn3d_trainer (C++)
14 / 57
SHAPENET CORE V2MODELNET40MODELNET10
3D DATASETS
Princeton ModelNet Data
http://modelnet.cs.princeton.edu/
10 Categories
4,930 Objects (2 GB)
OFF (CAD) File Format
ShapeNet
https://www.shapenet.org/
55 Categories
51,191 Objects (90 GB)
OBJ File Format
Princeton ModelNet Data
http://modelnet.cs.princeton.edu/
40 Categories
12,431 Objects (10 GB)
OFF (CAD) File Format
15 / 5715 / 57
NEURAL NETWORK MODELS
Point-Based Models
Pixel-Based Models
Voxel-Based Models
16 / 5716 / 57
POINT-BASED NEURAL NETWORK MODELS
• Preprocessing:
• Rotate randomly
• Scale randomly
• Uniform sampling on 3D object surfaces
• Sample 2048 points
• Shuffle points
Types of Models
• Tested Models
• Multi-Layer Perceptron (MLP)
• Multi Rotational MLPs
• Single Orientation CNN
• Multi Rotational CNNs
• Multi Rotational Resample & Max Pool Layers
• ResNet-like
17 / 5717 / 57
18 / 5718 / 57
POINT-BASED NEURAL NETWORK MODELS
Flatten Vector
Fully Connected Layer
… Class Onehot Vector
MLP
3D points …
Softmax Cross Entropy
ReLU + Dropout
19 / 5719 / 57
POINT-BASED NEURAL NETWORK MODELS
3D points …
Softmax Cross Entropy
Random 3x3 Rotation
3D Conv Layer
Max Pooling Layer
Flatten Vector
Fully Connected Layer
… Class Onehot Vector
Multi Rotational MLPs
ReLU + Dropout
20 / 5720 / 57
POINT-BASED NEURAL NETWORK MODELS
Random 3x3 Rotation
3D Conv Layer
Max Pooling Layer
Flatten Vector
Fully Connected Layer
… Class Onehot Vector
Single Orientation CNN
3D points …
Softmax Cross EntropyReLU + Dropout ReLU + Dropout
21 / 5721 / 57
POINT-BASED NEURAL NETWORK MODELS
3D points…
Softmax Cross Entropy
Random 3x3 Rotation
3D Conv Layer
Max Pooling Layer
Flatten Vector
Fully Connected Layer
… Class Onehot Vector
Multi Rotational CNNs
ReLU + Dropout ReLU + Dropout
22 / 5722 / 57
POINT-BASED NEURAL NETWORK MODELS
3D points…
Softmax Cross Entropy
Random 3x3 Rotation
Resample Layer
Max Pooling Layer
Flatten Vector
Fully Connected Layer
… Class Onehot Vector
Multi Rotational Resample & Max Pool Layers
ReLU + Dropout ReLU + Dropout
23 / 5723 / 57
POINT-BASED NEURAL NETWORK MODELS
…
Softmax Cross Entropy
Random 3x3 Rotation
Resample Layer
Max Pooling Layer
Flatten Vector
Fully Connected Layer
… Class Onehot Vector
ResNet-like
3D points
ReLU + Dropout ReLU + Dropout ReLU + Dropout
24 / 5724 / 57
PIXEL-BASED NEURAL NETWORK MODELS
• Preprocessing:
• Sample 8192 points
• Same as point-based models
• Depth-only orthogonal projection
• 32x32 or 64x64
• Generating multiple rotations
• 64x64x5 & 64x64x10
Types of Models
• Tested Models:
• MLP
• Depth-Only Orthogonal MVCNN
25 / 5725 / 57
26 / 5726 / 57
PIXEL-BASED NEURAL NETWORK MODELS
Softmax Cross Entropy
MLP
Flatten Vector
Fully Connected Layer
… Class Onehot Vector
Images
(32x32x5)…
27 / 5727 / 57
PIXEL-BASED NEURAL NETWORK MODELS
Images
(32x32x5)…
Softmax Cross Entropy
Depth-Only Orthogonal MVCNN
Image Separation
3D Conv Layer
Max Pooling Layer
Flatten Vector
Fully Connected Layer
… Class Onehot Vector
Concat
28 / 5728 / 57
VOXEL-BASED NEURAL NETWORK MODELS
• Preprocessing:
• Sample 8192 points
• Same as point based models
• Voxelization
• 3D points Voxels
• Each voxel has intensity 0.0 ~ 1.0
• how many points hit same voxel
• 32x32x32 & 64x64x64
Types of Models
• Tested Models:
• MLP
• CNN
• ResNet-like
29 / 5729 / 57
30 / 5730 / 57
VOXEL-BASED NEURAL NETWORK MODELSMLP
Flatten Vector
Fully Connected Layer
… Class Onehot Vector
Softmax Cross EntropyImages
(32x32x5)…
31 / 5731 / 57
VOXEL-BASED NEURAL NETWORK MODELSCNN
Max Pooling Layer
Flatten Vector
Fully Connected Layer
… Class Onehot Vector
3D Conv Layer
…
Softmax Cross EntropyVoxels
32x32x32
32 / 5732 / 57
VOXEL-BASED NEURAL NETWORK MODELSResNet-like
…
Softmax Cross Entropy
Voxels
32x32x32
Avg Pooling Layer
Resample Layer
Max Pooling Layer
Flatten Vector
Fully Connected Layer
… Class Onehot Vector
3D Conv Layer
Concat
33 / 5733 / 57
IMPLEMENTATION
• System: Ubuntu 16.04, RAM 32 GB & 64 GB, & SSD 512 GB
• NVIDIA Quadro P6000, Quadro M6000, & GeForce Titan X
• GCC 5.2.0 for C++ 11x
• Python 3.5
• TensorFlow-GPU v1.5.0
• NumPy 1.0
System Setup
34 / 5734 / 57
HYPER PARAMETERS
• Object Perturbation
• Random Rotations: -25 ~ 25 degree
• Random Scaling: 0.7 ~ 1.0
• Learning Rate: 0.0001
• Keep Probability (Dropout layer): 0.7
• Max Epochs: 1000
• Batch Size: 32
• Number of Random Rotations: 20
• Voxel Dim: 32x32x32
• MVCNN Number of Views: 5
35 / 57
RESULT
36 / 57
MODELNET10# OF TEST & TRAIN OBJECTS
0
100
200
300
400
500
600
700
800
900
1000
table toilet monitor bathtub sofa chair desk dresser night_stand bed
# of Test Models # of Train Models
37 / 570
10
20
30
40
50
60
70
80
90
100
PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet
Train Accu Test Accu mAP
MODELNET10 ACCURACYIter: 1000%
38 / 57
MODELNET40# OF TEST & TRAIN OBJECTS
0
100
200
300
400
500
600
700
800
900
# of Test Models # of Train Models
39 / 570
10
20
30
40
50
60
70
80
90
100
PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet
Train Accu Test Accu mAP
MODELNET40 ACCURACYIter: 1000%
40 / 57
MODELNET40 ACCURACY7 CATEGORIES (# OF TRAIN OBJECTS > 400)
0
100
200
300
400
500
600
700
800
900
# of Test Models # of Train Models
41 / 570
10
20
30
40
50
60
70
80
90
100
PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet
Train Accu Test Accu mAP
MODELNET40, 7 CATEGORIES ACCURACYIter: 1000%
42 / 57
MODELNET40 ACCURACY10 CATEGORIES (# OF TRAIN OBJECTS > 300)
0
100
200
300
400
500
600
700
800
900
# of Test Models # of Train Models
43 / 57
MODELNET40, 10 CATEGORIES ACCURACY
0
10
20
30
40
50
60
70
80
90
100
PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet
Train Accu Test Accu mAP
44 / 57
MODELNET40 ACCURACY17 CATEGORIES (# OF TRAIN OBJECTS > 200)
0
100
200
300
400
500
600
700
800
900
# of Test Models # of Train Models
45 / 57
MODELNET40,17 CATEGORIES ACCURACY
0
10
20
30
40
50
60
70
80
90
100
PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet
Train Accu Test Accu mAPIter: 1000%
46 / 57
MODELNET10 PERFORMANCE
0.00
0.50
1.00
1.50
2.00
2.50
3.00
PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PCResNet
PX MLP PXMVCNN
VX MLP VX CNN VXResNet
hours
Total Training Time
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PCResNet
PX MLP PXMVCNN
VX MLP VX CNN VXResNet
Inference Time Per Batch
seconds
47 / 57
MODELNET40 PERFORMANCE
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PCResNet
PX MLP PXMVCNN
VX MLP VX CNN VXResNet
Total Training Time
hours
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PCResNet
PX MLP PXMVCNN
VX MLP VX CNN VXResNet
Inference Time Per Batch
seconds
48 / 5748 / 57
SHAPENET CORE V2 ACCURACY
• ShapeNet has pretty big dataset
• 90 GB of dataset
• Only tested 3 NN models
• Point MP
• Depth-Only MVCNN
• Voxel CNN
Iter: 600
0
10
20
30
40
50
60
70
80
90
100
PC MP PX MVCNN VX CNN
Train Accu Test Accu mAP%
49 / 5749 / 57
0
0.2
0.4
0.6
0.8
1
50 4050 7850 11600 15410
0
0.2
0.4
0.6
0.8
1
50 4050 7850 11600 15410
TRAIN ACCURACY GRAPH
0
0.2
0.4
0.6
0.8
1
50 4050 7850 11600 15410
0
0.2
0.4
0.6
0.8
1
50 4050 7850 11600 15410
0
0.2
0.4
0.6
0.8
1
50 4050 7850 11600 15410
0
0.2
0.4
0.6
0.8
1
50 4050 7850 11600 15410
0
0.2
0.4
0.6
0.8
1
50 4050 7850 11600 15410
0
0.2
0.4
0.6
0.8
1
50 4050 7850 11600 15410
0
0.2
0.4
0.6
0.8
1
50 4050 7850 11600 15410
0
0.2
0.4
0.6
0.8
1
50 4050 7850 11600 15410
0
0.2
0.4
0.6
0.8
1
50 4050 7850 11600 15410
PC MLP1 PC MLPs PC CNN1 PC CNNs
PC MP PC ResNet PX MLP PX MVCNN
VX MLP VX CNN VX ResNet
50 / 5750 / 57
0
0.2
0.4
0.6
0.8
1
0 1000 2000 3000 4000
0
0.2
0.4
0.6
0.8
1
0 1000 2000 3000 4000
0
0.2
0.4
0.6
0.8
1
0 1000 2000 3000 4000
0
0.2
0.4
0.6
0.8
1
0 1000 2000 3000 4000
0
0.2
0.4
0.6
0.8
1
0 1000 2000 3000 4000
0
0.2
0.4
0.6
0.8
1
0 1000 2000 3000 4000
0
0.2
0.4
0.6
0.8
1
0 1000 2000 3000 4000
0
0.2
0.4
0.6
0.8
1
0 1000 2000 3000 4000
0
0.2
0.4
0.6
0.8
1
0 1000 2000 3000 4000
0
0.2
0.4
0.6
0.8
1
0 1000 2000 3000 4000
0
0.2
0.4
0.6
0.8
1
0 1000 2000 3000 4000
TEST ACCURACY GRAPH
PC MLP1 PC MLPs PC CNN1 PC CNNs
PC MP PC ResNet PX MLP PX MVCNN
VX MLP VX CNN VX ResNet
51 / 5751 / 570
0.5
1
1.5
2
2.5
50 4050 7850 11600 15410
0
0.5
1
1.5
2
2.5
50 4050 7850 11600 15410
0
0.5
1
1.5
2
2.5
50 4050 7850 11600 15410
0
0.5
1
1.5
2
2.5
50 4050 7850 11600 15410
0
0.5
1
1.5
2
2.5
50 4050 7850 11600 15410
0
0.5
1
1.5
2
2.5
50 4050 7850 11600 15410
0
0.5
1
1.5
2
2.5
50 4050 7850 11600 15410
0
0.5
1
1.5
2
2.5
50 4050 7850 11600 15410
0
0.5
1
1.5
2
2.5
50 4050 7850 11600 15410
0
0.5
1
1.5
2
2.5
50 4050 7850 11600 15410
0
0.5
1
1.5
2
2.5
50 4050 7850 11600 15410
CROSS ENTROPY CONVERGENCE GRAPHPC MLP1 PC MLPs PC CNN1 PC CNNs
PC MP PC ResNet PX MLP PX MVCNN
VX MLP VX CNN VX ResNet
52 / 57
CONCLUSION
53 / 57
CONCLUSION
• Voxel ResNet and Voxel CNN provide best result on 3D object classification
• Unordered vs. Ordered: Unordered data (point cloud) could be learned, but lower accuracy and higher computation cost than ordered data
• Projection vs. Voxelization: Voxelization provides better result with similar computation cost (converge faster, & more precise)
• Strict comparisons with previous methods are not done yet
• Sampling and jittering methods are different (cannot directly compare yet)
Models
54 / 57
CONCLUSION
• ModelNet10 & ModelNet40
• Some categories have too small training and testing objects
• Lowering classification accuracy
• ModelNet40, 7 categories (# of training objects > 400)
• Achieved 98% accuracy
• Partial dataset from ModelNet40 (Categories that have more than 400 3D objects)
• # of train 3D objects in a category matters
• ShapeNetCore.v2
• # of 3D objects are big enough, but many object but many duplicated objects
Dataset Evaluation
55 / 57
FUTURE WORK
• Running entire procedures in CUDA
• Using NVIDIA GVDB Will have advantages of Sparse Voxels
• Strict comparisons with previous methods
• PointNet, VRN Ensemble, etc
• Extending methods to captured dataset (e.g. KITTI)
• Train set is too small
• Need to investigate whether Generative Adversarial Networks (GAN) can alleviate this problem or not
56 / 57
REFERENCE
• [1] He et al., 2017, Mask R-CNN
• [2] Zhou and Tuzel, 2017, VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
• [3] Su et al., 2015, Multi-view convolutional neural networks for 3d shape recognition
• [4] Krizhevsky et al., 2012, ImageNet Classification with Deep Convolutional Neural Networks
• [5] Qi et al., 2017, Pointnet: Deep learning on point sets for 3d classification and segmentation
• [6] Maturana and Scherer, 2015, VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition
• [7] Broke et al., 2016, Generative and Discriminative Voxel Modeling with Convolutional Neural Networks
• [8] He et al., 2016, Deep residual learning for image recognition
• [9] Goodfellow et al., 2014, Generative adversarial nets
• [10] Girshick, 2015, Fast R-CNN
• [11] Ren et al., 2015, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
• [12] Chang et al., 2015, ShapeNet: An Information-Rich 3D Model Repository
• [13] Wu et al., 2015, 3D ShapeNets: A Deep Representation for Volumetric Shapes
• [14] Wu et al., 2014, 3D ShapeNets for 2.5D Object Recognition and Next-Best-View Prediction
57 / 57