POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point...

1 / 57

Innfarn Yoo, 3/29/2018

POINT CLOUD DEEP LEARNING

2 / 572 / 57

AGENDA

• Introduction

• Previous Work

• Method

• Result

• Conclusion

3 / 57

INTRODUCTION

4 / 57

2D OBJECT CLASSIFICATION

• Convolutional Neural Network (CNN) for 2D images works really well

• AlexNet, ResNet, & GoogLeNet

• R-CNN Fast R-CNN Faster R-CNN Mask R-CNN

• Recent 2D image classification can even extract precise boundaries of objects (FCN Mask R-CNN)

Deep Learning for 2D Object Classification

[1] He et al., Mask R-CNN (2017)

5 / 57

3D OBJECT CLASSIFICATION

• 3D object classification approaches are getting more attentions

• Collecting 3D point data is easier and cheaper than before (LiDAR & other sensors)

• Size of data is bigger than 2D images

• Open datasets are increasing

• Recent researches approaches human level detection accuracy

• MVCNN, ShapeNet, PointNet, VoxNet, VoxelNet, & VRN Ensemble

Deep Learning for 3D Object Classification

[2] Zhou and Tuzel, VoxelNet (2017)

6 / 57

GOALS

• Evaluating & comparing different types of Neural Network models for 3D object classification

• Providing the generic framework to test multiple 3D neural network models

• Simple & easy to implement neural network models

• Fast preprocessing (remove bottleneck of loading, sampling, & jittering 3D data)

The goals of our method

7 / 57

PREVIOUS WORK

8 / 57

3D POINT-BASED APPROACHES

• PointNet

• First 3D point-based classification

• Unordered dataset

• Transform Multi-Layer Perceptron (MLP) Max Pool (MP) Classification

3D Points Neural Nets

[5] Qi et al., PointNet (2017)

9 / 57

PIXEL-BASED APPROACHES

• Multi-Layer Perceptron (MLP)

• Convolutional Neural Network (CNN)

• Multi-View Convolutional Neural Network (MVCNN)

3D 2D Projections Neural Nets

[3] Su et al., MVCNN (2015)

[4] Krizhevsky et al., AlexNet (2012)

10 / 57

VOXEL-BASED APPROACHES

• VoxNet

• VRN Ensemble

• VoxelNet

3D Points Voxels Neural Nets

[6] Maturana and Scherer, VoxNet (2015)

[7] Broke et al., VRN Ensemble (2016)

[2] Zhou and Tuzel, VoxelNet (2017)

11 / 57

METHOD

12 / 5712 / 57

PREPROCESSING

• Loading 3D polygonal objects

• Required Operations on 3D objects

• Sampling, Shuffling, Jittering, Scaling, & Rotating

• Projection, & Voxelization

• Python interface is not that good for multi-core processing (or multi-threading)

• # of objects is notoriously for single-core processing

Requirement

13 / 5713 / 57

FRAMEWORKBasic Pipeline

Trainer:C++ program

Load 3D ObjectsSample Points

Loading 3D Object

Sampler Threads

Thread N3D Point Sample

Thread 23D Point Sample

…Thread 33D Point Sample

Call Python NN Model Functions:Train, Test, Eval, Report, & Save

Epoch #i

Increase epoch

Thread 13D Point Sample

ConverterPixel, Point, Voxel




…

nn3d_trainer (C++)

14 / 57

SHAPENET CORE V2MODELNET40MODELNET10

3D DATASETS

Princeton ModelNet Data

http://modelnet.cs.princeton.edu/

10 Categories

4,930 Objects (2 GB)

OFF (CAD) File Format

ShapeNet

https://www.shapenet.org/

55 Categories


OBJ File Format

Princeton ModelNet Data


40 Categories


OFF (CAD) File Format


https://www.shapenet.org/


15 / 5715 / 57

NEURAL NETWORK MODELS

Point-Based Models

Pixel-Based Models

Voxel-Based Models

16 / 5716 / 57

POINT-BASED NEURAL NETWORK MODELS

• Preprocessing:

• Rotate randomly

• Scale randomly

• Uniform sampling on 3D object surfaces

• Sample 2048 points

• Shuffle points

Types of Models

• Tested Models

• Multi-Layer Perceptron (MLP)

• Multi Rotational MLPs

• Single Orientation CNN

• Multi Rotational CNNs

• Multi Rotational Resample & Max Pool Layers

• ResNet-like

17 / 5717 / 57

18 / 5718 / 57


Flatten Vector

Fully Connected Layer

… Class Onehot Vector

MLP

3D points …

Softmax Cross Entropy

ReLU + Dropout

19 / 5719 / 57


3D points …


Random 3x3 Rotation

3D Conv Layer

Max Pooling Layer

Flatten Vector



Multi Rotational MLPs

ReLU + Dropout

20 / 5720 / 57


Random 3x3 Rotation

3D Conv Layer

Max Pooling Layer

Flatten Vector



Single Orientation CNN

3D points …

Softmax Cross EntropyReLU + Dropout ReLU + Dropout

21 / 5721 / 57


3D points…


Random 3x3 Rotation

3D Conv Layer

Max Pooling Layer

Flatten Vector



Multi Rotational CNNs

ReLU + Dropout ReLU + Dropout

22 / 5722 / 57


3D points…


Random 3x3 Rotation

Resample Layer

Max Pooling Layer

Flatten Vector



Multi Rotational Resample & Max Pool Layers

ReLU + Dropout ReLU + Dropout

23 / 5723 / 57


…


Random 3x3 Rotation

Resample Layer

Max Pooling Layer

Flatten Vector



ResNet-like

3D points

ReLU + Dropout ReLU + Dropout ReLU + Dropout

24 / 5724 / 57

PIXEL-BASED NEURAL NETWORK MODELS

• Preprocessing:


• Same as point-based models

• Depth-only orthogonal projection

• 32x32 or 64x64

• Generating multiple rotations

• 64x64x5 & 64x64x10

Types of Models

• Tested Models:

• MLP

• Depth-Only Orthogonal MVCNN

25 / 5725 / 57

26 / 5726 / 57



MLP

Flatten Vector



Images

(32x32x5)…

27 / 5727 / 57


Images

(32x32x5)…


Depth-Only Orthogonal MVCNN

Image Separation

3D Conv Layer

Max Pooling Layer

Flatten Vector



Concat

28 / 5728 / 57

VOXEL-BASED NEURAL NETWORK MODELS

• Preprocessing:


• Same as point based models

• Voxelization

• 3D points Voxels

• Each voxel has intensity 0.0 ~ 1.0

• how many points hit same voxel

• 32x32x32 & 64x64x64

Types of Models

• Tested Models:

• MLP

• CNN

• ResNet-like

29 / 5729 / 57

30 / 5730 / 57

VOXEL-BASED NEURAL NETWORK MODELSMLP

Flatten Vector



Softmax Cross EntropyImages

(32x32x5)…

31 / 5731 / 57

VOXEL-BASED NEURAL NETWORK MODELSCNN

Max Pooling Layer

Flatten Vector



3D Conv Layer

…

Softmax Cross EntropyVoxels

32x32x32

32 / 5732 / 57

VOXEL-BASED NEURAL NETWORK MODELSResNet-like

…


Voxels

32x32x32

Avg Pooling Layer

Resample Layer

Max Pooling Layer

Flatten Vector



3D Conv Layer

Concat

33 / 5733 / 57

IMPLEMENTATION

• System: Ubuntu 16.04, RAM 32 GB & 64 GB, & SSD 512 GB

• NVIDIA Quadro P6000, Quadro M6000, & GeForce Titan X

• GCC 5.2.0 for C++ 11x

• Python 3.5

• TensorFlow-GPU v1.5.0

• NumPy 1.0

System Setup

34 / 5734 / 57

HYPER PARAMETERS

• Object Perturbation

• Random Rotations: -25 ~ 25 degree

• Random Scaling: 0.7 ~ 1.0

• Learning Rate: 0.0001

• Keep Probability (Dropout layer): 0.7

• Max Epochs: 1000

• Batch Size: 32

• Number of Random Rotations: 20

• Voxel Dim: 32x32x32

• MVCNN Number of Views: 5

35 / 57

RESULT

36 / 57

MODELNET10# OF TEST & TRAIN OBJECTS

0

100

200

300

400

500

600

700

800

900

1000

table toilet monitor bathtub sofa chair desk dresser night_stand bed

# of Test Models # of Train Models

37 / 570

10

20

30

40

50

60

70

80

90

100

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PC ResNet PX MLP PX MVCNN VX MLP VX CNN VX ResNet

Train Accu Test Accu mAP

MODELNET10 ACCURACYIter: 1000%

38 / 57

MODELNET40# OF TEST & TRAIN OBJECTS

0

100

200

300

400

500

600

700

800

900


39 / 570

10

20

30

40

50

60

70

80

90

100



MODELNET40 ACCURACYIter: 1000%

40 / 57

MODELNET40 ACCURACY7 CATEGORIES (# OF TRAIN OBJECTS > 400)

0

100

200

300

400

500

600

700

800

900


41 / 570

10

20

30

40

50

60

70

80

90

100



MODELNET40, 7 CATEGORIES ACCURACYIter: 1000%

42 / 57


0

100

200

300

400

500

600

700

800

900


43 / 57

MODELNET40, 10 CATEGORIES ACCURACY

0

10

20

30

40

50

60

70

80

90

100



44 / 57


0

100

200

300

400

500

600

700

800

900


45 / 57

MODELNET40,17 CATEGORIES ACCURACY

0

10

20

30

40

50

60

70

80

90

100


Train Accu Test Accu mAPIter: 1000%

46 / 57

MODELNET10 PERFORMANCE

0.00

0.50

1.00

1.50

2.00

2.50

3.00

PC MLP1 PC CNN1 PC MLPs PC CNNs PC MP PCResNet

PX MLP PXMVCNN

VX MLP VX CNN VXResNet

hours

Total Training Time

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07


PX MLP PXMVCNN


Inference Time Per Batch

seconds

47 / 57

MODELNET40 PERFORMANCE

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

8.00


PX MLP PXMVCNN


Total Training Time

hours

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07


PX MLP PXMVCNN


Inference Time Per Batch

seconds

48 / 5748 / 57

SHAPENET CORE V2 ACCURACY

• ShapeNet has pretty big dataset

• 90 GB of dataset

• Only tested 3 NN models

• Point MP

• Depth-Only MVCNN

• Voxel CNN

Iter: 600

0

10

20

30

40

50

60

70

80

90

100

PC MP PX MVCNN VX CNN

Train Accu Test Accu mAP%

49 / 5749 / 57

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

TRAIN ACCURACY GRAPH

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

0

0.2

0.4

0.6

0.8

1

50 4050 7850 11600 15410

PC MLP1 PC MLPs PC CNN1 PC CNNs

PC MP PC ResNet PX MLP PX MVCNN

VX MLP VX CNN VX ResNet

50 / 5750 / 57

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

TEST ACCURACY GRAPH

PC MLP1 PC MLPs PC CNN1 PC CNNs



51 / 5751 / 570

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

0

0.5

1

1.5

2

2.5

50 4050 7850 11600 15410

CROSS ENTROPY CONVERGENCE GRAPHPC MLP1 PC MLPs PC CNN1 PC CNNs



52 / 57

CONCLUSION

53 / 57

CONCLUSION

• Voxel ResNet and Voxel CNN provide best result on 3D object classification

• Unordered vs. Ordered: Unordered data (point cloud) could be learned, but lower accuracy and higher computation cost than ordered data

• Projection vs. Voxelization: Voxelization provides better result with similar computation cost (converge faster, & more precise)

• Strict comparisons with previous methods are not done yet

• Sampling and jittering methods are different (cannot directly compare yet)

Models

54 / 57

CONCLUSION

• ModelNet10 & ModelNet40

• Some categories have too small training and testing objects

• Lowering classification accuracy

• ModelNet40, 7 categories (# of training objects > 400)

• Achieved 98% accuracy

• Partial dataset from ModelNet40 (Categories that have more than 400 3D objects)

• # of train 3D objects in a category matters

• ShapeNetCore.v2

• # of 3D objects are big enough, but many object but many duplicated objects

Dataset Evaluation

55 / 57

FUTURE WORK

• Running entire procedures in CUDA

• Using NVIDIA GVDB Will have advantages of Sparse Voxels

• Strict comparisons with previous methods

• PointNet, VRN Ensemble, etc

• Extending methods to captured dataset (e.g. KITTI)

• Train set is too small

• Need to investigate whether Generative Adversarial Networks (GAN) can alleviate this problem or not

56 / 57

REFERENCE

• [1] He et al., 2017, Mask R-CNN

• [2] Zhou and Tuzel, 2017, VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

• [3] Su et al., 2015, Multi-view convolutional neural networks for 3d shape recognition

• [4] Krizhevsky et al., 2012, ImageNet Classification with Deep Convolutional Neural Networks

• [5] Qi et al., 2017, Pointnet: Deep learning on point sets for 3d classification and segmentation

• [6] Maturana and Scherer, 2015, VoxNet: A 3D Convolutional Neural Network for Real-Time Object Recognition

• [7] Broke et al., 2016, Generative and Discriminative Voxel Modeling with Convolutional Neural Networks

• [8] He et al., 2016, Deep residual learning for image recognition

• [9] Goodfellow et al., 2014, Generative adversarial nets

• [10] Girshick, 2015, Fast R-CNN

• [11] Ren et al., 2015, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

• [12] Chang et al., 2015, ShapeNet: An Information-Rich 3D Model Repository

• [13] Wu et al., 2015, 3D ShapeNets: A Deep Representation for Volumetric Shapes

• [14] Wu et al., 2014, 3D ShapeNets for 2.5D Object Recognition and Next-Best-View Prediction

57 / 57

POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point...

Documents

Transcript of POINT CLOUD DEEP LEARNING - NVIDIA...• Preprocessing: • Sample 8192 points • Same as point...