PointNet: Deep Learning on Point Sets for 3D Classification and...
Transcript of PointNet: Deep Learning on Point Sets for 3D Classification and...
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation Charles R. Qi*, Hao Su*, Kaichun Mo, Leonidas J. Guibas
Motivation & Background
Input
Output
mug?
table?
car?
Classification Part Segmentation
PointNet
Semantic Segmentation
Input Point Cloud (point set representation)
Partial Inputs Complete Inputs
airplane
car
chair
lamp
guitarmotorbike
mugtable
bag
rocket
earphone
laptop
cap
knife
pistol
skateboard
Orig
inal
Sha
peC
ritic
al P
oint
Set
sU
pper
-bou
nd S
hape
s
inpu
t poi
nts
point features
outp
ut sc
ores
maxpool
shared shared
shared
nx3
nx3
nx64
nx64 nx1024
1024
n x 1088
nx12
8
mlp (64,64) mlp (64,128,1024)inputtransform
featuretransform
mlp(512,256,k)
global feature
mlp (512,256)
T-Net
matrixmultiply
3x3transform
T-Net
matrixmultiply
64x64transform
shared
mlp (128,m)
output scores
nxm
k
Classification Network
Segmentation Network
Furthest Random0 87.1 87.1
0.5 85.7 83.30.75 81.3 740.875 69.2 59.20.9375 49.1 33.2
PointNet VoxNet0 87.1 86.3
0.5 83.3 460.75 74 18.50.875 59.2 13.30.9375 33.2 10.2
0
20
40
60
80
100
0 0.2 0.4 0.6 0.8 1
Acc
urac
y (%
)
Missing Data Ratio
PointNet
VoxNet
Orig
inal
Sha
peC
ritic
al P
oint
Set
sU
pper
-bou
nd S
hape
s
30
40
50
60
70
80
90
0 0.05 0.1
Acc
urac
y (%
)
Perturbation noise std
30 40 50 60 70 80 90
100
0 0.2 0.4 0.6 0.8 1
Acc
urac
y (%
)
Missing data ratio
Furthest
Random
20 30 40 50 60 70 80 90
100
0.1 0.2 0.3 0.4 0.5
Acc
urac
y (%
)
Outlier ratio
XYZ XYZ+density
•Point Cloud Features • Mostly hand-crafted features -> feature learning
• Deep Nets on Point Cloud/Shape • Point cloud is usually converted to volume, image or feature
vector -> We work directly on point sets
•Deep Nets on Unordered Sets • Not much work on deep nets for point sets, barely any for 3D ->
We invent, experiment and explain novel architectures
Deep Learning on Point Sets
Properties of Point Sets
Poin
tNe
t Arc
hite
ctu
re
Theorem: PointNet as a Universal Approximation to Set Functions
‣ We design a novel deep net architecture suitable for consuming unordered point sets in 3D;
‣ We show how such a net can be trained to perform 3D shape classification, shape part segmentation and scene semantic parsing tasks;
‣ We provide thorough empirical and theoretical analysis on the stability and efficiency of our method;
‣ We illustrate the 3D features computed by the selected neurons in the net and develop intuitive explanations for its performance.
Contributions
Application Results Analysis Experiments
Table 1: Segmentation results on ShapeNet part dataset. Metric is mean IoU(%) across shapes.
Object Part Segmentation
Shape Classification
Table 2: Shape classification results on ModelNet40.
Semantic Segmentation
Table 3: Semantic segmentation results
on Stanford 3D Parsing dataset.
Table 4: Object detection results based on
semantic segmentation.
PointNet Robustness Test
Time and Space Complexity
PointNet is robust to various types of data corruption such as incompletion, outliers and perturbations
Table 6: Time and space complexity of PointNet
(classification network) compared with volumetric CNNs
(subvolume and VRN) and multi-view CNNs (MVCNN).
PointNet is highly time efficient (229x better than VRN, 141x better than MVCNN) and highly space efficient (17x less param. than MVCNN).
Visualization of what PointNet has Learned
Each cube visualizes the region of space that
activates a point function.
Critical points (those that affect the 1024-dim bottleneck layer)
and shape upper-bound. Left: test data. Right: unseen category.
Unordered. Unlike pixel arrays in images or voxel arrays in volumetric grids, point cloud is a set of points without specific order.
Interaction among points. The points are from a space with a distance metric. It means that points are not isolated.
Invariance under transformations. As a geometric object, the learned representation of the point set should be invariant to certain transformations.
Unordered: Symmetry Function for Unordered Input Invariance under transformations: Joint Alignment Network
Robot perception AR/VR
Big Data + Deep 3D Representation Learning
However.. 3D has multiple representations
Point Cloud Mesh Volumetric Image
Rawness
Geometry
Compact- ness
Data Structure
Previous Work
Set Graph Array Array
Spectral CNN 3D CNN Image CNN
This work: Deep Learning on Point Sets for 3D Vision