PointNet: Deep Learning on Point Sets for 3D Classification and...

1
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation Charles R. Qi*, Hao Su*, Kaichun Mo, Leonidas J. Guibas Motivation & Background Input Output mug? table? car? Classification Part Segmentation PointNet Semantic Segmentation Partial Inputs Complete Inputs airplane car chair lamp guitar motorbike mug table bag rocket earphone laptop cap knife pistol skateboard Original Shape Critical Point Sets Upper-bound Shapes input points point features output scores max pool shared shared shared nx3 nx3 nx64 nx64 nx1024 1024 n x 1088 nx128 mlp (64,64) mlp (64,128,1024) input transform feature transform mlp (512,256,k) global feature mlp (512,256) T-Net matrix multiply 3x3 transform T-Net matrix multiply 64x64 transform shared mlp (128,m) output scores nxm k Classification Network Segmentation Network 0 20 40 60 80 100 0 0.2 0.4 0.6 0.8 1 Accuracy (%) Missing Data Ratio PointNet VoxNet Original Shape Critical Point Sets Upper-bound Shapes 30 40 50 60 70 80 90 0 0.05 0.1 Accuracy (%) Perturbation noise std 30 40 50 60 70 80 90 100 0 0.2 0.4 0.6 0.8 1 Accuracy (%) Missing data ratio Furthest Random 20 30 40 50 60 70 80 90 100 0.1 0.2 0.3 0.4 0.5 Accuracy (%) Outlier ratio XYZ XYZ+density Point Cloud Features Mostly hand-crafted features -> feature learning Deep Nets on Point Cloud/Shape Point cloud is usually converted to volume, image or feature vector -> We work directly on point sets Deep Nets on Unordered Sets Not much work on deep nets for point sets, barely any for 3D -> We invent, experiment and explain novel architectures Deep Learning on Point Sets Properties of Point Sets PointNet Architecture Theorem: PointNet as a Universal Approximation to Set Functions We design a novel deep net architecture suitable for consuming unordered point sets in 3D; We show how such a net can be trained to perform 3D shape classification, shape part segmentation and scene semantic parsing tasks; We provide thorough empirical and theoretical analysis on the stability and efficiency of our method; We illustrate the 3D features computed by the selected neurons in the net and develop intuitive explanations for its performance. Contributions Application Results Analysis Experiments Table 1: Segmentation results on ShapeNet part dataset. Metric is mean IoU(%) across shapes. Object Part Segmentation Shape Classification Table 2: Shape classification results on ModelNet40. Semantic Segmentation Table 3: Semantic segmentation results on Stanford 3D Parsing dataset. Table 4: Object detection results based on semantic segmentation. PointNet Robustness Test Time and Space Complexity PointNet is robust to various types of data corruption such as incompletion, outliers and perturbations Table 6: Time and space complexity of PointNet (classification network) compared with volumetric CNNs (subvolume and VRN) and multi-view CNNs (MVCNN). PointNet is highly time efficient (229x better than VRN, 141x better than MVCNN) and highly space efficient (17x less param. than MVCNN). Visualization of what PointNet has Learned Each cube visualizes the region of space that activates a point function. Critical points (those that affect the 1024-dim bottleneck layer) and shape upper-bound. Left: test data. Right: unseen category. Unordered. Unlike pixel arrays in images or voxel arrays in volumetric grids, point cloud is a set of points without specific order. Interaction among points. The points are from a space with a distance metric. It means that points are not isolated. Invariance under transformations. As a geometric object, the learned representation of the point set should be invariant to certain transformations. Unordered: Symmetry Function for Unordered Input Invariance under transformations: Joint Alignment Network Robot perception AR/VR Big Data + Deep 3D Representation Learning However.. 3D has multiple representations Point Cloud Mesh Volumetric Image Rawness Geometry Compact- ness Data Structure Previous Work Set Graph Array Array Spectral CNN 3D CNN Image CNN This work: Deep Learning on Point Sets for 3D Vision

Transcript of PointNet: Deep Learning on Point Sets for 3D Classification and...

Page 1: PointNet: Deep Learning on Point Sets for 3D Classification and …forum.stanford.edu/events/posterslides/PointNetDeep... · 2017. 4. 6. · •Deep Nets on Point Cloud/Shape •

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation Charles R. Qi*, Hao Su*, Kaichun Mo, Leonidas J. Guibas

Motivation & Background

Input

Output

mug?

table?

car?

Classification Part Segmentation

PointNet

Semantic Segmentation

Input Point Cloud (point set representation)

Partial Inputs Complete Inputs

airplane

car

chair

lamp

guitarmotorbike

mugtable

bag

rocket

earphone

laptop

cap

knife

pistol

skateboard

Orig

inal

Sha

peC

ritic

al P

oint

Set

sU

pper

-bou

nd S

hape

s

inpu

t poi

nts

point features

outp

ut sc

ores

maxpool

shared shared

shared

nx3

nx3

nx64

nx64 nx1024

1024

n x 1088

nx12

8

mlp (64,64) mlp (64,128,1024)inputtransform

featuretransform

mlp(512,256,k)

global feature

mlp (512,256)

T-Net

matrixmultiply

3x3transform

T-Net

matrixmultiply

64x64transform

shared

mlp (128,m)

output scores

nxm

k

Classification Network

Segmentation Network

Furthest Random0 87.1 87.1

0.5 85.7 83.30.75 81.3 740.875 69.2 59.20.9375 49.1 33.2

PointNet VoxNet0 87.1 86.3

0.5 83.3 460.75 74 18.50.875 59.2 13.30.9375 33.2 10.2

0

20

40

60

80

100

0 0.2 0.4 0.6 0.8 1

Acc

urac

y (%

)

Missing Data Ratio

PointNet

VoxNet

Orig

inal

Sha

peC

ritic

al P

oint

Set

sU

pper

-bou

nd S

hape

s

30

40

50

60

70

80

90

0 0.05 0.1

Acc

urac

y (%

)

Perturbation noise std

30 40 50 60 70 80 90

100

0 0.2 0.4 0.6 0.8 1

Acc

urac

y (%

)

Missing data ratio

Furthest

Random

20 30 40 50 60 70 80 90

100

0.1 0.2 0.3 0.4 0.5

Acc

urac

y (%

)

Outlier ratio

XYZ XYZ+density

•Point Cloud Features • Mostly hand-crafted features -> feature learning

• Deep Nets on Point Cloud/Shape • Point cloud is usually converted to volume, image or feature

vector -> We work directly on point sets

•Deep Nets on Unordered Sets • Not much work on deep nets for point sets, barely any for 3D ->

We invent, experiment and explain novel architectures

Deep Learning on Point Sets

Properties of Point Sets

Poin

tNe

t Arc

hite

ctu

re

Theorem: PointNet as a Universal Approximation to Set Functions

‣ We design a novel deep net architecture suitable for consuming unordered point sets in 3D;

‣ We show how such a net can be trained to perform 3D shape classification, shape part segmentation and scene semantic parsing tasks;

‣ We provide thorough empirical and theoretical analysis on the stability and efficiency of our method;

‣ We illustrate the 3D features computed by the selected neurons in the net and develop intuitive explanations for its performance.

Contributions

Application Results Analysis Experiments

Table 1: Segmentation results on ShapeNet part dataset. Metric is mean IoU(%) across shapes.

Object Part Segmentation

Shape Classification

Table 2: Shape classification results on ModelNet40.

Semantic Segmentation

Table 3: Semantic segmentation results

on Stanford 3D Parsing dataset.

Table 4: Object detection results based on

semantic segmentation.

PointNet Robustness Test

Time and Space Complexity

PointNet is robust to various types of data corruption such as incompletion, outliers and perturbations

Table 6: Time and space complexity of PointNet

(classification network) compared with volumetric CNNs

(subvolume and VRN) and multi-view CNNs (MVCNN).

PointNet is highly time efficient (229x better than VRN, 141x better than MVCNN) and highly space efficient (17x less param. than MVCNN).

Visualization of what PointNet has Learned

Each cube visualizes the region of space that

activates a point function.

Critical points (those that affect the 1024-dim bottleneck layer)

and shape upper-bound. Left: test data. Right: unseen category.

Unordered. Unlike pixel arrays in images or voxel arrays in volumetric grids, point cloud is a set of points without specific order.

Interaction among points. The points are from a space with a distance metric. It means that points are not isolated.

Invariance under transformations. As a geometric object, the learned representation of the point set should be invariant to certain transformations.

Unordered: Symmetry Function for Unordered Input Invariance under transformations: Joint Alignment Network

Robot perception AR/VR

Big Data + Deep 3D Representation Learning

However.. 3D has multiple representations

Point Cloud Mesh Volumetric Image

Rawness

Geometry

Compact- ness

Data Structure

Previous Work

Set Graph Array Array

Spectral CNN 3D CNN Image CNN

This work: Deep Learning on Point Sets for 3D Vision