video intro SPLATNet: Sparse Lattice Networks for Point ... · video intro. Title: Slide 1 Author:...

1
Originally introduced in [3], BCL includes: 1. Splat : BCL first interpolates points onto a permutohedral lattice. 2. Convolve : convolution is performed over the sparsely populated lattice vertices. 3. Slice : the filtered signal is interpolated back onto the original point locations. Efficient sparse high-dim. filtering: hash table & permutohedral lattice Control over filter receptive fields This can be easily achieved by simply scaling lattice features. SPLATNet: Sparse Lattice Networks for Point Cloud Processing Hang Su 1,2 , Varun Jampani 2 , Deqing Sun 2 , Subhransu Maji 1 , Evangelos Kalogerakis 1 , Ming-Hsuan Yang 2,3 , Jan Kautz 2 [email protected]; {vjampani, deqings}@nvidia.com Challenges in Point Cloud Processing 1 SPLATNet operates on point clouds directly and allows joint 2D-3D processing Code: https://github.com/nvlabs/splatnet Traditional CNNs are not a good fit for point clouds: Point clouds are sets of sparse, unordered points; Traditional CNNs operate on dense, regular grids. Pre-processing into voxels or 2D projections: Loss of details Artifacts Computational overheads Bilateral Convolution Layer (BCL) on Point Clouds 2 Facade Semantic Segmentation 4 1 2 3 Architectures: SPLATNet 3D and SPLATNet 2D-3D 3 class avg. IoU instance avg. IoU Yi et al. [7] 79.0 81.4 3DCNN [8] 74.9 79.4 Kd-network [9] 77.4 82.3 PointNet [8] 80.4 83.7 PointNet++ [10] 81.9 85.1 SyncSpecCNN [11] 82.0 84.7 SPLATNet 3D 82.0 84.6 SPLATNet 2D-3D 83.7 85.4 with only 3D data IoU runtime (min) OctNet [1] 59.2 - Autocontext 3D [5] 54.4 16 SPLATNet 3D 65.4 0.06 with both 2D & 3D data IoU runtime (min) Autocontext 2D-3D [5] 62.9 87 SPLATNet 2D-3D 69.8 1.2 IoU runtime (min) Autocontext 2D [5] 60.5 117 Autocontext 2D-3D [5] 62.7 146 2D CNN [6] only 69.3 0.84 SPLATNet 2D-3D 70.6 4.34 Ground-truth Input point cloud Ours (SPLATNET 2D-3D ) Input image Ground-truth Our prediction Ours GT Ours GT ... + concatenation ShapeNet Part Segmentation 5 Ours GT Ours GT Incorrect labels Incomplete labels Inconsistent labels Confusing labels ↑ signs of performance saturation on the benchmark! Conclusion 6 (, , ) (8, 8, 8) ( , , ) ... + SPLATNet 3D BCL Τ 0 16 BCL Τ 0 2 BCL 0 11 Conv SPLATNet 2D-3D 11 Conv BCL 2D → 3D + 11 Conv 11 Conv BCL 3D → 2D ... Input images ... Input 3D point cloud 3D prediction (SPLATNet 3D ) CNN 1 CNN 2 + 2D predictions 11 Conv point feature 1 (, , ) (, , ) (… ) lattice feature (, , ) (, , ) ( , , ) (, , ) point feature (, , ) (… ) lattice feature (, ) (, ) Image Point cloud lattice feature (, ) 2 ∙ (, ) 3 ∙ (, ) Two separate sets of features Smooth projection between 2D&3D Many sensors output 3D points and 2D images simultaneously. Images can provide complementary information to 3D data, and is also often less noisy. We propose SPLATNet for efficient and flexible processing directly on point clouds. It can also incorporate 2D images for seamless joint 2D-3D learning. Varying receptive field sizes can help capture information from multiple scales. We achieve this capability by a special type of BCL. Splatting 2D data (given (, , , )) into 3D point feature → “what” lattice feature → “where” 3D prediction (SPLATNet 2D-3D ) OctNet [1] MVCNN [2] Ruemonge2014 [4] References: [1] G. Riegler et al. Octnet: Learning deep 3D representations at high resolutions. CVPR'17. [2] H. Su et al. Multi-view convolutional neural networks for 3D shape recognition. ICCV’15. [3] V. Jampani et al. Learning sparse high dimensional filters: Image filtering, dense CRFs and bilateral neural networks. CVPR'16. [8] C. R. Qi et al. PointNet: Deep learning on point sets for 3D classification and segmentation. CVPR’17. [9] R. Klokov and V. Lempitsky. Escape from cells: Deep Kd-Networks for the recognition of 3D point cloud models. ICCV'17. [10] C. R. Qi et al. PointNet++: Deep hierarchical feature learning on point sets in a metric space. NIPS'17. [11] L. Yi et al. SyncSpecCNN: Synchronized spectral CNN for 3D shape segmentation. CVPR'17. [4] H. Riemenschneider et al. Learning where to classify in multi-view semantic segmentation. ECCV’14. [5] R. Gadde et al. Efficient 2D and 3D facade segmentation using auto-context. PAMI'17. [6] L.-C. Chen et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs. ICLR'15. [7] L. Yi et al. A scalable active framework for region annotation in 3D shape collections. SIGGRAPH Asia'16. video intro

Transcript of video intro SPLATNet: Sparse Lattice Networks for Point ... · video intro. Title: Slide 1 Author:...

Page 1: video intro SPLATNet: Sparse Lattice Networks for Point ... · video intro. Title: Slide 1 Author: Varun Jampani Created Date: 6/16/2018 12:25:05 AM ...

Originally introduced in [3], BCL includes:

1. Splat: BCL first interpolates points onto

a permutohedral lattice.

2. Convolve: convolution is performed over

the sparsely populated lattice vertices.

3. Slice: the filtered signal is interpolated

back onto the original point locations.

Efficient sparse high-dim. filtering:• hash table & permutohedral lattice

Control over filter receptive fields

• This can be easily achieved by

simply scaling lattice features.

SPLATNet: Sparse Lattice Networks for Point Cloud ProcessingHang Su1,2, Varun Jampani2, Deqing Sun2, Subhransu Maji1, Evangelos Kalogerakis1, Ming-Hsuan Yang2,3, Jan Kautz2

[email protected]; {vjampani, deqings}@nvidia.com

Challenges in Point Cloud Processing1SPLATNet operates on point clouds directly and allows joint 2D-3D processing

Code: https://github.com/nvlabs/splatnet

Traditional CNNs are not a good fit for point clouds: • Point clouds are sets of sparse, unordered points;

• Traditional CNNs operate on dense, regular grids.

Pre-processing into voxels or 2D projections: • Loss of details

• Artifacts

• Computational overheads

Bilateral Convolution Layer (BCL) on Point Clouds2

Facade Semantic Segmentation4

1 2 3

Architectures: SPLATNet3D and SPLATNet2D-3D3

class avg. IoU

instanceavg. IoU

Yi et al. [7] 79.0 81.43DCNN [8] 74.9 79.4Kd-network [9] 77.4 82.3PointNet [8] 80.4 83.7PointNet++ [10] 81.9 85.1SyncSpecCNN [11] 82.0 84.7SPLATNet3D 82.0 84.6SPLATNet2D-3D 83.7 85.4

with only 3D data

IoUruntime

(min)OctNet [1] 59.2 -Autocontext3D [5] 54.4 16SPLATNet3D 65.4 0.06

with both 2D & 3D data

IoUruntime

(min)Autocontext2D-3D [5] 62.9 87SPLATNet2D-3D 69.8 1.2

IoUruntime

(min)Autocontext2D [5] 60.5 117Autocontext2D-3D [5] 62.7 1462D CNN [6] only 69.3 0.84SPLATNet2D-3D 70.6 4.34Ground-truthInput point cloud Ours (SPLATNET2D-3D)

Input image Ground-truth Our prediction

Ou

rs

G

T

Ou

rs

G

T

...

+ concatenation

ShapeNet Part Segmentation5

Ou

rs

G

T

Ou

rs

G

T

Incorrect labels Incomplete labels

Inconsistent labels Confusing labels

↑ signs of performance saturationon the benchmark!

Conclusion6

(𝑥, 𝑦, 𝑧) (8𝑥, 8𝑦, 8𝑧) (𝑛𝑥, 𝑛𝑦 , 𝑛𝑧)

...

+

SPLATNet3D

BCLΤ𝜆0 16

BCL Τ𝜆0 2

BCL𝜆0

1⨉1 Conv

SPLATNet2D-3D

1⨉1 Conv

BCL 2D → 3D

+ 1⨉1 Conv 1⨉1 Conv

BCL 3D → 2D

...

Input images ...

Input 3D point cloud

3D prediction(SPLATNet3D)

CNN1

CNN2+2D predictions

1⨉1 Conv

point feature

1 (𝑥, 𝑦, 𝑧) (𝑟, 𝑔, 𝑏) 𝑓(… )

lattice feature

(𝑥, 𝑦, 𝑧) (𝑥, 𝑦, 𝑧) (𝑛𝑥, 𝑛𝑦, 𝑛𝑧) (𝑥, 𝑦, 𝑧)

point feature

(𝑟, 𝑔, 𝑏) 𝑓(… )

lattice feature

(𝑥, 𝑦) (𝑥, 𝑦)

Image

Point cloud

lattice feature

(𝑥, 𝑦) 2 ∙ (𝑥, 𝑦) 3 ∙ (𝑥, 𝑦)

Two separate sets of features

Smooth projection between 2D&3D

• Many sensors output 3D points and

2D images simultaneously.

• Images can provide complementary

information to 3D data, and is also

often less noisy.

We propose SPLATNet for efficient and flexible processing directly on point

clouds. It can also incorporate 2D images for seamless joint 2D-3D learning.

• Varying receptive

field sizes can help

capture information

from multiple scales. • We achieve

this capability

by a special

type of BCL.

Splatting 2D data (given (𝑙, 𝑥, 𝑦, 𝑧)) into 3D

point feature

→ “what”

lattice feature

→ “where”

3D prediction(SPLATNet2D-3D)

OctNet [1] MVCNN [2]

Ruemonge2014 [4]

References:[1] G. Riegler et al. Octnet: Learning deep 3D

representations at high resolutions. CVPR'17.

[2] H. Su et al. Multi-view convolutional neural networks

for 3D shape recognition. ICCV’15.

[3] V. Jampani et al. Learning sparse high dimensional

filters: Image filtering, dense CRFs and bilateral neural

networks. CVPR'16.

[8] C. R. Qi et al. PointNet: Deep learning on point sets

for 3D classification and segmentation. CVPR’17.

[9] R. Klokov and V. Lempitsky. Escape from cells: Deep

Kd-Networks for the recognition of 3D point cloud

models. ICCV'17.

[10] C. R. Qi et al. PointNet++: Deep hierarchical feature

learning on point sets in a metric space. NIPS'17.

[11] L. Yi et al. SyncSpecCNN: Synchronized spectral

CNN for 3D shape segmentation. CVPR'17.

[4] H. Riemenschneider et al. Learning where to classify

in multi-view semantic segmentation. ECCV’14.

[5] R. Gadde et al. Efficient 2D and 3D facade

segmentation using auto-context. PAMI'17.

[6] L.-C. Chen et al. Semantic image segmentation with

deep convolutional nets and fully connected CRFs.

ICLR'15.

[7] L. Yi et al. A scalable active framework for region

annotation in 3D shape collections. SIGGRAPH Asia'16.

video intro