Download - SGM-Nets: Semi-global matching with neural networksopenaccess.thecvf.com/content_cvpr_2017/poster/0087_POSTER.pdfSGM-Nets: Semi-global matching with neural networks Akihito Seki1 and

Transcript
Page 1: SGM-Nets: Semi-global matching with neural networksopenaccess.thecvf.com/content_cvpr_2017/poster/0087_POSTER.pdfSGM-Nets: Semi-global matching with neural networks Akihito Seki1 and

SGM-Nets: Semi-global matching with neural networksAkihito Seki1 and Marc Pollefeys2,3 1Toshiba Corporation 2ETH Zurich 3Microsoft

1. Introduction

Our contributions:(1) A first learning based penalties estimation for SGM (Sec.3)

(2) New SGM parameterization that separates Pos/Neg disparity changes (Sec.4)

(3) State-of-the-art accuracy (Sec.6)

2. Related workI. Hand tuned penalties. Ex. Smaller penalties on edges = obj. boundaryII. Learning based penalties for MRF.

How to minimize Eq.(1) by SGM ?

C. All costs

(a) Input image

[1] H.Hirschmuller, “Stereo Processing by Semiglobal Matching and Mutual Information”, PAMI 2008.[2] J.Zbontar et al., “Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches”, JMLR2016.

Stereo correspondence (ex. ZNCC, MC-CNN[2])

Stereo images

Matching cost volume

width

heig

ht

SGM(Penalties P1/P2 are needed)

Disparity map

Disparity estimation pipeline

Proposed method

( ) ( ) [ ] [ ]∑ ∑∑

>−+=−+=

∈∈x y

yx

y

yxx

xx

xNND

ddTPddTPdcDE 11,minˆ21

Disparity map Matching cost Penalty for slant surface Penalty for discontinuity

T [] : Kronecker deltaUnary Pairwise

Semi-global matching (SGM)[1] is the most popular regularization method for stereo because of its high accuracy in real-time. However SGM penalties are difficult to be tuned. This work deals with deep neural networks for predicting the penalties.

SGM: P1/P2 control a smoothness / discontinuity of a disparity map.Energy function E for SGM:

(1)

Position x = (u, v)

Dis

parit

y d

(b) Minimum cost path ( )dL ,xr

Accumulated costs(a) 4 paths from all directions r

Position u

Pos

ition

v

0x

Image( ) ( )∑=r

xx dLD rd,minarg 00

Winner-takes-all (WTA) over accumulated costs Lr

0x1x2x3x 1−x 2−x

( ) ( ) ( ){ ,, min, , 100 dLdcdL rr xxx ′+=′ ( ) ,1, 11 PdLr +±′ x ( ) } , min 211PiLrdi

+′±≠

x

Matching cost

(2)

( ) ( )( )∑≠

+−=00

00 ,,,0max 00xx

xx xxgti dd

igtg mdLdLEHinge loss

Flat Slant Border

(3).

0x1x

Position

Dis

parit

y

Accumulation direction

2x3x( )11 xP

( )21 xP

( )22 xP

0

0

0xgtd

14xd

11xd

23xd3

3xd

( )32 xP0

Backward direction

1d

2d

3d

4d

5d 05xd

( ) ( ) ( ) ( ) ( ) ( )223332110032100 ,,,,, xxxxxx xxxxx PdcdcdcdcdL gtgt ++++=

( ) ( ) ( ) ( ) ( ) ( ) ( )2111333241505032100 ,,,,, xxxxxxx xxxxx PPdcdcdcdcdL +++++=

Accumulated cost at pixel of disparity 0xgtd0x

Accumulated cost at pixel of disparity 05xd0x

( ) ( ) ( ) 1,0,1221211

=∂

∂=

∂−=

xxx PE

PE

PE ggg ( ) ( ) 0,, if 00

00 >+− mdLdL igtxx xx

Derivatives of Eq.(3) wrt. Pare needed to train the networks

Derivatives wrt. P

Correctly estimated pixel

<

( )111,xx dL

( ) ( )11211, xx x PdL +

( ) ( )12311, xx x PdL +

)(⋅NBorder case

( ) ( )1211, xx x PdL gt +

0x1x

2d

3d

4d

=1d

1xgtd

0xgtd

Minimum

)(⋅N

4. Signed parameterization

5. SGM-Net architecture

6. Results

Ex.)

( ) ( ) ( )( )∑≠

+⋅−+=00

1121, ,0max

xx

xx x

gti ddgtnb mNPdLE

Margin

A. Path cost B. Neighbor costSparse GT Acceptable UnacceptableInfluenced pixels Beyond neighbor pixels Only neighborpixelsDisparity ambiguity Yes NoInitial parameters Influenced Uninfluenced

∑ ∑∑∑∑

+++=

r flatnf

slantns

bordernbg EEEEE α

( ) ,0or 112

=∂∂

xPEb

( ) 0or 111

−=∂∂

xPEb 0 if >nbEDerivatives wrt. P

nxEB. Neighbor cost

A. Path cost gE

A. Path B. Neighbor

E

+1P

+2P

−1P

−2P

Position

Dis

parit

y

positive

negative

+P

−P

Posit

ion

Disp

arity

Position

−+ > 11 PP

VerticalHorizontal

−+ < 11 PP

Disparity

( ) ( ) [ ] [ ]∑ ∑∑

−=+=+=

+

x yy

x

xx

xNND

dTPdTPdcDE 11,minˆ11 δδ

P1/P2 have different penaltiesdepending on either positive ornegative disparity change.Eq.(1) becomes to

[ ] [ ]

−<+>+ ∑∑

+

xx yy NNdTPdTP 11 22 δδ

yx ddd −=δSGM-Net is also applicable to the parameterization

Synthetic dataset

Real dataset

(a) GT disparity map

(b) Initial state of SGM-Net (c) SGM-Net (Path cost)

(d) SGM-Net(Neighbor cost)

AB

(e) SGM-Net with all costs

Original image

BA

+1P −

1P

Original imageDisparity map

LargeSmall

• ZNCC / MC-CNN as a matcher• Various settings of SGM.

• MC-CNN as a matcher (67 [sec.] for the matching)• Got 1st rank at the CVPR deadline#

• Standard and Signed SGM-Nets output 8 and 16 values, respectively.

• 3x3 16 filters and 128 dim FCs• Computation takes 0.02 seconds on GPU

#18/10/2016 except anonymous submission

(b) GT disparity map

(c) Winner-tales-all (d) SGM with hand tuned penalties (e) SGM with our SGM-Net

Appropriate SGM penalties are important for accurate disparity maps

Original image Hand tuned SGM

Standard SGM-Net Signed SGM-Net

Percentage of erroneous pixels on non-occluded areas with an error threshold of 3 pixels

KITTI: http://www.cvlibs.net/datasets/kitti/

SceneFlow: https://lmb.informatik.uni-freiburg.de/

KITTI 2012

For semantic segmentation and difficult to apply to SGM.

positive positivenegative negative

(c) Flat/Slant/Border

Penalty map

8.29

3.95 3.60

Set correct path by checking adjacent pixels

Minimize loss over the accumulated path.

From Eq.(2), we formulate

3. SGM-Nets

FC +

ReL

U

Conv

olut

ion

+ Re

LU

Cons

tant

Patch(5x5)

Direction1

Direction2

Direction3

Direction4Normalized patch position

Conc

aten

ate

Conv

olut

ion

+ Re

LU

FC +

ELU

P1+, P1-P2+, P2-

P1+, P1-P2+, P2-

P1+, P1-P2+, P2-

P1+, P1-P2+, P2-

Predict 16 values for each pixel.

SGM-Net : Predict P1/P2 at each pixel with a image

Signed SGM-Net

Path-ambiguity