Intrinsic3D: High-Quality 3D Reconstruction by Joint ... · Joint Appearance and Geometry...

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting

R. Maier1,2, K. Kim1, D. Cremers2, J. Kautz1, M. Nießner2,3

International Conference on Computer Vision 2017

OursFusion

1 NVIDIA 3 Stanford University2 TU Munich

2

Overview

• Motivation & State-of-the-art

• Approach

• Results

• Conclusion

3

Overview


• Approach

• Results

• Conclusion

4

Motivation

• Recent progress in Augmented Reality (AR) / Virtual Reality (VR)

HTC ViveMicrosoft HoloLens

5

Motivation


• Requirement of high-quality 3D content for AR, VR, Gaming …

HTC Vive

NVIDIA VR Funhouse

Microsoft HoloLens

6

Motivation



• Usually: manual modelling (e.g. Maya)

HTC Vive

NVIDIA VR Funhouse

Microsoft HoloLens

7

Motivation




• Wide availability of commodity RGB-D sensors: efficient methods for 3D reconstruction

HTC Vive

NVIDIA VR Funhouse

Microsoft HoloLens

Asus Xtion

8

Motivation




• Wide availability of commodity RGB-D sensors: efficient methods for 3D reconstruction

• Challenge: how to reconstruct high-quality 3D models with best-possible geometry and color from low-cost depth sensors?

HTC Vive

NVIDIA VR Funhouse

Microsoft HoloLens

Asus Xtion

9

State-of-the-art

• Goal: stream of RGB-D frames of a scene → 3D shape that maximizes the geometric consistency

RGB-D based 3D Reconstruction

10

State-of-the-art


• Real-time, robust, fairly accurate geometric reconstructions


KinectFusion, 2011

“KinectFusion: Real-time Dense Surface Mapping and Tracking”, Newcombe et al., ISMAR 2011.

11

State-of-the-art




DynamicFusion, 2015KinectFusion, 2011


“DynamicFusion: Reconstruction and Tracking of Non-rigid Scenes in Real-time”, Newcombe et al., CVPR 2015.

12

State-of-the-art




BundleFusion, 2017DynamicFusion, 2015KinectFusion, 2011


“DynamicFusion: Reconstruction and Tracking of Non-rigid Scenes in Real-time”, Newcombe et al., CVPR 2015.

“BundleFusion: Real-time Globally Consistent 3D Reconstruction using On-the-fly Surface Re-integration”, Dai et al., ToG 2017.

13

State-of-the-artVoxel Hashing• Baseline RGB-D based 3D reconstruction

framework• initial camera poses

• sparse SDF reconstruction

“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013.

14




• Challenges:• (Slightly) inaccurate and over-smoothed geometry


15





• Bad colors


16





• Bad colors

• Inaccurate camera pose estimation


17





• Bad colors


• Input data quality (e.g. motion blur, sensor noise)


18





• Bad colors




19





• Bad colors



• Goal: High-Quality Reconstruction of Geometry and Color


2020

State-of-the-art

2121

State-of-the-artHigh-Quality Colors [Zhou2014]

“Color Map Optimization for 3D Reconstruction with Consumer Depth Cameras”, Zhou and Koltun, ToG 2014

Optimize camera poses and image deformations to optimally fit initial (maybe wrong) reconstruction

But: HQ images required, no geometry refinement involved

2222





High-Quality Geometry [Zollhöfer2015]

“Shading-based Refinement on Volumetric Signed Distance Functions”, Zollhöfer et al., ToG 2015

Adjust camera poses in advance (bundle adjustment) to improve colorUse shading cues (RGB) to refine geometry (shading based refinement of surface & albedo)

But: RGB is fixed (no color refinement based on refined geometry)

2323





High-Quality Geometry [Zollhöfer2015]

“Shading-based Refinement on Volumetric Signed Distance Functions”, Zollhöfer et al., ToG 2015

Adjust camera poses in advance (bundle adjustment) to improve colorUse shading cues (RGB) to refine geometry (shading based refinement of surface & albedo)

But: RGB is fixed (no color refinement based on refined geometry)

Idea: jointly optimize for geometry, albedo and image formation model to simultaneously obtain high-quality geometry and appearance!

24

Our Method

25

Our Method

• Temporal view sampling & filtering techniques (input frames)

26

Our Method


• Joint optimization of • surface & albedo (Signed Distance

Field) • image formation model

27

Our Method




28

Our Method




• Lighting estimation using Spatially-Varying Spherical Harmonics (SVSH)

29

Our Method




• Lighting estimation using Spatially-Varying Spherical Harmonics (SVSH)

• Optimized colors (by-product)

30

Overview


• Approach

• Results

• Conclusion

31

ApproachOverview

RGB-D

32

ApproachOverview

RGB-D SDF Fusion

33

ApproachOverview

RGB-D SDF Fusion Shading-based Refinement(Shape-from-Shading)

34

ApproachOverview

RGB-D SDF Fusion

Temporal view sampling / filtering

Shading-based Refinement(Shape-from-Shading)

35

ApproachOverview

RGB-D SDF Fusion



Spatially-VaryingLighting Estimation

36

ApproachOverview

RGB-D SDF Fusion




Joint Appearance and Geometry Optimization• surface• albedo• image formation model

37

ApproachOverview

High-Quality 3D Reconstruction

RGB-D SDF Fusion





38

ApproachOverview

RGB-D

39

RGB-D Data

• 1086 RGB-D frames

• Sensor:

• Depth 640x480px

• Color 1280x1024px

• ~10 Hz

• Primesense

• Poses estimated using Voxel Hashing

Example: Fountain dataset

Source: http://qianyi.info/scenedata.html

40

ApproachOverview

SDF Fusion

41

• Voxel grid: dense (e.g. KinectFusion) or sparse (e.g. Voxel Hashing)

Signed Distance FieldsVolumetric 3D model representation

“A volumetric method for building complex models from range images”, Curless and Levoy, SIGGRAPH 1996.

42

• Voxel grid: dense (e.g. KinectFusion) or sparse (e.g. Voxel Hashing)• Each voxel stores:

• Signed Distance Function (SDF): signed distance to closest surface• Color values• Weights

Signed Distance FieldsVolumetric 3D model representation

“A volumetric method for building complex models from range images”, Curless and Levoy, SIGGRAPH 1996.

43

Signed Distance Fields

• Integrate depth maps into SDF with their estimated camera poses

Fusion of depth maps

44



• Voxel updates using weighted average


45





46





47





48




• Extract ISO-surface with Marching Cubes (triangle mesh)

”Marching cubes: A high resolution 3D surface construction algorithm”, Lorensen and Cline, SIGGRAPH 1987.


49

ApproachOverview


50

Keyframe Selection

• Compute per-frame blur score (for color image)

“The blur effect: perception and estimation with a new no-reference perceptual blur metric”, Crete et al., SPIE 2007.

Frame 81 Frame 84

• Select frame with best score within a fixed size window as keyframe

51

Sampling / Filtering

• Sample from selected keyframes only

• Collect observations for voxel in input views:

Sampling of voxel observationsInput keyframes

Reconstruction

52




Sampling of voxel observations

Voxel center transformed and projected into input view

Input keyframes

Reconstruction

53




• Observation weights: view-dependent on normal and depth

• Filter observations: keep only best 5 observations by weight

Sampling of voxel observations

Voxel center transformed and projected into input view

Input keyframes

Reconstruction

54

ApproachOverview




Double-hierarchical(coarse-to-fine: SDF Volume / RGB-D)

55

Shape-from-Shading

• Shading equation:

56

Shape-from-Shadingsurface normal


57

Shape-from-Shading

lighting

surface normal


58

Shape-from-Shadingalbedo

lighting

surface normal


59

Shape-from-Shadingalbedo

lighting

surface normal


Shading

60

Shape-from-Shading

• Shading-based refinement:

• Intuition: high-frequency changes in surface geometry → shading cues in input images

albedo

lighting

surface normal


Shading

61

Shape-from-Shading



• Estimate lighting given surface and albedo (intrinsic material properties)

albedo

lighting

surface normal


Shading

62

Shape-from-Shading



• Estimate lighting given surface and albedo (intrinsic material properties)

• Estimate surface and albedo given the lighting: minimize difference between estimated shading and input luminance

albedo

lighting

surface normal


Shading

-

63

ApproachOverview


64

• Encode incident lighting for a given surface point

• Smooth for Lambertian surfaces

Lighting EstimationSpherical Harmonics (SH)

65



• SH Basis functions Hm parametrized by unit normal n


66




• Good approx. using only 9 SH basis functions (2nd order)


67





• Estimate SH coefficients:


68





• Estimate SH coefficients:


• Shortcoming: purely directional → cannot represent scene lighting for all surface points simultaneously

69

Spatially-Varying LightingSubvolume Partitioning

70

Spatially-Varying Lighting

• Partition SDF volume into subvolumes

Subvolume Partitioning

71



• Estimate independent SH coefficients for each subvolume


72



• Estimate independent SH coefficients for each subvolume

• Obtain per-voxel SH coefficients through tri-linear interpolation


73

Spatially-Varying LightingJoint Optimization

74


• Estimate SVSH coefficients for all subvolumes jointly:

Joint Optimization

75



Joint Optimization

Similarity between estimated shading and input luminance

Data term:

76



Joint Optimization

Similarity between estimated shading and input luminance

Data term:

Laplacian regularizer:

Smooth illumination changes

77

ApproachOverview


78

Joint Optimization

• Joint optimization of geometry, albedo and image formation model (camera poses and camera intrinsics):

Shading-based SDF optimization

with

79

Joint Optimization



Gradient-based shading constraint (data term)

with

80

Joint Optimization



Gradient-based shading constraint (data term)Volumetric regularizer: smoothness in distance values (Laplacian)

with

81

Joint Optimization



Gradient-based shading constraint (data term)Volumetric regularizer: smoothness in distance values (Laplacian)Surface Stabilization constraint: stay close to initial distance values

with

82

Joint Optimization



Gradient-based shading constraint (data term)Volumetric regularizer: smoothness in distance values (Laplacian)Surface Stabilization constraint: stay close to initial distance valuesAlbedo regularizer: constrain albedo changes based on chromaticity (Laplacian)

with

83

Joint Optimization

• Idea: maximize consistency between estimated voxel shading and sampled intensities in input luminance images (gradient for robustness)

Shading Constraint (data term)

84

Joint Optimization



Best views for voxel and respective view-dependent weights

85

Joint Optimization



Best views for voxel and respective view-dependent weightsShading: allows for optimization of surface (through normal) and albedo

86

Joint Optimization



Best views for voxel and respective view-dependent weightsShading: allows for optimization of surface (through normal) and albedoVoxel center transformed and projected into input view

87

Joint Optimization



Best views for voxel and respective view-dependent weightsShading: allows for optimization of surface (through normal) and albedo

Sampling: allows for optimization of camera poses and camera intrinsicsVoxel center transformed and projected into input view

88

Recolorization

• Recompute voxel colors after optimization at each level

Optimal colors

89

Recolorization


• Sampling

• Sample from keyframes only

• Collect, weight and filter observations

Optimal colors

90

Recolorization


• Sampling

• Sample from keyframes only

• Collect, weight and filter observations

• Weighted average of observations:

Optimal colors

91

Overview


• Approach

• Results

• Conclusion

92

Ground Truth: GeometryFrog (synthetic)

Ours

Fusion Ground truth

93


Ours

Zollhöfer et al. 15

Fusion Ground truth

94


Ours

Zollhöfer et al. 15 Ours

Fusion Ground truth

95

Ground Truth: Quantitative ResultsFrog (synthetic) Zollhöfer et al. 15

• Generated synthetic RGB-D dataset (noise on depth and camera poses)

• Quantitative surface accuracy evaluation• Color coding: absolute distances (ground truth)

96

Ground Truth: Quantitative ResultsFrog (synthetic)

Ours




97

Ground Truth: Quantitative ResultsFrog (synthetic)

Ours




Mean absolute deviation:• Ours: 0.222mm (std.dev. 0.269mm)• Zollhöfer et al: 0.278mm (std.dev. 0.299mm)

→ 20.14% more accurate

98

Qualitative ResultsRelief (geometry)

Input Color

Fusion

Ours


Ours

99

Qualitative ResultsFountain (appearance)

Input Color Ours

Fusion


Ours

100

Qualitative ResultsLion

Input ColorGeometry (ours) Appearance (ours)

Fusion Ours Fusion Ours

101

Qualitative ResultsTomb Statuary

Input Color Appearance (ours)

Fusion Ours

Geometry (ours)

Fusion Ours

102

Qualitative ResultsGate

Input Color

Fusion Ours

Appearance (ours)Geometry (ours)

Fusion Ours

103

Qualitative ResultsHieroglyphics

Input Color

Fusion Ours

Appearance (ours)Geometry (ours)

Fusion Ours

104

Qualitative ResultsBricks

Input Color Geometry (ours)

Appearance (ours)

Ours Fusion

105

Luminance

Shading: Global SH vs. SVSHFountain

106Albedo

Luminance


107Albedo

Luminance


Global SH

DifferenceShading

108Albedo

Luminance


Global SH

SVSH

Difference

DifferenceShading

Shading

109

Overview


• Approach

• Results

• Conclusion

110

Conclusion

• High-Quality 3D Reconstruction of Geometry and Appearance• Temporal view sampling & filtering techniques

• Spatially-Varying Lighting estimation

• Joint optimization of surface & albedo (SDF) and image formation model

• Optimized texture as by-product

111

Conclusion

• High-Quality 3D Reconstruction of Geometry and Appearance• Temporal view sampling & filtering techniques

• Spatially-Varying Lighting estimation

• Joint optimization of surface & albedo (SDF) and image formation model

• Optimized texture as by-product

Thank you!

Questions?

Robert Maier

Technical University of MunichComputer Vision Group

[email protected]://vision.in.tum.de/members/maierr

Intrinsic3D: High-Quality 3D Reconstruction by Joint ... · Joint Appearance and Geometry...

Documents

Transcript of Intrinsic3D: High-Quality 3D Reconstruction by Joint ... · Joint Appearance and Geometry...