All-Frequency Rendering with Dynamic, Spatially-Varying Reflectance
Intrinsic3D: High-Quality 3D Reconstruction by Joint ... · Joint Appearance and Geometry...
Transcript of Intrinsic3D: High-Quality 3D Reconstruction by Joint ... · Joint Appearance and Geometry...
Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting
R. Maier1,2, K. Kim1, D. Cremers2, J. Kautz1, M. Nießner2,3
International Conference on Computer Vision 2017
OursFusion
1 NVIDIA 3 Stanford University2 TU Munich
2
Overview
• Motivation & State-of-the-art
• Approach
• Results
• Conclusion
3
Overview
• Motivation & State-of-the-art
• Approach
• Results
• Conclusion
4
Motivation
• Recent progress in Augmented Reality (AR) / Virtual Reality (VR)
HTC ViveMicrosoft HoloLens
5
Motivation
• Recent progress in Augmented Reality (AR) / Virtual Reality (VR)
• Requirement of high-quality 3D content for AR, VR, Gaming …
HTC Vive
NVIDIA VR Funhouse
Microsoft HoloLens
6
Motivation
• Recent progress in Augmented Reality (AR) / Virtual Reality (VR)
• Requirement of high-quality 3D content for AR, VR, Gaming …
• Usually: manual modelling (e.g. Maya)
HTC Vive
NVIDIA VR Funhouse
Microsoft HoloLens
7
Motivation
• Recent progress in Augmented Reality (AR) / Virtual Reality (VR)
• Requirement of high-quality 3D content for AR, VR, Gaming …
• Usually: manual modelling (e.g. Maya)
• Wide availability of commodity RGB-D sensors: efficient methods for 3D reconstruction
HTC Vive
NVIDIA VR Funhouse
Microsoft HoloLens
Asus Xtion
8
Motivation
• Recent progress in Augmented Reality (AR) / Virtual Reality (VR)
• Requirement of high-quality 3D content for AR, VR, Gaming …
• Usually: manual modelling (e.g. Maya)
• Wide availability of commodity RGB-D sensors: efficient methods for 3D reconstruction
• Challenge: how to reconstruct high-quality 3D models with best-possible geometry and color from low-cost depth sensors?
HTC Vive
NVIDIA VR Funhouse
Microsoft HoloLens
Asus Xtion
9
State-of-the-art
• Goal: stream of RGB-D frames of a scene → 3D shape that maximizes the geometric consistency
RGB-D based 3D Reconstruction
10
State-of-the-art
• Goal: stream of RGB-D frames of a scene → 3D shape that maximizes the geometric consistency
• Real-time, robust, fairly accurate geometric reconstructions
RGB-D based 3D Reconstruction
KinectFusion, 2011
“KinectFusion: Real-time Dense Surface Mapping and Tracking”, Newcombe et al., ISMAR 2011.
11
State-of-the-art
• Goal: stream of RGB-D frames of a scene → 3D shape that maximizes the geometric consistency
• Real-time, robust, fairly accurate geometric reconstructions
RGB-D based 3D Reconstruction
DynamicFusion, 2015KinectFusion, 2011
“KinectFusion: Real-time Dense Surface Mapping and Tracking”, Newcombe et al., ISMAR 2011.
“DynamicFusion: Reconstruction and Tracking of Non-rigid Scenes in Real-time”, Newcombe et al., CVPR 2015.
12
State-of-the-art
• Goal: stream of RGB-D frames of a scene → 3D shape that maximizes the geometric consistency
• Real-time, robust, fairly accurate geometric reconstructions
RGB-D based 3D Reconstruction
BundleFusion, 2017DynamicFusion, 2015KinectFusion, 2011
“KinectFusion: Real-time Dense Surface Mapping and Tracking”, Newcombe et al., ISMAR 2011.
“DynamicFusion: Reconstruction and Tracking of Non-rigid Scenes in Real-time”, Newcombe et al., CVPR 2015.
“BundleFusion: Real-time Globally Consistent 3D Reconstruction using On-the-fly Surface Re-integration”, Dai et al., ToG 2017.
13
State-of-the-artVoxel Hashing• Baseline RGB-D based 3D reconstruction
framework• initial camera poses
• sparse SDF reconstruction
“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013.
14
State-of-the-artVoxel Hashing• Baseline RGB-D based 3D reconstruction
framework• initial camera poses
• sparse SDF reconstruction
• Challenges:• (Slightly) inaccurate and over-smoothed geometry
“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013.
15
State-of-the-artVoxel Hashing• Baseline RGB-D based 3D reconstruction
framework• initial camera poses
• sparse SDF reconstruction
• Challenges:• (Slightly) inaccurate and over-smoothed geometry
• Bad colors
“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013.
16
State-of-the-artVoxel Hashing• Baseline RGB-D based 3D reconstruction
framework• initial camera poses
• sparse SDF reconstruction
• Challenges:• (Slightly) inaccurate and over-smoothed geometry
• Bad colors
• Inaccurate camera pose estimation
“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013.
17
State-of-the-artVoxel Hashing• Baseline RGB-D based 3D reconstruction
framework• initial camera poses
• sparse SDF reconstruction
• Challenges:• (Slightly) inaccurate and over-smoothed geometry
• Bad colors
• Inaccurate camera pose estimation
• Input data quality (e.g. motion blur, sensor noise)
“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013.
18
State-of-the-artVoxel Hashing• Baseline RGB-D based 3D reconstruction
framework• initial camera poses
• sparse SDF reconstruction
• Challenges:• (Slightly) inaccurate and over-smoothed geometry
• Bad colors
• Inaccurate camera pose estimation
• Input data quality (e.g. motion blur, sensor noise)
“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013.
19
State-of-the-artVoxel Hashing• Baseline RGB-D based 3D reconstruction
framework• initial camera poses
• sparse SDF reconstruction
• Challenges:• (Slightly) inaccurate and over-smoothed geometry
• Bad colors
• Inaccurate camera pose estimation
• Input data quality (e.g. motion blur, sensor noise)
• Goal: High-Quality Reconstruction of Geometry and Color
“Real-time 3D Reconstruction at Scale using Voxel Hashing”, Nießner et al., ToG 2013.
2020
State-of-the-art
2121
State-of-the-artHigh-Quality Colors [Zhou2014]
“Color Map Optimization for 3D Reconstruction with Consumer Depth Cameras”, Zhou and Koltun, ToG 2014
Optimize camera poses and image deformations to optimally fit initial (maybe wrong) reconstruction
But: HQ images required, no geometry refinement involved
2222
State-of-the-artHigh-Quality Colors [Zhou2014]
“Color Map Optimization for 3D Reconstruction with Consumer Depth Cameras”, Zhou and Koltun, ToG 2014
Optimize camera poses and image deformations to optimally fit initial (maybe wrong) reconstruction
But: HQ images required, no geometry refinement involved
High-Quality Geometry [Zollhöfer2015]
“Shading-based Refinement on Volumetric Signed Distance Functions”, Zollhöfer et al., ToG 2015
Adjust camera poses in advance (bundle adjustment) to improve colorUse shading cues (RGB) to refine geometry (shading based refinement of surface & albedo)
But: RGB is fixed (no color refinement based on refined geometry)
2323
State-of-the-artHigh-Quality Colors [Zhou2014]
“Color Map Optimization for 3D Reconstruction with Consumer Depth Cameras”, Zhou and Koltun, ToG 2014
Optimize camera poses and image deformations to optimally fit initial (maybe wrong) reconstruction
But: HQ images required, no geometry refinement involved
High-Quality Geometry [Zollhöfer2015]
“Shading-based Refinement on Volumetric Signed Distance Functions”, Zollhöfer et al., ToG 2015
Adjust camera poses in advance (bundle adjustment) to improve colorUse shading cues (RGB) to refine geometry (shading based refinement of surface & albedo)
But: RGB is fixed (no color refinement based on refined geometry)
Idea: jointly optimize for geometry, albedo and image formation model to simultaneously obtain high-quality geometry and appearance!
24
Our Method
25
Our Method
• Temporal view sampling & filtering techniques (input frames)
26
Our Method
• Temporal view sampling & filtering techniques (input frames)
• Joint optimization of • surface & albedo (Signed Distance
Field) • image formation model
27
Our Method
• Temporal view sampling & filtering techniques (input frames)
• Joint optimization of • surface & albedo (Signed Distance
Field) • image formation model
28
Our Method
• Temporal view sampling & filtering techniques (input frames)
• Joint optimization of • surface & albedo (Signed Distance
Field) • image formation model
• Lighting estimation using Spatially-Varying Spherical Harmonics (SVSH)
29
Our Method
• Temporal view sampling & filtering techniques (input frames)
• Joint optimization of • surface & albedo (Signed Distance
Field) • image formation model
• Lighting estimation using Spatially-Varying Spherical Harmonics (SVSH)
• Optimized colors (by-product)
30
Overview
• Motivation & State-of-the-art
• Approach
• Results
• Conclusion
31
ApproachOverview
RGB-D
32
ApproachOverview
RGB-D SDF Fusion
33
ApproachOverview
RGB-D SDF Fusion Shading-based Refinement(Shape-from-Shading)
34
ApproachOverview
RGB-D SDF Fusion
Temporal view sampling / filtering
Shading-based Refinement(Shape-from-Shading)
35
ApproachOverview
RGB-D SDF Fusion
Temporal view sampling / filtering
Shading-based Refinement(Shape-from-Shading)
Spatially-VaryingLighting Estimation
36
ApproachOverview
RGB-D SDF Fusion
Temporal view sampling / filtering
Shading-based Refinement(Shape-from-Shading)
Spatially-VaryingLighting Estimation
Joint Appearance and Geometry Optimization• surface• albedo• image formation model
37
ApproachOverview
High-Quality 3D Reconstruction
RGB-D SDF Fusion
Temporal view sampling / filtering
Shading-based Refinement(Shape-from-Shading)
Spatially-VaryingLighting Estimation
Joint Appearance and Geometry Optimization• surface• albedo• image formation model
38
ApproachOverview
RGB-D
39
RGB-D Data
• 1086 RGB-D frames
• Sensor:
• Depth 640x480px
• Color 1280x1024px
• ~10 Hz
• Primesense
• Poses estimated using Voxel Hashing
Example: Fountain dataset
Source: http://qianyi.info/scenedata.html
40
ApproachOverview
SDF Fusion
41
• Voxel grid: dense (e.g. KinectFusion) or sparse (e.g. Voxel Hashing)
Signed Distance FieldsVolumetric 3D model representation
“A volumetric method for building complex models from range images”, Curless and Levoy, SIGGRAPH 1996.
42
• Voxel grid: dense (e.g. KinectFusion) or sparse (e.g. Voxel Hashing)• Each voxel stores:
• Signed Distance Function (SDF): signed distance to closest surface• Color values• Weights
Signed Distance FieldsVolumetric 3D model representation
“A volumetric method for building complex models from range images”, Curless and Levoy, SIGGRAPH 1996.
43
Signed Distance Fields
• Integrate depth maps into SDF with their estimated camera poses
Fusion of depth maps
44
Signed Distance Fields
• Integrate depth maps into SDF with their estimated camera poses
• Voxel updates using weighted average
Fusion of depth maps
45
Signed Distance Fields
• Integrate depth maps into SDF with their estimated camera poses
• Voxel updates using weighted average
Fusion of depth maps
46
Signed Distance Fields
• Integrate depth maps into SDF with their estimated camera poses
• Voxel updates using weighted average
Fusion of depth maps
47
Signed Distance Fields
• Integrate depth maps into SDF with their estimated camera poses
• Voxel updates using weighted average
Fusion of depth maps
48
Signed Distance Fields
• Integrate depth maps into SDF with their estimated camera poses
• Voxel updates using weighted average
• Extract ISO-surface with Marching Cubes (triangle mesh)
”Marching cubes: A high resolution 3D surface construction algorithm”, Lorensen and Cline, SIGGRAPH 1987.
Fusion of depth maps
49
ApproachOverview
Temporal view sampling / filtering
50
Keyframe Selection
• Compute per-frame blur score (for color image)
“The blur effect: perception and estimation with a new no-reference perceptual blur metric”, Crete et al., SPIE 2007.
Frame 81 Frame 84
• Select frame with best score within a fixed size window as keyframe
51
Sampling / Filtering
• Sample from selected keyframes only
• Collect observations for voxel in input views:
Sampling of voxel observationsInput keyframes
Reconstruction
52
Sampling / Filtering
• Sample from selected keyframes only
• Collect observations for voxel in input views:
Sampling of voxel observations
Voxel center transformed and projected into input view
Input keyframes
Reconstruction
53
Sampling / Filtering
• Sample from selected keyframes only
• Collect observations for voxel in input views:
• Observation weights: view-dependent on normal and depth
• Filter observations: keep only best 5 observations by weight
Sampling of voxel observations
Voxel center transformed and projected into input view
Input keyframes
Reconstruction
54
ApproachOverview
Shading-based Refinement(Shape-from-Shading)
Spatially-VaryingLighting Estimation
Joint Appearance and Geometry Optimization• surface• albedo• image formation model
Double-hierarchical(coarse-to-fine: SDF Volume / RGB-D)
55
Shape-from-Shading
• Shading equation:
56
Shape-from-Shadingsurface normal
• Shading equation:
57
Shape-from-Shading
lighting
surface normal
• Shading equation:
58
Shape-from-Shadingalbedo
lighting
surface normal
• Shading equation:
59
Shape-from-Shadingalbedo
lighting
surface normal
• Shading equation:
Shading
60
Shape-from-Shading
• Shading-based refinement:
• Intuition: high-frequency changes in surface geometry → shading cues in input images
albedo
lighting
surface normal
• Shading equation:
Shading
61
Shape-from-Shading
• Shading-based refinement:
• Intuition: high-frequency changes in surface geometry → shading cues in input images
• Estimate lighting given surface and albedo (intrinsic material properties)
albedo
lighting
surface normal
• Shading equation:
Shading
62
Shape-from-Shading
• Shading-based refinement:
• Intuition: high-frequency changes in surface geometry → shading cues in input images
• Estimate lighting given surface and albedo (intrinsic material properties)
• Estimate surface and albedo given the lighting: minimize difference between estimated shading and input luminance
albedo
lighting
surface normal
• Shading equation:
Shading
-
63
ApproachOverview
Spatially-VaryingLighting Estimation
64
• Encode incident lighting for a given surface point
• Smooth for Lambertian surfaces
Lighting EstimationSpherical Harmonics (SH)
65
• Encode incident lighting for a given surface point
• Smooth for Lambertian surfaces
• SH Basis functions Hm parametrized by unit normal n
Lighting EstimationSpherical Harmonics (SH)
66
• Encode incident lighting for a given surface point
• Smooth for Lambertian surfaces
• SH Basis functions Hm parametrized by unit normal n
• Good approx. using only 9 SH basis functions (2nd order)
Lighting EstimationSpherical Harmonics (SH)
67
• Encode incident lighting for a given surface point
• Smooth for Lambertian surfaces
• SH Basis functions Hm parametrized by unit normal n
• Good approx. using only 9 SH basis functions (2nd order)
• Estimate SH coefficients:
Lighting EstimationSpherical Harmonics (SH)
68
• Encode incident lighting for a given surface point
• Smooth for Lambertian surfaces
• SH Basis functions Hm parametrized by unit normal n
• Good approx. using only 9 SH basis functions (2nd order)
• Estimate SH coefficients:
Lighting EstimationSpherical Harmonics (SH)
• Shortcoming: purely directional → cannot represent scene lighting for all surface points simultaneously
69
Spatially-Varying LightingSubvolume Partitioning
70
Spatially-Varying Lighting
• Partition SDF volume into subvolumes
Subvolume Partitioning
71
Spatially-Varying Lighting
• Partition SDF volume into subvolumes
• Estimate independent SH coefficients for each subvolume
Subvolume Partitioning
72
Spatially-Varying Lighting
• Partition SDF volume into subvolumes
• Estimate independent SH coefficients for each subvolume
• Obtain per-voxel SH coefficients through tri-linear interpolation
Subvolume Partitioning
73
Spatially-Varying LightingJoint Optimization
74
Spatially-Varying Lighting
• Estimate SVSH coefficients for all subvolumes jointly:
Joint Optimization
75
Spatially-Varying Lighting
• Estimate SVSH coefficients for all subvolumes jointly:
Joint Optimization
Similarity between estimated shading and input luminance
Data term:
76
Spatially-Varying Lighting
• Estimate SVSH coefficients for all subvolumes jointly:
Joint Optimization
Similarity between estimated shading and input luminance
Data term:
Laplacian regularizer:
Smooth illumination changes
77
ApproachOverview
Joint Appearance and Geometry Optimization• surface• albedo• image formation model
78
Joint Optimization
• Joint optimization of geometry, albedo and image formation model (camera poses and camera intrinsics):
Shading-based SDF optimization
with
79
Joint Optimization
• Joint optimization of geometry, albedo and image formation model (camera poses and camera intrinsics):
Shading-based SDF optimization
Gradient-based shading constraint (data term)
with
80
Joint Optimization
• Joint optimization of geometry, albedo and image formation model (camera poses and camera intrinsics):
Shading-based SDF optimization
Gradient-based shading constraint (data term)Volumetric regularizer: smoothness in distance values (Laplacian)
with
81
Joint Optimization
• Joint optimization of geometry, albedo and image formation model (camera poses and camera intrinsics):
Shading-based SDF optimization
Gradient-based shading constraint (data term)Volumetric regularizer: smoothness in distance values (Laplacian)Surface Stabilization constraint: stay close to initial distance values
with
82
Joint Optimization
• Joint optimization of geometry, albedo and image formation model (camera poses and camera intrinsics):
Shading-based SDF optimization
Gradient-based shading constraint (data term)Volumetric regularizer: smoothness in distance values (Laplacian)Surface Stabilization constraint: stay close to initial distance valuesAlbedo regularizer: constrain albedo changes based on chromaticity (Laplacian)
with
83
Joint Optimization
• Idea: maximize consistency between estimated voxel shading and sampled intensities in input luminance images (gradient for robustness)
Shading Constraint (data term)
84
Joint Optimization
• Idea: maximize consistency between estimated voxel shading and sampled intensities in input luminance images (gradient for robustness)
Shading Constraint (data term)
Best views for voxel and respective view-dependent weights
85
Joint Optimization
• Idea: maximize consistency between estimated voxel shading and sampled intensities in input luminance images (gradient for robustness)
Shading Constraint (data term)
Best views for voxel and respective view-dependent weightsShading: allows for optimization of surface (through normal) and albedo
86
Joint Optimization
• Idea: maximize consistency between estimated voxel shading and sampled intensities in input luminance images (gradient for robustness)
Shading Constraint (data term)
Best views for voxel and respective view-dependent weightsShading: allows for optimization of surface (through normal) and albedoVoxel center transformed and projected into input view
87
Joint Optimization
• Idea: maximize consistency between estimated voxel shading and sampled intensities in input luminance images (gradient for robustness)
Shading Constraint (data term)
Best views for voxel and respective view-dependent weightsShading: allows for optimization of surface (through normal) and albedo
Sampling: allows for optimization of camera poses and camera intrinsicsVoxel center transformed and projected into input view
88
Recolorization
• Recompute voxel colors after optimization at each level
Optimal colors
89
Recolorization
• Recompute voxel colors after optimization at each level
• Sampling
• Sample from keyframes only
• Collect, weight and filter observations
Optimal colors
90
Recolorization
• Recompute voxel colors after optimization at each level
• Sampling
• Sample from keyframes only
• Collect, weight and filter observations
• Weighted average of observations:
Optimal colors
91
Overview
• Motivation & State-of-the-art
• Approach
• Results
• Conclusion
92
Ground Truth: GeometryFrog (synthetic)
Ours
Fusion Ground truth
93
Ground Truth: GeometryFrog (synthetic)
Ours
Zollhöfer et al. 15
Fusion Ground truth
94
Ground Truth: GeometryFrog (synthetic)
Ours
Zollhöfer et al. 15 Ours
Fusion Ground truth
95
Ground Truth: Quantitative ResultsFrog (synthetic) Zollhöfer et al. 15
• Generated synthetic RGB-D dataset (noise on depth and camera poses)
• Quantitative surface accuracy evaluation• Color coding: absolute distances (ground truth)
96
Ground Truth: Quantitative ResultsFrog (synthetic)
Ours
Zollhöfer et al. 15
• Generated synthetic RGB-D dataset (noise on depth and camera poses)
• Quantitative surface accuracy evaluation• Color coding: absolute distances (ground truth)
97
Ground Truth: Quantitative ResultsFrog (synthetic)
Ours
Zollhöfer et al. 15
• Generated synthetic RGB-D dataset (noise on depth and camera poses)
• Quantitative surface accuracy evaluation• Color coding: absolute distances (ground truth)
Mean absolute deviation:• Ours: 0.222mm (std.dev. 0.269mm)• Zollhöfer et al: 0.278mm (std.dev. 0.299mm)
→ 20.14% more accurate
98
Qualitative ResultsRelief (geometry)
Input Color
Fusion
Ours
Zollhöfer et al. 15
Ours
99
Qualitative ResultsFountain (appearance)
Input Color Ours
Fusion
Zollhöfer et al. 15
Ours
100
Qualitative ResultsLion
Input ColorGeometry (ours) Appearance (ours)
Fusion Ours Fusion Ours
101
Qualitative ResultsTomb Statuary
Input Color Appearance (ours)
Fusion Ours
Geometry (ours)
Fusion Ours
102
Qualitative ResultsGate
Input Color
Fusion Ours
Appearance (ours)Geometry (ours)
Fusion Ours
103
Qualitative ResultsHieroglyphics
Input Color
Fusion Ours
Appearance (ours)Geometry (ours)
Fusion Ours
104
Qualitative ResultsBricks
Input Color Geometry (ours)
Appearance (ours)
Ours Fusion
105
Luminance
Shading: Global SH vs. SVSHFountain
106Albedo
Luminance
Shading: Global SH vs. SVSHFountain
107Albedo
Luminance
Shading: Global SH vs. SVSHFountain
Global SH
DifferenceShading
108Albedo
Luminance
Shading: Global SH vs. SVSHFountain
Global SH
SVSH
Difference
DifferenceShading
Shading
109
Overview
• Motivation & State-of-the-art
• Approach
• Results
• Conclusion
110
Conclusion
• High-Quality 3D Reconstruction of Geometry and Appearance• Temporal view sampling & filtering techniques
• Spatially-Varying Lighting estimation
• Joint optimization of surface & albedo (SDF) and image formation model
• Optimized texture as by-product
111
Conclusion
• High-Quality 3D Reconstruction of Geometry and Appearance• Temporal view sampling & filtering techniques
• Spatially-Varying Lighting estimation
• Joint optimization of surface & albedo (SDF) and image formation model
• Optimized texture as by-product
Thank you!
Questions?
Robert Maier
Technical University of MunichComputer Vision Group
[email protected]://vision.in.tum.de/members/maierr