MONOCULAR SLAM SUPPORTED OBJECT RECOGNITIONpeople.csail.mit.edu/spillai/projects/vslam-object... ·...
Transcript of MONOCULAR SLAM SUPPORTED OBJECT RECOGNITIONpeople.csail.mit.edu/spillai/projects/vslam-object... ·...
MONOCULAR SLAM SUPPORTED OBJECT RECOGNITION
John J. [email protected]
Sudeep [email protected]
Marine Robotics GroupCSAIL, MIT
• Object recognition for a robot is still a challenge
Cap
Co eeMug
SodaCan
Bowl
Bowl
Cap
Co eeMug
SodaCan
Bowl
Bowl
Reconstruction with Object Labels
Multi-view Object DetectionObjects easily tease apart to
enable better proposals(Proposals from Semi-Dense Maps)
RobustReduced false positives via
view correspondence from SLAM
(Multi-view prediction)
Scalable ~1.6-1.7 s for 5-50 object categories(FLAIR encoding)
Single RGB CameraMonocular SLAM supports
improved recognition(Semi-Dense Mapping Backend)
UW-RGBD Dataset (v2)Lai et al. (ICRA 2014)
Input RGB Video Scale-ambiguous Reconstruction Filtered ReconstructionInput RGB Video
(a) (b)
ViewView
View
Density-based Spatial Clustering
(c) (d)
Multi-view object proposals
ORB-SLAMMur-Artal et al.
Semi-Dense Depth FilteringSVO: Forster et al. (ICRA 2014)
+
Scale-ambiguous ReconstructionScale-ambiguous Reconstruction Filtered ReconstructionInput RGB Video
(a) (b)
ViewView
View
Density-based Spatial Clustering
(c) (d)
Multi-view object proposalsLow-density region pruning
Filtered Reconstruction
Candidate proposals projection
View-Consistent Proposals
Scale-ambiguous Reconstruction Filtered ReconstructionInput RGB Video
(a) (b)
ViewView
View
Density-based Spatial Clustering
(c) (d)
Multi-view object proposals
Scale-ambiguous Reconstruction Filtered ReconstructionInput RGB Video
(a) (b)
ViewView
View
Density-based Spatial Clustering
(c) (d)
Multi-view object proposals
Fusion of object proposal classification
Unified Object Prediction
Scale-ambiguous Reconstruction Filtered ReconstructionInput RGB Video
(a) (b)
ViewView
View
Density-based Spatial Clustering
(c) (d)
Multi-view object proposals
(#) - Number of categories identifiable
Runtime (s)
>4 s5 s
1.8 s1.7 s1.6 s
Ours Ours DetOnlyDetOnly HMP2D+3D
(5) (51) (5)
(51)(9)
Low
er is
bet
ter
IoU - Intersection Over Union (#) - Number of object proposals
MULTI-VIEW OBJECT PROPOSALS
• Consistent object hypothesis supported by newer semi-dense map representations
•Achieve similar Recall-vs-IoU with fewer object proposals
RUNTIME SCALABILITY
• Category-agnostic VLAD + FLAIR encoding• Run-time nearly independent of object
categories identifiable
SLAM-SUPPORTED VS. FRAME-BASED RECOGNITION
• VLAD + FLAIR encoding: Run-time nearly independent of object categories identifiable
For more details and videos, please visit our websitehttp://people.csail.mit.edu/spillai/projects/vslam-object-recognition
mAP
61.781.5
Ours DetOnly
High
er is
bet
ter
(RGB) (RGB)
• RGB: Superior to frame-based methods
High
er is
bet
ter
SLAM-SUPPORTED
mAP
90.991.089.8
Ours Det3DMRF HMP2D+3D(RGB-D)(RGB-D)(RGB)
• RGB-D: Comparable to state-of-the-art SLAM-based recognition methods
RESULTS
METHODOLOGYMulti-scale Over-segmentationScale-ambiguous Reconstruction Filtered ReconstructionInput RGB Video
(a) (b)
ViewView
View
Density-based Spatial Clustering
(c) (d)
Multi-view object proposals
Density-based segmentation over 4 spatial scales
CONTRIBUTIONS
Objectness
Selective Search
Randomized Prim
BING
Object Proposal Methods
State-of-the-art object detection schemes utilize object proposal
methods to localize objects
SHIFT IN OBJECT DETECTION
Fisher and VLAD with FLAIRvan de Sande et al. (CVPR 2014) Efficient box encoding provide
object localization capabilities to existing BoVW methods
EFFICIENT FEATURE ENCODING
1
LSD-SLAM: Large-Scale Direct Monocular SLAM
Jakob Engel, Thomas Schöps, Daniel CremersTechnical University Munich
Monocular Video Camera Motion and Scene Geometry
Computer Vision GroupTechnical University of Munich
Jakob Engel
LSD-SLAM: Large-Scale Direct Monocular SLAMEngel, Schöps, Cremers
LSD-SLAMEngel et al. (ECCV 2014)
SLAM++Salas-Moreno et al. (CVPR 2013)
Detection-based Object Labeling in 3D scenes Lai et al. (ICRA 2012)
• Semi-dense reconstructions could potentially propose objects• Object Detection / Recognition can be better informed with SLAM
NEWER SLAM CAPABILITIES
PRIOR WORKMOTIVATION
• Provide a fast and scalable solution for training and running object detectors
• Leverage inherent SLAM capabilities to better support the object detection and recognition task
SLAM
ObjectRecognition
(Classification)
ObjectDetection(Proposals)
Robust View Correspondence
Proposals from Semi-dense
maps
Efficient and ScalableClassification