Deformable Convolutional Networks -...
Transcript of Deformable Convolutional Networks -...
![Page 1: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/1.jpg)
Jifeng DaiWith Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen Wei
Visual Computing Group
Microsoft Research Asia
(*interns at MSRA)
Deformable Convolutional Networks-- MSRA COCO Detection & Segmentation Challenge 2017 Entry
![Page 2: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/2.jpg)
Outline
• Deformable ConvNets idea
• Deformable ConvNets for COCO challenge
![Page 3: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/3.jpg)
Highlights
• Enabling effective modeling of spatial transformation in ConvNets
• No additional supervision for learning spatial transformation
• Significant accuracy improvements on sophisticated vision tasks
Code is available at https://github.com/msracver/Deformable-ConvNets
![Page 4: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/4.jpg)
Modeling Spatial Transformations
• A long standing problem in computer visionDeformation: Scale:
Viewpoint variation: Intra-class variation:
(Some examples are taken from Li Fei-fei’s course CS223B, 2009-2010)
![Page 5: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/5.jpg)
Traditional Approaches
• 1) To build training datasets with sufficient desired variations
• 2) To use transformation-invariant features and algorithms
• Drawbacks: geometric transformations are assumed fixed and known, hand-crafted design of invariant features and algorithms
Scale Invariant Feature Transform (SIFT) Deformable Part-based Model (DPM)
![Page 6: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/6.jpg)
Spatial Transformations in CNNs
• Regular CNNs are inherently limited to model large unknown transformations• The limitation originates from the fixed geometric structures of CNN modules
regular convolution regular RoI Pooling2 layers of regular convolution
![Page 7: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/7.jpg)
Spatial Transformer Networks
• Learning a global, parametric transformation on feature maps• Prefixed transformation family, infeasible for complex vision tasks
![Page 8: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/8.jpg)
Deformable Convolution
• Local, dense, non-parametric transformation• Learning to deform the sampling locations in the convolution/RoI Pooling modules
regular deformed scale & aspect ratio rotation
![Page 9: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/9.jpg)
Deformable Convolution
Regular convolution
Deformable convolution
where is generated by a sibling branch of regular convolution
![Page 10: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/10.jpg)
Deformable RoI Pooling
deformable RoI Pooling
Regular RoI pooling
Deformable RoI pooling
where is generated by a sibling fc branch
![Page 11: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/11.jpg)
Deformable ConvNets
• Same input & output as the plain versions• Regular convolution -> deformable convolution
• Regular RoI pooling -> deformable RoI pooling
• End-to-end trainable without additional supervision
![Page 12: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/12.jpg)
Sampling Locations of Deformable Convolution
(a) standard convolution (b) deformable convolution
![Page 13: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/13.jpg)
Part Offsets in Deformable RoI Pooling
![Page 14: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/14.jpg)
Deformable ConvNets for Object Detection
Fast(er) RCNN R-FCN FPN
RoI Pooling
Car Person
Car
Person
Position SensitiveRoI Pooling
Car Person
RoI Pooling
Conv5 Position SensitiveScore Map
Conv3
Conv4
Conv5 P5
P4
P3
• Regular object detectors
![Page 15: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/15.jpg)
Deformable ConvNets for Object Detection
Fast(er) RCNN R-FCN FPN
DeformableRoI Pooling
Car Person
Car
Person
DeformablePS RoI Pooling
Car Person
DeformableRoI Pooling
: Deformable Convolution / RoI Pooling
Conv5 Position SensitiveScore Map
Conv3
Conv4
Conv5 P5
P4
P3
• Deformable object detectors
![Page 16: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/16.jpg)
XCeption -> Aligned XCeption
14x14x728 feature maps
separable conv 728, 3x3, pad 1
separable conv 1024, 3x3, pad 1
max pool 3x3, stride2, pad 1
conv 1024 1x1 stride2, pad 0
separable conv 1536, 3x3, pad 1
separable conv 1536, 3x3, pad 1
7x7x2048 feature maps
separable conv 2048, 3x3, pad 1
entry flow
conv 32, 3x3, stride2, pad 1
conv 64, 3x3, pad 1
separable conv 128, 3x3, pad 1
separable conv 128, 3x3, pad 1
max pool 3x3, stride 2, pad 1
conv 128 1x1 stride2, pad 0
separable conv 728, 3x3, pad 1
separable conv 728, 3x3, pad 1
max pool 3x3, stride2, pad 1
14x14x728 feature maps
224x224x3 images
separable conv 256, 3x3, pad 1
separable conv 256, 3x3, pad 1conv 256 1x1
separable conv 256, 3x3, pad 1
separable conv 256, 3x3, pad 1
separable conv 256, 3x3, pad 1
max pool 3x3, stride2, pad 1
separable conv 728, 3x3, pad 1
separable conv 728, 3x3, pad 1
separable conv 728, 3x3, pad 1
conv 256 1x1, stride2, pad 0
conv 728 1x1
conv 728 1x1, stride2, pad 0
separable conv 728, 3x3, pad 1
separable conv 728, 3x3, pad 1
separable conv 728, 3x3, pad 1
14x14x728 feature maps
Repeat 16 times
14x14x728 feature maps
middle flow
exit flow
• Proper feature alignment in XCeption• Efficient: 9.5 GFLOPS on 224*224 img (ResNet-101, 7.6 GFLOPS)
• Accurate: mAP 2.8% better than ResNet-101 using FPN on COCO (det, test-dev)
![Page 17: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/17.jpg)
Object Detection on COCO (Test-dev)
• MSRA 2017 Entry• ~3% mAP improvements by Deformable ConvNets
• Best single model performance: 48.5%
37.4
40.7
41.6
45.2
40.5
43.3
44.2
45.3
46.9
47.9
48.5
50.7
30 35 40 45 50
FPN+OHEM (RESNET-101)
FPN+OHEM (ALIGNED XCEPTION)
+ MASK
+ SOFT NMS
+ MULTI-SCALE TESTING
+ ITERATIVE TESTING
+ HORIZONTAL FLIP
+ ENSEMBLE (6 MODELS)
mAP (%)
Deformable Regular
![Page 18: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/18.jpg)
Object Detection on COCO (Test-dev)
• Deformable ConvNets v.s. regular ConvNets• Noticeable improvements for varies baselines
• Marginal parameter & computation overhead
23.2
30.3
32.1
34.5
37.4
40.2
45.2
25.8
35
35.7
37.5
40.5
43.3
48.5
20 25 30 35 40 45 50
CLASS-AWARE RPN (RESNET-101)
FASTER R-CNN, 2FC (RESNET-101)
R-FCN (RESNET-101)
R-FCN (ALIGNED-INCEPTION-RESNET)
FPN+OHEM (RESNET-101)
FPN+OHEM (ALIGNED-XCEPTION)
FPN++ (ALIGNED-XCEPTION)
mAP (%)
Deformable Regular
![Page 19: Deformable Convolutional Networks - cocodataset.orgpresentations.cocodataset.org/COCO17-Detect-MSRA.pdfJifeng Dai With Haozhi Qi*, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng*, Yichen](https://reader036.fdocuments.in/reader036/viewer/2022081802/5ae673387f8b9aee078cf646/html5/thumbnails/19.jpg)
Conclusion
• Deformable ConvNets for dense spatial modeling• Simple, efficient, deep, and end-to-end
• No additional supervision
• Feasible and effective on sophisticated vision tasks for the first time
• Our team
Jifeng DaiHaozhi Qi* Yichen Wei
* interns at MSRA
Zheng Zhang Bowen Cheng*Bin Xiao Han Hu