UAV Navigation above Roads with CNN’s Layer 1 Layer...

1
UAV Navigation above Roads with CNN’s Thomas AYOUL, Toby BUCKLEY, Felix CREVIER Stanford University UAV navigation over roads without relying on a preloaded map o Train a deep segmentation neural net for road detection o Develop a tool to create/update a road graph from segmented image o Implement the feed-forward aircraft controller to follow edges of graph o Embed system onboard a drone and test in the field Main challenges o Obtain good per-class accuracy on landscapes different from training set o Simplify architectures to enable real-time inference onboard aircraft Objective Filters Best Net o Filters highlight color patterns and some forms of edges o At higher learning rates, filters are smooth or dead Loss Best Net Color vs edge detection o The layer 1 activation maps illustrate a gray/not gray filter (color detection) o The layer 4 heat map labels creases in the foliage as roads (edge detection) o It would seem edges require more abstraction, yet CNN classifiers usually have edge filters in the first layer. Possibly a training issue. Analysis o CNN’s with powerful Encoding-Decoding Kendall architectures were trained o Good performance was obtained the test set, but less so on images from other landscapes o To improve accuracy, techniques from Mnih, such as the TABN loss and structured prediction post-processing should be investigated Conclusions Simulation environment for software testing o Pygame – OpenGL environment o Run-time inference using best trained CNN o Realistic aircraft dynamics and navigation o Global map is over Yosemite National Park Simulation University of Toronto Massachusetts Roads & Buildings dataset o Data collected and annotated by Volodymyr Mnih o 1000 RGB satellite images split into 16 375x375 input images. o Networks trained on 4500/16 000 sub-images o Annotations are pixel-wise road/no-road booleans Dataset CNN based segmentation Encoder-Decoder o Architecture based on Alex Kendall’s Segnet o Implementation is done in a modified Caffe framework Our Best network o 4 Conv + Batch + ReLU layers and 4 Upsampling + UpConv layers o 7x7 Kernel, Stride 1, 8 Filters, 2x2 Max Pooling o 23,338 learned parameters o Class priors: Road 0.2 | No Road 0.8 o Performance: 72% Road accuracy Network Architecture Input RGB Road/No Road Sweep on Kernel Size and Number of Filters Observations o Road accuracy is a better performance metric as it’s prior is low o Medium sized kernels seem to perform best : larger scale features o More filters ≠ better accuracy: low geometric complexity Hyperparameter Tuning 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 8F 16F 32F Road accuracy – 4 Layer Nets 3x3 5x5 7x7 9x9 Layer 1 Layer 4 Layer 1 Layer 4 Layer 1 Layer 4 Layer 1 Layer 4 = 0.01 = 0.0005 o Loss is based on overall pixel accuracy o Loss exhibits strange oscillatory behavior, even at low learning rates o A loss based on road accuracy could lead to better results Original image Activation Map 1 Heat Map 4 Class inference 1. V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv:1511.00561v2 [cs.CV], 2015 2. Volodymyr Mnih. Machine Learning for Aerial Image Labeling. PhD thesis, University of Toronto, 2013. 3. F. Lei, et al. “Convolutional Neural Networks for Visual Recognition”. http://cs231n.github.io/ Simulation interface snapshot CNN UAV with GoPro CONTROL AIRCRAFT CAMERA Outline

Transcript of UAV Navigation above Roads with CNN’s Layer 1 Layer...

Page 1: UAV Navigation above Roads with CNN’s Layer 1 Layer 4cs231n.stanford.edu/reports/2017/posters/553.pdf · o Simplify architectures to enable real-time inference onboard aircraft

UAV Navigation above Roads with CNN’sThomas AYOUL, Toby BUCKLEY, Felix CREVIER

Stanford University

UAV navigation over roads without relying on a preloaded map

o Train a deep segmentation neural net for road detection

o Develop a tool to create/update a road graph from segmented image

o Implement the feed-forward aircraft controller to follow edges of graph

o Embed system onboard a drone and test in the field

Main challenges

o Obtain good per-class accuracy on landscapes different from training set

o Simplify architectures to enable real-time inference onboard aircraft

Objective

Filters – Best Net

o Filters highlight color patterns and some forms of edges

o At higher learning rates, filters are smooth or dead

Loss – Best Net

Color vs edge detection

o The layer 1 activation maps illustrate a gray/not gray filter (color detection)

o The layer 4 heat map labels creases in the foliage as roads (edge detection)

o It would seem edges require more abstraction, yet CNN classifiers usually

have edge filters in the first layer. Possibly a training issue.

Analysis

o CNN’s with powerful Encoding-Decoding Kendall architectures were trained

o Good performance was obtained the test set, but less so on images from other landscapes

o To improve accuracy, techniques from Mnih, such as the TABN loss and structured prediction post-processing should be investigated

Conclusions

Simulation environment for software testing

o Pygame – OpenGL environment

o Run-time inference using best trained CNN

o Realistic aircraft dynamics and navigation

o Global map is over Yosemite National Park

Simulation

University of Toronto Massachusetts Roads & Buildings dataset

o Data collected and annotated by Volodymyr Mnih

o 1000 RGB satellite images split into 16 375x375 input images.

o Networks trained on 4500/16 000 sub-images

o Annotations are pixel-wise road/no-road booleans

Dataset

CNN based segmentation Encoder-Decoder

o Architecture based on Alex Kendall’s Segnet

o Implementation is done in a modified Caffe framework

Our Best network

o 4 Conv + Batch + ReLU layers and 4 Upsampling + UpConv layers

o 7x7 Kernel, Stride 1, 8 Filters, 2x2 Max Pooling

o 23,338 learned parameters

o Class priors: Road 0.2 | No Road 0.8

o Performance: 72% Road accuracy

Network Architecture

Input RGB Road/No Road

Sweep on Kernel Size and Number of Filters

Observations

o Road accuracy is a better performance metric as it’s prior is low

o Medium sized kernels seem to perform best : larger scale features

o More filters ≠ better accuracy: low geometric complexity

Hyperparameter Tuning

00.10.20.30.40.50.60.70.8

8F 16F 32F

Road accuracy – 4 Layer Nets

3x3 5x5 7x7 9x9

Layer 1 Layer 4 Layer 1 Layer 4

Layer 1 Layer 4 Layer 1 Layer 4

𝜂 = 0.01 𝜂 = 0.0005

o Loss is based on overall pixel accuracy

o Loss exhibits strange oscillatory behavior, even at low learning rates

o A loss based on road accuracy could lead to better results

Original image Activation Map 1 Heat Map 4 Class inference

1. V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. arXiv:1511.00561v2 [cs.CV], 2015

2. Volodymyr Mnih. Machine Learning for Aerial Image Labeling. PhD thesis, University of Toronto, 2013.

3. F. Lei, et al. “Convolutional Neural Networks for Visual Recognition”. http://cs231n.github.io/

Simulation interface snapshot

CNN

UAV with GoPro

CONTROL

AIRCRAFT

CAMERA

Outline