Leveraging PowerVR GPU Compute for Automotive...
Transcript of Leveraging PowerVR GPU Compute for Automotive...
www.imgtec.com
Bryce Johnstone & Paul Brasnett
9 Nov 2016
Leveraging PowerVR GPU Compute for Automotive Convolutional Neural Networks
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 2
Agenda
About Imagination Technologies
ADAS/Autonomous Driving : why?
Why PowerVR GPUs for Vision Processing?
Implementing Convolutional Neural Networks (CNNs)
Performance Analysis
Conclusions
Leveraging GPU Compute for Automotive CNNs
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 3
Core IP for low power, high performance SoCs
Ultra-low power; class-leading efficiency; designed for IP-based SoCs
Our technologies address what really matters to help our customers create innovations for success
PowerVR Graphics & GPU Compute
Processors
Ensigma Communications
Processors
PowerVR Vision
Processors
MIPS Processors
Fabric
PowerVR Video
Processors
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 4
Enabling customers to fully leverage their own IP
Domain Solutions Customer
technologies & know-how
Customizable IP platforms
Scalable IP
AR / VR Networking IoT Consumer Automotive Mobile
Ecosystems software, tools, apps, middleware, hardware
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 5
Autonomous Driving and ADAS
Reduce road deaths/GDP costs WW. 1.2m in
2015
Increase road utilisation – 2x with 80%
Autonomous cars
Reduce congestion & parking time
US/EU already driving legislation change to
support the nascent market
Issues of liability, safety, security will have to
be resolved before wide adoption
Complex vision processing (deep learning/AI)
needs increasing rapidly
Platooning -> autotaxi/lift -> Semi -> fully
autonomous
ADAS is the backbone for Autonomous Driving
Today
ASSIST • Driver active
• Fail Safe
2020
AUTOMATE • Sensor Fusion
• Co-pilot
2030
AUTONOMOUS • 3D Maps
• Driverless
• Hands off / mind
off
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 6
ADAS:Levels of Processing From Sensor to actuator
Action
Low Level Processing
Intermediate Level Processing
High Level Processing
Control Logic
Pixel Processing • Hundreds of millions of pixels
per second • Similar processing per pixel
Object Processing • Thousands of objects per second • Similar processing per object
Object Recognition • Dozens of objects per second
Sensor Fusion Decision Making Application Control MIPS
Prop HWA
As complexity increases, specifically designed hardware acceleration allow for best performance and most power efficiency
GPU
Compute
Sensor Data
Visual
Actuator
ADAS -> Fully autonomous Orders of magnitude increase in processing
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 7
Automotive GPU requirements Wide Range of Possibilities
Single Screen
Low Resolution
Single Task
Digital/Mechanical
Single/Multi-Screen
High Resolution
Multi-Task (HMI/Entertainment)
Full Digital
Basic ADAS Functions
Multi-Screen/HUD
High Retina Resolution
Virtualised Multi-Tasking
Full Digital
Full ADAS Functionality
Entry Level Mid Range High End
PowerVR GPU Series 6XE/7XE
• Low res 2D/3D UIs • Small silicon area • low power & memory • High end functionality
PowerVR GPU Series 6/7XT
• 4-16 Cluster • Virtualization • Ray Tracing-photorealism • TFLOPs GPU Compute • FP16 & FP32
PowerVR GPU Series 7XE/8XE
• Entry Level 3D UI • Advanced 3D graphics • Simple GPU Compute • Basic ADAS • Secure multi-tasking
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 8
Evolution of Compute GPU APIs
OpenCL 1.2
OpenCV
OpenVX
Vulkan
OpenCL 2.0
Full Profile
New APIs :
OpenGL ES SC
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 9
Why PowerVR GPUs for Vision Processing?
CPUs can generate large amounts of heat
• CPUs can deliver high peak/burst
performance
• But generate large amounts of heat
• PowerVR GPUs provide
• Lowest power FP16/FP32 & int
pipelines
• Local memory for highly efficient data
access for compute operations
• Power-saving features such as gating
of non-compute parts of GPU for
efficient compute operation
CPU
GP
U
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 10
Why GPUs for Vision Processing?
Provence(raytracing)
Particle Simulation –
32k
Particle Simulation –
4k Julia Set
AmbientOcclusion
Denoise Gaussian Blur
CPU 100.00% 100% 100% 100% 100% 100% 100%
PowerVR Series6 265% 407% 517% 963% 1126% 482% 383%
0%
100%
200%
300%
400%
500%
600%
Perf
orm
ance
rel
ativ
e t
o C
PU
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 11
Why CNNs?
State-of-the-art performance
Rapid development cycles
Range of vision tasks
Classification
Detection
Segmentation
Recognition
Tracking
Feature detection
Feature description
Other tasks…
Camera Localisation
PoseNet: A Convolutional Network for Real-Time 6-DOF Camera
Relocalization, Kendall, A., Grimes, M., Cipolla, R., ICCV 2015
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 12
CNN uses in Autonomous Driving
Pedestrian/cyclist/motorcyclist
detection
Sign detection & classification
Road user detection
Driver monitoring
Vehicle occupancy classification
Drivable path analysis
Road scene understanding
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 13
What is a CNN?
Convolution Activation Normalization Pooling Fully Connected
CNN Architecture Basic Building Blocks
Soft Max
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 14
What is a CNN? Convolution layer
Input Image Convolution
coefficients
Output
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 15
What is a CNN?
Convolution Activation Normalization Pooling Fully Connected
Convolution Image Activation Pooling
Fully Connected
CNN Architecture Basic Building Blocks
CNN Example Network
Normalization
Soft Max
Convolution Activation Pooling
Convolution Activation Pooling Soft Max
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 16
CNN Object Classification
Training — Offline
Architecture
Data CNN Library Compute + Time Model Coefficients
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 17
CNN Object Classification
Training — Offline
Inference — Online
Architecture
Data CNN Library Compute + Time Model Coefficients
Architecture
Model Coefficients
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 18
CNN Object Classification
Training — Offline
Inference — Online
Architecture
Data CNN Library Compute + Time Model Coefficients
Architecture
Model Coefficients
Image
CNN Library Compute Classification
PowerVR GPU
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 19
Coefficients by layer type
Where is the Cost in CNN Inference? Number of operations and coefficients required by layer type for Alexnet
Operations by layer type
Convolutions
Pooling
Normalisation
Fully Connected
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 20
Coefficients by layer type
Where is the Cost in CNN Inference? Number of operations and coefficients required by layer type for Alexnet
Operations by layer type
Convolutions
Pooling
Normalisation
Fully Connected
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 21
Coefficients by layer type
Where is the Cost in CNN Inference? Number of operations and coefficients required by layer type for Alexnet
Operations by layer type
Convolutions
Pooling
Normalisation
Fully Connected
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 22
Convolutions - Matrix Multiply
Create as many work-items as is size of output matrix
Each work-item will read it’s row and column and produce dot product
Requires large number of accesses to memory
Naïve Implementation
x =
A B C
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 23
Convolutions - Matrix Multiply
Tiling Approach
0.1
1
10
100
1000
Tim
e (
s)
Matrix Size
Naïve
Tiled matrix multiply
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 24
Convolutions – Frequency Domain Example number of operations (Mflops) required to implement convolutions
Implementation
Filter Size
AlexNet/conv3
(3x3)
AlexNet/conv2
(5x5)
GoogleNet/conv1
(7x7)
Matrix Multiply 299 448 236
Frequency Domain 90 55 79
Convolution in time domain corresponds to multiplication in frequency
domain
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 25
Performance Analysis — GPU v CPU*
* CPU results based on Caffe (with ATLAS)
0.1
1
10
100
Convolutions Pooling Normalisation FullyConnected
Rela
tive F
PS
Perf
orm
an
ce
(Hig
her
is b
ett
er)
Alexnet
CPU (1.6GHz)
PowerVR 2 Cluster GPU(384MHz) - MatMul
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 26
Performance Analysis — GPU v CPU*
* CPU results based on Caffe (with ATLAS)
0.1
1
10
100
Convolutions Pooling Normalisation Fully Connected
Rela
tive F
PS
Perf
orm
an
ce
(Hig
her
is b
ett
er)
Alexnet
CPU (1.6GHz)
PowerVR 2 ClusterGPU (384MHz) -MatMul
PowerVR 2 ClusterGPU (384MHz) -FFT
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 27
Fully Connected Layers Low precision data types
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
float ushort uchar
Rela
tive F
PS
P
erf
orm
an
ce (
Hig
her
is
bett
er)
Fully connected weights data-type
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 28
Conclusions
CNNs are an integral part of Computer Vision applications for Semi and
Autonomous cars
Numerous applications can be addressed with CNNs
PowerVR GPUs offer
upto 12x higher performance deployment for CNNs (GPU Compute)
Convolution performance can be improved using frequency domain
Fully connected layer performance can be improved by using low precision
data types
PowerVR GPUs scale to allow for higher levels of performance & lower
power for current and future generations of vision enabled products
www.imgtec.com
Thank you
Confidential
© Imagination Technologies CNNs in Automotive Webinar Nov 2016 30
Resources
PowerVR GPU Compute
https://imgtec.com/tools/powervr-gpu-compute/
Guide to writing OpenCL
http://blog.imgtec.com/powervr/a-quick-guide-to-writing-opencl-kernels-for-rogue
PowerVR Imaging Framework
http://blog.imgtec.com/powervr/powervr-imaging-framework-sdk
PowerVR CNN Demo
OpenCL Tutorial
https://handsonopencl.github.io/