Leveraging PowerVR GPU Compute for Automotive...

www.imgtec.com

Bryce Johnstone & Paul Brasnett

9 Nov 2016

Leveraging PowerVR GPU Compute for Automotive Convolutional Neural Networks

© Imagination Technologies CNNs in Automotive Webinar Nov 2016 2

Agenda

About Imagination Technologies

ADAS/Autonomous Driving : why?

Why PowerVR GPUs for Vision Processing?

Implementing Convolutional Neural Networks (CNNs)

Performance Analysis

Conclusions

Leveraging GPU Compute for Automotive CNNs


Core IP for low power, high performance SoCs

Ultra-low power; class-leading efficiency; designed for IP-based SoCs

Our technologies address what really matters to help our customers create innovations for success

PowerVR Graphics & GPU Compute

Processors

Ensigma Communications

Processors

PowerVR Vision

Processors

MIPS Processors

Fabric

PowerVR Video

Processors


Enabling customers to fully leverage their own IP

Domain Solutions Customer

technologies & know-how

Customizable IP platforms

Scalable IP

AR / VR Networking IoT Consumer Automotive Mobile

Ecosystems software, tools, apps, middleware, hardware


Autonomous Driving and ADAS

Reduce road deaths/GDP costs WW. 1.2m in

2015

Increase road utilisation – 2x with 80%

Autonomous cars

Reduce congestion & parking time

US/EU already driving legislation change to

support the nascent market

Issues of liability, safety, security will have to

be resolved before wide adoption

Complex vision processing (deep learning/AI)

needs increasing rapidly

Platooning -> autotaxi/lift -> Semi -> fully

autonomous

ADAS is the backbone for Autonomous Driving

Today

ASSIST • Driver active

• Fail Safe

2020

AUTOMATE • Sensor Fusion

• Co-pilot

2030

AUTONOMOUS • 3D Maps

• Driverless

• Hands off / mind

off


ADAS:Levels of Processing From Sensor to actuator

Action

Low Level Processing

Intermediate Level Processing

High Level Processing

Control Logic

Pixel Processing • Hundreds of millions of pixels

per second • Similar processing per pixel

Object Processing • Thousands of objects per second • Similar processing per object

Object Recognition • Dozens of objects per second

Sensor Fusion Decision Making Application Control MIPS

Prop HWA

As complexity increases, specifically designed hardware acceleration allow for best performance and most power efficiency

GPU

Compute

Sensor Data

Visual

Actuator

ADAS -> Fully autonomous Orders of magnitude increase in processing


Automotive GPU requirements Wide Range of Possibilities

Single Screen

Low Resolution

Single Task

Digital/Mechanical

Single/Multi-Screen

High Resolution

Multi-Task (HMI/Entertainment)

Full Digital

Basic ADAS Functions

Multi-Screen/HUD

High Retina Resolution

Virtualised Multi-Tasking

Full Digital

Full ADAS Functionality

Entry Level Mid Range High End

PowerVR GPU Series 6XE/7XE

• Low res 2D/3D UIs • Small silicon area • low power & memory • High end functionality

PowerVR GPU Series 6/7XT

• 4-16 Cluster • Virtualization • Ray Tracing-photorealism • TFLOPs GPU Compute • FP16 & FP32

PowerVR GPU Series 7XE/8XE

• Entry Level 3D UI • Advanced 3D graphics • Simple GPU Compute • Basic ADAS • Secure multi-tasking


Evolution of Compute GPU APIs

OpenCL 1.2

OpenCV

OpenVX

Vulkan

OpenCL 2.0

Full Profile

New APIs :

OpenGL ES SC


Why PowerVR GPUs for Vision Processing?

CPUs can generate large amounts of heat

• CPUs can deliver high peak/burst

performance

• But generate large amounts of heat

• PowerVR GPUs provide

• Lowest power FP16/FP32 & int

pipelines

• Local memory for highly efficient data

access for compute operations

• Power-saving features such as gating

of non-compute parts of GPU for

efficient compute operation

CPU

GP

U


Why GPUs for Vision Processing?

Provence(raytracing)

Particle Simulation –

32k

Particle Simulation –

4k Julia Set

AmbientOcclusion

Denoise Gaussian Blur

CPU 100.00% 100% 100% 100% 100% 100% 100%

PowerVR Series6 265% 407% 517% 963% 1126% 482% 383%

0%

100%

200%

300%

400%

500%

600%

Perf

orm

ance

rel

ativ

e t

o C

PU


Why CNNs?

State-of-the-art performance

Rapid development cycles

Range of vision tasks

Classification

Detection

Segmentation

Recognition

Tracking

Feature detection

Feature description

Other tasks…

Camera Localisation

PoseNet: A Convolutional Network for Real-Time 6-DOF Camera

Relocalization, Kendall, A., Grimes, M., Cipolla, R., ICCV 2015


CNN uses in Autonomous Driving

Pedestrian/cyclist/motorcyclist

detection

Sign detection & classification

Road user detection

Driver monitoring

Vehicle occupancy classification

Drivable path analysis

Road scene understanding


What is a CNN?

Convolution Activation Normalization Pooling Fully Connected

CNN Architecture Basic Building Blocks

Soft Max


What is a CNN? Convolution layer

Input Image Convolution

coefficients

Output


What is a CNN?

Convolution Activation Normalization Pooling Fully Connected

Convolution Image Activation Pooling

Fully Connected

CNN Architecture Basic Building Blocks

CNN Example Network

Normalization

Soft Max

Convolution Activation Pooling

Convolution Activation Pooling Soft Max


CNN Object Classification

Training — Offline

Architecture

Data CNN Library Compute + Time Model Coefficients




Inference — Online

Architecture


Architecture

Model Coefficients




Inference — Online

Architecture


Architecture

Model Coefficients

Image

CNN Library Compute Classification

PowerVR GPU


Coefficients by layer type

Where is the Cost in CNN Inference? Number of operations and coefficients required by layer type for Alexnet

Operations by layer type

Convolutions

Pooling

Normalisation

Fully Connected





Convolutions

Pooling

Normalisation

Fully Connected


Convolutions - Matrix Multiply

Create as many work-items as is size of output matrix

Each work-item will read it’s row and column and produce dot product

Requires large number of accesses to memory

Naïve Implementation

x =

A B C


Convolutions - Matrix Multiply

Tiling Approach

0.1

1

10

100

1000

Tim

e (

s)

Matrix Size

Naïve

Tiled matrix multiply


Convolutions – Frequency Domain Example number of operations (Mflops) required to implement convolutions

Implementation

Filter Size

AlexNet/conv3

(3x3)

AlexNet/conv2

(5x5)

GoogleNet/conv1

(7x7)

Matrix Multiply 299 448 236

Frequency Domain 90 55 79

Convolution in time domain corresponds to multiplication in frequency

domain


Performance Analysis — GPU v CPU*

* CPU results based on Caffe (with ATLAS)

0.1

1

10

100

Convolutions Pooling Normalisation FullyConnected

Rela

tive F

PS

Perf

orm

an

ce

(Hig

her

is b

ett

er)

Alexnet

CPU (1.6GHz)

PowerVR 2 Cluster GPU(384MHz) - MatMul


Performance Analysis — GPU v CPU*

* CPU results based on Caffe (with ATLAS)

0.1

1

10

100

Convolutions Pooling Normalisation Fully Connected

Rela

tive F

PS

Perf

orm

an

ce

(Hig

her

is b

ett

er)

Alexnet

CPU (1.6GHz)

PowerVR 2 ClusterGPU (384MHz) -MatMul

PowerVR 2 ClusterGPU (384MHz) -FFT


Fully Connected Layers Low precision data types

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

float ushort uchar

Rela

tive F

PS

P

erf

orm

an

ce (

Hig

her

is

bett

er)

Fully connected weights data-type


Conclusions

CNNs are an integral part of Computer Vision applications for Semi and

Autonomous cars

Numerous applications can be addressed with CNNs

PowerVR GPUs offer

upto 12x higher performance deployment for CNNs (GPU Compute)

Convolution performance can be improved using frequency domain

Fully connected layer performance can be improved by using low precision

data types

PowerVR GPUs scale to allow for higher levels of performance & lower

power for current and future generations of vision enabled products

www.imgtec.com

Thank you

Confidential


Resources

PowerVR GPU Compute

https://imgtec.com/tools/powervr-gpu-compute/

Guide to writing OpenCL

http://blog.imgtec.com/powervr/a-quick-guide-to-writing-opencl-kernels-for-rogue

PowerVR Imaging Framework

http://blog.imgtec.com/powervr/powervr-imaging-framework-sdk

PowerVR CNN Demo

OpenCL Tutorial

https://handsonopencl.github.io/






























https://handsonopencl.github.io/

Leveraging PowerVR GPU Compute for Automotive...

Documents

Transcript of Leveraging PowerVR GPU Compute for Automotive...