"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Presentation from CEVA
-
Upload
embedded-vision-alliance -
Category
Technology
-
view
326 -
download
0
Transcript of "Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Presentation from CEVA
Copyright © 2016 CEVA 1
Yair Siegel
May 3, 2016
Fast Deployment of Low-power Deep
Learning on CEVA Vision Processors
Copyright © 2016 CEVA 2
CEVA — The leading licensor of ultra-low-power
signal processing IPs for embedded devices
Imaging &
Vision
Audio, Voice,
Sensing Connectivity Communication
>7 Billion CEVA-powered devices shipped world-wide
Copyright © 2016 CEVA 3
• CEVA Deep Neural Network (CDNN) Software Framework
• Accelerates machine learning deployment for embedded systems
• Utilizes CEVA-XM4 imaging & vision DSP
• Targeted at object recognition and vision analytics
• Automatic conversion from offline neural networks to real-time networks
Scope
* Vs. GPU-based systems
** Vs. typical implementation
30x
Lower Power*
15x
Lower Memory
Bandwidth**
30%
Faster
Processing*
Copyright © 2016 CEVA 4
Presentation Outline
1.
Backgrounder
2.
CEVA Deep
Neural
Networks
Introduction
3.
Neural
Networks
Development
Flow
4.
AlexNet
Example
5.
Summary
Copyright © 2016 CEVA 5
• Image signal
processor (ISP)*
• Image registration
• Depth map generation
• Point cloud processing
• 3D scanning
• 3D content creation
CEVA in the Vision Space
3D vision
Computational
photography
Visual
perception
Enabling Intelligent Vision Processing
Left Image
Right Image
Depth Data Images, Data
Encode*
* These are most appropriately implemented by external HW accelerators
• Refocus image
• Video stabilization
• Low-light image enhance
• Zoom
• Super-resolution
• Background removal
• HDR
• Deep learning (CNN, DNN)
• Object detection,
recognition & tracking
• Augmented reality (AR)
• Natural user interface
(NUI)
• Context aware algorithms
• Biometric authentication
Copyright © 2016 CEVA 6
• 4th-generation imaging and vision processor IP
• Brings embedded systems closer to human vision and
visual perception
• Vector-type processor; combines fixed- and floating-point
math; up to 4096-bit processing per cycle
• Includes vision processor, libraries, tools and applications
(CEVA, SW partners, service experts)
• Mature: 10+ design wins, Silicon available in Q2/2016
• CNN-based algorithms combined w/traditional algorithms
CEVA-XM4™ Imaging & Vision DSP
Copyright © 2016 CEVA 7
• Human brain based on neural networks, used for any cognitive
processing: visual, audio, other senses
• Networks develop over time, data collected & analyzed
• “Training” phase – Learning new types from examples
• “The hunt” to mimic human perception in computers
• Horsepower, efficient engine, algorithmic quality — limiters
• Big progress here recently
Neural Networks Basics
Output Layer Input Layer Hidden
Layers
Connections,
Weights Neurons
"...a computing system made up of a number of simple, highly interconnected processing
elements, which process information by their dynamic state response to external inputs.”*
*"Neural Network Primer: Part I" by Maureen Caudill, AI Expert, Feb. 1989
High-time for neural networks in embedded systems
Copyright © 2016 CEVA 8
• Deep Learning
• Family of neural network methods, high number of
layers (hence deep)
• Convolutional Neural Networks (CNN)
• Most popular deep learning neural network method
• Benefits
1. Best recognition quality (vs. alternatives)
2. Re-trainable without code changes (implement once,
use many times)
• Caffe — deep learning framework
• Popular open source software framework, used to
build, train, activate neural networks.
• Targets expression, speed, and modularity
Deep Learning Neural Networks
caffe.berkeleyvision.org
Object Recognition Driver Assistance
(ADAS)
Vision Analytics
Artificial Intelligence (AI)
Augmented Reality / Virtual Reality
Copyright © 2016 CEVA 9
• Computation intensive
• 1Meg-Ops/layer — typical
• Training in floating point — limited perf in embedded
• High memory bandwidth
• Between layers, fetching weights for layers
• Example: AlexNet — 12MB in layers, 243MB weights in FP
• Multi-ROI processing using same network
• Evolving, TTM
• Ability to modify network, change characteristics, quickly
Neural Network Embedded Challenges
All above in a cost and energy efficient form factor —
must-have for mass market adoption
Copyright © 2016 CEVA 11
CEVA Network Generator
(offline)
CEVA Deep Neural Network (CDNN) Features
Real-time Neural Network
Libraries
CDNN deliverables include real-time example models for image
classification, localization, object detection
• Auto converts for power-efficiency
• Floating to fixed point conversion
• Adapts for embedded constraints
• Keeps high accuracy, 1% deviation
• RT algo development and deployment
• Optimized for CEVA-XM4 vision DSP
• Any network portion/layer
• Fixed or variable input sizes
• On-the-fly bandwidth optimizations
Copyright © 2016 CEVA 13
• Example application steps to run on device using CDNN
a. Create CDNN CEVA handle
• CDNNCreate()
b. Create network model (based on CDNN conversion tool outputs)
• CDNNCreateNetwork()
c. Initialize CDNN library (create a network and a memory database)
• CDNNInitialize()
d. Execute the network (no need for re-initialization)
• CDNNNetworkClassify()
Simplified Developer Flow via CDNN
Copyright © 2016 CEVA 14
Neural Networks on CEVA-XM4
m
n
Reducing Bandwidth Programmability & Time-To-Market
Performance Optimization
• Compress via prior knowledge
• Reduce network redundancies
• E.g., AlexNet fully connected —>6MB
• Data reused on entry point
• Flexible solution supporting any
network
• Quick turn-around time via port
automation
• Maximize MAC utilization
• Combine small maps
• Use fixed-point for higher performance
• Utilize dedicated instructions
• Parallel scatter-gather for activation layer
Copyright © 2016 CEVA 15
• Example based on Caffe open source implementation for CNN
Example CNN — AlexNet
Classification Probabilities
Object
AlexNet PC
Probability
(floating point)
AlexNet on XM4
Probability
(fixed point)
Labrador retriever 90.44% 91.01%
Golden retriever 4.45% 3.98%
Beagle 0.21% 0.18%
Kuvasz 0.12% 0.10%
| | <1%
Copyright © 2016 CEVA 16
CEVA-XM4 CDNN Development Platform
PCIe
XM4 FPGA i.MX6
Host running Linux
applications
Copyright © 2016 CEVA 17
iMX6 (Host)
• Live AlexNet object recognition — come visit our booth!
• Enables milli-watt products vs. watts on GPU
CEVA-XM4 CDNN Demo
Webcam
FHD
Shared
Memory
DMA
DDR
JBOX
PC
Debugger
USB
Daisy
CDNN
Engine
CEVA
Link
CEVA Host
Link
HDMI
XM4 FPGA
Input
Images
Data TCM
Code TCM
Code Cache PCIe
FHD to 224x224
Conversion
Copyright © 2016 CEVA 18
• SW framework for real-time, efficient object recognition & vision analytics
• Accelerates deep learning application deployment
• Harnessing CEVA-XM4 imaging & vision DSP
• Lowest power & memory bandwidth solution
• Enables real-time classification with pre-trained networks
1. Receives network model & weights as input (via “Caffe”)
2. Automatically converts to real-time network, via CEVA Network Generator
3. Utilizes real-time network models in CNN applications on CEVA-XM4
CEVA Deep Neural Network (CDNN) Summary
Copyright © 2016 CEVA 20
CEVA — The leading licensor of ultra-low-power
signal processing IP’s for embedded devices
More than 300 licensees to date
>7 Billion CEVA-powered devices
shipped worldwide to date
100 licensees of Wi-Fi & Bluetooth
IP — and more than 1 billion
chips shipped
3X the market share in DSP over
any other DSP IP vendor
1 in 3 handsets worldwide are
powered by CEVA DSP
5 billion DSP cores in audio/voice
devices shipped to date
>20 licensees for imaging and
vision — shipping for first time
in 2016
Copyright © 2016 CEVA 21
• Face Detection & Recognition
• Universal Object Recognition
• Pedestrian Detection
• ADAS Algorithms (FCW, LDW)
• 3D Depth Map Creation
CEVA-XM4 Imaging & Vision IP Platform
CPU-DSP Link – Communication Layer
• Digital Video Stabilizer (DVS)
• Super-Resolution (SR)
Hardware
Layer
Software
Layer
App Dev.
Kit (ADK)
Host CV / OpenVX API
SW
Toolset
Hardware Development
Kit
Partner Software Products
CEVA-XM4 DSP Core
Auto system handle
CEVA Software Products
CEVA-CV Libraries
CEVA CNN Framework (CDNN) Android Framework (AMF) Provides OEM
differentiation CPU
offload
Source code
provided
RTOS