"The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

20
Copyright © 2015 CEVA 1 Moshe Shahar 12 May 2015 The Evolution of Object Recognition in Embedded Systems

Transcript of "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Page 1: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 1

Moshe Shahar

12 May 2015

The Evolution of Object Recognition in

Embedded Systems

Page 2: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 2

Timeline of CEVA’s Feature Extraction and

Classification Algorithms

Cycle

count scale

2012 2013 Time 2014 2015

SIFT

2016

HOG

CNN

HARRIS

3D Objects

(KD-Tree)

SURF

GFTT

ORB

CEVA-MM3101 CEVA-XM4

LBP

Page 3: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 3

• Find peak responses over scale

in Laplacian pyramid

• Find response with sub pixel

accuracy

• Only keep “corner like”

responses

• Assign orientation

• Create recognition signature

• (Patent protected)

SIFT—Scale Invariant Feature Transform

Performance estimations were

30-50MCycles for VGA frame

Scale

(Next

Octave)

Scale

(First

Octave)

Gaussian Difference of

Gaussian

Page 4: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 4

• SIFT is very accurate

but very complex to

implement on a

programmable platform

• SURF includes fast

implementation and fast

descriptor matching

• Invariant to various

image transforms like

rotation and illumination

changes

• Still widely used

• Partial patent protection

SURF—Speeded Up Robust Features

Image

Integral Image

Response Map Calculatoin

Maxima 3x3x3 + interpolate

Detected Interest Points

Descriptor DB

Sorted Interest Points

DetectorDetector

Sort according to levels and

regions

Integral sum

Build 64 sum Descriptor

SortSort DescriptorDescriptor

Page 5: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 5

HOG—Histogram of Gradients

Input image

Scaled image

Scale 1

Scaled image

Scale 9

Gamma

Normalization

Gradient

Calculation

Descriptor

Calculation

Bilinear

Scaling

HOG algorithm is based on Dalal & Triggs paper

(2005)

Common use is object detection, especially

pedestrian detection

Reference Code—OpenCV 2.4.3

Page 6: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 6

1. Load 2 vectors in single cycle

2. Perform multiple operations in single

cycle

3. Store a transposed rectangle of 4x4

pixels in single cycle

4. Perform the load and filter again

5. Store 4x4 transposed to memory in

single cycle

HOG—Bilinear Scaling

Memory

vAvBvCvD

Vector Registers

Memory

vAvBvCvD

Vector Registers

Memory

vAvBvCvD

Vector Registers

filter

transpose

Page 7: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 7

• Implemented using ‘Look Up Table’ (LUT)—N way parallel access to

local memory in one cycle

• Parallel load mechanism—Load N gamma values in a cycle

HOG—Gamma Normalization

Memory Map

Page 8: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 8

ORB—Oriented FAST and Rotated BRIEF

• An efficient alternative to SIFT (and patent free)

• Pyramid is used for scale-invariance

• Features are detected using FAST9, Harris and non-max-suppress

• Descriptors are based on BRIEF with normalized orientation

ORB—Feature Extraction

Input

Image Fast9 Harris

Non-Max-

Suppress

Oriented

BRIEF

Descriptors

list

Pyramid

Page 9: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 9

ORB—FAST9 Implementation

Continuous arc of 9 or more pixels:

All much brighter then (p+Th)

or

All much darker then (p-Th)

Page 10: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 10

ORB—FAST9 Implementation

• Early exit is used to detect potential positions

• Long memory access of 32 bytes using

• quickly load consecutive pixels

• Vector compare is used to compare the center of the corner to

the borders

• Building a binary (bit) map with positions that need to be calculated

• Calculation of N positions in parallel

• Using different two dimensional loads

• Vector predicates are used selectively calculate only the locations

that pass the threshold

• Use N way parallel lookup

Page 11: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 11

• More MACs per cycle

• Complex multi-operation instructions

• Wider bandwidth to local memory

(while conserving power)

• More orthogonal memory accesses

• Supporting ISA to allow conditional

execution per vector element

• More operations per loaded bit

• Higher performance per mW

• Improve local data reuse

Vision Processor Evolution

Computing power

“Multi Scalar”

Improve

efficiency

Page 12: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 12

• Ability to processes 2D data efficiently is critical for many algorithms

and especially for convolutions

• Idea is to take advantage of the pixel overlap in image processing by

reusing same data to produce multiple outputs

• Significantly increases processing capability

• Saves external memory bandwidth and frees system buses for other tasks

• Reduces power consumption

2-Dimension Data Processing Capability

Input Image

Reuse

For 16MAC with of 512-bit bandwidth, only 176-bit actually loaded

Page 13: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 13

• Convolutional neural network (CNN)

• A deep learning neural network algorithm

• Used for classification, localization and detection of objects in images

• CNN value

1. Best recognition quality

2. Re-trainable to any object without code change

• CNN combines 2D convolutions, 2D max and 1D MAC operations

• Good match for vector DSPs

Vector Accelerated Deep Neural Network

Input

Image

NxM

Convolution

NxM

Convolution

Subsampling Subsampling Fully

Connected

Page 14: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 14

• The algorithm enables to utilize the overlapping convolutions to get

efficient processing

• Executing one or several filters in parallel on the same input—ideal

for using a 2-dimention data processing capability

CNN—2D Convolution Layer

Convolution layer

Page 15: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 15

• Subsampling stage: max filter operation used to find strongest response

on MxN patch from previous layer reducing the scaling in each axis

• Example processor capability: Calculate MxN max filter using vector

max on 3-input vectors, 16 elements each, 16-bit per element

CNN—Pooling Layer

Pooling layer

m

n

Page 16: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 16

• Includes many multiplications with different weights accumulated to

single result

• Requires high accumulation precision and large amount of MAC

operations

• Ideal for vector processor

CNN—Fully Connected Layer

m

n

Fully Connected

Page 17: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 17

• CNN—The new king of the block

• Will dominant object recognition once real-time is possible

• Allows a lot of algorithmic freedom within the implementation

• Ideal for programmable or accelerator+processor solutions

• 3D becoming more widely used

• 3D object detection, classification and recognition will evolve

rapidly

• No clear winner, each vendor has its own flow

• Dominant database seems to be KD-Tree, very serial in nature

• Rapid developments, new innovations coming soon…

What Do We See as The Next Trends?

Page 18: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 18

Timeline of CEVA’s Feature Extraction and

Classification Algorithms C

ycle

count scale

2012 2013 Time 2014 2015 2016

HOG

CNN

SURF

ORB

Parallel

access

Data

reuse Fast

Filters

64 bit

Parallel

access

Fast

Filters

Parallel

access

Fast

Filters+ Parallel

access+

Page 19: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 19

• 4th-generation imaging and vision processor IP

• Vector-type processor; combines fixed- and floating-point math; up

to 4096-bit processing per cycle

• Platform includes vision processor, libraries, tools and applications

Introducing CEVA-XM4™

Page 20: "The Evolution of Object Recognition in Embedded Systems," a Presentation from CEVA

Copyright © 2015 CEVA 20

Come see us at our booth

for real time demos ….