Tutorial AI LID Part2 v1 - LETI Innovation Days · 2019-08-27 · 23/07/2019 1 Leti Innovation Days...

23/07/2019

1

| 1Leti Innovation Days | Tutorial on AI | 24th June 2019

• 13:00 Registration & Welcome coffee• 13.30 Part 1 (2h)

Know your AIA panorama on AI and Deep Learning

• 15:30 Coffee break (30’)• 16:00 Part 2 (1h30)

Run your AISoftware and Hardware platforms for Deep Learning

• 17:30 End

OUTLOOK


• Deep learning methodology• Problem definition• Data preparation and collection• Learn

• Popular software frameworks• Troubleshooting

• Debug

• Hardware platforms• Challenges• Various needs

• Training vs Inference• Cloud vs Edge

• Trends• The Bio-inspiration

• Edge AI activities @ LETI

• The Future of AI & conclusion

OUTLINE

23/07/2019

2


• Before talking about DL frameworks, a few generalit ies …

• Deep learning implies to change one’s mindset

SOFTWARE FRAMEWORKS

You have to embrace uncertainty

• Models are built from data• e.g. you do not know what features will be extracted from

an image to recognize a dog• You will not write a rigid set of instructions

• Code is usually much shorter

You have to experiment

• Your models will get good enough through trial and error• Maybe you will need to enrich your dataset, or change

the optimizer, the topology …


• Since you are an experimenter, you need to set up a scientific methodology

RECOMMENDED METHODOLOGY

Define the problem

Make assumptions

Collect data

Train the model

Verify the model

“I want to sort out bananas from other

fruits”

“I only need to focus on the yellow color”

Build a database containing only the

color of fruits

“Oops, it wrongly classifies lemons” Not good enough

“I need to add shape information”

Complement database

“Oops, it misses green bananas”

“I need to give full labeled images”

23/07/2019

3


• What application do you target?• Classification e.g. it is a dog, a cat � Supervised learning• Clustering e.g. cucumbers on one side, tomatoes on the other � Unsupervised learning• Regression e.g. predict the amount of pollen on a sunny day � Supervised learning

• What data do you need?• Supervised � you need labeled data• Unsupervised � you need correctly defined data for ensuring a correct clustering• Reinforced � you need a virtual world and an agent (or a real agent, which is even more

challenging)

FIRST STEP – DEFINE A DEEP LEARNING PROBLEM

Deep Learning problem definition

DL project completion




• Debug

• Hardware platforms



OUTLINE

23/07/2019

4


• Data must be collected and prepared• This is the foundation for trusted DL � this takes time

• > 50% of a ML project

• It is important to ensure that data is• Clean

• Remove outliers, biased data, exceptions• Consistent

• Format data consistently• Accurate

• Feature engineering might help• E.g. day of the week for analyzing sells trends

SECOND STEP – COLLECT AND PREPARE DATA


Data collection & preparation

Deep learning is all about transforming raw data into information



• Remove outliers• Data that lies at an abnormal distance from other values

• Sample correctly• Training data distribution must reflect the actual

environment• E.g. you build a database for a self-driving car

• Remove bias• Otherwise, your model will learn it

CLEAN DATAOutlier

23/07/2019

5


• Inconsistent data issue arises when you are aggrega ting data from different sources• Dates• Addresses• Prices

• $5.01, 5$ 1cent, 5.01• Images

• 32*32 pixels, 128*128 pixels

CONSISTENT DATA


• Raw inaccurate data is useless• Deep learning will not learn meaningful feature extractors

• You might need to do feature engineering• Example 1 – you want to analyze sell trends

• 2015/05/22, 2015/05/23 … might not be accurate enough• Add Monday, Tuesday …

• Example 2 – you want to classify seismic activity• Raw acceleration might not be accurate enough• Add FFT transformation

ACCURATE DATA

S. Zarnani, “Numerical parametric study of expanded polystyrene geofoam seismic buffers”, Canadian Geotechnical Journal, 2009

23/07/2019

6


• How much data is enough data?• For good generalization, you need 100s of thousands of

example• With less data, consider using non-ML methods first (e.g. PCA)

• Data augmentation can help in expanding the dataset• Basic operations: rescaling, flipping, normalization, affine,

filtering, DFT…• Advanced operations: elastic distortion, random slices/labels

extraction, morphological reconstructions…

SECOND STEP – COLLECT AND PREPARE DATA



© h

ttps:

//i.s

tack

.imgu

r.co

m/h

aBY

t.png


• Natural images

DATASETS SOURCES

MNIST CIFAR10/100

Caltech101/256

SVHN ImageNet

The de-facto standard

Coil100NORB LSUN

Pascal VOC MS COCO

Labelme Google’s Open Images

23/07/2019

7


• There is nowadays a large variety of labelled datas ets!

• Facial recognition• Faces: UMD Faces (367K images, >8K people), CASIA WebFace (453K images, >10K people), MS-

Celeb-1M (1M images of celebrities), IndianFaceDatabase, Multi-Pie, Labelled Faces in the Wild, FERET• Emotions:JACFEE (Japanese and Caucasian Facial Expressions of Emotion), Face-in-Action, MMI

Facial Expression Database• Recommendation

• Movies: Netflix Prize (100M ratings, >17K movies, ~500K people), Movielens• Music: Million song dataset (1M songs, 1M people), last.fm (>90K songs, ~2K people)• Reading: Book-Crossing dataset (>1M ratings, ~300K books, ~300K people)• Purchase: Amazon Co-Purchasing (>500K poducts)

• Speech• Health• Government data• Question answering …

DATASETS SOURCES


• Need a particular dataset?• Use a dataset search engine!

• For example, www.kaggle.com• > 17K datasets

DATASET SOURCES

23/07/2019

8


• Original dataset is decomposed into several subsets

HOW IS THE DATASET USED?

Training dataset Test dataset

Original dataset

Training dataset Test datasetValidation dataset

Deep learning model

Deep learning algorithm

Final validation

Validation

Training


• In supervised learning, it is the data that matters the most!

TAKE HOME MESSAGE

23/07/2019

9




• Debug




OUTLINE


• Use the learning strategy that• Fits your DL problem to be solved• Is consistent with data you have

THIRD STEP – LEARN



Learn

Learning strategy

DL problem Outcome Real-life example

SupervisedClassification Labeled data

Image: a dog, a cat…Sound: a car, a truck…LIDAR: a pedestrian, a tree

Regression Numerical values % of people liking a movie

Unsupervised ClusteringGroups of similar data

Faces clustering of your photos


23/07/2019

10


• There are many easy-to-use, open source Deep Learni ng frameworks• To simplify the implementation of large-scale deep learning models• In a short period of time

• A Deep Learning framework is• A software tool• With an interface• And a library of pre-built components

• Popular frameworks are• TensorFlow• Keras• PyTorch• Caffe• Deeplearning4j



• How to choose one among others?

• You should look after the following aspects dependi ng on who you are

• Other topics to consider• Open source


Beginner

• Good community support

• Available tutorials

• Pre-written examples

Professional

• Compact code, easy to understand and maintain

• Optimized for performance

Expert

• Distributed DL solution, for scalable production code

• Multiple language support

23/07/2019

11


By far, the most commonly used

• Open Source, with excellent community support• Originates from the Google Brain Team• Extensive documentation• Pre-written codes for CNN, RNN…

• Supports multiple languages for creating DL models• Python, C++, R

• Important APIs• Dataset API: streamlines the pre-processing, batching and consumption of data• Keras API: high-level API, eases model definition

TENSOR FLOW


The most flexible

• Rapidly developing DL framework• Originates from Facebook

• Tensor computation• Python-based

• Flexibility• Computational graphs can be built on the go• Even be changed during runtime• Useful when you do not know before-hand the amount of memory needed

PYTORCH

23/07/2019

12


The fastest on images

• A popular framework, initially geared towards image processing• Originates from UC Berkeley• Open-source

• Consequently, support for RNN is not as great

• Supports multiple languages• C, C++, Python• As well as a Matlab interface

• Primarily used for building and deploying deep lear ning models for mobile phone• And other computationally constrained platforms

CAFFE


Implemented in Java

• More speed and energy efficient than Python• Supports both CPUs and GPUs• Can process a huge amount of data without sacrificing speed

• Java library for deep learning• ND4J library, for Tensor computations

• Takes advantage of distributed frameworks for big d ata processing• Spark, Hadoop

• Primarily used by Java developers

DEEPLEARNING4J

23/07/2019

13


High-level API

• For those who do not want to dig deep into DL frame works• Enables fast experimentations• Without focusing on low-level library details

• Supports CNN and RNN topologies• VGG, InceptionV3, Mobilenet

• Runs on top of• TensorFlow (Keras API is natively integrated in TensorFlow 2.0)• Theano• CNTK

KERAS


DNN shrinking exploration and porting

• Embedding low-power DNN remains challenging• Must adapt and simplify DNN topologies

• Reduce layers complexity (number of operations)• Reduce precision (8 bit integer or less)

• N2D2 framework automates• DNN shrinking exploration and evaluation• Performances projection• And porting on embedded platforms

• Various hardware targets• Even spiking accelerators

N2D2

23/07/2019

14


• Educated advice if you are a complete beginner• Start with Keras• Then, move onto TensorFlow

COMPARISON OF THOSE DEEP LEARNING FRAMEWORKS

Framework LanguageCUDA

supportPre-trained

modelsOpen

sourceWhy choose it?

TensorFlow Python, C++ Yes Yes YesMost used

Various languages

PyTorch Python, C Yes Yes YesMost flexible

Dynamic graphs

Caffe C++ Yes Yes YesImage processing

Constrained platforms

Deeplearning4j Java, C++ Yes Yes YesJava programming

Speed

Keras Python Yes Yes Yes High-level API

N2D2 C++ Yes Yes Yes Embedded target


• New initiative for interchangeable DL models• Introduced in 2017 by Microsoft and Facebook• Enables models to be trained in one framework and transferred to another for inference

• Supported frameworks

• Converters to/from

ONNX – OPEN NEURAL NETWORK EXCHANGE

23/07/2019

15


• You must choose a Deep Learning Framework depending on your needs

• Knowing that models are more and more interchangeab le

TAKE HOME MESSAGE




• Debug




OUTLINE

23/07/2019

16


• Visualize Deep Network models and metrics• Use pre-built packages like TensorBoard

TROUBLESHOOTING – WHAT TO LOOK FOR DURING LEARNING?


• Plot the Cost function to tune the learning rate


Iterations

CostLearning rate is too high

Learning rate is high

Learning rate is lowLearning rate is good

�� = �� − ��_�� ∗ ��

23/07/2019

17


• Plot the Accuracy to tune regularization

• A large gap between validation and training accurac ies• Signals the overfitting of the model • To reduce overfitting, increase regularizations


Training dataset

Validation dataset(Good generalization)

Validation dataset(Overfit)

Iterations

Accuracy

Regularization

• Dropout• L1 regularization• …


• Monitor Accuracy and Cross-entropy• Accuracy: Is my model performing well?

• Cross-entropy: How close my model is to ground-truth classes?


Accuracy Cross-entropy

Cat DogBird

0.610.33

0.06Cat 1Dog 0Bird 0

Cat DogBird

0.94

0.050.01

23/07/2019

18


• Monitor Weights and Biases• Weights: A Normal weight distribution is a good sign that the training is going well

• Biases: Large (positive or negative) biases is abnormal


Weights Biases

With a large bias, neuron input would not matter


• Monitor Pre-activations, Activations and Gradients• Pre-activations: Pre-activations must be normally distributed, otherwise apply a normalization

• Activation: Monitor zero activations (i.e. dead nodes)

• Gradients: Monitor layers gradients and track gradient evolution from output layers to input layers, to prevent vanishing or exploding gradient problems


Pre-activation Gradients

23/07/2019

19


• You also must monitor• Memory footprint• Latency• Applicative performances

• Example• The DeepManta solution for autonomous driving

• Needs to run real-time on the DrivePX2 platform


3.12 fasterSame level of recognition

Initial implementation 8 FPS

Final performance 25 FPS

NVIDIA DrivePX2


• The more you observe what is going on in your model , the better will be the training

• And the final application performances

TAKE HOME MESSAGE

23/07/2019

20


• Models will make strange mistakes that are difficul t to debug• Due to anything from skewed training data,• To unexpected interpretations of data during training

• Furthermore, production models• Will interact with other pieces of software• Or will be used in never before situations

FOURTH STEP – DEBUG

Google only solved this issue by removing the Gorilla category

altogether!



Learn Debug



• Use smaller filter sizes• 3x3 and 5x5 filters usually perform best

• Add layers• Deeper networks lead to more complex models

• Add skip connections to tackle the vanishing gradie nt problem• As in ResNet topology

IMPROVING A DEEP NETWORK

Intent is to overfit the model,since we can later on correct that with regularization

23/07/2019

21


• Learning rate decay schedule• Instead of a fixed learning rate, use a regularly decaying

learning rate• E.g. 0.95 decay rate for every 100,000 iterations

• Momentum• A high momentum will prevent weights from oscillating

• Early stopping• Prevent overfitting by stopping learning when validation

loss keeps increasing

ADVANCED TUNING

Iterations

Learning rate

Iterations

AccuracyTraining set

Test set

Early stopping


• Model ensembles are very common in some DL competit ions• Since it is effective in pushing the accuracy up a few percentage points

• Model ensembles• Combine Predictions From Multiple Models• Reducing the variance of predictions and generalization error

• This is a kind of model averaging• It works because different models will usually not make all the same errors on the test set

ENSEMBLE LEARNING

Law of Large NumbersThe average of the results obtained from a large number of trials should be close to

the expected value, and will tend to become closer as more trials are performed

23/07/2019

22


• It is common practice to start from an existing neu ral network• Given the difficulty and time needed to train a neural network from scratch

• What has been learned in one context• Is exploited to improve generalization in another setting

• Especially used for Image and Natural Language Proc essing

TRANSFER LEARNING


• Features to be transferred need to be general enoug h• For being suitable to the target tasks• E.g., for image classification, start from pre-trained CNN features on ImageNet

• Freeze some layers• CNN features are more generic in early layers • And more original-dataset-specific in later layers

• Benefits• Faster learning• Potentially better end result

TRANSFER LEARNING

Freeze Fine-tune Learn

23/07/2019

23


• There are many fine-tuning options and methods

• This is a field of research that is evolving every week, so you need to keep yourself up-to-date

TAKE HOME MESSAGE


• Deep learning methodology






OUTLINE

23/07/2019

24


• Challenges• Many parameters to be stored• Many operations to be performed

• This is not even considering training!

DNN TOPOLOGIES AND COMPUTATION DEMANDS

A. Canziani et Al., “An Analysis of Deep Neural Network Models for Practical Applications”, 2017

Benchmark of CNN models on ImageNet database (2 mil lion labeled objects)Tradeoff between accuracy, speed and complexitySize of bubbles is proportional to number of parameters

Best ones


IT PROJECTED TO CHALLENGE FUTURE ELECTRICITY SUPPLY

Exponential power consumption

Anders S.G. Andrae, “Total Consumer Power Consumption Forecast”, 2017

23/07/2019

25


EVOLUTION TOWARDS EMBEDDED INTELLIGENCE

2019 model2019 model2019 model2019 model

Simple sensor

Cloud computing

Learning

Inference

Row Data

Row

Dat

a

Com

man

d

Multi-sensor

Data fusion

Pre-processing

Pre-processedData

Cognitive Cyber-Physical Systems

Pre

-pro

cess

ed D

ata

Con

figur

atio

n

Learning

Inference

EdgeEdgeEdgeEdge AIAIAIAI

FederatedFederatedFederatedFederatedOpen PersonalOpen PersonalOpen PersonalOpen Personal

IntelligenceIntelligenceIntelligenceIntelligence

Knowledge

Multi-sensorBehavior sensor

Coo

pera

tion

EmbeddedIntelligence

Federated learning

Embedded Intelligence

LearningInference


SOLVING THE ENERGY CHALLENGE: COST OF MOVING DATA

x800 more!

Bill Dally, “To ExaScale and Beyond”, 2010

23/07/2019

26


• Focus on Low-Power and Low-Area Digital Computation s • Limit as much as possible the number of weights

• Weight pruning and quantization• Avoid FP representation and FP operations

• Integrate Memory cuts close to processing elements • Store most accessed parameters on-chip

• Ultimately do computations in the memory• In-memory computing paradigm• Possible with Non-Volatile memories

• Digital or analog manner

SOLUTIONS FOR INCREASING ENERGY EFFICIENCY


• GPUs and CPUs currently lead in market share

• But ASICs will capture the lead in 2022

• With opportunities for SOC accelerators

NEED FOR SPECIALIZED HARDWARE

23/07/2019

27


• Dedicated ASICs will fuel the Deep Learning applica tions growth

TAKE HOME MESSAGE








OUTLINE

23/07/2019

28


TRAINING VERSUS INFERENCE – DIFFERENT USAGES

Cloud/HPC Edge/Embedded

Trai

ning

Infe

renc

e


TRAINING VERSUS INFERENCE – DIFFERENT NEEDS


Trai

ning

Infe

renc

e

� High energy efficiency (100 GFLOPS/W)� Large memory bandwidth� Moderate precision (16b FP)� High configurability (any layer)

� High throughput (TFLOPS)� Large memory bandwidth� High precision (32b/64b FP)� High configurability (any layer)� Distributed

� High throughput (TFLOPS)� Low Latency� Energy efficiency� Distributed

� Very high energy efficiency (5-10 TOPS/W)� Short latency (batch size of 1)� Reduced throughput� Low cost (as low as 5$)� Above tradeoffs depend on the application

� ADAS, delivery drone, wearables

23/07/2019

29


TRAINING VERSUS INFERENCE – DIFFERENT HARDWARE TARGE TS


Infe

renc

e

� Mostly GPUs� Some FPGAs� Some ASICs

(TPUs)

Trai

ning � Small-scale GPUs

� CPUs� GPUs� FPGAs� ASICs (TPUs)

nVidia V100 nVidia Jetson Nano

Google Edge TPU

Intel Compute Stick

� Small-scale GPUs� ASICs� CPUs

Intel Xeon

Raspberry Pi








OUTLINE

23/07/2019

30


TRENDS IN CLOUD COMPUTING

Increased parallelism for higher throughput

FPGAs

• Especially useful for low batch size inference

More tensor cores

• nVidia V100• 640 cores, 120 TFLOPS

• Google TPUv2• 180 TFLOPS

nVidia V100

Google TPU

• NVIDIA V100 GPU and Google Cloud TPU• Are the benchmarks for commercial AI chips in the cloud

Increased storage requirements

High bandwidth memory

• nVidia V100• 900 GB/s bandwidth

• Google TPUv2• 600 GB/s bandwidth

Large capacity

• nVidia V100• 16 GB HBM2

• Google TPUv2• 16 GB HBM


• Example: Evolution of the TPU• It started with an inference-only accelerator

TRENDS IN CLOUD COMPUTING

“Different versions of TPU compared”, Teich 2018

23/07/2019

31


TRENDS IN EDGE COMPUTING

Increased computing efficiency

Weight quantization

• Reduced bit accuracy• Smaller memory footprint• Lighter operations

• Currently used for inference-only tasks

Variable bit precision

• Handling higher bit accuracy when needed

• For higher inference precision

Sparsity

• Clock gating MAC operators• When weight or intermediate result is 0

“Quantization in TensorFlow”, Google Cloud Blog, 2019

Can lead to 4X memory reduction


TRENDS IN EDGE COMPUTING

Increased computing efficiency

Weight quantization

• Reduced bit accuracy• Smaller memory footprint• Lighter operations

Increased storage efficiency

Near memory computing

• Avoid external memory accesses• 200X more energy consuming

• Weights• Embedded Non-Volatile Memory

• Intermediate results• SRAM or Embedded DRAM

• Currently used for inference-only tasks

In-Memory computing

• SRAM or Embedded NVM• Digital or analog

Variable bit precision

• Handling higher bit accuracy when needed

• For higher inference precision

Sparsity

• Skip MAC operations• When weight or intermediate result is 0

23/07/2019

32


• CPU• Quick prototyping that requires maximum flexibility• Small models with small effective batch sizes• Models that use many custom layers or operations written in C++• Models that are dominated by networking bandwidth of the Host system

• GPU• Medium-to-large models with larger effective batch sizes• Workloads that require high-precision arithmetic, e.g. double-precision

• TPU• Exclusively with TensorFlow, and without custom operations• Very large models, with very large effective batch sizes, that train for weeks

• FPGA• Low latency on low size batch is needed• Efficiently exploiting sparsity, e.g. DeepCompress

WHEN TO USE GPU/TPU/CPU/FPGA?


• Each hardware target has its own PROS/CONS

• Choose one (or several) depending on your needs

TAKE HOME MESSAGE

23/07/2019

33








OUTLINE


BIO-INSPIRED NEURAL NETWORKS

• Network• Set of neurons• Interconnected through synapses• 3D connected

• Neuron• Compute element

→ Integration of inputs• 1k – 10k inputs• 1 output only but with very high Fan-out

• Synapse• Memory element

→ Modulation of inputs• Define the function of the network

� Low frequency (1-10 Hz) usage but huge connectivity

Action potential = spike

23/07/2019

34


WHAT IS THE DIFFERENCE BETWEEN CLASSICAL CODING NN AND BIOLOGY?

• Classical coding is an Abstraction from biology• The spike train is converted into a value representing its mean frequency

• Neuron• MAC operation

• Multiplication-Accumulation• Non-linear activation function

• Sigmoid, ReLU …

• Synapse• Weight stored into DRAM

• Brain works a lot differently• Computation is analogous

• Neuron soma = synaptic current integrator• Communication is digital

• Spikes = unary events, very robust to noise• Compute and memory cells are co-located


• The promises of spike-coding NN:• Reduced computing complexity and natural temporal and spatial parallelism • Simple and efficient performance tunability capabilities• Spiking NN best exploit NVMs such as RRAM, for massively parallel synaptic memory

SPIKE CODING FOR DEEP NETWORKS

layer 1 layer 2 layer 3 layer 4

Pixelbrightness

Spiking frequencyV

t

fMIN

fMAX

Rate-based input

coding

Time

CorrectOutput

0

0,5

1

1,5

2

2,5

0

0,5

1

1,5

2

2,5

3

3,5

4

4,5

5

1 2 3 4 5 6 7 8 9 10

Spi

kes

/ con

nect

ion

Tes

t err

or r

ate

(%)

Decision threshold

16 kernels4x4 synapses

90 kernels5x5 synapses

Input: digit24x24 pixels

(cropped)

Conv. layer16 maps

11x11 neurons

Conv. layer

24 maps4x4

neurons

1) Standard CNN topology, offline learning 2) Lossless spike transcoding 3) Performance vs computing time tunability(approximated computing)

Formal neurons Spiking neuronsBase operation - Multiply-Accumulate (MAC) + Accumulate onlyActivation function - Non-linear function + Simple thresholdParallelism - Spatial multiplexing + Spatial and temporal multiplexing

23/07/2019

35


• The most well known• Kind of a benchmark

• Fully digital implementation

• Time-multiplexed

• Scalable architecture with 4,096 cores• 1M neurons, 256M synapses

• Each neurosynaptic core has• 256 neurons with 256 inputs• Implemented as a 256x256 binary crossbar

• Memory and computation are intertwined• No memory bottleneck• Energy efficient (70mW)

• Demonstrated• Audio and image classification• Hand gesture recognition with event-based camera

IBM TRUENORTH

A. Andreopoulos, “Visual saliency on networks of neurosynaptic cores”, IBM, 2015

“Deep learning inference possible in embedded systems thanks to TrueNorth”, IBM Blog, 2016


• The most versatile• Newer than TrueNorth

• Fully digital implementation

• Time-multiplexed

• Scalable architecture with 128 cores• 130K neurons, 130M synapses

• Online learning• Spikes are asynchronous• Different time-based learning rules

• Demonstrated• Keyword spotting, Natural Language Processing• Trough the Intel Neuromorphic Research Community

• 50 research groups

INTEL LOIHI

M. Davies, “Loihi: A Neuromorphic Manycore Processor with On-Chip Learning”, IEEE Micro, 2018

“Kapoho Bay USB stick”, Intel Newsroom, 2018

23/07/2019

36


UNIVERSITY OF ZURICH DYNAPSEL

• The most biomimetic• True analog neurons and synapses

• Mixed-signal implementation

• Scalable architecture with 5 cores• 4 non plastic: 256 neurons, 16K synapses• 1 plastic: 64 neurons, 8K plastic synapses

• Fully integrated SNN• Parallel processing of spikes

• Online learning• Spikes are asynchronous• STDP learning rule

• Demonstrated• Heartbeat anomaly detection

• ~ 15 nW for 60 beats per minute

G. Indiveri, “A mixed-signal multi-core spiking chip for models of cortical computation”, NeuRAM3 project , 2018

“Heartbeat anomaly detection”, NeuRAM3 project , 2018


IBM TrueNorth INTEL Loihi ZURICH DynapSEL

Technology 28nm CMOS 14nm CMOS 28nm FDSOI

Supply voltage 0.7 – 1.05 V 0.5 – 1.25 V 0.73V – 1 V

Design type Digital Digital Mixed-signal

Number of neurons 1000K 130K 1K

Neurons per core 256 max. 1K 256

Core area 0.094 mm² 0.4 mm² 0.36 mm²

Computation Time multiplexing Time multiplexing Parallel processing

Fan In/Out 256/256 16/4K 2K/8K

On-line learning No Programmable rules STDP

Synaptic operations/s/W 46 G 300 G

Energy per synaptic op. 26 pJ 23.6 pJ 2 pJ

BENCHMARK

• Lessons learnt• Parallel processing increases energy efficiency• Large fan In/Out is a must for enabling dense layers

23/07/2019

37


• Neuromorphic circuits are seen by some as the futur e of AI chips

• Although there is no Killer application yet

TAKE HOME MESSAGE








OUTLINE

23/07/2019

38


N2D2: DNN DESIGN ENVIRONMENT

• A unique platform for the design and exploration of DNN applications

Code Generation Code Execution

COTS

•Many-core CPUs

(MPPA, ASMP, ARM…)

•GPUs, FPGAs

HW ACCELERATORS

PNeuro, DNeuro

SW DNN libraries

•OpenCL, OpenMP,

CuDNN, CUDA,

TensorRT

•PNeuro, ASMP

HW DNN libraries

DNeuro, C/HLS

Data

conditioning

Learning &

Test

databases

CONSIDERED CRITERIA

•Accuracy (approximate computing…)

•Memory need

•Computational Complexity

Modeling Learning Test

Optimization

Trained DNN

Available on GitHub


N2D2: FAST AND ACCURATE DNN EXPLORATION

; Environment[env]SizeX=8SizeY=8ConfigSection=env.config

[env.config]ImageScale=0

; First layer (convolutionnal)[conv1]Input=envType=ConvKernelWidth=3KernelHeight=3NbChannels=32Stride=1

; Second layer (pooling)[pool1]Input=conv1

Type=PoolPoolWidth=2PoolHeight=2NbChannels=32Stride=2

; Third layer (fully connected)[fc1]Input=conv2Type=FcNbOutputs=100

; Output layer (fully connected)[fc2]Input=fc1Type=FcNbOutputs=10

Learning

Test

Output categories and localization

Rec

on. r

ate

Rec

on.

rate

�� Wide targets range, perfs and power metrics

Deep Network builder11 Learning a database22 Analysis of network Performance33

CPU, GPU and FPGA-based Real-time implementation44

OpenMP

OpenCL

CUDA

HLS FPGA

Parallel CPUGPU

FPGA

23/07/2019

39


• L-IOT platform• Ultra-low-power BUT Always-Responsive• Advanced Wake-up mechanisms

• On-Demand subsystem • 32-bit RISC-V processor• DNN accelerator

• 2 Clusters of 4 neurocores• Optimized MAC operators

• Always-responsive subsystem• Asynchronous Wake-Up controller• Wake-up radio

IOT PLATFORM WITH DNN ACCELERATOR


BRAIN VS. COMPUTER: X 10 6 POWER DISCREPANCY

• Biological system computations are• 3 to 6 order more energy efficient than current

dedicated silicon system

• Brain-inspired computing might just be the key!

• Human brain is• Massively parallel

• 86B neurons and 104 more synapses• Doing processing using memory elements• Event-driven, spike based induced activity

• No system-clock• Self-learning, self-organizing

• Embedded brain-inspired solutions needs• High density storage, close to neurons

• Computational storage• A time-code will be a must

• Scalability, re-configurability• Online learning to come

23/07/2019

40


N2D2 – BIO-INSPIRED MODELS EXPLORATION

Neurons activity

Network topology

Input stimuli

N2 D2simulator

N2-D2Neuromorphic

simulator

128

128CMOS Retina

16,384 spiking pixels

1st layer

2nd layer

Lateral

inhibition

Lateral

inhibition

……

Learning rule

- 1 0 0 - 5 0 0 5 0 1 0 0- 6 0

- 4 0

- 2 0

0

2 0

4 0

6 0

8 0

1 0 0

1 2 0

Con

duct

anc

e ch

ange

∆W

(%

)

∆ T = tp o s t

- tp r e

( m s )

E x p . d a ta [ B i& P o o ] L T P L T D L T P s im u la t io n L T D s im u la t io n

Neuron model

Example: Leaky Integrate & Fire (LIF) neuron

� = �. �� !"�#_��

$! "� % &

Synaptic model

0

20

40

0 20 40 60 80 100

Con

ductan

ce (nS

)

Pulse number

0

20

40

0 20 40 60 80 100

Con

ductan

ce (nS

)

Pulse number

Neuron membrane potential

0

200000

400000

600000

800000

1e+06

1.2e+06

30 35 40 45 50 55 60

Integ

ratio

n

Time (s)

32769

32770

32771

32772

32773

32774

32775

32776

32777

32778

32779

0 10 20 30 40 50 60 70 80 90

Node

#

Time (s)

Synaptic weights

TLTP

• Tool flow for bio-inspired synapses, neurons and le arning rules network simulations


MEMORY: A UNIQUE VALUE PROPOSITION

200/300 MMINTEGRATION

© G

uilly

/cea

© J

ayet

/cea

DEFINITION OF TECHNOLOGY SPECIFICATIONS

MODULE DEVELOPMENT

TEST & CHARACTERIZATION

DESIGN ENABLEMENT

MODELING,SIMULATION & NANO-CHARACTERIZATION

Large variety of materials available

HfAlxOySiOxTaOxZrOxAlOxVOx

GeSbTeGeAsSbTe

Large variety of Memories available

Conductive Bridge RAMOxide Resistive RAM

Ferro-electric RAMPhase – Change Memory

pSTT-Magnetic RAM

23/07/2019

41


• Via Collaborations • MAD Shuttle

RRAM BENCHMARK FOR TRADEOFF UNDERSTANDING

VLSI 2018IMW 2018EDL 2018IRPS 2018

Towards circuit implementation Towards circuit implementation


• SNN with• OxRAM synapses (1T-1R)• Analog neurons

• MNIST application• Proof of concept

• Topology• Fully connected• 10 neurons (10 output classes)• 1440 synapses (11,5 kOxRAMs)

• Technology• Bulk 130nm

SPIKING NEURAL NETWORK ACCELERATOR

Neurons

OxRAMs

23/07/2019

42


SPIKING NEURAL NETWORK ACCELERATOR – THE DEMONSTRATO R

• Live demonstration of handwritten digits classification• Wednesday afternoon• Thursday all day


• LETI has skills ranging from Technology …

• Through Circuits …

• To Applications

TAKE HOME MESSAGE

23/07/2019

43








OUTLINE


• Explainable models & trustworthy AI –• How to trust an AI model? Today’s AI have no notion of common sense!

• What is extremely obvious for a human can lead to mistakes from the AI – and mistake with a very high confidence from the model.

• Ex: • Clear CNN mistakes,

• Person « disappearing » while holding a printed pattern on a a4 paper in front of them, etc.

• Certifications?

THE FUTURE OF AI

Image from [Thys and Van Ranst, 2019] and [Nguyen, Yosinski and Clune 2015]

23/07/2019

44


• Reduce the labeling constraint & the dataset size • How to bring AI to fields where labelled datasets are not commonly available? • How to learn from only a few examples?• Unsupervised learning, representation learning, self-supervised learning, etc.

• Incremental learning• How to add a new class of data to an already trained model?

THE FUTURE OF AI


• Deep learning is now mainstream• Lots of Deep Learning frameworks• Lots of Datasets

• Several dedicated hardware platforms are available• Cloud/HPC applications• Embedded applications

• Custom ASICs will keep improving performance and en ergy-efficiency• Increased parallelism• Embedded memory• In-memory computing

• Future challenges are• Certification• Lifelong learning

CONCLUSION

23/07/2019

45

Leti, technology research instituteCommissariat à l’énergie atomique et aux énergies alternativesMinatec Campus | 17 avenue des Martyrs | 38054 Grenoble Cedex | Francewww.leti-cea.com

Thanks a lot!

Tutorial AI LID Part2 v1 - LETI Innovation Days · 2019-08-27 · 23/07/2019 1 Leti Innovation Days...

Documents

Transcript of Tutorial AI LID Part2 v1 - LETI Innovation Days · 2019-08-27 · 23/07/2019 1 Leti Innovation Days...