Tutorial AI LID Part2 v1 - LETI Innovation Days · 2019-08-27 · 23/07/2019 1 Leti Innovation Days...
Transcript of Tutorial AI LID Part2 v1 - LETI Innovation Days · 2019-08-27 · 23/07/2019 1 Leti Innovation Days...
23/07/2019
1
| 1Leti Innovation Days | Tutorial on AI | 24th June 2019
• 13:00 Registration & Welcome coffee• 13.30 Part 1 (2h)
Know your AIA panorama on AI and Deep Learning
• 15:30 Coffee break (30’)• 16:00 Part 2 (1h30)
Run your AISoftware and Hardware platforms for Deep Learning
• 17:30 End
OUTLOOK
| 2Leti Innovation Days | Tutorial on AI | 24th June 2019
• Deep learning methodology• Problem definition• Data preparation and collection• Learn
• Popular software frameworks• Troubleshooting
• Debug
• Hardware platforms• Challenges• Various needs
• Training vs Inference• Cloud vs Edge
• Trends• The Bio-inspiration
• Edge AI activities @ LETI
• The Future of AI & conclusion
OUTLINE
23/07/2019
2
| 3Leti Innovation Days | Tutorial on AI | 24th June 2019
• Before talking about DL frameworks, a few generalit ies …
• Deep learning implies to change one’s mindset
SOFTWARE FRAMEWORKS
You have to embrace uncertainty
• Models are built from data• e.g. you do not know what features will be extracted from
an image to recognize a dog• You will not write a rigid set of instructions
• Code is usually much shorter
You have to experiment
• Your models will get good enough through trial and error• Maybe you will need to enrich your dataset, or change
the optimizer, the topology …
| 4Leti Innovation Days | Tutorial on AI | 24th June 2019
• Since you are an experimenter, you need to set up a scientific methodology
RECOMMENDED METHODOLOGY
Define the problem
Make assumptions
Collect data
Train the model
Verify the model
“I want to sort out bananas from other
fruits”
“I only need to focus on the yellow color”
Build a database containing only the
color of fruits
“Oops, it wrongly classifies lemons” Not good enough
“I need to add shape information”
Complement database
“Oops, it misses green bananas”
“I need to give full labeled images”
23/07/2019
3
| 5Leti Innovation Days | Tutorial on AI | 24th June 2019
• What application do you target?• Classification e.g. it is a dog, a cat � Supervised learning• Clustering e.g. cucumbers on one side, tomatoes on the other � Unsupervised learning• Regression e.g. predict the amount of pollen on a sunny day � Supervised learning
• What data do you need?• Supervised � you need labeled data• Unsupervised � you need correctly defined data for ensuring a correct clustering• Reinforced � you need a virtual world and an agent (or a real agent, which is even more
challenging)
FIRST STEP – DEFINE A DEEP LEARNING PROBLEM
Deep Learning problem definition
DL project completion
| 6Leti Innovation Days | Tutorial on AI | 24th June 2019
• Deep learning methodology• Problem definition• Data preparation and collection• Learn
• Popular software frameworks• Troubleshooting
• Debug
• Hardware platforms
• Edge AI activities @ LETI
• The Future of AI & conclusion
OUTLINE
23/07/2019
4
| 7Leti Innovation Days | Tutorial on AI | 24th June 2019
• Data must be collected and prepared• This is the foundation for trusted DL � this takes time
• > 50% of a ML project
• It is important to ensure that data is• Clean
• Remove outliers, biased data, exceptions• Consistent
• Format data consistently• Accurate
• Feature engineering might help• E.g. day of the week for analyzing sells trends
SECOND STEP – COLLECT AND PREPARE DATA
Deep Learning problem definition
Data collection & preparation
Deep learning is all about transforming raw data into information
DL project completion
| 8Leti Innovation Days | Tutorial on AI | 24th June 2019
• Remove outliers• Data that lies at an abnormal distance from other values
• Sample correctly• Training data distribution must reflect the actual
environment• E.g. you build a database for a self-driving car
• Remove bias• Otherwise, your model will learn it
CLEAN DATAOutlier
23/07/2019
5
| 9Leti Innovation Days | Tutorial on AI | 24th June 2019
• Inconsistent data issue arises when you are aggrega ting data from different sources• Dates• Addresses• Prices
• $5.01, 5$ 1cent, 5.01• Images
• 32*32 pixels, 128*128 pixels
CONSISTENT DATA
| 10Leti Innovation Days | Tutorial on AI | 24th June 2019
• Raw inaccurate data is useless• Deep learning will not learn meaningful feature extractors
• You might need to do feature engineering• Example 1 – you want to analyze sell trends
• 2015/05/22, 2015/05/23 … might not be accurate enough• Add Monday, Tuesday …
• Example 2 – you want to classify seismic activity• Raw acceleration might not be accurate enough• Add FFT transformation
ACCURATE DATA
S. Zarnani, “Numerical parametric study of expanded polystyrene geofoam seismic buffers”, Canadian Geotechnical Journal, 2009
23/07/2019
6
| 11Leti Innovation Days | Tutorial on AI | 24th June 2019
• How much data is enough data?• For good generalization, you need 100s of thousands of
example• With less data, consider using non-ML methods first (e.g. PCA)
• Data augmentation can help in expanding the dataset• Basic operations: rescaling, flipping, normalization, affine,
filtering, DFT…• Advanced operations: elastic distortion, random slices/labels
extraction, morphological reconstructions…
SECOND STEP – COLLECT AND PREPARE DATA
Deep Learning problem definition
Data collection & preparation
© h
ttps:
//i.s
tack
.imgu
r.co
m/h
aBY
t.png
| 12Leti Innovation Days | Tutorial on AI | 24th June 2019
• Natural images
DATASETS SOURCES
MNIST CIFAR10/100
Caltech101/256
SVHN ImageNet
The de-facto standard
Coil100NORB LSUN
Pascal VOC MS COCO
Labelme Google’s Open Images
23/07/2019
7
| 13Leti Innovation Days | Tutorial on AI | 24th June 2019
• There is nowadays a large variety of labelled datas ets!
• Facial recognition• Faces: UMD Faces (367K images, >8K people), CASIA WebFace (453K images, >10K people), MS-
Celeb-1M (1M images of celebrities), IndianFaceDatabase, Multi-Pie, Labelled Faces in the Wild, FERET• Emotions:JACFEE (Japanese and Caucasian Facial Expressions of Emotion), Face-in-Action, MMI
Facial Expression Database• Recommendation
• Movies: Netflix Prize (100M ratings, >17K movies, ~500K people), Movielens• Music: Million song dataset (1M songs, 1M people), last.fm (>90K songs, ~2K people)• Reading: Book-Crossing dataset (>1M ratings, ~300K books, ~300K people)• Purchase: Amazon Co-Purchasing (>500K poducts)
• Speech• Health• Government data• Question answering …
DATASETS SOURCES
| 14Leti Innovation Days | Tutorial on AI | 24th June 2019
• Need a particular dataset?• Use a dataset search engine!
• For example, www.kaggle.com• > 17K datasets
DATASET SOURCES
23/07/2019
8
| 15Leti Innovation Days | Tutorial on AI | 24th June 2019
• Original dataset is decomposed into several subsets
HOW IS THE DATASET USED?
Training dataset Test dataset
Original dataset
Training dataset Test datasetValidation dataset
Deep learning model
Deep learning algorithm
Final validation
Validation
Training
| 16Leti Innovation Days | Tutorial on AI | 24th June 2019
• In supervised learning, it is the data that matters the most!
TAKE HOME MESSAGE
23/07/2019
9
| 17Leti Innovation Days | Tutorial on AI | 24th June 2019
• Deep learning methodology• Problem definition• Data preparation and collection• Learn
• Popular software frameworks• Troubleshooting
• Debug
• Hardware platforms
• Edge AI activities @ LETI
• The Future of AI & conclusion
OUTLINE
| 18Leti Innovation Days | Tutorial on AI | 24th June 2019
• Use the learning strategy that• Fits your DL problem to be solved• Is consistent with data you have
THIRD STEP – LEARN
Deep Learning problem definition
Data collection & preparation
Learn
Learning strategy
DL problem Outcome Real-life example
SupervisedClassification Labeled data
Image: a dog, a cat…Sound: a car, a truck…LIDAR: a pedestrian, a tree
Regression Numerical values % of people liking a movie
Unsupervised ClusteringGroups of similar data
Faces clustering of your photos
DL project completion
23/07/2019
10
| 19Leti Innovation Days | Tutorial on AI | 24th June 2019
• There are many easy-to-use, open source Deep Learni ng frameworks• To simplify the implementation of large-scale deep learning models• In a short period of time
• A Deep Learning framework is• A software tool• With an interface• And a library of pre-built components
• Popular frameworks are• TensorFlow• Keras• PyTorch• Caffe• Deeplearning4j
THIRD STEP – LEARN
| 20Leti Innovation Days | Tutorial on AI | 24th June 2019
• How to choose one among others?
• You should look after the following aspects dependi ng on who you are
• Other topics to consider• Open source
THIRD STEP – LEARN
Beginner
• Good community support
• Available tutorials
• Pre-written examples
Professional
• Compact code, easy to understand and maintain
• Optimized for performance
Expert
• Distributed DL solution, for scalable production code
• Multiple language support
23/07/2019
11
| 21Leti Innovation Days | Tutorial on AI | 24th June 2019
By far, the most commonly used
• Open Source, with excellent community support• Originates from the Google Brain Team• Extensive documentation• Pre-written codes for CNN, RNN…
• Supports multiple languages for creating DL models• Python, C++, R
• Important APIs• Dataset API: streamlines the pre-processing, batching and consumption of data• Keras API: high-level API, eases model definition
TENSOR FLOW
| 22Leti Innovation Days | Tutorial on AI | 24th June 2019
The most flexible
• Rapidly developing DL framework• Originates from Facebook
• Tensor computation• Python-based
• Flexibility• Computational graphs can be built on the go• Even be changed during runtime• Useful when you do not know before-hand the amount of memory needed
PYTORCH
23/07/2019
12
| 23Leti Innovation Days | Tutorial on AI | 24th June 2019
The fastest on images
• A popular framework, initially geared towards image processing• Originates from UC Berkeley• Open-source
• Consequently, support for RNN is not as great
• Supports multiple languages• C, C++, Python• As well as a Matlab interface
• Primarily used for building and deploying deep lear ning models for mobile phone• And other computationally constrained platforms
CAFFE
| 24Leti Innovation Days | Tutorial on AI | 24th June 2019
Implemented in Java
• More speed and energy efficient than Python• Supports both CPUs and GPUs• Can process a huge amount of data without sacrificing speed
• Java library for deep learning• ND4J library, for Tensor computations
• Takes advantage of distributed frameworks for big d ata processing• Spark, Hadoop
• Primarily used by Java developers
DEEPLEARNING4J
23/07/2019
13
| 25Leti Innovation Days | Tutorial on AI | 24th June 2019
High-level API
• For those who do not want to dig deep into DL frame works• Enables fast experimentations• Without focusing on low-level library details
• Supports CNN and RNN topologies• VGG, InceptionV3, Mobilenet
• Runs on top of• TensorFlow (Keras API is natively integrated in TensorFlow 2.0)• Theano• CNTK
KERAS
| 26Leti Innovation Days | Tutorial on AI | 24th June 2019
DNN shrinking exploration and porting
• Embedding low-power DNN remains challenging• Must adapt and simplify DNN topologies
• Reduce layers complexity (number of operations)• Reduce precision (8 bit integer or less)
• N2D2 framework automates• DNN shrinking exploration and evaluation• Performances projection• And porting on embedded platforms
• Various hardware targets• Even spiking accelerators
N2D2
23/07/2019
14
| 27Leti Innovation Days | Tutorial on AI | 24th June 2019
• Educated advice if you are a complete beginner• Start with Keras• Then, move onto TensorFlow
COMPARISON OF THOSE DEEP LEARNING FRAMEWORKS
Framework LanguageCUDA
supportPre-trained
modelsOpen
sourceWhy choose it?
TensorFlow Python, C++ Yes Yes YesMost used
Various languages
PyTorch Python, C Yes Yes YesMost flexible
Dynamic graphs
Caffe C++ Yes Yes YesImage processing
Constrained platforms
Deeplearning4j Java, C++ Yes Yes YesJava programming
Speed
Keras Python Yes Yes Yes High-level API
N2D2 C++ Yes Yes Yes Embedded target
| 28Leti Innovation Days | Tutorial on AI | 24th June 2019
• New initiative for interchangeable DL models• Introduced in 2017 by Microsoft and Facebook• Enables models to be trained in one framework and transferred to another for inference
• Supported frameworks
• Converters to/from
ONNX – OPEN NEURAL NETWORK EXCHANGE
23/07/2019
15
| 29Leti Innovation Days | Tutorial on AI | 24th June 2019
• You must choose a Deep Learning Framework depending on your needs
• Knowing that models are more and more interchangeab le
TAKE HOME MESSAGE
| 30Leti Innovation Days | Tutorial on AI | 24th June 2019
• Deep learning methodology• Problem definition• Data preparation and collection• Learn
• Popular software frameworks• Troubleshooting
• Debug
• Hardware platforms
• Edge AI activities @ LETI
• The Future of AI & conclusion
OUTLINE
23/07/2019
16
| 31Leti Innovation Days | Tutorial on AI | 24th June 2019
• Visualize Deep Network models and metrics• Use pre-built packages like TensorBoard
TROUBLESHOOTING – WHAT TO LOOK FOR DURING LEARNING?
| 32Leti Innovation Days | Tutorial on AI | 24th June 2019
• Plot the Cost function to tune the learning rate
TROUBLESHOOTING – WHAT TO LOOK FOR DURING LEARNING?
Iterations
CostLearning rate is too high
Learning rate is high
Learning rate is lowLearning rate is good
�������� = ������� � − ��������_���� ∗ ��������
23/07/2019
17
| 33Leti Innovation Days | Tutorial on AI | 24th June 2019
• Plot the Accuracy to tune regularization
• A large gap between validation and training accurac ies• Signals the overfitting of the model • To reduce overfitting, increase regularizations
TROUBLESHOOTING – WHAT TO LOOK FOR DURING LEARNING?
Training dataset
Validation dataset(Good generalization)
Validation dataset(Overfit)
Iterations
Accuracy
Regularization
• Dropout• L1 regularization• …
| 34Leti Innovation Days | Tutorial on AI | 24th June 2019
• Monitor Accuracy and Cross-entropy• Accuracy: Is my model performing well?
• Cross-entropy: How close my model is to ground-truth classes?
TROUBLESHOOTING – WHAT TO LOOK FOR DURING LEARNING?
Accuracy Cross-entropy
Cat DogBird
0.610.33
0.06Cat 1Dog 0Bird 0
Cat DogBird
0.94
0.050.01
23/07/2019
18
| 35Leti Innovation Days | Tutorial on AI | 24th June 2019
• Monitor Weights and Biases• Weights: A Normal weight distribution is a good sign that the training is going well
• Biases: Large (positive or negative) biases is abnormal
TROUBLESHOOTING – WHAT TO LOOK FOR DURING LEARNING?
Weights Biases
With a large bias, neuron input would not matter
| 36Leti Innovation Days | Tutorial on AI | 24th June 2019
• Monitor Pre-activations, Activations and Gradients• Pre-activations: Pre-activations must be normally distributed, otherwise apply a normalization
• Activation: Monitor zero activations (i.e. dead nodes)
• Gradients: Monitor layers gradients and track gradient evolution from output layers to input layers, to prevent vanishing or exploding gradient problems
TROUBLESHOOTING – WHAT TO LOOK FOR DURING LEARNING?
Pre-activation Gradients
23/07/2019
19
| 37Leti Innovation Days | Tutorial on AI | 24th June 2019
• You also must monitor• Memory footprint• Latency• Applicative performances
• Example• The DeepManta solution for autonomous driving
• Needs to run real-time on the DrivePX2 platform
TROUBLESHOOTING – WHAT TO LOOK FOR DURING LEARNING?
3.12 fasterSame level of recognition
Initial implementation 8 FPS
Final performance 25 FPS
NVIDIA DrivePX2
| 38Leti Innovation Days | Tutorial on AI | 24th June 2019
• The more you observe what is going on in your model , the better will be the training
• And the final application performances
TAKE HOME MESSAGE
23/07/2019
20
| 39Leti Innovation Days | Tutorial on AI | 24th June 2019
• Models will make strange mistakes that are difficul t to debug• Due to anything from skewed training data,• To unexpected interpretations of data during training
• Furthermore, production models• Will interact with other pieces of software• Or will be used in never before situations
FOURTH STEP – DEBUG
Google only solved this issue by removing the Gorilla category
altogether!
Deep Learning problem definition
Data collection & preparation
Learn Debug
DL project completion
| 40Leti Innovation Days | Tutorial on AI | 24th June 2019
• Use smaller filter sizes• 3x3 and 5x5 filters usually perform best
• Add layers• Deeper networks lead to more complex models
• Add skip connections to tackle the vanishing gradie nt problem• As in ResNet topology
IMPROVING A DEEP NETWORK
Intent is to overfit the model,since we can later on correct that with regularization
23/07/2019
21
| 41Leti Innovation Days | Tutorial on AI | 24th June 2019
• Learning rate decay schedule• Instead of a fixed learning rate, use a regularly decaying
learning rate• E.g. 0.95 decay rate for every 100,000 iterations
• Momentum• A high momentum will prevent weights from oscillating
• Early stopping• Prevent overfitting by stopping learning when validation
loss keeps increasing
ADVANCED TUNING
Iterations
Learning rate
Iterations
AccuracyTraining set
Test set
Early stopping
| 42Leti Innovation Days | Tutorial on AI | 24th June 2019
• Model ensembles are very common in some DL competit ions• Since it is effective in pushing the accuracy up a few percentage points
• Model ensembles• Combine Predictions From Multiple Models• Reducing the variance of predictions and generalization error
• This is a kind of model averaging• It works because different models will usually not make all the same errors on the test set
ENSEMBLE LEARNING
Law of Large NumbersThe average of the results obtained from a large number of trials should be close to
the expected value, and will tend to become closer as more trials are performed
23/07/2019
22
| 43Leti Innovation Days | Tutorial on AI | 24th June 2019
• It is common practice to start from an existing neu ral network• Given the difficulty and time needed to train a neural network from scratch
• What has been learned in one context• Is exploited to improve generalization in another setting
• Especially used for Image and Natural Language Proc essing
TRANSFER LEARNING
| 44Leti Innovation Days | Tutorial on AI | 24th June 2019
• Features to be transferred need to be general enoug h• For being suitable to the target tasks• E.g., for image classification, start from pre-trained CNN features on ImageNet
• Freeze some layers• CNN features are more generic in early layers • And more original-dataset-specific in later layers
• Benefits• Faster learning• Potentially better end result
TRANSFER LEARNING
Freeze Fine-tune Learn
23/07/2019
23
| 45Leti Innovation Days | Tutorial on AI | 24th June 2019
• There are many fine-tuning options and methods
• This is a field of research that is evolving every week, so you need to keep yourself up-to-date
TAKE HOME MESSAGE
| 46Leti Innovation Days | Tutorial on AI | 24th June 2019
• Deep learning methodology
• Hardware platforms• Challenges• Various needs
• Training vs Inference• Cloud vs Edge
• Trends• The Bio-inspiration
• Edge AI activities @ LETI
• The Future of AI & conclusion
OUTLINE
23/07/2019
24
| 47Leti Innovation Days | Tutorial on AI | 24th June 2019
• Challenges• Many parameters to be stored• Many operations to be performed
• This is not even considering training!
DNN TOPOLOGIES AND COMPUTATION DEMANDS
A. Canziani et Al., “An Analysis of Deep Neural Network Models for Practical Applications”, 2017
Benchmark of CNN models on ImageNet database (2 mil lion labeled objects)Tradeoff between accuracy, speed and complexitySize of bubbles is proportional to number of parameters
Best ones
| 48Leti Innovation Days | Tutorial on AI | 24th June 2019
IT PROJECTED TO CHALLENGE FUTURE ELECTRICITY SUPPLY
Exponential power consumption
Anders S.G. Andrae, “Total Consumer Power Consumption Forecast”, 2017
23/07/2019
25
| 49Leti Innovation Days | Tutorial on AI | 24th June 2019
EVOLUTION TOWARDS EMBEDDED INTELLIGENCE
2019 model2019 model2019 model2019 model
Simple sensor
Cloud computing
Learning
Inference
Row Data
Row
Dat
a
Com
man
d
Multi-sensor
Data fusion
Pre-processing
Pre-processedData
Cognitive Cyber-Physical Systems
Pre
-pro
cess
ed D
ata
Con
figur
atio
n
Learning
Inference
EdgeEdgeEdgeEdge AIAIAIAI
FederatedFederatedFederatedFederatedOpen PersonalOpen PersonalOpen PersonalOpen Personal
IntelligenceIntelligenceIntelligenceIntelligence
Knowledge
Multi-sensorBehavior sensor
Coo
pera
tion
EmbeddedIntelligence
Federated learning
Embedded Intelligence
LearningInference
| 50Leti Innovation Days | Tutorial on AI | 24th June 2019
SOLVING THE ENERGY CHALLENGE: COST OF MOVING DATA
x800 more!
Bill Dally, “To ExaScale and Beyond”, 2010
23/07/2019
26
| 51Leti Innovation Days | Tutorial on AI | 24th June 2019
• Focus on Low-Power and Low-Area Digital Computation s • Limit as much as possible the number of weights
• Weight pruning and quantization• Avoid FP representation and FP operations
• Integrate Memory cuts close to processing elements • Store most accessed parameters on-chip
• Ultimately do computations in the memory• In-memory computing paradigm• Possible with Non-Volatile memories
• Digital or analog manner
SOLUTIONS FOR INCREASING ENERGY EFFICIENCY
| 52Leti Innovation Days | Tutorial on AI | 24th June 2019
• GPUs and CPUs currently lead in market share
• But ASICs will capture the lead in 2022
• With opportunities for SOC accelerators
NEED FOR SPECIALIZED HARDWARE
23/07/2019
27
| 53Leti Innovation Days | Tutorial on AI | 24th June 2019
• Dedicated ASICs will fuel the Deep Learning applica tions growth
TAKE HOME MESSAGE
| 54Leti Innovation Days | Tutorial on AI | 24th June 2019
• Deep learning methodology
• Hardware platforms• Challenges• Various needs
• Training vs Inference• Cloud vs Edge
• Trends• The Bio-inspiration
• Edge AI activities @ LETI
• The Future of AI & conclusion
OUTLINE
23/07/2019
28
| 55Leti Innovation Days | Tutorial on AI | 24th June 2019
TRAINING VERSUS INFERENCE – DIFFERENT USAGES
Cloud/HPC Edge/Embedded
Trai
ning
Infe
renc
e
| 56Leti Innovation Days | Tutorial on AI | 24th June 2019
TRAINING VERSUS INFERENCE – DIFFERENT NEEDS
Cloud/HPC Edge/Embedded
Trai
ning
Infe
renc
e
� High energy efficiency (100 GFLOPS/W)� Large memory bandwidth� Moderate precision (16b FP)� High configurability (any layer)
� High throughput (TFLOPS)� Large memory bandwidth� High precision (32b/64b FP)� High configurability (any layer)� Distributed
� High throughput (TFLOPS)� Low Latency� Energy efficiency� Distributed
� Very high energy efficiency (5-10 TOPS/W)� Short latency (batch size of 1)� Reduced throughput� Low cost (as low as 5$)� Above tradeoffs depend on the application
� ADAS, delivery drone, wearables
23/07/2019
29
| 57Leti Innovation Days | Tutorial on AI | 24th June 2019
TRAINING VERSUS INFERENCE – DIFFERENT HARDWARE TARGE TS
Cloud/HPC Edge/Embedded
Infe
renc
e
� Mostly GPUs� Some FPGAs� Some ASICs
(TPUs)
Trai
ning � Small-scale GPUs
� CPUs� GPUs� FPGAs� ASICs (TPUs)
nVidia V100 nVidia Jetson Nano
Google Edge TPU
Intel Compute Stick
� Small-scale GPUs� ASICs� CPUs
Intel Xeon
Raspberry Pi
| 58Leti Innovation Days | Tutorial on AI | 24th June 2019
• Deep learning methodology
• Hardware platforms• Challenges• Various needs
• Training vs Inference• Cloud vs Edge
• Trends• The Bio-inspiration
• Edge AI activities @ LETI
• The Future of AI & conclusion
OUTLINE
23/07/2019
30
| 59Leti Innovation Days | Tutorial on AI | 24th June 2019
TRENDS IN CLOUD COMPUTING
Increased parallelism for higher throughput
FPGAs
• Especially useful for low batch size inference
More tensor cores
• nVidia V100• 640 cores, 120 TFLOPS
• Google TPUv2• 180 TFLOPS
nVidia V100
Google TPU
• NVIDIA V100 GPU and Google Cloud TPU• Are the benchmarks for commercial AI chips in the cloud
Increased storage requirements
High bandwidth memory
• nVidia V100• 900 GB/s bandwidth
• Google TPUv2• 600 GB/s bandwidth
Large capacity
• nVidia V100• 16 GB HBM2
• Google TPUv2• 16 GB HBM
| 60Leti Innovation Days | Tutorial on AI | 24th June 2019
• Example: Evolution of the TPU• It started with an inference-only accelerator
TRENDS IN CLOUD COMPUTING
“Different versions of TPU compared”, Teich 2018
23/07/2019
31
| 61Leti Innovation Days | Tutorial on AI | 24th June 2019
TRENDS IN EDGE COMPUTING
Increased computing efficiency
Weight quantization
• Reduced bit accuracy• Smaller memory footprint• Lighter operations
• Currently used for inference-only tasks
Variable bit precision
• Handling higher bit accuracy when needed
• For higher inference precision
Sparsity
• Clock gating MAC operators• When weight or intermediate result is 0
“Quantization in TensorFlow”, Google Cloud Blog, 2019
Can lead to 4X memory reduction
| 62Leti Innovation Days | Tutorial on AI | 24th June 2019
TRENDS IN EDGE COMPUTING
Increased computing efficiency
Weight quantization
• Reduced bit accuracy• Smaller memory footprint• Lighter operations
Increased storage efficiency
Near memory computing
• Avoid external memory accesses• 200X more energy consuming
• Weights• Embedded Non-Volatile Memory
• Intermediate results• SRAM or Embedded DRAM
• Currently used for inference-only tasks
In-Memory computing
• SRAM or Embedded NVM• Digital or analog
Variable bit precision
• Handling higher bit accuracy when needed
• For higher inference precision
Sparsity
• Skip MAC operations• When weight or intermediate result is 0
23/07/2019
32
| 63Leti Innovation Days | Tutorial on AI | 24th June 2019
• CPU• Quick prototyping that requires maximum flexibility• Small models with small effective batch sizes• Models that use many custom layers or operations written in C++• Models that are dominated by networking bandwidth of the Host system
• GPU• Medium-to-large models with larger effective batch sizes• Workloads that require high-precision arithmetic, e.g. double-precision
• TPU• Exclusively with TensorFlow, and without custom operations• Very large models, with very large effective batch sizes, that train for weeks
• FPGA• Low latency on low size batch is needed• Efficiently exploiting sparsity, e.g. DeepCompress
WHEN TO USE GPU/TPU/CPU/FPGA?
| 64Leti Innovation Days | Tutorial on AI | 24th June 2019
• Each hardware target has its own PROS/CONS
• Choose one (or several) depending on your needs
TAKE HOME MESSAGE
23/07/2019
33
| 65Leti Innovation Days | Tutorial on AI | 24th June 2019
• Deep learning methodology
• Hardware platforms• Challenges• Various needs
• Training vs Inference• Cloud vs Edge
• Trends• The Bio-inspiration
• Edge AI activities @ LETI
• The Future of AI & conclusion
OUTLINE
| 66Leti Innovation Days | Tutorial on AI | 24th June 2019
BIO-INSPIRED NEURAL NETWORKS
• Network• Set of neurons• Interconnected through synapses• 3D connected
• Neuron• Compute element
→ Integration of inputs• 1k – 10k inputs• 1 output only but with very high Fan-out
• Synapse• Memory element
→ Modulation of inputs• Define the function of the network
� Low frequency (1-10 Hz) usage but huge connectivity
Action potential = spike
23/07/2019
34
| 67Leti Innovation Days | Tutorial on AI | 24th June 2019
WHAT IS THE DIFFERENCE BETWEEN CLASSICAL CODING NN AND BIOLOGY?
• Classical coding is an Abstraction from biology• The spike train is converted into a value representing its mean frequency
• Neuron• MAC operation
• Multiplication-Accumulation• Non-linear activation function
• Sigmoid, ReLU …
• Synapse• Weight stored into DRAM
• Brain works a lot differently• Computation is analogous
• Neuron soma = synaptic current integrator• Communication is digital
• Spikes = unary events, very robust to noise• Compute and memory cells are co-located
| 68Leti Innovation Days | Tutorial on AI | 24th June 2019
• The promises of spike-coding NN:• Reduced computing complexity and natural temporal and spatial parallelism • Simple and efficient performance tunability capabilities• Spiking NN best exploit NVMs such as RRAM, for massively parallel synaptic memory
SPIKE CODING FOR DEEP NETWORKS
layer 1 layer 2 layer 3 layer 4
Pixelbrightness
Spiking frequencyV
t
fMIN
fMAX
Rate-based input
coding
Time
CorrectOutput
0
0,5
1
1,5
2
2,5
0
0,5
1
1,5
2
2,5
3
3,5
4
4,5
5
1 2 3 4 5 6 7 8 9 10
Spi
kes
/ con
nect
ion
Tes
t err
or r
ate
(%)
Decision threshold
16 kernels4x4 synapses
90 kernels5x5 synapses
Input: digit24x24 pixels
(cropped)
Conv. layer16 maps
11x11 neurons
Conv. layer
24 maps4x4
neurons
1) Standard CNN topology, offline learning 2) Lossless spike transcoding 3) Performance vs computing time tunability(approximated computing)
Formal neurons Spiking neuronsBase operation - Multiply-Accumulate (MAC) + Accumulate onlyActivation function - Non-linear function + Simple thresholdParallelism - Spatial multiplexing + Spatial and temporal multiplexing
23/07/2019
35
| 69Leti Innovation Days | Tutorial on AI | 24th June 2019
• The most well known• Kind of a benchmark
• Fully digital implementation
• Time-multiplexed
• Scalable architecture with 4,096 cores• 1M neurons, 256M synapses
• Each neurosynaptic core has• 256 neurons with 256 inputs• Implemented as a 256x256 binary crossbar
• Memory and computation are intertwined• No memory bottleneck• Energy efficient (70mW)
• Demonstrated• Audio and image classification• Hand gesture recognition with event-based camera
IBM TRUENORTH
A. Andreopoulos, “Visual saliency on networks of neurosynaptic cores”, IBM, 2015
“Deep learning inference possible in embedded systems thanks to TrueNorth”, IBM Blog, 2016
| 70Leti Innovation Days | Tutorial on AI | 24th June 2019
• The most versatile• Newer than TrueNorth
• Fully digital implementation
• Time-multiplexed
• Scalable architecture with 128 cores• 130K neurons, 130M synapses
• Online learning• Spikes are asynchronous• Different time-based learning rules
• Demonstrated• Keyword spotting, Natural Language Processing• Trough the Intel Neuromorphic Research Community
• 50 research groups
INTEL LOIHI
M. Davies, “Loihi: A Neuromorphic Manycore Processor with On-Chip Learning”, IEEE Micro, 2018
“Kapoho Bay USB stick”, Intel Newsroom, 2018
23/07/2019
36
| 71Leti Innovation Days | Tutorial on AI | 24th June 2019
UNIVERSITY OF ZURICH DYNAPSEL
• The most biomimetic• True analog neurons and synapses
• Mixed-signal implementation
• Scalable architecture with 5 cores• 4 non plastic: 256 neurons, 16K synapses• 1 plastic: 64 neurons, 8K plastic synapses
• Fully integrated SNN• Parallel processing of spikes
• Online learning• Spikes are asynchronous• STDP learning rule
• Demonstrated• Heartbeat anomaly detection
• ~ 15 nW for 60 beats per minute
G. Indiveri, “A mixed-signal multi-core spiking chip for models of cortical computation”, NeuRAM3 project , 2018
“Heartbeat anomaly detection”, NeuRAM3 project , 2018
| 72Leti Innovation Days | Tutorial on AI | 24th June 2019
IBM TrueNorth INTEL Loihi ZURICH DynapSEL
Technology 28nm CMOS 14nm CMOS 28nm FDSOI
Supply voltage 0.7 – 1.05 V 0.5 – 1.25 V 0.73V – 1 V
Design type Digital Digital Mixed-signal
Number of neurons 1000K 130K 1K
Neurons per core 256 max. 1K 256
Core area 0.094 mm² 0.4 mm² 0.36 mm²
Computation Time multiplexing Time multiplexing Parallel processing
Fan In/Out 256/256 16/4K 2K/8K
On-line learning No Programmable rules STDP
Synaptic operations/s/W 46 G 300 G
Energy per synaptic op. 26 pJ 23.6 pJ 2 pJ
BENCHMARK
• Lessons learnt• Parallel processing increases energy efficiency• Large fan In/Out is a must for enabling dense layers
23/07/2019
37
| 73Leti Innovation Days | Tutorial on AI | 24th June 2019
• Neuromorphic circuits are seen by some as the futur e of AI chips
• Although there is no Killer application yet
TAKE HOME MESSAGE
| 74Leti Innovation Days | Tutorial on AI | 24th June 2019
• Deep learning methodology
• Hardware platforms• Challenges• Various needs
• Training vs Inference• Cloud vs Edge
• Trends• The Bio-inspiration
• Edge AI activities @ LETI
• The Future of AI & conclusion
OUTLINE
23/07/2019
38
| 75Leti Innovation Days | Tutorial on AI | 24th June 2019
N2D2: DNN DESIGN ENVIRONMENT
• A unique platform for the design and exploration of DNN applications
Code Generation Code Execution
COTS
•Many-core CPUs
(MPPA, ASMP, ARM…)
•GPUs, FPGAs
HW ACCELERATORS
PNeuro, DNeuro
SW DNN libraries
•OpenCL, OpenMP,
CuDNN, CUDA,
TensorRT
•PNeuro, ASMP
HW DNN libraries
DNeuro, C/HLS
Data
conditioning
Learning &
Test
databases
CONSIDERED CRITERIA
•Accuracy (approximate computing…)
•Memory need
•Computational Complexity
Modeling Learning Test
Optimization
Trained DNN
Available on GitHub
| 76Leti Innovation Days | Tutorial on AI | 24th June 2019
N2D2: FAST AND ACCURATE DNN EXPLORATION
; Environment[env]SizeX=8SizeY=8ConfigSection=env.config
[env.config]ImageScale=0
; First layer (convolutionnal)[conv1]Input=envType=ConvKernelWidth=3KernelHeight=3NbChannels=32Stride=1
; Second layer (pooling)[pool1]Input=conv1
Type=PoolPoolWidth=2PoolHeight=2NbChannels=32Stride=2
; Third layer (fully connected)[fc1]Input=conv2Type=FcNbOutputs=100
; Output layer (fully connected)[fc2]Input=fc1Type=FcNbOutputs=10
Learning
Test
Output categories and localization
Rec
on. r
ate
Rec
on.
rate
���� Wide targets range, perfs and power metrics
Deep Network builder11 Learning a database22 Analysis of network Performance33
CPU, GPU and FPGA-based Real-time implementation44
OpenMP
OpenCL
CUDA
HLS FPGA
Parallel CPUGPU
FPGA
23/07/2019
39
| 77Leti Innovation Days | Tutorial on AI | 24th June 2019
• L-IOT platform• Ultra-low-power BUT Always-Responsive• Advanced Wake-up mechanisms
• On-Demand subsystem • 32-bit RISC-V processor• DNN accelerator
• 2 Clusters of 4 neurocores• Optimized MAC operators
• Always-responsive subsystem• Asynchronous Wake-Up controller• Wake-up radio
IOT PLATFORM WITH DNN ACCELERATOR
| 78Leti Innovation Days | Tutorial on AI | 24th June 2019
BRAIN VS. COMPUTER: X 10 6 POWER DISCREPANCY
• Biological system computations are• 3 to 6 order more energy efficient than current
dedicated silicon system
• Brain-inspired computing might just be the key!
• Human brain is• Massively parallel
• 86B neurons and 104 more synapses• Doing processing using memory elements• Event-driven, spike based induced activity
• No system-clock• Self-learning, self-organizing
• Embedded brain-inspired solutions needs• High density storage, close to neurons
• Computational storage• A time-code will be a must
• Scalability, re-configurability• Online learning to come
23/07/2019
40
| 79Leti Innovation Days | Tutorial on AI | 24th June 2019
N2D2 – BIO-INSPIRED MODELS EXPLORATION
Neurons activity
Network topology
Input stimuli
N2 D2simulator
N2-D2Neuromorphic
simulator
128
128CMOS Retina
16,384 spiking pixels
1st layer
2nd layer
Lateral
inhibition
Lateral
inhibition
……
Learning rule
- 1 0 0 - 5 0 0 5 0 1 0 0- 6 0
- 4 0
- 2 0
0
2 0
4 0
6 0
8 0
1 0 0
1 2 0
Con
duct
anc
e ch
ange
∆W
(%
)
∆ T = tp o s t
- tp r e
( m s )
E x p . d a ta [ B i& P o o ] L T P L T D L T P s im u la t io n L T D s im u la t io n
Neuron model
Example: Leaky Integrate & Fire (LIF) neuron
� = �. ������� ��!"�#_����
$! "� % &
Synaptic model
0
20
40
0 20 40 60 80 100
Con
ductan
ce (nS
)
Pulse number
0
20
40
0 20 40 60 80 100
Con
ductan
ce (nS
)
Pulse number
Neuron membrane potential
0
200000
400000
600000
800000
1e+06
1.2e+06
30 35 40 45 50 55 60
Integ
ratio
n
Time (s)
32769
32770
32771
32772
32773
32774
32775
32776
32777
32778
32779
0 10 20 30 40 50 60 70 80 90
Node
#
Time (s)
Synaptic weights
TLTP
• Tool flow for bio-inspired synapses, neurons and le arning rules network simulations
| 80Leti Innovation Days | Tutorial on AI | 24th June 2019
MEMORY: A UNIQUE VALUE PROPOSITION
200/300 MMINTEGRATION
© G
uilly
/cea
© J
ayet
/cea
DEFINITION OF TECHNOLOGY SPECIFICATIONS
MODULE DEVELOPMENT
TEST & CHARACTERIZATION
DESIGN ENABLEMENT
MODELING,SIMULATION & NANO-CHARACTERIZATION
Large variety of materials available
HfAlxOySiOxTaOxZrOxAlOxVOx
GeSbTeGeAsSbTe
Large variety of Memories available
Conductive Bridge RAMOxide Resistive RAM
Ferro-electric RAMPhase – Change Memory
pSTT-Magnetic RAM
23/07/2019
41
| 81Leti Innovation Days | Tutorial on AI | 24th June 2019
• Via Collaborations • MAD Shuttle
RRAM BENCHMARK FOR TRADEOFF UNDERSTANDING
VLSI 2018IMW 2018EDL 2018IRPS 2018
Towards circuit implementation Towards circuit implementation
| 82Leti Innovation Days | Tutorial on AI | 24th June 2019
• SNN with• OxRAM synapses (1T-1R)• Analog neurons
• MNIST application• Proof of concept
• Topology• Fully connected• 10 neurons (10 output classes)• 1440 synapses (11,5 kOxRAMs)
• Technology• Bulk 130nm
SPIKING NEURAL NETWORK ACCELERATOR
Neurons
OxRAMs
23/07/2019
42
| 83Leti Innovation Days | Tutorial on AI | 24th June 2019
SPIKING NEURAL NETWORK ACCELERATOR – THE DEMONSTRATO R
• Live demonstration of handwritten digits classification• Wednesday afternoon• Thursday all day
| 84Leti Innovation Days | Tutorial on AI | 24th June 2019
• LETI has skills ranging from Technology …
• Through Circuits …
• To Applications
TAKE HOME MESSAGE
23/07/2019
43
| 85Leti Innovation Days | Tutorial on AI | 24th June 2019
• Deep learning methodology
• Hardware platforms• Challenges• Various needs
• Training vs Inference• Cloud vs Edge
• Trends• The Bio-inspiration
• Edge AI activities @ LETI
• The Future of AI & conclusion
OUTLINE
| 86Leti Innovation Days | Tutorial on AI | 24th June 2019
• Explainable models & trustworthy AI –• How to trust an AI model? Today’s AI have no notion of common sense!
• What is extremely obvious for a human can lead to mistakes from the AI – and mistake with a very high confidence from the model.
• Ex: • Clear CNN mistakes,
• Person « disappearing » while holding a printed pattern on a a4 paper in front of them, etc.
• Certifications?
THE FUTURE OF AI
Image from [Thys and Van Ranst, 2019] and [Nguyen, Yosinski and Clune 2015]
23/07/2019
44
| 87Leti Innovation Days | Tutorial on AI | 24th June 2019
• Reduce the labeling constraint & the dataset size • How to bring AI to fields where labelled datasets are not commonly available? • How to learn from only a few examples?• Unsupervised learning, representation learning, self-supervised learning, etc.
• Incremental learning• How to add a new class of data to an already trained model?
THE FUTURE OF AI
| 88Leti Innovation Days | Tutorial on AI | 24th June 2019
• Deep learning is now mainstream• Lots of Deep Learning frameworks• Lots of Datasets
• Several dedicated hardware platforms are available• Cloud/HPC applications• Embedded applications
• Custom ASICs will keep improving performance and en ergy-efficiency• Increased parallelism• Embedded memory• In-memory computing
• Future challenges are• Certification• Lifelong learning
CONCLUSION
23/07/2019
45
Leti, technology research instituteCommissariat à l’énergie atomique et aux énergies alternativesMinatec Campus | 17 avenue des Martyrs | 38054 Grenoble Cedex | Francewww.leti-cea.com
Thanks a lot!