1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al....
-
Upload
fay-butler -
Category
Documents
-
view
216 -
download
0
Transcript of 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al....
![Page 1: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/1.jpg)
1
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
ShiDianNao: Shifting Vision Processing Closer to the Sensor
Authors – Zidong Du et al.
Presented by – Gokul Subramanian Ravi
November 12, 2015
![Page 2: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/2.jpg)
2
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Summary• Fact: Neural network accelerators achieve high energy
efficiency/performance for recognition and mining applications.
• Problem: Further improvements limited by memory bandwidth constraints.
• Proposal: – Mapping entire CNN into SRAM: Memory accesses for weights.
– Moving closer to sensors: Memory access for I/O.
• Result: – CNN accelerator placed next to a CMOS or CCD sensor.
– Absence of DRAM accesses + exploitation of access patterns: 60x energy efficiency.
– Synthesis at 65 nm: Large speedup over CPUs/GPUs/DianNao.
![Page 3: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/3.jpg)
3
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Outline Overview of Neural Networks Memory Constrained Acceleration Primer on CNNs Mapping Principles Accelerator Architecture
Computation
Storage
Control
CNN Mapping Results Conclusion
![Page 4: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/4.jpg)
4
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Overview of Neural Networks
• Feed forward networks trained by trial/error or back-propagation.
• Machine learning implemented in FPGAs/accelerators provide high performance/efficiency in multiple applications.
• Convergence of trends towards recognition and mining applications, neural network based algorithms can tackle a significant share of these applications.
• Best of both worlds: accelerators with high performance/efficiency and yet broad application scope.
• Two types of NN – C(Convolutional)NN and D(Deep)NN.
![Page 5: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/5.jpg)
5
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
CNN vs. DNN
• Deep Neural Networks:– Used in object detection, parsing, language modeling.
– Each neuron has unique weight
– Sizes ranging up to 10 billion neurons
• Convolutional Neural Networks:– Used in computer vision, recognition etc.
– Each neuron shares its weight with other neurons.
– Sizes are smaller (eg. 60 million weights).
– Due to its small weights memory footprint, it is possible to store a whole CNN within a small SRAM next to computational operators
– No longer a need for DRAM memory accesses to fetch the (weights) in order to process each input.
![Page 6: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/6.jpg)
6
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Memory Constrained Acceleration
• Highest energy expense is related to data movement, in particular DRAM accesses rather than computation.
• DRAM accesses – fetch weights and inputs.
• The image is acquired by the CMOS/CCD sensor, sent to DRAM, and later fetched by the CPU/GPU for recognition processing.
• The small size of the CNN accelerator makes it possible to hoist it next to the sensor, and only send the few output bytes of the recognition process to DRAM or the host processor.
![Page 7: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/7.jpg)
7
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Shi + DianNao = ShiDianNao
• A synthesized (place & route) accelerator design for large-scale CNNs and DNNs.
• Achieves high throughput in a small area, power and energy footprint.
• Exploits the locality properties of processing layers introduces custom designed storage structures reducing memory overhead.
• ShiDianNao builds atop this to almost completely eliminate DRAM accesses.
* DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning
![Page 8: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/8.jpg)
8
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Convolutional Neural Networks
• Input: 2D arrays of input pixels/neurons
• Convolution Layer: – Set of local filters designed for identifying characteristics of input feature
maps.
– Processes a convolutional window capturing Kx Ky input neurons in one input feature map.
– A 2D array of local filters produces an output feature map, where each local filter corresponds to an output neuron.
![Page 9: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/9.jpg)
9
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Convolutional Neural Networks
• Pooling Layer: – Down-samples an input feature map by performing maximum or average
operations to non-overlapping windows of input neurons.
• Normalization Layer:– 2 Types: LRN and LCN.
– Improves the recognition accuracy of CNN.
• Classifier Layer: integrates one or more classifier layers to compute the final result.
![Page 10: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/10.jpg)
10
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Code Snippets
* DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning
![Page 11: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/11.jpg)
11
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Mapping Principles
• Processing elements – • represent neurons,• are organized in a 2D mesh, • receive broadcasted kernel elements, • receive through right-left and up-down shifts the input feature
map• accumulate locally the resulting output feature map.• Temporal sharing / Sequential mapping.
![Page 12: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/12.jpg)
12
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Architecture: Computation
• Two buffers for input and output neurons (NBin and NBout), buffer for synapses (SB)
• A neural functional unit (NFU) plus an arithmetic unit (ALU) for computing output neurons
• 16-bit operations.
![Page 13: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/13.jpg)
13
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Architecture: Computation
• NFU optimized into 2D to handle 2D data as used in convolution.
• Intuitive way of mapping is to map Kx*Ky kernel to same number of PEs to output one neuron – disadvantageous.
• Single PE/output neuron, time shared across input neurons (increased latency?).
• PE can perform addition, multiplication or comparison.
• Lightweight ALU to implement non-linear activation function (in the form of piecewise linear interpolation).
![Page 14: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/14.jpg)
14
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Inter-PE data propagation• Required data available in NBin/NBout but repeatedly reading them requires
high b/w.
• Inter-PE data propagation allows efficient data reuse.
• Temporarily store outputs and transfer to left and lower neighbors.
![Page 15: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/15.jpg)
15
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Architecture: Storage
• On-chip SRAM to store data/instructions.
• ~136 KB storage sufficient for total data of practical CNNs.
• Implements 288KB SRAM.
![Page 16: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/16.jpg)
16
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Architecture: Control
• Support efficient computation and data reuse.
• NB – 2*Py banks, each with width Px*2 bytes.
![Page 17: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/17.jpg)
17
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Read modes
![Page 18: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/18.jpg)
18
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Hierarchical control FSM
• 2 level hierarchical FSM to describe execution flow.
• Level 1: ALU task / layer type etc.
• Level 2: Within-layer execution steps.
![Page 19: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/19.jpg)
19
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Mapping Conv. Layer to design
![Page 20: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/20.jpg)
20
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Results: Parameters/overheads
![Page 21: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/21.jpg)
21
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Results: Performance
![Page 22: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/22.jpg)
22
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Results: Energy
![Page 23: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/23.jpg)
23
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Conclusions
• Versatile accelerator for visual recognition algorithms.
• 50x, 30x, 1.8x faster than CPU, GPU and DianNao.
• 4700x and 60x less energy than GPU and DianNao.
• “Only” 3x the area of DianNao.
• 320 mW at 1GHz.
![Page 24: 1 11/12/15ShiDianNao: Shifting Vision Processing Closer to the Sensor Authors – Zidong Du et al. Presented by – Gokul Subramanian Ravi November 12, 2015.](https://reader036.fdocuments.in/reader036/viewer/2022062409/5697bfa11a28abf838c95af4/html5/thumbnails/24.jpg)
24
11/12/15 ShiDianNao: Shifting Vision Processing Closer to the Sensor
Questions