Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor...
Transcript of Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor...
![Page 1: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/1.jpg)
Neural Network Accelerationon Mobile Devices
Dongyoung KimMachine learning team
HyperconnectSeoul, South Korea
![Page 2: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/2.jpg)
Agenda
● Introduction
● Neural Processing Unit (NPU)○ NPU utilizing sparsity
Zero-aware neural network accelerator (ZeNA)
○ NPU utilizing reduced precision Outlier quantization and Precision highway
● Working as an AI System Architect in the Industry
● On-device Machine Learning○ Neural network acceleration on mobile CPU
○ Optimizing neural network for mobile devices
Temporal convolution for real-time keyword spotting on mobile devices
![Page 3: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/3.jpg)
Introduction
![Page 4: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/4.jpg)
Deep Neural Network (DNN)
Self driving car Voice recognition Feed and ad
● Deep neural networks are ubiquitous in various applications.
![Page 5: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/5.jpg)
DNN on Real‐time Mobile Devices
Self driving car Virtual Reality (VR) Augmented Reality (AR)
● Especially, DNN shows reliable result on real-time mobile devices.
![Page 6: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/6.jpg)
Challenges of DNN Applications on Mobile Devices
Embedded systemsGPU server
● DNN based applications perform well on large devices which have abundant resources but they are unsuitable for mobile devices.
![Page 7: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/7.jpg)
Neural Processing Unit (NPU)
![Page 8: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/8.jpg)
Neural Processing Unit (NPU)
A11 / A12 Bionic (Apple)
Snapdragon 855 (Qualcomm)
Kirin 970 (Huawei) Exynos 9820 (Samsung)
Booming of NPU
![Page 9: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/9.jpg)
NPU Utilizing Sparsity
![Page 10: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/10.jpg)
Sparsity
● Significant portion of input values in a convolutional layer is zero.● Large number of ineffectual computations can be skipped.● However, it is difficult to utilize sparsity on CPU.
AlexNet
Layer Zero Weight [%] Zero Activation [%]conv1 15.7 0
conv2 62.1 50.9
conv3 65.4 76.3
conv4 62.8 61.8
conv5 63.1 59.0
Pruning ReLU
![Page 11: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/11.jpg)
Previous Works
AlexNet
Layer Zero Weight [%] Zero Activation [%]conv1 15.7 0
conv2 62.1 50.9
conv3 65.4 76.3
conv4 62.8 61.8
conv5 63.1 59.0
Energy ↓
Run me ↓Energy ↓
Run me ↓
● DaDianNao: wide SIMD-like architecture.● Eyeriss: clock-gating for multiplications with zero activations.● Cnvlutin: utilizes zero activations for performance and energy.● Cambricon-X: utilizes zero weights for performance and energy.
Exploit zero values in both kernel weights and input activations
![Page 12: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/12.jpg)
Basic Idea of Zero‐aware Neural Network Accelerator (ZeNA)
PE 0
PE 1
Proportional to intersection of non-zero inputs
K: kernel weightsA: input activationsP: partial sums
Cambricon-X
Cnvlutin
BroadcastZeNA
Zero-induced load imbalance
*
*
Zero bit-vectors
Zero bit-vectors
![Page 13: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/13.jpg)
ZeNA Architecture Overview
Act SRAM
ReLUmodule
PE array
Activation
Weight
Conv result
Psum
Bitvec
Output FM
PE PE PE PE
PE PE PE PE
PE PE PE PE
PE PE PE PE
WeightSRAM
Weight SRAM storesKernel weightsNon-zero bit-vectors
Output FM
BitvecReLU module generates
Output feature mapsNon-zero bit-vectors
Bus connectsOn-chip SRAM and PE array
Act SRAM storesActivationsOutput partial sumsOutput feature mapsNon-zero bit-vectors
![Page 14: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/14.jpg)
Work Group (WG)
Activation Kernel
Activation Kernel
WG 1
Activation Kernel
WG 0
● Work group (WG): Spatial dimension of input activations is divided into work groups.
![Page 15: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/15.jpg)
Work Group (WG)
Activation Kernel
Activation Kernel
WG 0
WG 1
Sub-WG: WG is divided into multiple subsets in the channel dimension of output feature map.
Activation Kernel
Sub-WG 0 of WG 0
Activation Kernel
Sub-WG 1 of WG 0
Activation Kernel
Sub-WG 0 of WG 1
Activation Kernel
Sub-WG 1 of WG 1
![Page 16: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/16.jpg)
Work Group (WG)
Activation Kernel
Sub-WG 0 of WG 0
Activation Kernel
Sub-WG 0 of WG 1
Sub-WG 0 of WG 0
Activation tiles Kernel tiles
Sub-WG 0 of WG 1
Activation tiles Kernel tiles
● Each sub-WG consists of activation tiles and kernel tiles.
![Page 17: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/17.jpg)
Computation ProcedureActivation Kernel
Output feature map
WG 0
WG 1
Sub-WG 0
Sub-WG 1
● Our accelerator performs convolution with activation and kernel tiles iterativelyto compute output feature maps.
![Page 18: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/18.jpg)
Data Flow and Computation: Kernel Broadcast
Weight
Act SRAM
ReLUmodule
Activation
Conv result
Psum
Bitvec
Output FM
WeightSRAM
PE array
PE group 0PE 0 PE 1
PE 2 PE 3
PE 0 PE 1
PE 2 PE 3
PE group 1
![Page 19: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/19.jpg)
Data Flow and Computation: Kernel Broadcast
Weight SRAM
Weight
Activation Output feature mapKernel
* =
PE array
PE group 0PE 0 PE 1
PE 2 PE 3
PE 0 PE 1
PE 2 PE 3
PE group 1Broadcast
WG1
WG0 Sub-WG 0
Sub-WG 1
![Page 20: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/20.jpg)
Data Flow and Computation: Activation Broadcast
Weight
Act SRAM
ReLUmodule
Activation
Conv result
Psum
Bitvec
Output FM
WeightSRAM
PE array
PE group 0PE 0 PE 1
PE 2 PE 3
PE 0 PE 1
PE 2 PE 3
PE group 1
![Page 21: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/21.jpg)
Data Flow and Computation: Activation Broadcast
Act SRAM
PE array
PE group 0PE 0 PE 1
PE 2 PE 3
PE 0 PE 1
PE 2 PE 3
PE group 1
BroadcastBroadcast only the difference
Previous activation tiles are stored in local buffer
Zing-zag scan
Activation Output feature mapKernel
* =
WG1
WG0 Sub-WG 0
Sub-WG 1
Activation
![Page 22: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/22.jpg)
Zero‐aware PE
Non-zero index
Act buffer
Act zero bit-vector1 1 0 0 1
0 1 0 1 1Weight zero bit-vector
Fetch controller
Weight buffer
Psum buffer+
AND
× ×
PE
3 0
![Page 23: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/23.jpg)
Zero‐induced Load Imbalance
PE 0
PE 1
*
*
PE 0
PE 1
*
*
Sub-WG 0 Sub-WG 1
Before kernel allocation is applied Total runtime
● Typically, kernel weights are allocated to PEs based on the kernel index.
![Page 24: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/24.jpg)
Zero‐aware Kernel Allocation
PE 0
PE 1
*
*
PE 0
PE 1
*
*
Sub-WG 0 Sub-WG 1
Before kernel allocation is applied Total runtime
After kernel allocation is applied Total runtime● First, we sort the sets of kernel weights based on the number of non-zero weights in the sets.
Then, allocate sets of kernel weights to PEs in the sorted order
![Page 25: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/25.jpg)
Performance
0
2
4
6
Alexnet VGG-16
Spee
dup
WZ AZ WAZ WAZ+KA
AlexNet
4x (AlexNet) and 5.2x (VGG-16) speed up w.r.t. Eyeriss. 1.8x (AlexNet) and 2.1x (VGG-16) speed up w.r.t. AZ (Cnvlutin). 19.6% (AlexNet) and 25.8% (VGG-16) speed up w.r.t. WAZ.
4x
5.2x
1.8x
2.1x
19.6%
25.8%
![Page 26: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/26.jpg)
Energy
11.3% (AlexNet) and 18% (VGG-16) energy reduction w.r.t. Eyeriss.
0
0.2
0.4
0.6
0.8
1
AlexNet VGG-16
Norm
alize
d en
ergy
/ Im
age
Eyeriss WAZ WAZ+KA
0
0.2
0.4
0.6
0.8
1
AlexNet VGG-16
Norm
alize
d en
ergy
/ Im
age
Eyeriss WAZ WAZ+KA
0
0.2
0.4
0.6
0.8
1
AlexNet VGG-16
Norm
alize
d en
ergy
/ Im
age
Eyeriss WAZ WAZ+KA
0
0.2
0.4
0.6
0.8
1
AlexNet VGG-16
Norm
alize
d en
ergy
/ Im
age
Eyeriss WAZ WAZ+KA
0
0.2
0.4
0.6
0.8
1
AlexNet VGG-16
Norm
alize
d en
ergy
/ Im
age
Eyeriss WAZ WAZ+KA
0
0.2
0.4
0.6
0.8
1
AlexNet VGG-16
Nor
mal
ized
ene
rgy
/ Im
age
SRAM leakage SRAM Local buffer Logic Bus
Exploiting zero weights
11.3% 18%
AlexNet VGG-16
![Page 27: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/27.jpg)
NPU Utilizing Reduced Precision
![Page 28: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/28.jpg)
Reduced Precision
● Neural network shows comparable accuracy after applying quantization.● Quantization method reduces computation complexity and memory
footprint.
![Page 29: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/29.jpg)
Outlier Quantization
Outlier: weight and activation having larger value than threshold. Outlier incurs quantization error. Outlier-aware Quantization
○ Keep outliers in high-precision.○ Apply linear quantization to data except outliers.
Outliers
High-precision outliers (1~3 %)(No quantization error)
Low-precision linear quantization
(Small quantization error)
![Page 30: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/30.jpg)
Accuracy
● 4-bit quantization with outliers of 0.5%already gives good accuracy without fine-tuning.
● Outliers of 3.5% lose only <1% accuracy.
![Page 31: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/31.jpg)
Precision Highway
Conventional approach Precision highway
● Precision Highway
○ Keep residual path in high-precision (8-bit).
○ Apply low-bit linear quantization (2-bit) to data except residual path.
![Page 32: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/32.jpg)
Precision Highway vs. Zhuang’s (2‐bit)
Zhuang’s: [Bohan Zhuang et al. Towards effective low-bitwidth convolutional neural networks. Computer Vision and Pattern Recognition(CVPR), 2018.]
Precision highway shows 66.71% and 73.55% TOP-1 accuracy in ResNet-18 and ResNet-50, respectively.
![Page 33: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/33.jpg)
NPU Supporting Mixed‐precision Computation (OLAccel)
● Mixed-precision computation
○ In case of outlier:
4-bit data (dense) + 16-bit activation/8-bit weight outliers (sparse)
○ In case of precision highway:
2-bit data (dense)+ 8-bit residual path (sparse)
![Page 34: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/34.jpg)
Area and Energy Reduction
● 82.3 % reduction in chip area (16-bit vs. 3-bit).● 73.1 % reduction in energy consumption (16-bit vs. 3-bit).
![Page 35: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/35.jpg)
Working as an AI System Architect in the Industry
![Page 36: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/36.jpg)
![Page 37: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/37.jpg)
![Page 38: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/38.jpg)
![Page 39: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/39.jpg)
Neural Network Acceleration on Mobile CPU
![Page 40: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/40.jpg)
Running AI Applications Using Mobile CPU
● AI applications are widely used even on low-end mobile devices where NPU is absent.
● CPU is required for executing specific operations utilized in state-of-the-art neural networks.
![Page 41: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/41.jpg)
Example of 1X1 Convolution
Input activation
Weight
* =
Output activation
∈
∈
∈
8
1
16
1
18
1
16
![Page 42: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/42.jpg)
Example of 1X1 Convolution
1
Input activation
Weight
* =
Output activation
∈
∈
∈
8
1
16
1
18
1
16
0 ,… , 7
0 ,… , 7
![Page 43: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/43.jpg)
1X1 Convolution on ARM CPU with 8‐bit Quantization (w/o quality loss)
Example of 1X1 convolution Pseudo code (int8)
Input activation
Weight
* =
Output activation
∈
∈
∈
8
1
16
1
18
1
16
0 ,… , 7
0 ,… , 7
![Page 44: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/44.jpg)
1X1 Convolution on ARM CPU with 8‐bit Quantization (w/o quality loss)
4 cycle
64 cycle
4 cycle
4 cycle
40 cycle
4 cycle
1.5x speed up
Pseudo code (int8; 48 cycle)Pseudo code (float32; 72 cycle)
![Page 45: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/45.jpg)
1X1 Convolution on ARM CPU with 8‐bit Quantization (w/ quality loss)
Pseudo code (int8 w/ quality loss)Pseudo code (int8)
![Page 46: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/46.jpg)
1X1 Convolution on ARM CPU with 8‐bit Quantization (w/ quality loss)
4 cycle
64 cycle
4 cycle
2 cycle
24 cycle
2 cycle
2.6x speed up
Pseudo code (float32; 72 cycle) Pseudo code (int8 w/ quality loss; 28 cycle)
![Page 47: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/47.jpg)
Optimizing Neural Network for Mobile Devices
![Page 48: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/48.jpg)
Neural Network Optimization for Mobile Devices
● IT industries are interested in IoT, AR/VR, robotics and self driving car.● They require tremendous computations.● Neural network optimization for mobile devices is required.
AR glass (Google, Microsoft)
Robot (Amazon)
IoT devices(Naver, Google, Amazon,
Apple, …)
Self driving car (Tesla, Waymo, Uber)
![Page 49: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/49.jpg)
Keyword Spotting (KWS)
● Keyword spotting (KWS) deals with the identification of predefined keywords in utterances.
● Recognizing wake-up word (“Hey Siri” and “OK Google”) and distinguishing common command (“yes” or “no”).
Siri: “Hey Siri” Alexa: “Alexa” Google assistant: “OK, Google”Widely used for hands-free control of mobile applications
![Page 50: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/50.jpg)
CNN‐based KWS
● CNN-based KWS studies show remarkable accuracy.● Most of CNN-based KWS approaches receive features as a 2D input of a
convolutional network.
Utterance2D input
(e.g., MFCC) CNN
![Page 51: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/51.jpg)
Challenges of KWS on Real‐time Mobile Devices
● Since use of KWS is commonly concentrated on mobile devices, the response of KWS should be both fast and accurate.
It is challenging to implement fast and accurate KWS on mobile devices with restricted hardware resources
![Page 52: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/52.jpg)
Preliminary: Spectrogram
● Time-frequency representation of a speech signal is referred to as spectrogram.
Speech signal
Ampl
itude
Time
Spectrogram
Sampling
FFT
Ampl
itude
Frequency
Ampl
itude
Frequency
![Page 53: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/53.jpg)
Preliminary: Mel‐frequency Filter
● It is observed that human ears act as filter.
● Mel-frequency filters are non-uniformly spaced on the frequency axis.
Speech signalAm
plitu
de
Time
Spectrogram
Ampl
itude
Frequency
Mel-frequency filters
Mel-spectrogram
![Page 54: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/54.jpg)
Preliminary: Mel‐frequency Cepstral Coefficients (MFCC)
● Cepstral coefficients obtained from Mel-spectrogram are referred to as Mel-Frequency Cepstral Coefficients (MFCC).
Speech signalAm
plitu
de
Time
Mel-spectrogram
MFCC
Ampl
itude
Frequency
Cepstral analysis
![Page 55: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/55.jpg)
Preliminary: Mel‐frequency Cepstral Coefficients (MFCC)
● MFCC: speech signal is represented as a sequence of cepstral vectors.
Speech signalAm
plitu
de
Time
MFCC
n
1
Tiime
MFC
Coe
ffici
ents
(re
pres
ent f
requ
ency
) Time (==1)
MFC Coefficients
MFC Coefficient
![Page 56: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/56.jpg)
Conventional 2D Convolution for KWS
Input feature map
MFCC
time ( t )frequ
ency
( f )
Output feature mapWeights
1 3
31
1* =
● Conventional 2D convolution for KWS utilizes input tensor where w = t, h = f (or vice versa), and c = 1.
![Page 57: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/57.jpg)
Conventional 2D Convolution for KWS
Input feature map
MFCC
time ( t )frequ
ency
( f )
Output feature mapWeights
1 3
3
* =(1)(2)
● Conventional 2D convolution slides window along the large spatial dimension (2D) of input feature map.
![Page 58: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/58.jpg)
Proposed Temporal Convolution for KWS
Input feature map
MFCC
time ( t )
3
I
Input feature map
MFCC
time ( t )frequ
ency
( f )
Conventional approach Proposed temporal convolution
![Page 59: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/59.jpg)
Proposed Temporal Convolution for KWS
Input feature map
MFCC
time ( t )
Weights Output feature map
1* =3
I
(1)
(2)
● Proposed temporal convolution slides window along the small spatial dimension (1D) of input feature map.
31
![Page 60: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/60.jpg)
Problem of Conventional 2D Convolution for KWS
Tiime (t)
MFC
Coe
ffici
ents
(f)
(max
= 8
0)Conventional
2D convolution
1st conv(3×3 weight)
Receptive field(3×3)
t
f
● Both low-and-high frequency data at the same time step includes informative features.
● Since modern CNNs commonly utilize small kernels, it is difficult to capture informative features from both low and high frequencies.
Input MFCC
![Page 61: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/61.jpg)
Small Receptive Field of Conventional 2D Convolution
Tiime (t)
MFC
Coe
ffici
ents
(f)
(max
= 8
0)Conventional
2D convolution
1st conv(3×3 weight)
Receptive field(2N+1) × (2N+1)
t
f
● Assume that N convolutional layers of 3 × 3 weights with a stride of one exist, the receptive field of the network only grows up to 2N+ 1.
● Conventional 2D convolution requires a large number of operations to increase receptive field.
2nd conv (3×3 weight)
N th conv (3×3 weight)
Not suitable for running on real-time mobile devices
Input MFCC
![Page 62: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/62.jpg)
Large Receptive Field of Proposed Temporal Convolution
Tiime (t)1st conv
(3×1 weight)
ProposedTemporal
convolution
MFC
Coe
ffici
ents
(f)
(max
= 8
0)
● In the proposed method, all lower-level features always participate in forming the higher-level features in the next layer.
Tiime (t)
MFC
Coe
ffici
ents
(f)
(max
= 8
0)Conventional
2D convolution
1st conv(3×3 weight)
t
f
2nd conv (3×3 weight)
N th conv (3×3 weight)
Every frequency is covered
Input MFCC
Input MFCC
![Page 63: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/63.jpg)
Small Footprint and Low Computational ComplexityInput feature map Output feature mapWeights
1 3
3
* =
Input feature map Output feature mapWeights
3
Conventional2D convolution
* =ProposedTemporal
convolution
Note that both the parameters of a conventional 2D convolution and that of the temporal convolution have the same size in this example by setting t = 98, f = 40, c = 160, and c` = 12.
MACs = 3 × 3 × 1 × × ×
= 5,644,800
MACs = = 3 × 1 × × × 1 × ′
= 141,120
31
![Page 64: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/64.jpg)
End‐to‐end Pipeline for Mobile Devices
● End-to-end pipeline for Mobile Devices
○ We release end-to-end pipeline codebase for training, evaluating, and benchmarking the baseline models and together with the proposed models.
● Proposed codebase includes following components:
○ TensorFlow models
○ Scripts to convert the models into the TensorFlow Lite models (mobile devices)
○ Pre-built TensorFlow Lite Android benchmark tool (mobile performance measurement)
● Git repo
○ https://github.com/hyperconnect/TC-ResNet
![Page 65: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/65.jpg)
Accuracy and Inference Time
● TC-ResNet8 improves 11.5%p accuracy with comparable latency compared to latency state-of-the-art (CNN-2).
![Page 66: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/66.jpg)
Accuracy and Inference Time
● TC-ResNet8 achieves 385x speedup while improving 0.3%p accuracy compared to accuracy state-of-the-art (Res15).
![Page 67: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/67.jpg)
Conclusion
![Page 68: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/68.jpg)
Neural Network Optimization in the Future
● Edge devices will run more complex tasks in the future.● Neural network acceleration becomes more important in the future.
AR glass (Google, Microsoft)
Robot (Amazon)
IoT devices(Naver, Google, Amazon,
Apple, …)
Self driving car (Tesla, Waymo, Uber)
![Page 69: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/69.jpg)
Thanks
![Page 70: Neural Network Acceleration on Mobile Devices · 2019. 9. 30. · Neural Network Optimizationfor Mobile Devices IT industries are interested in IoT, AR/VR, roboticsand self driving](https://reader034.fdocuments.in/reader034/viewer/2022051920/600db28104111826b24ffcb8/html5/thumbnails/70.jpg)
References● ZeNA
○ Dongyoung Kim, Soobeom Kim, Sungjoo Yoo, "FPGA Prototyping of Low-Precision Zero-Skipping Accelerator for Neural Networks," in Rapid System Prototyping (RSP), Nov. 2018.
○ Dongyoung Kim, Junwhan Ahn, and Sungjoo Yoo, "ZeNA: Zero-Aware Neural Network Accelerator," in IEEE Design & Test, Aug. 2017.
○ Dongyoung Kim, Junwhan Ahn, and Sungjoo Yoo, "A Novel Zero Weight/Activation-Aware Hardware Architecture of Convolutional Neural Network," in Proc. Design Automation and Test in Europe (DATE), Mar. 2017.
● Outlier quantization and Precision highway
○ Eunhyeok Park, Dongyoung Kim, Sungjoo Yoo, “Energy-Efficient Neural Network Accelerator based on Outlier-Aware Low Precision Computation," in International Symposium on Computer Architecture (ISCA), June 2018.
○ Eunhyeok Park, Dongyoung Kim, Sungjoo Yoo, Peter Vajda, “Precision Highway for Ultra Low-Precision Quantization,” arXiv preprint arXiv:1812.09818, Dec. 2018.
● Temporal convolution for real-time KWS
○ Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, Sungjoo Ha, “Temporal Convolution for Real-time Keyword Spotting on Mobile Devices,” in Annual Conference of the International Speech Communication Associatio(INTERSPEECH), Sep. 2019.