ECE 636 Reconfigurable Computing Lecture 11 Reconfigurable Computing Applications
OrthrusPE: Runtime Reconfigurable Processing Elements for ...
Transcript of OrthrusPE: Runtime Reconfigurable Processing Elements for ...
![Page 1: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/1.jpg)
OrthrusPE: Runtime Reconfigurable Processing Elements
for Binary Neural Networks
Nael Fasfous
Technical University of Munich
Department of Electrical and Computer Engineering
Chair of Integrated Systems
![Page 2: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/2.jpg)
โข Optimization of Convolutional Neural Networks
โข Challenges of Binary Neural Networks
โข Motivation for Runtime Reconfigurable BNN Processing Elements
โข OrthrusPE: Dual Modes
โข Experimentation and Results
Outline
Nael Fasfous | Chair of Integrated Systems | TUM
![Page 3: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/3.jpg)
Optimization of Convolutional Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
Structural Algorithmic Hardware
![Page 4: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/4.jpg)
Optimization of Convolutional Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
Structural Algorithmic Hardware
Low-Rank
Decomp.Pruning Quantization Winograd FFT GEMM Vectorization PartitioningDataflow
![Page 5: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/5.jpg)
Optimization of Convolutional Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
Structural Algorithmic Hardware
Low-Rank
Decomp.Pruning Quantization Winograd FFT GEMM Vectorization
Bypass
PartitioningDataflow
Tucker
Decomp.
CP
Decomp.
Channel-wise
Filter-wise
Element-wise
Winograd
Sparsity
Spectral
Sparsity
Filter
Sharing
Floating
PointFixed Point
Binarization
Pow. 2
Log Base
Weight
Sharing
Array
MappingIn Out
IFmap Filter
IO OW
IW
OS IS
RS WS
![Page 6: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/6.jpg)
Overview of Binary Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
โข Binary Weights
โข Binary Activations
โข Binary Weights AND Activations
![Page 7: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/7.jpg)
Overview of Binary Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
โข Binary Weights
โข Binary Activations
โข Binary Weights AND Activations Replace Multiplications by XNOR ops
Replace Accumulations by Popcount ops
![Page 8: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/8.jpg)
Overview of Binary Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
โข Binary Weights
โข Binary Activations
โข Binary Weights AND Activations Replace Multiplications by XNOR ops
Replace Accumulations by Popcount ops
Reduce Memory requirements (1/16 of FP-16)
![Page 9: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/9.jpg)
Overview of Binary Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
โข Binary Weights
โข Binary Activations
โข Binary Weights AND Activations Replace Multiplications by XNOR ops
Replace Accumulations by Popcount ops
Severe Information Loss
Reduce Memory requirements (1/16 of FP-16)
![Page 10: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/10.jpg)
Overview of Binary Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
โข Binary Weights
โข Binary Activations
โข Binary Weights AND Activations Replace Multiplications by XNOR ops
Replace Accumulations by Popcount ops
Severe Information Loss
MNIST, CIFAR-10, SVHN
ImageNet
Reduce Memory requirements (1/16 of FP-16)
![Page 11: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/11.jpg)
Overview of Binary Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
Naรฏve Binarization
1 -2 4
10 0.1 3
-5 7 -3
๐ ๐๐๐ ๐ฅ
1 0 1
1 1 1
0 1 0
Severe information loss:
10 and 0.1 have the same effect on the
network.
๐ ๐๐๐ ๐ฅ = 0, ๐ฅ < 01, ๐ฅ โฅ 0
![Page 12: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/12.jpg)
Overview of Binary Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
Approximation through binary bases
1 -2 4
10 0.1 3
-5 7 -3
๐ ๐๐๐ ๐ฅ = 0, ๐ฅ < 01, ๐ฅ โฅ 0
๐ ๐๐๐ ๐ฅ โ 50 0 0
1 0 0
0 1 0
0 0 1
1 0 1
0 1 0
0 0 1
1 0 0
0 1 0
๐ ๐๐๐ ๐ฅ โ 3
๐ ๐๐๐ ๐ฅ โ 4
![Page 13: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/13.jpg)
Overview of Binary Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
Approximation through binary bases
1 -2 4
10 0.1 3
-5 7 -3
๐ ๐๐๐ ๐ฅ โ 50 0 0
1 0 0
0 1 0
๐ ๐๐๐ ๐ฅ = 0, ๐ฅ < 01, ๐ฅ โฅ 0
0 0 1
1 0 1
0 1 0
0 0 1
1 0 0
0 1 0
๐ ๐๐๐ ๐ฅ โ 3
๐ ๐๐๐ ๐ฅ โ 4
Captured Information:
0.1 is less positive than 10
![Page 14: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/14.jpg)
Overview of Binary Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
Approximation through binary bases
1 -2 4
10 0.1 3
-5 7 -3
๐ ๐๐๐ ๐ฅ โ 50 0 0
1 0 0
0 1 0
๐ ๐๐๐ ๐ฅ = 0, ๐ฅ < 01, ๐ฅ โฅ 0
0 0 1
1 0 1
0 1 0
0 0 1
1 0 0
0 1 0
๐ ๐๐๐ ๐ฅ โ 3
๐ ๐๐๐ ๐ฅ โ 4
Captured Information:
0.1 is less positive than 10
4 and 3 lie between 10 and 0.1
![Page 15: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/15.jpg)
Overview of Binary Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
Approximation through binary bases
1 -2 4
10 0.1 3
-5 7 -3
๐ ๐๐๐ ๐ฅ โ 50 0 0
1 0 0
0 1 0
๐ ๐๐๐ ๐ฅ = 0, ๐ฅ < 01, ๐ฅ โฅ 0
0 0 1
1 0 1
0 1 0
0 0 1
1 0 0
0 1 0
๐ ๐๐๐ ๐ฅ โ 3
๐ ๐๐๐ ๐ฅ โ 4
Captured Information:
0.1 is less positive than 10
4 and 3 lie between 10 and 0.1
4 and 3 are single unit away from
each other
![Page 16: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/16.jpg)
Overview of Binary Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
Approximation through binary bases
1 -2 4
10 0.1 3
-5 7 -3
๐ ๐๐๐ ๐ฅ โ 50 0 0
1 0 0
0 1 0
๐ ๐๐๐ ๐ฅ = 0, ๐ฅ < 01, ๐ฅ โฅ 0
0 0 1
1 0 1
0 1 0
0 0 1
1 0 0
0 1 0
๐ ๐๐๐ ๐ฅ โ 3
๐ ๐๐๐ ๐ฅ โ 4
Captured Information:
0.1 is less positive than 10
4 and 3 lie between 10 and 0.1
4 and 3 are single unit away from
each other
Information can be extracted for
negative numbers, e.g. ๐ ๐๐๐ ๐ฅ + 3
![Page 17: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/17.jpg)
Overview of Binary Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
Approximation through binary bases
1 -2 4
10 0.1 3
-5 7 -3
๐ ๐๐๐ ๐ฅ โ 50 0 0
1 0 0
0 1 0
๐ ๐๐๐ ๐ฅ = 0, ๐ฅ < 01, ๐ฅ โฅ 0
0 0 1
1 0 1
0 1 0
0 0 1
1 0 0
0 1 0
๐ ๐๐๐ ๐ฅ โ 3
๐ ๐๐๐ ๐ฅ โ 4
๐๐๐๐ถ๐๐๐ฃ
๐๐๐๐ถ๐๐๐ฃ
๐๐๐๐ถ๐๐๐ฃ
๐ถ๐๐๐ฃ
ร ๐ผ1
ร ๐ผ2
ร ๐ผ3
[1] X. Lin et al., โTowards accurate binary convolutional neural network,โ NIPS, 2017.
![Page 18: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/18.jpg)
Overview of Binary Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
Approximation through binary bases
1 -2 4
10 0.1 3
-5 7 -3
๐ ๐๐๐ ๐ฅ โ 50 0 0
1 0 0
0 1 0
๐ ๐๐๐ ๐ฅ = 0, ๐ฅ < 01, ๐ฅ โฅ 0
0 0 1
1 0 1
0 1 0
0 0 1
1 0 0
0 1 0
๐ ๐๐๐ ๐ฅ โ 3
๐ ๐๐๐ ๐ฅ โ 4
๐๐๐๐ถ๐๐๐ฃ
๐๐๐๐ถ๐๐๐ฃ
๐๐๐๐ถ๐๐๐ฃ
๐ถ๐๐๐ฃ
ร ๐ผ1
ร ๐ผ2
ร ๐ผ3
ฮฑ: learned during
training to induce
further diversity in
information.
[1] X. Lin et al., โTowards accurate binary convolutional neural network,โ NIPS, 2017.
![Page 19: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/19.jpg)
Overview of Binary Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
Approximation through binary bases
1 -2 4
10 0.1 3
-5 7 -3
๐ ๐๐๐ ๐ฅ โ 50 0 0
1 0 0
0 1 0
๐ ๐๐๐ ๐ฅ = 0, ๐ฅ < 01, ๐ฅ โฅ 0
0 0 1
1 0 1
0 1 0
0 0 1
1 0 0
0 1 0
๐ ๐๐๐ ๐ฅ โ 3
๐ ๐๐๐ ๐ฅ โ 4
๐๐๐๐ถ๐๐๐ฃ
๐๐๐๐ถ๐๐๐ฃ
๐๐๐๐ถ๐๐๐ฃ
๐ถ๐๐๐ฃ
ร ๐ผ1
ร ๐ผ2
ร ๐ผ3
ฮฑ: learned during
training to induce
further diversity in
information.
BNNs with only 4-5%
accuracy degradation
from full-precision
CNN on ImageNet [1].
[1] X. Lin et al., โTowards accurate binary convolutional neural network,โ NIPS, 2017.
![Page 20: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/20.jpg)
How Binary are Binary Neural Networks?
Nael Fasfous | Chair of Integrated Systems | TUM
K Ci
Xi
Yi
CiK Ci
๐ด๐ = ๐ถ๐๐๐ฃ(๐๐ , ๐ด๐โ1)
Co
๐๐ โ โ๐พร๐พร๐ถ๐ร๐ถ๐๐ด๐โ1 โ โ๐๐ร๐๐ร๐ถ๐
![Page 21: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/21.jpg)
How Binary are Binary Neural Networks?
Nael Fasfous | Chair of Integrated Systems | TUM
K Ci
Xi
Yi
CiK Ci
๐ต๐ โ ๐น๐พร๐พร๐ถ๐ร๐ร๐ถ๐๐ป๐โ1 โ ๐น๐๐ร๐๐ร๐ถ๐ร๐
๐น = {0,1}
๐ด๐ = ๐ถ๐๐๐ฃ(๐๐ , ๐ด๐โ1)
Co
![Page 22: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/22.jpg)
How Binary are Binary Neural Networks?
Nael Fasfous | Chair of Integrated Systems | TUM
๐ด๐ = ๐ถ๐๐๐ฃ(๐๐ , ๐ด๐โ1)
K Ci
Xi
Yi
N
M
Ci
K Ci
M
โฆ
Co
๐ต๐ โ ๐น๐พร๐พร๐ถ๐ร๐ร๐ถ๐๐ป๐โ1 โ ๐น๐๐ร๐๐ร๐ถ๐ร๐
๐น = {0,1}
![Page 23: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/23.jpg)
How Binary are Binary Neural Networks?
Nael Fasfous | Chair of Integrated Systems | TUM
K Ci
Xi
Yi
N
M
Ci
K Ci
M
โฆ
Co
๐ต๐ โ ๐น๐พร๐พร๐ถ๐ร๐ร๐ถ๐๐ป๐โ1 โ ๐น๐๐ร๐๐ร๐ถ๐ร๐
๐น = {0,1}
๐ด๐ =
๐=1
๐
๐=1
๐
๐ผ๐๐ฝ๐๐ต๐๐๐ถ๐๐๐ฃ(๐ต๐๐ , ๐ป๐
๐โ1)
![Page 24: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/24.jpg)
How Binary are Binary Neural Networks?
Nael Fasfous | Chair of Integrated Systems | TUM
๐น = {0,1}
๐ด๐ =
๐=1
๐
๐=1
๐
๐ผ๐๐ฝ๐๐ต๐๐๐ถ๐๐๐ฃ(๐ต๐๐ , ๐ป๐
๐โ1)
K Ci
Xi
Yi
N
M
Ci
K Ci
M
โฆ
Co
๐ต๐๐ โ ๐น๐พร๐พร๐ถ๐ร๐ถ๐๐ป๐
๐โ1 โ ๐น๐๐ร๐๐ร๐ถ๐
![Page 25: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/25.jpg)
How Binary are Binary Neural Networks?
Nael Fasfous | Chair of Integrated Systems | TUM
๐น = {0,1}
๐๐,๐ =
๐๐=1
๐ถ๐
(๐๐,๐,๐๐)
๐ด๐ =
๐=1
๐
๐=1
๐
๐ผ๐๐ฝ๐๐ต๐๐๐ถ๐๐๐ฃ(๐ต๐๐ , ๐ป๐
๐โ1)
K Ci
Xi
Yi
N
M
Ci
K Ci
M
โฆ
Co
๐ต๐๐ โ ๐น๐พร๐พร๐ถ๐ร๐ถ๐๐ป๐
๐โ1 โ ๐น๐๐ร๐๐ร๐ถ๐
![Page 26: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/26.jpg)
How Binary are Binary Neural Networks?
Nael Fasfous | Chair of Integrated Systems | TUM
K Ci
Xi
Yi
N
M
Ci
K Ci
M
โฆ
Co
๐น = {0,1}
๐ด๐ =
๐=1
๐
๐=1
๐
๐ผ๐๐ฝ๐๐ต๐๐๐ถ๐๐๐ฃ(๐ต๐๐ , ๐ป๐
๐โ1)
๐ต๐๐ โ ๐น๐พร๐พร๐ถ๐ร๐ถ๐๐ป๐
๐โ1 โ ๐น๐๐ร๐๐ร๐ถ๐
๐๐,๐,๐๐ =
๐๐ฅ=1
๐พ
๐๐ฆ=1
๐พ
๐ฅ๐๐๐ ๐๐๐ฅ, ๐๐ฆ , โ๐ฅ๐+๐๐ฅ, ๐ฆ๐+๐๐ฆ
๐๐,๐ =
๐๐=1
๐ถ๐
(๐๐,๐,๐๐)
![Page 27: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/27.jpg)
How Binary are Binary Neural Networks?
Nael Fasfous | Chair of Integrated Systems | TUM
K Ci
Xi
Yi
N
M
Ci
K Ci
M
โฆ
Co
๐น = {0,1}
๐ด๐ =
๐=1
๐
๐=1
๐
๐ผ๐๐ฝ๐๐ต๐๐๐ถ๐๐๐ฃ(๐ต๐๐ , ๐ป๐
๐โ1)
๐ต๐๐ โ ๐น๐พร๐พร๐ถ๐ร๐ถ๐๐ป๐
๐โ1 โ ๐น๐๐ร๐๐ร๐ถ๐
๐๐,๐ =
๐๐=1
๐ถ๐
(๐๐,๐,๐๐)
๐๐,๐,๐๐ = ๐๐๐๐๐๐ก (๐ฅ๐๐๐ ๐๐๐ฅ, ๐๐ฆ , โ๐ฅ๐+๐๐ฅ, ๐ฆ๐+๐๐ฆ )
![Page 28: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/28.jpg)
How Binary are Binary Neural Networks?
Nael Fasfous | Chair of Integrated Systems | TUM
K Ci
Xi
Yi
N
M
Ci
K Ci
M
โฆ
Co
๐น = {0,1}
๐ด๐ =
๐=1
๐
๐=1
๐
๐ผ๐๐ฝ๐๐ต๐๐๐ถ๐๐๐ฃ(๐ต๐๐ , ๐ป๐
๐โ1)
๐๐,๐ =
๐๐=1
๐ถ๐
(๐๐,๐,๐๐)
๐ต๐๐ โ ๐น๐พร๐พร๐ถ๐ร๐ถ๐๐ป๐
๐โ1 โ ๐น๐๐ร๐๐ร๐ถ๐
โ๐โ1 ๐๐
๐๐,๐,๐๐ = ๐๐๐๐๐๐ก (๐ฅ๐๐๐ ๐๐๐ฅ, ๐๐ฆ , โ๐ฅ๐+๐๐ฅ, ๐ฆ๐+๐๐ฆ )
![Page 29: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/29.jpg)
How Binary are Binary Neural Networks?
Nael Fasfous | Chair of Integrated Systems | TUM
K Ci
Xi
Yi
N
M
Ci
K Ci
M
โฆ
Co
๐น = {0,1}
๐ด๐ =
๐=1
๐
๐=1
๐
๐ผ๐๐ฝ๐๐ต๐๐๐ถ๐๐๐ฃ(๐ต๐๐ , ๐ป๐
๐โ1)
๐๐,๐ =
๐๐=1
๐ถ๐
(๐๐,๐,๐๐)
๐ต๐๐ โ ๐น๐พร๐พร๐ถ๐ร๐ถ๐๐ป๐
๐โ1 โ ๐น๐๐ร๐๐ร๐ถ๐
โ๐โ1 ๐๐
๐๐,๐,๐๐ = ๐๐๐๐๐๐ก (๐ฅ๐๐๐ ๐๐๐ฅ, ๐๐ฆ , โ๐ฅ๐+๐๐ฅ, ๐ฆ๐+๐๐ฆ )
Binary
Fixed-point
![Page 30: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/30.jpg)
More FP in Binary Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
โข Binary Weight and Activation Bases (Scale/Shift)
โข First Layer Remains Non-Binarized
โข Batch Normalization
![Page 31: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/31.jpg)
More FP in Binary Neural Networks
Nael Fasfous | Chair of Integrated Systems | TUM
โข Binary Weight and Activation Bases (Scale/Shift)
โข First Layer Remains Non-Binarized
โข Batch Normalization
Motivation for Reconfigurable Processing Elements
Fixed-Point Ops Binary Ops
![Page 32: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/32.jpg)
OrthrusPE: Runtime Reconfigurable PEs for BNNs
Nael Fasfous | Chair of Integrated Systems | TUM
Dual Modes
โข Fixed-precision mode: First layer, Batch-norm, Scale and Shift Operations
โข Binary mode: SIMD Binary Hadamard Products and Popcounts
โข Achieved with high resource reuse
![Page 33: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/33.jpg)
OrthrusPE: Binary Mode
Nael Fasfous | Chair of Integrated Systems | TUM
Efficient SIMD Binary Hadamard Product Execution
1 1 1 1 1 1 0 X
0 0 0 0 0 0 0 X
1 0 1 0 1 0 0 X
X X X X X X X X
1 0 1
1 0 1
1 0 1
Input: hl-1 โ Hnl-1
Kernel: bl โ Bml
![Page 34: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/34.jpg)
OrthrusPE: Binary Mode
Nael Fasfous | Chair of Integrated Systems | TUM
Efficient SIMD Binary Hadamard Product Execution
1 1 1 1 1 1 0 X
0 0 0 0 0 0 0 X
1 0 1 0 1 0 0 X
X X X X X X X X
1 0 1
1 0 1
1 0 1
Input: hl-1 โ Hnl-1
Kernel: bl โ Bml
Sig A
Sig B
1 1 1
0 0 0
1 0 1
1 1 1
0 0 0
0 1 0
1 1 1
0 0 0
1 0 1
1
0
0
1 1
0 0
1 0
1 1 0
0 0 0
1 0 0
X
X
X
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
X
X
X
Sig C
![Page 35: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/35.jpg)
OrthrusPE: Binary Mode
Nael Fasfous | Chair of Integrated Systems | TUM
Efficient SIMD Binary Hadamard Product Execution
1 1 1 1 1 1 0 X
0 0 0 0 0 0 0 X
1 0 1 0 1 0 0 X
X X X X X X X X
1 0 1
1 0 1
1 0 1
Input: hl-1 โ Hnl-1
Kernel: bl โ Bml
DSP
Sig P
Sig A
Sig B
1 1 1
0 0 0
1 0 1
1 1 1
0 0 0
0 1 0
1 1 1
0 0 0
1 0 1
1
0
0
1 1
0 0
1 0
1 1 0
0 0 0
1 0 0
X
X
X
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
1 0 1
X
X
X
Sig C
![Page 36: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/36.jpg)
OrthrusPE: Dual Modes
Nael Fasfous | Chair of Integrated Systems | TUM
Runtime Reconfigurability
Binary Mode
Fixed-Precision Mode
Reconfiguration Signals
Using the same
hardware resource for
two distinct, critical
BNN operations
![Page 37: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/37.jpg)
OrthrusPE: Dual Modes
Nael Fasfous | Chair of Integrated Systems | TUM
Runtime Reconfigurability
Binary Mode
Fixed-Precision Mode
Reconfiguration Signals
Using the same
hardware resource for
two distinct, critical
BNN operations
![Page 38: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/38.jpg)
OrthrusPE: Dual Modes
Nael Fasfous | Chair of Integrated Systems | TUM
Runtime Reconfigurability
Binary Mode
Fixed-Precision Mode
Reconfiguration Signals
Using the same
hardware resource for
two distinct, critical
BNN operations
![Page 39: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/39.jpg)
OrthrusPE: Dual Modes
Nael Fasfous | Chair of Integrated Systems | TUM
Runtime Reconfigurability
Binary Mode
Fixed-Precision Mode
Reconfiguration Signals
Using the same
hardware resource for
two distinct, critical
BNN operations
![Page 40: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/40.jpg)
Experimentation and Evaluation
Nael Fasfous | Chair of Integrated Systems | TUM
Synthesized Four Throughput-Equivalent Configurations:
โข OrthrusPE
โข OrthrusPE-DS (Dual-Static): SIMD Binary Hadamard Products on Static DSP
โข Hybrid (Common): Binary operations on LUTs, FP operations on DSP
โข All-LUT: Execution restricted to LUTs
![Page 41: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/41.jpg)
Experimentation and Evaluation
Nael Fasfous | Chair of Integrated Systems | TUM
Resource Utilization
โข OrthrusPE and OrthrusPE-DS are more
resource efficient across all target
accelerator frequencies.
![Page 42: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/42.jpg)
Experimentation and Evaluation
Nael Fasfous | Chair of Integrated Systems | TUM
Resource Utilization
โข OrthrusPEโs closest FINN configuration
@200MHz
โข 16 Extra Bit Accumulations
โข 3 MACs (through reconfigurability)
โข 32% fewer LUTs
![Page 43: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/43.jpg)
Experimentation and Evaluation
Nael Fasfous | Chair of Integrated Systems | TUM
Dynamic Power Estimation
โข OrthrusPE more efficient across all
frequencies
โข Results scale as accelerators use 100-
1000s of PEs
![Page 44: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/44.jpg)
Conclusion and Future Work
Nael Fasfous | Chair of Integrated Systems | TUM
โข Accurate BNNs cannot be achieved without fixed-point operations and reliance on
DSP blocks.
โข OrthrusPE improves the efficiency of computation by executing both fixed-point and
binary ops on FPGA hard blocks.
โข Accurate BNNs solve many of the computation and memory challenges for deep neural
network workloads on edge devices.
![Page 45: OrthrusPE: Runtime Reconfigurable Processing Elements for ...](https://reader030.fdocuments.in/reader030/viewer/2022012709/61a9afd3f6262e788339a599/html5/thumbnails/45.jpg)
Nael Fasfous | Chair of Integrated Systems | TUM
Thank you for your attention