OrthrusPE: Runtime Reconfigurable Processing Elements for ...

45
OrthrusPE: Runtime Reconfigurable Processing Elements for Binary Neural Networks Nael Fasfous Technical University of Munich Department of Electrical and Computer Engineering Chair of Integrated Systems

Transcript of OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Page 1: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

OrthrusPE: Runtime Reconfigurable Processing Elements

for Binary Neural Networks

Nael Fasfous

Technical University of Munich

Department of Electrical and Computer Engineering

Chair of Integrated Systems

Page 2: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

โ€ข Optimization of Convolutional Neural Networks

โ€ข Challenges of Binary Neural Networks

โ€ข Motivation for Runtime Reconfigurable BNN Processing Elements

โ€ข OrthrusPE: Dual Modes

โ€ข Experimentation and Results

Outline

Nael Fasfous | Chair of Integrated Systems | TUM

Page 3: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Optimization of Convolutional Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Structural Algorithmic Hardware

Page 4: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Optimization of Convolutional Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Structural Algorithmic Hardware

Low-Rank

Decomp.Pruning Quantization Winograd FFT GEMM Vectorization PartitioningDataflow

Page 5: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Optimization of Convolutional Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Structural Algorithmic Hardware

Low-Rank

Decomp.Pruning Quantization Winograd FFT GEMM Vectorization

Bypass

PartitioningDataflow

Tucker

Decomp.

CP

Decomp.

Channel-wise

Filter-wise

Element-wise

Winograd

Sparsity

Spectral

Sparsity

Filter

Sharing

Floating

PointFixed Point

Binarization

Pow. 2

Log Base

Weight

Sharing

Array

MappingIn Out

IFmap Filter

IO OW

IW

OS IS

RS WS

Page 6: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

โ€ข Binary Weights

โ€ข Binary Activations

โ€ข Binary Weights AND Activations

Page 7: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

โ€ข Binary Weights

โ€ข Binary Activations

โ€ข Binary Weights AND Activations Replace Multiplications by XNOR ops

Replace Accumulations by Popcount ops

Page 8: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

โ€ข Binary Weights

โ€ข Binary Activations

โ€ข Binary Weights AND Activations Replace Multiplications by XNOR ops

Replace Accumulations by Popcount ops

Reduce Memory requirements (1/16 of FP-16)

Page 9: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

โ€ข Binary Weights

โ€ข Binary Activations

โ€ข Binary Weights AND Activations Replace Multiplications by XNOR ops

Replace Accumulations by Popcount ops

Severe Information Loss

Reduce Memory requirements (1/16 of FP-16)

Page 10: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

โ€ข Binary Weights

โ€ข Binary Activations

โ€ข Binary Weights AND Activations Replace Multiplications by XNOR ops

Replace Accumulations by Popcount ops

Severe Information Loss

MNIST, CIFAR-10, SVHN

ImageNet

Reduce Memory requirements (1/16 of FP-16)

Page 11: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Naรฏve Binarization

1 -2 4

10 0.1 3

-5 7 -3

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ

1 0 1

1 1 1

0 1 0

Severe information loss:

10 and 0.1 have the same effect on the

network.

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ = 0, ๐‘ฅ < 01, ๐‘ฅ โ‰ฅ 0

Page 12: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Approximation through binary bases

1 -2 4

10 0.1 3

-5 7 -3

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ = 0, ๐‘ฅ < 01, ๐‘ฅ โ‰ฅ 0

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 50 0 0

1 0 0

0 1 0

0 0 1

1 0 1

0 1 0

0 0 1

1 0 0

0 1 0

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 3

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 4

Page 13: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Approximation through binary bases

1 -2 4

10 0.1 3

-5 7 -3

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 50 0 0

1 0 0

0 1 0

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ = 0, ๐‘ฅ < 01, ๐‘ฅ โ‰ฅ 0

0 0 1

1 0 1

0 1 0

0 0 1

1 0 0

0 1 0

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 3

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 4

Captured Information:

0.1 is less positive than 10

Page 14: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Approximation through binary bases

1 -2 4

10 0.1 3

-5 7 -3

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 50 0 0

1 0 0

0 1 0

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ = 0, ๐‘ฅ < 01, ๐‘ฅ โ‰ฅ 0

0 0 1

1 0 1

0 1 0

0 0 1

1 0 0

0 1 0

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 3

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 4

Captured Information:

0.1 is less positive than 10

4 and 3 lie between 10 and 0.1

Page 15: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Approximation through binary bases

1 -2 4

10 0.1 3

-5 7 -3

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 50 0 0

1 0 0

0 1 0

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ = 0, ๐‘ฅ < 01, ๐‘ฅ โ‰ฅ 0

0 0 1

1 0 1

0 1 0

0 0 1

1 0 0

0 1 0

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 3

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 4

Captured Information:

0.1 is less positive than 10

4 and 3 lie between 10 and 0.1

4 and 3 are single unit away from

each other

Page 16: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Approximation through binary bases

1 -2 4

10 0.1 3

-5 7 -3

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 50 0 0

1 0 0

0 1 0

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ = 0, ๐‘ฅ < 01, ๐‘ฅ โ‰ฅ 0

0 0 1

1 0 1

0 1 0

0 0 1

1 0 0

0 1 0

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 3

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 4

Captured Information:

0.1 is less positive than 10

4 and 3 lie between 10 and 0.1

4 and 3 are single unit away from

each other

Information can be extracted for

negative numbers, e.g. ๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ + 3

Page 17: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Approximation through binary bases

1 -2 4

10 0.1 3

-5 7 -3

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 50 0 0

1 0 0

0 1 0

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ = 0, ๐‘ฅ < 01, ๐‘ฅ โ‰ฅ 0

0 0 1

1 0 1

0 1 0

0 0 1

1 0 0

0 1 0

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 3

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 4

๐‘๐‘–๐‘›๐ถ๐‘œ๐‘›๐‘ฃ

๐‘๐‘–๐‘›๐ถ๐‘œ๐‘›๐‘ฃ

๐‘๐‘–๐‘›๐ถ๐‘œ๐‘›๐‘ฃ

๐ถ๐‘œ๐‘›๐‘ฃ

ร— ๐›ผ1

ร— ๐›ผ2

ร— ๐›ผ3

[1] X. Lin et al., โ€œTowards accurate binary convolutional neural network,โ€ NIPS, 2017.

Page 18: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Approximation through binary bases

1 -2 4

10 0.1 3

-5 7 -3

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 50 0 0

1 0 0

0 1 0

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ = 0, ๐‘ฅ < 01, ๐‘ฅ โ‰ฅ 0

0 0 1

1 0 1

0 1 0

0 0 1

1 0 0

0 1 0

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 3

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 4

๐‘๐‘–๐‘›๐ถ๐‘œ๐‘›๐‘ฃ

๐‘๐‘–๐‘›๐ถ๐‘œ๐‘›๐‘ฃ

๐‘๐‘–๐‘›๐ถ๐‘œ๐‘›๐‘ฃ

๐ถ๐‘œ๐‘›๐‘ฃ

ร— ๐›ผ1

ร— ๐›ผ2

ร— ๐›ผ3

ฮฑ: learned during

training to induce

further diversity in

information.

[1] X. Lin et al., โ€œTowards accurate binary convolutional neural network,โ€ NIPS, 2017.

Page 19: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Overview of Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

Approximation through binary bases

1 -2 4

10 0.1 3

-5 7 -3

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 50 0 0

1 0 0

0 1 0

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ = 0, ๐‘ฅ < 01, ๐‘ฅ โ‰ฅ 0

0 0 1

1 0 1

0 1 0

0 0 1

1 0 0

0 1 0

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 3

๐‘ ๐‘–๐‘”๐‘› ๐‘ฅ โˆ’ 4

๐‘๐‘–๐‘›๐ถ๐‘œ๐‘›๐‘ฃ

๐‘๐‘–๐‘›๐ถ๐‘œ๐‘›๐‘ฃ

๐‘๐‘–๐‘›๐ถ๐‘œ๐‘›๐‘ฃ

๐ถ๐‘œ๐‘›๐‘ฃ

ร— ๐›ผ1

ร— ๐›ผ2

ร— ๐›ผ3

ฮฑ: learned during

training to induce

further diversity in

information.

BNNs with only 4-5%

accuracy degradation

from full-precision

CNN on ImageNet [1].

[1] X. Lin et al., โ€œTowards accurate binary convolutional neural network,โ€ NIPS, 2017.

Page 20: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

K Ci

Xi

Yi

CiK Ci

๐ด๐‘™ = ๐ถ๐‘œ๐‘›๐‘ฃ(๐‘Š๐‘™ , ๐ด๐‘™โˆ’1)

Co

๐‘Š๐‘™ โˆˆ โ„๐พร—๐พร—๐ถ๐‘–ร—๐ถ๐‘œ๐ด๐‘™โˆ’1 โˆˆ โ„๐‘‹๐‘–ร—๐‘Œ๐‘–ร—๐ถ๐‘–

Page 21: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

K Ci

Xi

Yi

CiK Ci

๐ต๐‘™ โˆˆ ๐”น๐พร—๐พร—๐ถ๐‘–ร—๐‘€ร—๐ถ๐‘œ๐ป๐‘™โˆ’1 โˆˆ ๐”น๐‘‹๐‘–ร—๐‘Œ๐‘–ร—๐ถ๐‘–ร—๐‘

๐”น = {0,1}

๐ด๐‘™ = ๐ถ๐‘œ๐‘›๐‘ฃ(๐‘Š๐‘™ , ๐ด๐‘™โˆ’1)

Co

Page 22: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

๐ด๐‘™ = ๐ถ๐‘œ๐‘›๐‘ฃ(๐‘Š๐‘™ , ๐ด๐‘™โˆ’1)

K Ci

Xi

Yi

N

M

Ci

K Ci

M

โ€ฆ

Co

๐ต๐‘™ โˆˆ ๐”น๐พร—๐พร—๐ถ๐‘–ร—๐‘€ร—๐ถ๐‘œ๐ป๐‘™โˆ’1 โˆˆ ๐”น๐‘‹๐‘–ร—๐‘Œ๐‘–ร—๐ถ๐‘–ร—๐‘

๐”น = {0,1}

Page 23: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

K Ci

Xi

Yi

N

M

Ci

K Ci

M

โ€ฆ

Co

๐ต๐‘™ โˆˆ ๐”น๐พร—๐พร—๐ถ๐‘–ร—๐‘€ร—๐ถ๐‘œ๐ป๐‘™โˆ’1 โˆˆ ๐”น๐‘‹๐‘–ร—๐‘Œ๐‘–ร—๐ถ๐‘–ร—๐‘

๐”น = {0,1}

๐ด๐‘™ =

๐‘š=1

๐‘€

๐‘›=1

๐‘

๐›ผ๐‘š๐›ฝ๐‘›๐ต๐‘–๐‘›๐ถ๐‘œ๐‘›๐‘ฃ(๐ต๐‘š๐‘™ , ๐ป๐‘›

๐‘™โˆ’1)

Page 24: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

๐”น = {0,1}

๐ด๐‘™ =

๐‘š=1

๐‘€

๐‘›=1

๐‘

๐›ผ๐‘š๐›ฝ๐‘›๐ต๐‘–๐‘›๐ถ๐‘œ๐‘›๐‘ฃ(๐ต๐‘š๐‘™ , ๐ป๐‘›

๐‘™โˆ’1)

K Ci

Xi

Yi

N

M

Ci

K Ci

M

โ€ฆ

Co

๐ต๐‘š๐‘™ โˆˆ ๐”น๐พร—๐พร—๐ถ๐‘–ร—๐ถ๐‘œ๐ป๐‘›

๐‘™โˆ’1 โˆˆ ๐”น๐‘‹๐‘–ร—๐‘Œ๐‘–ร—๐ถ๐‘–

Page 25: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

๐”น = {0,1}

๐‘Ž๐‘š,๐‘› =

๐‘๐‘–=1

๐ถ๐‘–

(๐‘๐‘š,๐‘›,๐‘๐‘–)

๐ด๐‘™ =

๐‘š=1

๐‘€

๐‘›=1

๐‘

๐›ผ๐‘š๐›ฝ๐‘›๐ต๐‘–๐‘›๐ถ๐‘œ๐‘›๐‘ฃ(๐ต๐‘š๐‘™ , ๐ป๐‘›

๐‘™โˆ’1)

K Ci

Xi

Yi

N

M

Ci

K Ci

M

โ€ฆ

Co

๐ต๐‘š๐‘™ โˆˆ ๐”น๐พร—๐พร—๐ถ๐‘–ร—๐ถ๐‘œ๐ป๐‘›

๐‘™โˆ’1 โˆˆ ๐”น๐‘‹๐‘–ร—๐‘Œ๐‘–ร—๐ถ๐‘–

Page 26: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

K Ci

Xi

Yi

N

M

Ci

K Ci

M

โ€ฆ

Co

๐”น = {0,1}

๐ด๐‘™ =

๐‘š=1

๐‘€

๐‘›=1

๐‘

๐›ผ๐‘š๐›ฝ๐‘›๐ต๐‘–๐‘›๐ถ๐‘œ๐‘›๐‘ฃ(๐ต๐‘š๐‘™ , ๐ป๐‘›

๐‘™โˆ’1)

๐ต๐‘š๐‘™ โˆˆ ๐”น๐พร—๐พร—๐ถ๐‘–ร—๐ถ๐‘œ๐ป๐‘›

๐‘™โˆ’1 โˆˆ ๐”น๐‘‹๐‘–ร—๐‘Œ๐‘–ร—๐ถ๐‘–

๐‘๐‘š,๐‘›,๐‘๐‘– =

๐‘˜๐‘ฅ=1

๐พ

๐‘˜๐‘ฆ=1

๐พ

๐‘ฅ๐‘›๐‘œ๐‘Ÿ ๐‘๐‘˜๐‘ฅ, ๐‘˜๐‘ฆ , โ„Ž๐‘ฅ๐‘–+๐‘˜๐‘ฅ, ๐‘ฆ๐‘–+๐‘˜๐‘ฆ

๐‘Ž๐‘š,๐‘› =

๐‘๐‘–=1

๐ถ๐‘–

(๐‘๐‘š,๐‘›,๐‘๐‘–)

Page 27: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

K Ci

Xi

Yi

N

M

Ci

K Ci

M

โ€ฆ

Co

๐”น = {0,1}

๐ด๐‘™ =

๐‘š=1

๐‘€

๐‘›=1

๐‘

๐›ผ๐‘š๐›ฝ๐‘›๐ต๐‘–๐‘›๐ถ๐‘œ๐‘›๐‘ฃ(๐ต๐‘š๐‘™ , ๐ป๐‘›

๐‘™โˆ’1)

๐ต๐‘š๐‘™ โˆˆ ๐”น๐พร—๐พร—๐ถ๐‘–ร—๐ถ๐‘œ๐ป๐‘›

๐‘™โˆ’1 โˆˆ ๐”น๐‘‹๐‘–ร—๐‘Œ๐‘–ร—๐ถ๐‘–

๐‘Ž๐‘š,๐‘› =

๐‘๐‘–=1

๐ถ๐‘–

(๐‘๐‘š,๐‘›,๐‘๐‘–)

๐‘๐‘š,๐‘›,๐‘๐‘– = ๐‘๐‘œ๐‘๐‘๐‘›๐‘ก (๐‘ฅ๐‘›๐‘œ๐‘Ÿ ๐‘๐‘˜๐‘ฅ, ๐‘˜๐‘ฆ , โ„Ž๐‘ฅ๐‘–+๐‘˜๐‘ฅ, ๐‘ฆ๐‘–+๐‘˜๐‘ฆ )

Page 28: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

K Ci

Xi

Yi

N

M

Ci

K Ci

M

โ€ฆ

Co

๐”น = {0,1}

๐ด๐‘™ =

๐‘š=1

๐‘€

๐‘›=1

๐‘

๐›ผ๐‘š๐›ฝ๐‘›๐ต๐‘–๐‘›๐ถ๐‘œ๐‘›๐‘ฃ(๐ต๐‘š๐‘™ , ๐ป๐‘›

๐‘™โˆ’1)

๐‘Ž๐‘š,๐‘› =

๐‘๐‘–=1

๐ถ๐‘–

(๐‘๐‘š,๐‘›,๐‘๐‘–)

๐ต๐‘š๐‘™ โˆˆ ๐”น๐พร—๐พร—๐ถ๐‘–ร—๐ถ๐‘œ๐ป๐‘›

๐‘™โˆ’1 โˆˆ ๐”น๐‘‹๐‘–ร—๐‘Œ๐‘–ร—๐ถ๐‘–

โ„Ž๐‘™โˆ’1 ๐‘๐‘™

๐‘๐‘š,๐‘›,๐‘๐‘– = ๐‘๐‘œ๐‘๐‘๐‘›๐‘ก (๐‘ฅ๐‘›๐‘œ๐‘Ÿ ๐‘๐‘˜๐‘ฅ, ๐‘˜๐‘ฆ , โ„Ž๐‘ฅ๐‘–+๐‘˜๐‘ฅ, ๐‘ฆ๐‘–+๐‘˜๐‘ฆ )

Page 29: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

How Binary are Binary Neural Networks?

Nael Fasfous | Chair of Integrated Systems | TUM

K Ci

Xi

Yi

N

M

Ci

K Ci

M

โ€ฆ

Co

๐”น = {0,1}

๐ด๐‘™ =

๐‘š=1

๐‘€

๐‘›=1

๐‘

๐›ผ๐‘š๐›ฝ๐‘›๐ต๐‘–๐‘›๐ถ๐‘œ๐‘›๐‘ฃ(๐ต๐‘š๐‘™ , ๐ป๐‘›

๐‘™โˆ’1)

๐‘Ž๐‘š,๐‘› =

๐‘๐‘–=1

๐ถ๐‘–

(๐‘๐‘š,๐‘›,๐‘๐‘–)

๐ต๐‘š๐‘™ โˆˆ ๐”น๐พร—๐พร—๐ถ๐‘–ร—๐ถ๐‘œ๐ป๐‘›

๐‘™โˆ’1 โˆˆ ๐”น๐‘‹๐‘–ร—๐‘Œ๐‘–ร—๐ถ๐‘–

โ„Ž๐‘™โˆ’1 ๐‘๐‘™

๐‘๐‘š,๐‘›,๐‘๐‘– = ๐‘๐‘œ๐‘๐‘๐‘›๐‘ก (๐‘ฅ๐‘›๐‘œ๐‘Ÿ ๐‘๐‘˜๐‘ฅ, ๐‘˜๐‘ฆ , โ„Ž๐‘ฅ๐‘–+๐‘˜๐‘ฅ, ๐‘ฆ๐‘–+๐‘˜๐‘ฆ )

Binary

Fixed-point

Page 30: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

More FP in Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

โ€ข Binary Weight and Activation Bases (Scale/Shift)

โ€ข First Layer Remains Non-Binarized

โ€ข Batch Normalization

Page 31: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

More FP in Binary Neural Networks

Nael Fasfous | Chair of Integrated Systems | TUM

โ€ข Binary Weight and Activation Bases (Scale/Shift)

โ€ข First Layer Remains Non-Binarized

โ€ข Batch Normalization

Motivation for Reconfigurable Processing Elements

Fixed-Point Ops Binary Ops

Page 32: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

OrthrusPE: Runtime Reconfigurable PEs for BNNs

Nael Fasfous | Chair of Integrated Systems | TUM

Dual Modes

โ€ข Fixed-precision mode: First layer, Batch-norm, Scale and Shift Operations

โ€ข Binary mode: SIMD Binary Hadamard Products and Popcounts

โ€ข Achieved with high resource reuse

Page 33: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

OrthrusPE: Binary Mode

Nael Fasfous | Chair of Integrated Systems | TUM

Efficient SIMD Binary Hadamard Product Execution

1 1 1 1 1 1 0 X

0 0 0 0 0 0 0 X

1 0 1 0 1 0 0 X

X X X X X X X X

1 0 1

1 0 1

1 0 1

Input: hl-1 โŠ‚ Hnl-1

Kernel: bl โŠ‚ Bml

Page 34: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

OrthrusPE: Binary Mode

Nael Fasfous | Chair of Integrated Systems | TUM

Efficient SIMD Binary Hadamard Product Execution

1 1 1 1 1 1 0 X

0 0 0 0 0 0 0 X

1 0 1 0 1 0 0 X

X X X X X X X X

1 0 1

1 0 1

1 0 1

Input: hl-1 โŠ‚ Hnl-1

Kernel: bl โŠ‚ Bml

Sig A

Sig B

1 1 1

0 0 0

1 0 1

1 1 1

0 0 0

0 1 0

1 1 1

0 0 0

1 0 1

1

0

0

1 1

0 0

1 0

1 1 0

0 0 0

1 0 0

X

X

X

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

X

X

X

Sig C

Page 35: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

OrthrusPE: Binary Mode

Nael Fasfous | Chair of Integrated Systems | TUM

Efficient SIMD Binary Hadamard Product Execution

1 1 1 1 1 1 0 X

0 0 0 0 0 0 0 X

1 0 1 0 1 0 0 X

X X X X X X X X

1 0 1

1 0 1

1 0 1

Input: hl-1 โŠ‚ Hnl-1

Kernel: bl โŠ‚ Bml

DSP

Sig P

Sig A

Sig B

1 1 1

0 0 0

1 0 1

1 1 1

0 0 0

0 1 0

1 1 1

0 0 0

1 0 1

1

0

0

1 1

0 0

1 0

1 1 0

0 0 0

1 0 0

X

X

X

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

1 0 1

X

X

X

Sig C

Page 36: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

OrthrusPE: Dual Modes

Nael Fasfous | Chair of Integrated Systems | TUM

Runtime Reconfigurability

Binary Mode

Fixed-Precision Mode

Reconfiguration Signals

Using the same

hardware resource for

two distinct, critical

BNN operations

Page 37: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

OrthrusPE: Dual Modes

Nael Fasfous | Chair of Integrated Systems | TUM

Runtime Reconfigurability

Binary Mode

Fixed-Precision Mode

Reconfiguration Signals

Using the same

hardware resource for

two distinct, critical

BNN operations

Page 38: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

OrthrusPE: Dual Modes

Nael Fasfous | Chair of Integrated Systems | TUM

Runtime Reconfigurability

Binary Mode

Fixed-Precision Mode

Reconfiguration Signals

Using the same

hardware resource for

two distinct, critical

BNN operations

Page 39: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

OrthrusPE: Dual Modes

Nael Fasfous | Chair of Integrated Systems | TUM

Runtime Reconfigurability

Binary Mode

Fixed-Precision Mode

Reconfiguration Signals

Using the same

hardware resource for

two distinct, critical

BNN operations

Page 40: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Experimentation and Evaluation

Nael Fasfous | Chair of Integrated Systems | TUM

Synthesized Four Throughput-Equivalent Configurations:

โ€ข OrthrusPE

โ€ข OrthrusPE-DS (Dual-Static): SIMD Binary Hadamard Products on Static DSP

โ€ข Hybrid (Common): Binary operations on LUTs, FP operations on DSP

โ€ข All-LUT: Execution restricted to LUTs

Page 41: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Experimentation and Evaluation

Nael Fasfous | Chair of Integrated Systems | TUM

Resource Utilization

โ€ข OrthrusPE and OrthrusPE-DS are more

resource efficient across all target

accelerator frequencies.

Page 42: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Experimentation and Evaluation

Nael Fasfous | Chair of Integrated Systems | TUM

Resource Utilization

โ€ข OrthrusPEโ€™s closest FINN configuration

@200MHz

โ€ข 16 Extra Bit Accumulations

โ€ข 3 MACs (through reconfigurability)

โ€ข 32% fewer LUTs

Page 43: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Experimentation and Evaluation

Nael Fasfous | Chair of Integrated Systems | TUM

Dynamic Power Estimation

โ€ข OrthrusPE more efficient across all

frequencies

โ€ข Results scale as accelerators use 100-

1000s of PEs

Page 44: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Conclusion and Future Work

Nael Fasfous | Chair of Integrated Systems | TUM

โ€ข Accurate BNNs cannot be achieved without fixed-point operations and reliance on

DSP blocks.

โ€ข OrthrusPE improves the efficiency of computation by executing both fixed-point and

binary ops on FPGA hard blocks.

โ€ข Accurate BNNs solve many of the computation and memory challenges for deep neural

network workloads on edge devices.

Page 45: OrthrusPE: Runtime Reconfigurable Processing Elements for ...

Nael Fasfous | Chair of Integrated Systems | TUM

Thank you for your attention