SCNN: An Accelerator for Compressed-sparse Convolutional...

54
Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks * Now at POSTECH (Pohang University of Science and Technology), South Korea

Transcript of SCNN: An Accelerator for Compressed-sparse Convolutional...

Page 1: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally

SCNN: An Accelerator for Compressed-sparseConvolutional Neural Networks

* Now at POSTECH (Pohang University of Science and Technology), South Korea

Page 2: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

2© NVIDIA 2017

Motivation

Page 3: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

3© NVIDIA 2017

Convolution (dense)Inner product between every input pixel and every filter coefficient

Sliding window is intuitive and maps to reasonable hardware implementation

* = ?

Input activation maps Filters

(Weights)

Output activation maps

Page 4: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

4© NVIDIA 2017

Convolution (dense)Inner product between every input pixel and every filter coefficient

Sliding window is intuitive and maps to reasonable hardware implementation

* =

: useless MUL ops

Page 5: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

5© NVIDIA 2017

Convolution (dense)Inner product between every input pixel and every filter coefficient

Sliding window is intuitive and maps to reasonable hardware implementation

* =

: useless MUL ops

Page 6: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

6© NVIDIA 2017

Convolution (dense)Inner product between every input pixel and every filter coefficient

Sliding window is intuitive and maps to reasonable hardware implementation

* =

: useless MUL ops

Page 7: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

7© NVIDIA 2017

Convolution (dense)Inner product between every input pixel and every filter coefficient

Sliding window is intuitive and maps to reasonable hardware implementation

* =

: useless MUL ops

Page 8: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

8© NVIDIA 2017

Convolution with SparsityMost operand values are zero

Static sparsity: pruned network weights set to ‘0’ during training

* =

* Han et al., “Learning Both Weights and Connections for Efficient Neural Network”, NIPS-2015

Page 9: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

9© NVIDIA 2017

Convolution with SparsityMost operand values are zero

* =

* Albericio et al., “CNVLUTIN: Ineffectual-Neuron-Free Deep Neural Network Computing”, ISCA-2016

Dynamic sparsity: negative-valued activations clamped to ‘0’ during inference

Page 10: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

10© NVIDIA 2017

Convolution with SparsityMost operand values are zero

Sliding window based convolution is wasteful

Fraction of non-zero (NZ) activations & weights is roughly 20~50% per layer

0

0.2

0.4

0.6

0.8

1

1.2

Densi

ty

Density (activations)

Density (weights)

< VGGNet >

Page 11: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

11© NVIDIA 2017

Convolution with SparsityMost operand values are zero

Sliding window based convolution is wasteful

Fraction of non-zero (NZ) activations & weights is roughly 20~50% per layer

* = ?Dynamic sparsity Static sparsity

Page 12: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

12© NVIDIA 2017

Convolution with SparsityMost operand values are zero

Sliding window based convolution is wasteful

Fraction of non-zero (NZ) activations & weights is roughly 20~50% per layer

* =

: useless MUL ops

Page 13: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

13© NVIDIA 2017

Convolution with SparsityMost operand values are zero

Sliding window based convolution is wasteful

Fraction of non-zero (NZ) activations & weights is roughly 20~50% per layer

* =

: useless MUL ops

0

Page 14: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

14© NVIDIA 2017

Convolution with SparsityMost operand values are zero

Sliding window based convolution is wasteful

Fraction of non-zero (NZ) activations & weights is roughly 20~50% per layer

* =

: useless MUL ops

0

Page 15: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

15© NVIDIA 2017

Convolution with SparsityMost operand values are zero

Sliding window based convolution is wasteful

Fraction of non-zero (NZ) activations & weights is roughly 20~50% per layer

* =

: useless MUL ops

0 0

Page 16: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

16© NVIDIA 2017

MotivationCNN inference often performed in power-limited environments

Our goal: sparsity-optimized CNN accelerator for high energy-efficiency

Page 17: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

17© NVIDIA 2017

Possible Solutions

Page 18: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

18© NVIDIA 2017

Option 1: Leverage Dense CNN DesignEmploy pair of bit-masks to track non-zero weights and/or activations

* = ?NZ activations NZ weights

(3x3)

sliding window

Page 19: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

19© NVIDIA 2017

Option 1: Leverage Dense CNN Design

* =

NZ bitmask

(weights)

NZ activations NZ weights

Employ pair of bit-masks to track non-zero weights and/or activations

?

Page 20: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

20© NVIDIA 2017

Option 1: Leverage Dense CNN Design

* =

NZ bitmask

(weights)

NZ activations NZ weights

Employ pair of bit-masks to track non-zero weights and/or activations

?

NZ bitmask

(activations)

Page 21: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

21© NVIDIA 2017

Option 1: Leverage Dense CNN Design

* =

?

vv

NZ bitmask

(activations)

vv

NZ bitmask

(weights)

AND

gate

vv

=

2 useful MULs

NZ activations NZ weights

Employ pair of bit-masks to track non-zero weights and/or activations

CLK#0

ALU ALU ALU ALU

Page 22: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

22© NVIDIA 2017

Option 1: Leverage Dense CNN Design

* =

?

v

NZ bitmask

(activations)

v

NZ bitmask

(weights)

AND

gate

v

=

1 useful MUL

NZ activations NZ weights

Employ pair of bit-masks to track non-zero weights and/or activations

CLK#1

ALU ALU ALU ALU

Page 23: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

23© NVIDIA 2017

Option 1: Leverage Dense CNN Design

* =

?

NZ bitmask

(activations)

NZ bitmask

(weights)

AND

gate=

0 useful MULs

NZ activations NZ weights

Employ pair of bit-masks to track non-zero weights and/or activations

CLK#2

ALU ALU ALU ALU

Page 24: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

24© NVIDIA 2017

Option 1: Leverage Dense CNN Design

* =

?

NZ bitmask

(activations)

NZ bitmask

(weights)

AND

gate=

0 useful MULs

ALU ALU ALU ALU

Challenge: find enough useful MULs to fully

populate the vector ALUs

NZ activations NZ weights

Employ pair of bit-masks to track non-zero weights and/or activations

CLK#2

Page 25: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

25© NVIDIA 2017

SCNN Intuition & Approach

Page 26: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

26© NVIDIA 2017

Intuition behind SCNNForget the sliding windows based convolution

All NZ activations must (at some point in time) be multiplied by all NZ weights

Holds true for convolution stride ‘1’

* =

y

z

x

?

Page 27: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

27© NVIDIA 2017

Intuition behind SCNNForget the sliding windows based convolution

All NZ activations must (at some point in time) be multiplied by all NZ weights

Holds true for convolution stride ‘1’

* =

y

z

x

x

?

Page 28: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

28© NVIDIA 2017

Intuition behind SCNNForget the sliding windows based convolution

All NZ activations must (at some point in time) be multiplied by all NZ weights

Holds true for convolution stride ‘1’

* =

y

z

x

x

?

Page 29: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

29© NVIDIA 2017

Intuition behind SCNNForget the sliding windows based convolution

All NZ activations must (at some point in time) be multiplied by all NZ weights

Holds true for convolution stride ‘1’

* =

y

z

x

x

?

Page 30: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

30© NVIDIA 2017

Intuition behind SCNNForget the sliding windows based convolution

All NZ activations must (at some point in time) be multiplied by all NZ weights

Holds true for convolution stride ‘1’

* =

y

z

xx

?

Page 31: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

31© NVIDIA 2017

Intuition behind SCNNForget the sliding windows based convolution

All NZ activations must (at some point in time) be multiplied by all NZ weights

Holds true for convolution stride ‘1’

* =

y

z

x

x?

Page 32: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

32© NVIDIA 2017

Intuition behind SCNNForget the sliding windows based convolution

All NZ activations must (at some point in time) be multiplied by all NZ weights

Holds true for convolution stride ‘1’

* =

y

z

x

x

?

Page 33: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

33© NVIDIA 2017

Intuition behind SCNNForget the sliding windows based convolution

All NZ activations must (at some point in time) be multiplied by all NZ weights

Holds true for convolution stride ‘1’

* =

y

z

x

x x

x

x

x

x

?

Page 34: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

34© NVIDIA 2017

Intuition behind SCNNForget the sliding windows based convolution

All NZ activations must (at some point in time) be multiplied by all NZ weights

Holds true for convolution stride ‘1’

* =

z

x

y y

y

y

y

y

y

?

Page 35: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

35© NVIDIA 2017

Intuition behind SCNNForget the sliding windows based convolution

All NZ activations must (at some point in time) be multiplied by all NZ weights

Holds true for convolution stride ‘1’

* =

yx

z z

z

z

z

z

z ?

Page 36: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

36© NVIDIA 2017

The SCNN approachThe Cartesian product (i.e., all-to-all) based convolution operation

Assuming a convolution stride of ‘1’:

Minimum # MULs: Cartesian product of the NZ activations and the NZ weights

* = ?y

z

x

b

d

c

e

a

f

Page 37: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

37© NVIDIA 2017

The SCNN approachThe Cartesian product (i.e., all-to-all) based convolution operation

Assuming a convolution stride of ‘1’:

Minimum # MULs: Cartesian product of the NZ activations and the NZ weights

* = ?y

z

x

b

d

c

e

a

f

b

d

e

f

c y

z

Compress

NZ activations

Compress

NZ weights

Page 38: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

38© NVIDIA 2017

The SCNN approachThe Cartesian product (i.e., all-to-all) based convolution operation

Assuming a convolution stride of ‘1’:

Minimum # MULs: Cartesian product of the NZ activations and the NZ weights

=

x

a

b

d

e

f

cy

z

xa *

ya *

za *

xb *

yb *

zb *

Scatter

network

Accumulate MULs

PE frontend PE backend

Page 39: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

39© NVIDIA 2017

SCNN Architecture

Page 40: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

40© NVIDIA 2017

SCNN Architecture2D spatially arranged processing elements (PEs)

Page 41: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

41© NVIDIA 2017

SCNN ArchitectureWorkload distribution

Weights broadcast to all PEs

All PEs have a copy of all the NZ weights of the CNN model

Page 42: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

42© NVIDIA 2017

SCNN ArchitectureWorkload distribution

Filters

(Weights)

*

PE-0 PE-1

PE-2 PE-3

Input activation maps

=

Output activation maps

PE-0 PE-1

PE-2 PE-3

Each PE is allocated with a partial volume of input and output activations

Input and output activations stay local to PE

Page 43: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

43© NVIDIA 2017

SCNN ArchitectureHalo resolution?

B

Filters

(Weights)

Output halos:

PE-0 calculates (A x B), but the result should be accumulated in PE-1 (X)

*

APE-0 PE-1

PE-2 PE-3

Input activation maps

=

Output activation maps

PE-0 PE-1

PE-2 PE-3

X

Page 44: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

44© NVIDIA 2017

SCNN ArchitectureInter-PE communication channel for halo resolution

PE

Weights

IAOA

NEneighbor

SEneighbor

: traffic for halo resolution

Page 45: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

45© NVIDIA 2017

SCNN PE microarchitecture(Compressed-sparse frontend) + (Scattered-dense backend)

I

F

Page 46: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

46© NVIDIA 2017

SCNN PE microarchitecture(Compressed-sparse frontend) + (Scattered-dense backend)

I

F

PE frontend

PE backend

Page 47: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

47© NVIDIA 2017

Evaluation

Page 48: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

48© NVIDIA 2017

Evaluation

Network models and input activations

trained → pruned → retrained models using Caffe

Area & power

System-C → Catapult HLS → Verilog RTL → Synthesis of an SCNN PE

Performance & energy

Performance model for cycle-level simulation of SCNN

Analytical model for design space exploration (dataflows, sparse vs. dense)

Methodology

Page 49: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

49© NVIDIA 2017

Evaluation

DCNN: operates solely on dense weights and activations

DCNN-opt: DCNN with (de)compression of activations + ALU power-gating

SCNN: sparse-optimized CNN accelerator

Architecture configurations

Page 50: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

50© NVIDIA 2017

PerformanceDense vs. sparse

< VGGNet >

* Network-wide improvement over DCNN: AlexNet (2.37x), GoogLeNet (2.19x)

Page 51: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

51© NVIDIA 2017

Energy consumptionDense vs. sparse

* Network-wide improvement over DCNN: AlexNet (2.13x), GoogLeNet (2.51x)

< VGGNet >

Page 52: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

52© NVIDIA 2017

Related WorkQualitative comparison to prior work

ArchitectureSparse optimizations

Convolution dataflowWeights Activations

DaDianNao [ASPLOS ‘14] - -

(Variant of)

Sliding-window

Eyeriss [ISCA ‘16] - Power-gating

CNVLUTIN [ISCA ‘16] - Zero-skipping

Cambricon-X [MICRO ’16] Zero-skipping -

SCNN Zero-skipping Cartesian-product

Page 53: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

53© NVIDIA 2017

Follow-up questions?

Technical leads:

SCNN architecture & sparse models: Minsoo Rhu

TimeLoop (CNN analytical model): Angshuman Parashar

Power & area modeling: Rangharajan Venkatesan

Contacts the authors

Page 54: SCNN: An Accelerator for Compressed-sparse Convolutional ...people.csail.mit.edu/anurag_m/papers/2017.scnn.isca.slides.pdf · Angshuman Parashar, Minsoo Rhu*, Anurag Mukkara, Antonio

54© NVIDIA 2017

Conclusion

Novel Cartesian-product based convolution operation

Simple/compressed/sparse PE frontend

Scatter/dense PE backend

Superior performance and energy-efficiency

Average 2.7x higher performance than dense CNN architecture

Average 2.3x higher energy-efficiency than dense CNN architecture

SCNN: a compressed-sparse CNN accelerator