Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP...

62
Blade – A Path Towards Average- Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th , 2015

Transcript of Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP...

Page 1: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design

Peter A. Beerel

CHIP in Bahia, Salvador Brazil

Sept 4th, 2015

Page 2: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Collaborations and Acknowledgements

MOTIVATION | 2

PUCRS (Brazil): Ney Calazans, Matheus Moreira, Guilherme Heck, Leandro Heck, Matheus Gibiluka

USCS (Brazil): Frederico Butzke

Tsinghua (China): Benmao Cheng

BITS (India): Ajay Singhvi

USC (USA): Dylan Hand, Fei Huang, Ramy Tadros, Alan Huang, Yang Zhang, Mel Breuer, Danlei Chen, Weizhe Hua, Huimei Cheng, Austin Katzin, Yuqi Song, Zhe Liu, Jun He

Page 3: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Motivation: Delay Overheads

Traditional synchronous design suffers from increased margins• Worse at low and near-threshold regions

MOTIVATION | 2

PVT margin

Clock margin

Flip-flop alignment

Cycle time ofclocked logic

Logic gates

Logic Time

[Dreslinksi et al., IEEE Proc. 2010]

Page 4: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Data delay in Plasma CPU

Motivation: Data Dependent Delays

Delay

Num

ber

of O

pera

tions

Slower

Logic gates

Logic Time

Worst-CaseData

Worst-Case DelayAverage DelayCycle time ofclocked logic

MOTIVATION | 3

Delay variation due to data is rarely exploited in traditional designs

Page 5: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Outline

Resiliency Design and its Pitfalls

Blade Design, Operation, and Special Cells

Blade Analysis and Optimization

Blade CAD Flow and Case Study

Conclusions and On-Going Work

OUTLINE | 5

Page 6: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Resilient Design

Allow and detect timing errors in datapath • Correct via architectural replay or gating/pausing clock

Effective approaches have been elusive!

RESILIENT DESIGN | 5

ShadowSequential

LogicNext PipelineStage

To Control LogicErrorDetection

SequentialGate

CLKCLK = 0

0

CLK = 1

1

Page 7: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

The Problem [Beer et al, 2014]

• Flop metastability can propagate to ERR signal

• Metastability in control path can cause system failure

RESILIENT DESIGN | 6

[Bubble Razor, Fojtik, 2013]

Latc

hS

hado

w

FF

Logic Next Pipeline Stage

CLK

X To Control LogicX

Metastability Issue

Page 8: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Handling Metastability

Transition detector may exhibit metastability

• Error signal must go through synchronizer, increasing delay

• Uses architectural replay to recover from errors

Latch

Transition Detector

Logic

To Control Logic

Next Pipeline Stage

CLK

FF FF

Synchronizer[RazorII, Das, 2009]

X 1 or 0

RESILIENT DESIGN | 7

Page 9: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Hold Time Concerns

Relies on latch for error correction

Hold times are problematic

RESILIENT DESIGN | 8

Data In

Latch

MSDetector

FF

STOP

In

OutInternal Data

Comb.Logic

Ring Oscillator Clock Generator

[SafeRazor, Cannizzaro, 2014]

CLKCLK = 1

CLK = 0

Page 10: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Design Template

Sync / Async

MTBF Safe

Avoids Replay Logic

Hold Time

Robust

Low Error Penalty

Bubble Razor

Sync No Yes Yes Yes

Razor II Sync Yes No No No

SafeRazor Async Yes* Yes No Yes

Blade combines the best features of past resiliency schemes

RESILIENT DESIGN | 9

Design Template

Sync / Async

MTBF Safe

Avoids Replay Logic

Hold Time

Robust

Low Error Penalty

Bubble Razor

Sync No Yes Yes Yes

Razor II Sync Yes No No No

SafeRazor Async Yes* Yes No Yes

Blade Async Yes Yes Yes Yes

Resiliency Landscape

Page 11: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Outline

Resiliency Design and its Pitfalls

Blade Design, Operation, and Special Cells

Blade Analysis and Optimization

Blade CAD Flow and Case Study

Conclusions and On-Going Work

OUTLINE | 11

Page 12: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

EDL R.dataCombinational

LogicL.data

δ Err

L.ackL.req

LE.reqLE.ack

R.ack

RE.ack

R.req

RE.req

Sam

ple

CLK

Blade StageError Detection Logic

2

BladeController

Δ

Asynchronous Controller

Error Detecting Latch

Single Rail Datapath

Reconfigurable Delay Lines

BLADE TEMPLATE | 11

Blade Template

Page 13: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Blade Template Operation

Send request speculatively before data is guaranteed stable

Timing errors delay handshaking signals

EDLCombinational

Logic

δ

Error Detection Logic

ControllerB

Δ

Error Detection Logic

ControllerA

Δ

EDL

BLADE TEMPLATE | 12

CLK Err

Sam

ple

CLK Err

Sam

ple

Page 14: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

EDLCombinational

Logic

δ

Error Detection Logic

ControllerB

Δ

Error Detection Logic

ControllerA

Δ

EDL

Blade Template Operation

Send request speculatively before data is guaranteed stable

Timing errors delay handshaking signals

BLADE TEMPLATE | 12

CLK Err

Sam

ple

CLK Err

Sam

ple

Err = 0

Page 15: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

EDLCombinational

Logic

δ

Error Detection Logic

ControllerB

Δ

Error Detection Logic

ControllerA

Δ

EDL

Blade Template Operation

Send request speculatively before data is guaranteed stable

Timing errors delay handshaking signals

BLADE TEMPLATE | 12

CLK Err

Sam

ple

CLK Err

Sam

ple

Err = 1

Page 16: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

EDLCombinational

Logic

δ

Error Detection Logic

ControllerB

Δ

Error Detection Logic

ControllerA

Δ

EDL

Positive Hold Margins

BLADE TEMPLATE | 13

CLK Err

Sam

ple

CLK Err

Sam

ple

Err = 0

Page 17: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

EDLCombinational

Logic

δ

Error Detection Logic

ControllerB

Δ

Error Detection Logic

ControllerA

Δ

EDL

Positive Hold Margins

BLADE TEMPLATE | 13

CLK Err

Sam

ple

CLK Err

Sam

ple

Handshaking delays create positive hold margin!

Page 18: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Error Detection Logic

C-element stores error signal, which is sampled by Q-Flop

BLADE TEMPLATE | 14

OutIn

Error Detection Logic

Blade Controller

C+

D QLatch

delayX

EDL

Q-Flop

{ {From other Q-FlopstTD

CLK

Err1

Err0

CLK = 0

CE = 0

CLK = 1

CE = 1

Err1

= 0

Err0

= 0

Sam

ple

Sam

ple

= 0

Sam

ple

= 1

Err1

= 1

Page 19: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Metastable-Safe Operation

Q-Flop prevents metastability propagation to control path

BLADE TEMPLATE | 15

OutIn

Error Detection Logic

Blade Controller

C+

D QLatch

delayX

EDL

Q-Flop

{ {From other Q-FlopstTD

CLK

Err1

Err0

CLK = 0

CE = 0

CLK = 1

CE = X

Err1

= 0

Err0

= 0

Sam

ple

Sam

ple

= 0

Sam

ple

= 1

Err0

= 1

Some time later…

Page 20: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Out

CLK

In

Sample

Error Detection Logic

Blade Controller

C+

D Q

delay

Err0

Latch

delayX

EDL

Q-Flop

Err1

From other C-elements

{

{ {

From other Q-Flops

tcomp

tTD

Overhead Reduction

C-element and OR gates amortize overhead over many EDLs

BLADE TEMPLATE | 16

4-Input

C-element

collects output

from 3 EDLs

4-Input

OR gate collects

output from 4

C-elements

Each Q-Flop

covers 12

EDLs

Page 21: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Controller Implementation

BLADE TEMPLATE | 17

Bottom

Left

goL

goR

goD

L.reqL.ack

LE.reqLE.ack

R.reqR.ackRE.reqRE.ack

Err[1] Err[0]Sample CLK

δ

Δ

Δ edo

edi

delayBlade Controller

Right

Three part burst-mode state machine• Implemented using 3D [Yun, 1992]

Page 22: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Bottom

Left

goL

goR

goD

L.reqL.ack

LE.reqLE.ack

R.reqR.ackRE.reqRE.ack

Err[1] Err[0]Sample CLK

δ

Δ

Δ edo

edi

delayBlade Controller

Right

Three part burst-mode state machine• Implemented using 3D [Yun, 1992]

Controller Implementation

BLADE TEMPLATE | 17

L.req+ / LE.req+

LE.ack+ / goR+

goL+ / Lack+ L.req- / LE.req-

LE.ack- / goR-

goL- / L.ack-RE.req+ Err[1]+ /

edo+

edi+ /RE.ack+ goD- edo-

Err[1]- edi- / Err[0]- /

RE.req+ Err[0]+ /RE.ack+ goD-

RE.req- Err[1]+ /edo+

Err[1]- edi- / Err[0]- /

RE.req- Err[0]+ /RE.ack- goD-

edi+ /RE.ack- goD- edo-

goR+ R.ack- / goD+ R.req+

goR- R.ack+ / goD+ R.req-

goD+ / CLK+

delay+ /goL+ Sample+ CLK-

goD- delay- /Sample-

goD+ / CLK+

delay+ /goL- Sample+ CLK-

goD- delay- /Sample-

Page 23: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

DELAY ELEMENTS | 18

Delay Elements: MUX-DE

Most popular programmable Delay Elements (DEs)

Delay controlled by number of buffers in the signal path

Binary codewords N. Mahapatra et al., “Comparison and Analysis of Delay Elements,” in MWSCAS, 2002, pp. 473–476.

Page 24: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

DELAY ELEMENTS | 19

Delay Elements: One-Hot DCCS

Lengths of the current source transistors increase linearly

• (1L, 2L, 3L, ..., nL)

Replicate inverting stages to balance rise-fall delays

For a particular codeword, only one of the current source transistors is ON [Singhvi et al., ISVLSI 2015]

Page 25: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Delay Elements: Delay Shift Inverter

Simple inverter structure

3 flavors for back-biasing

• For RBB, PBIAS > 1V

and NBIAS < 0V

• Use flavor (b)

Amount of delay shift

controlled by two factors:

• Inverter sizing

• Magnitude of back-bias

voltage

DELAY ELEMENTS | 20

[Singhvi et al., ISVLSI 2015]

Page 26: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Delay Lines and Voltage Scaling

*Delays range: 300ps to 1.4ns *Sized to match rise/fall delays

Voltage Scaled Delay Ratio

• Delay at nominal voltage over delay at near threshold voltage

The problem

• VSDR changes with codeword

• Forces needed margin

Solution

• Replicate and size designs to reduce margins

DELAY ELEMENTS | 23

[Singhvi et al, VLSID 2015]

Page 27: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

DELAY ELEMENTS | 21

Comparison of Characteristics

Energy vs Quantization Error Trade-offs Exist Can be Exploited

*DQE: Delay Quantization Error*EPT: Energy Per Transition

*

[Singhvi et al., ISVLSI 2015]

Page 28: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Delay Line Quantification Effects

Quantification effects reduced due to inherenttradeoff between nominal delayδ and error penalty p * Δ

Del

ay

δ p * Δ

𝑑=𝛿+ p∗ ΔRecall:

DELAY LINE QUANTIZATION | 25

Page 29: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Delay Line Quantification in BD

Linear relationship between delay line quantizationand average stage delay in Bundled Data

Del

ay

δ

𝑑=𝛿Bundled Data:

DELAY LINE QUANTIZATION | 26

Bundled Data

Page 30: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Error Detecting Latch: TBTD

Formed by a transition detector and an error latch (an asymmetric C-Element)

TD EL

Transition Detector (TD) and Error Latch (EL)

[Bowman et al, ISSCC 2007]

ERROR DETECTING LATCH | 27

Page 31: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Increased sensitivity Enables safe operation even in the presence of glitches

Transistor size XOR• OX-TD+EL

Use static C-element• SOX-TD+EL

Analyze for glitch sensitivity

[Moreira et al., ISQED 2015]

ERROR DETECTING LATCH | 28

Error Detecting Latch: Glitch Sensitivity

twin

Page 32: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Outline

Resiliency Design and its Pitfalls

Blade Design, Operation, and Special Cells

Blade Analysis and Optimization

Blade CAD Flow and Case Study

Conclusions and On-Going Work

OUTLINE | 29

Page 33: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Resiliency Performance Benefit

ANALYSIS | 30

Data delay in Plasma CPU

Delay

Num

ber

of O

pera

tions

Worst-Case DelayNominal Delay

Key Question: How do we set δ to optimize performance

δ δ + Δ

δ Δ

Page 34: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Delay Models

Our approach

• Analyze the performance of Blade for a variety of delay models

NormalDistribution

Logic Delay

Pro

babi

lity

Logic Delay

Pro

babi

lity

Log-NormalDistribution

Logic Delay

Fre

quen

cy Real WorldDistribution

DELAY MODELS | 31

Page 35: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Definitions• C : Clock Period / Cycle Time• EC : Effective Clock Period• p : Probability of error• : Average delay of Blade stage

Optimal performance achieved by minimizing

Optimal Average-Case Performance

DELAY MODELS | 32

Pro

babi

lity

δ δ + Δ

PR( d ≤ δ )

Pro

babi

lity

δ δ + Δ

1-PR( d ≤ δ )

Probability oferror (p)

𝐶=𝛿+ Δ

𝑑=𝛿+ p∗ Δ

σ

μ

*Assumes backward latency is hidden via latch retiming

Page 36: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

popt observations

• Varies between 20%

and 35% for log-

normal distributions

• Significantly higher

than in sync resiliency

• Constant for normal

distributions!

Optimal Probability of Error - popt

DELAY MODELS | 33

Higher Variance

Page 37: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Proof of constant popt

DELAY MODELS | 34

Implication: Tuning of delay line may target fixed probability!

𝐾=𝜇+𝑚∗𝜎𝐾=𝛿+ΔAssume worst case delay per stage is constant

Worst case delay is set by mean, variance, and SER

Systematic Error Rate (ξ) sets the worst-case delay per stage, K

𝑚= 𝑓 (𝜉 )𝜉=1− [𝑃𝑅 {𝑑≤𝐶 } ]𝑁

Recall: 𝑑=𝛿+𝑝∗ Δ¿ (1 −𝑝 )∗𝛿+𝑝∗𝐾For Normal distribution:

Taking inverse error function of both sides:

erf −1 (1− 2𝑝 )=𝛿−𝜇√2𝜎

𝛿=√2𝜎 [erf − 1 (1−2𝑝 )]+𝜇

Rewrite:𝑑=(1−𝑝 )[√2𝜎 [erf −1 (1−2𝑝 )]+𝜇]+ p∗K

𝜕𝑑𝜕𝑝

=(1+𝑦 )[√2𝜕erf −1 (𝑦 )

𝜕 𝑦 ]−√2erf −1 𝑦+𝑚=0

𝑦=1 −2𝑝

Minimize by taking derivative and setting it equal to zero :

Note y and p are independent of σ and μ! m depends only on

Page 38: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Optimal Size of Resiliency Window - Δ

Blade supports maximum Δ of 50% of clock cycleOptimal Δ is larger for designs with high-variance!

OPTIMIZATION | 35

(%)

Page 39: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Comparison to Sync Resiliency

Synchronous• EC set by systematic

error rate

Bubble Razor [Zhang,2014]

Blade

COMPARISON | 36

FF

FF

M S M S

ED

L

ED

L

ED

L

ED

L

N-Stage Rings

Page 40: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Comparison to Sync Resiliency

Synchronous• EC set by systematic

error rate

Bubble Razor [Zhang,2014]

Blade

COMPARISON | 36

Normal Distribution

Synchronous BR

Blade

35% 23%

Page 41: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Comparison to Sync Resiliency

Synchronous• EC set by systematic

error rate

Bubble Razor [Zhang,2014]

Blade

COMPARISON | 36

Normal Distribution

Page 42: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Outline

Resiliency Design and its Pitfalls

Blade Design, Operation, and Special Circuits

Blade Analysis and Optimization

Blade CAD Flow and Case Study

Conclusions and On-Going Work

OUTLINE | 37

Page 43: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Automatic Conversion Flow

Convert single-clock sync RTL design to Blade

Re-uses synchronous EDA tools and libraries

Seamless integration into existing flows

CASE STUDY | 38

Simulation

Blade Design Flow

SyncSynthesis

Latch Retiming

FF to Latch ConversionR

TL

Spe

c

Asy

ncN

etlis

t

Add EDL + Async Control

Page 44: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Synthesize sync RTL design using standard EDA tools

CASE STUDY | 39

Synthesis

FF FF

CLK

SimulationSyncSynthesis

Latch Retiming

Add EDL + Async Control

FF to Latch Conversion

Blade Design Flow

RT

L S

pec

Asy

ncN

etlis

t

SimulationSyncSynthesis

Latch Retiming

FF to Latch Conversion

Add EDL + Async Control

Page 45: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

CASE STUDY | 40

Latches

M S M S

Replace flip flops with master-slave latches

Two-phase non-overlapping clocking

CLK1CLK2

SimulationSyncSynthesis

Latch Retiming

Add EDL + Async Control

FF to Latch Conversion

Page 46: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

CASE STUDY | 41

Latch Retiming

M M

CLK1

S

Retime latches to spread logic delay across stages

Allow time borrowing to reduce area overhead

CLK2

S

SimulationSyncSynthesis

Latch Retiming

Add EDL + Async Control

FF to Latch Conversion

Page 47: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

CASE STUDY | 42

EDL Insertion

M

CLK1

S

Replace non-TB latches with error detecting latches

Not all latches need be error detecting

CLK2

SEDLMEDL TB TB

SimulationSyncSynthesis

Latch Retiming

Add EDL + Async Control

FF to Latch Conversion

Page 48: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

CASE STUDY | 43

Async Control

TB

Remove synchronous clock trees

Add Blade controllers and delay lines

TBEDLEDL

BladeController

BladeController

BladeController

BladeController

SimulationSyncSynthesis

Latch Retiming

Add EDL + Async Control

FF to Latch Conversion

Page 49: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

CASE STUDY | 44

Simulation

TB

Back annotated SDF simulation using final netlist

TBEDLEDL

BladeController

BladeController

BladeController

BladeController

SimulationSyncSynthesis

Latch Retiming

Add EDL + Async Control

FF to Latch Conversion

Page 50: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Resynthesis

CASE STUDY | 45

EDL

A

LogicEDL

B

EDL

C

EDL

D

EDL

E

EDL

F

Critical Path

Delay > δ

Correlated

EDLDelay > δ

Latch

F

Add set_max_delay from A to F = δ

Latch

E

Delay < δ

Delay < δ

Page 51: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Brute Force Resynthesis

Evaluated hundreds of resynthesis runs• Each run sets a max delay constraint to a single latch

Chose result that led to largest reduction in area and error rate

CASE STUDY | 46

Best run reduced error rate by 39% and EDLs by 27% (-1.9% area)

Best Run122 Correlated EDLs!

Page 52: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

MIPS OpenCore

3-stage pipeline

28nm FDSOI @ 666MHz

(w/ ideal clock and Vdd)Type Count

Combinational 11,740

Buf/Inv 1,683

Seq. (Non-RF) 531

RF 2,048

Total 14,319

http://opencores.org/project,plasma

CASE STUDY | 47

Case Study: Plasma

Page 53: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Case Study: Area and Performance

Overall area overhead is 8.4%

CASE STUDY | 48

2%

24%

6%

32%

12%

24%

FF to Latch + EDL

Q-Flops

C-elements

AND/OR Trees

DelayElements

Controllers

Performance increases• Average: 19%• Peak: 42%

Plasma running CoreMark

Page 54: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Performance Comparison with Margins

Must add margins for PVT variation, clock skew / jitter, and/or aging

• Synchronous frequency degraded to accommodate

Margin in Blade design is only imposed when an error occurs

• A 30% error rate reduces impact of margin by ~70%

CASE STUDY | 49

Margin Synchronous Blade (30% ER) % Advantage

0% 666MHz 800MHz 20%

15% 566MHz 764MHz 35%

30% 466Mhz 728MHz 56%

Page 55: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Outline

Resiliency Design and its Pitfalls

Blade Design, Operation, and Special Circuits

Blade Analysis and Optimization

Blade CAD Flow and Case Study

Conclusions and On-Going Work

OUTLINE | 50

Page 56: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Other Related Work

Canary Circuits [Sato, 2007]• Removes some PVT margins

• But cannot take advantage of data dependency

Bundled Data Designs [Sutherland’89, Nowick’97]• Speculative completion sensing exploits some data dependency

• Margins impact performance on every cycle

• No observability of errors

Soft Mousetrap [Liu, 2013]• Hold time constraints remain difficult to meet

CONCLUSIONS | 51

Page 57: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

ConclusionsBlade Template

• Achieves higher performance by exploiting data dependency

• Benefits from average vs worst-case MS resolution times

• Reduces impact of margins for PVT variations

• Enables voltage scaling for power savings

• Supported by design flow w/ commercial EDA tools

Plasma Case Study

• Highlights design and CAD techniques for area efficiency

• Achieves 19% increase in performance with 8.4% area overhead

CONCLUSIONS | 52

Page 58: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

On-Going WorkDesign

• Further design and analysis of efficient DEs and EDLs

Automated Flow• Support designs specified in System Verilog CSP

• Develop average-case-driven re-synthesis tools

• Automated resilient-aware PnR Flow

Testing Strategy• On-line tuning of delay lines

• Screen chips with variations too large to correct

Applications• Encryption Designs (Triple DES and Light Weight Crypto)

ON-GOING WORK | 53

Page 59: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

D. Hand, M. Moreira, H.-H. Huang, D. Chen, F. Butzke, Z. Li, M. Gibiluka, M. A. Breuer, N. L. V. Calazans, P. A. Beerel: Blade - A Timing Violation Resilient Asynchronous Template. ASYNC 2015: 21-28

D. Hand, H.-H. Huang, B. Cheng, Y. Zhang, M. Moreira, M. A. Breuer, N. L. V. Calazans, P. A. Beerel: Performance Optimization and Analysis of Blade Designs under Delay Variability. ASYNC 2015: 61-68

Y. Zhang, L. S. Heck, . M. Moreira, D. Zar, M. A. Breuer, N. L. V. Calazans, P. A. Beerel: Design and Analysis of Testable Mutual Exclusion Elements. ASYNC 2015: 124-131

M. Moreira, D. Hand, P. A. Beerel, N. Calazans: TDTB error detecting latches: Timing violation sensitivity analysis and optimization. ISQED 2015: 379-383

G. Heck, L. S. Heck, A. Singhvi, M. Moreira, P. A. Beerel, N. L. V. Calazans: Analysis and Optimization of Programmable Delay Elements for 2-Phase Bundled-Data Circuits. VLSI Design 2015: 321-326.

A. Singhvi, M. T. Moreira, R. N. Tadros, N. L. V. Calazans, P. A. Beerel: A Fine-Grained, Uniform, Energy-Efficient Delay Element for FD-SOI Technologies, ISLVSI 2015

P. A. Beerel, N. Calazans: A Path Towards Average Case Silicon using Resilient Bundled Data Design, ECCTD 2015 (Invited)

ON-GOING WORK | 55

Blade Publications (2015)

Page 60: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

The Blade Team++

Page 61: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Asynchronous Circuit in Industry

3D Network on chips (STMicroelectronics)

Ethernet Switches (Intel)

Ultra high-speed FPGAs (Achronix)

Low-power smart cards (Tiempo)

Neuromorphic Computing (IBM)

Internet of Things (???)IBM True North Multi-Chip

Neuromorphic System

Tiempo TAM16 - Clockless 16-bit Microcontroller

STMicroelectronics WIOMING 3D-IC

Achronix FPGA. 1.7 M LUTs. 2.1 Gbps IO

Fulcrum/Intel Ethernet Switch Chip

Page 62: Blade – A Path Towards Average-Case Silicon via Bundled-Data Resilient Design Peter A. Beerel CHIP in Bahia, Salvador Brazil Sept 4 th, 2015.

Obrigado!

Do not miss....

22nd IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC)

MAY 8 - 11, 2016 PORTO ALEGRE, BRAZIL

Homepage: http://www.inf.pucrs.br/async2016

We hope to see you there!