EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… ·...

21
1 EE241 - Spring 2013 Advanced Digital Integrated Circuits Lecture 13: Timing revisited 2 Announcements Homework 2 due today Quiz #2 on Monday Midterm project report due next Wednesday

Transcript of EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… ·...

Page 1: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

1

EE241 - Spring 2013Advanced Digital Integrated Circuits

Lecture 13: Timing revisited

2

Announcements

Homework 2 due today

Quiz #2 on Monday

Midterm project report due next Wednesday

Page 2: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

2

3

Outline

Last lectureECC in SRAM

SRAM scaling options

This lectureBack to timing

Latch-based timing

Variability and timing

4. Design for performance

A. Timing

Page 3: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

3

5

Flip-Flop Parameters

D

Clk

Q

D

Q

Clk

TCQ

TH

PWm

TSU

Delays can be different for rising and falling data transitions

6

Latch Parameters

D

Clk

Q

D

Q

Clk

TCQ

TH

PWmTSU

TDQ

Delays can be different for rising and falling data transitions

Unger and TanTrans. on Comp.10/86

Page 4: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

4

7

Example Clock System

Courtesy of IEEE Press, New York. 2000

8

Clock Nonidealities

Clock skewSpatial variation in temporally equivalent clock edges; deterministic + random, tSK

Clock jitterTemporal variations in consecutive edges of the clock signal; modulation + random noise

Cycle-to-cycle (short-term) - tJSLong-term - tJL

Variation of the pulse width for level-sensitive clocking

Page 5: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

5

9

Clock Skew and Jitter

Both skew and jitter affect the effective cycle time

Only skew affects the race margin, if jitter is from the sourceDistribution-induced jitter affects both

Clk1

Clk2

tSK

tJS

10

Clock Uncertainties

2

4

3

Power Supply

Interconnect

5 Temperature

6 Capacitive Load

7 Coupling to Adjacent Lines

1 Clock Generation

Devices

Sources of clock uncertainty

Page 6: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

6

11

Clock Constraints in Edge-Triggered Systems

Courtesy of IEEE Press, New York. 2000

12

Latch timing

D

Clk

Q

tDQ

tCQ

When data arrives to transparent latch

When data arrives to closed latch

Data has to be ‘re-launched’

Latch is a ‘soft’ barrier

Page 7: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

7

13

Single-Phase Clock with Latches

Latch

Logic

Clk

Clk

tCY

PW

Tskl Tskl TsktTskt

Unger and TanTrans. on Comp.10/86

sktsklsk TTT

In Rabaey Chapter 10:

14

Preventing Late Arrivals

Clk

tCY PW

TSU Datamustarrive

Clk

TCQ TLM

TSU

Clk

TDQ TLM

TSUPW

TSU

Page 8: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

8

15

Preventing Late Arrivals

,max

skl skt SU CQMCY LM

DQM

T T T T PWt T

T

CY CQM LM SU skl sktt T T T T T PW

CY DQM LMt T T

Or:

16

Preventing Premature Arrivals

ClkTHPW

TCQ TLm

Two cases, reduce to one:

Lm skl skt H CQmT T T T PW T

Page 9: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

9

17

Single-Latch Timing Summary

Latch

Logic

Clk

,max

skl skt SU CQMCY LM

DQM

T T T T PWt T

T

Lm skl skt H CQmT T T T PW T

Bounds on logic delay:

Solutions: 1) Balance logic delays2) Locally generated short PW

18

Latch-Based Design

L1Latch

Logic

Logic

L2Latch

Clk

L1 latch is transparentwhen Clk = 0

L2 latch is transparent when Clk = 1

Page 10: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

10

19

Latch-Based Timing

As long as transitions are within the assertion period of the latch,no impact of position of clock edges

20

Latch Design and Hold Times

Page 11: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

11

21

Latch-Based Timing

Longest path

2CY DQM LHM LLMT T T T

l Short paths

CLLm SK H CQmT T T T

CLHm SK H CQmT T T T

Same as register-based design but holdsfor both clock edges

Independent of skew

22

Latch-Based Timing

L1Latch

Logic

Logic

L2Latch

Clk

Clk

Clk

L1 latch

L2 latch

Skew

Can tolerate skew!

Longpath

Shortpath

Static logic

Page 12: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

12

23

Soft-Edge Properties of Latches

Slack passing – logical partition uses left over time (slack) from the previous partitionTime borrowing – logical partition utilizes a portion of time allotted to the next partitionMakes most impact in unbalanced pipelines

Bernstein et al, Chapter 8, Chandrakasan, Chap 11 (by Partovi)

24

Slack-Passing and Cycle Borrowing

For N stage pipeline, overall logic delay should be < N Tcl

Page 13: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

13

4. Design for performance

B. Statistical timing

26

Pictorial view of setup and hold tests

0 or moreswitching(s)allowed

Latest clock arrival time Earliest clock arrival time(next cycle)

Data must be stable

Hold time

Early RAT

Data must be stable

Setup time

Late RAT

Actual early AT Actual late AT

Earlyslack

Lateslack

ICCAD '07 Tutorial Chandu Visweswariah

Page 14: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

14

27

LF CF

Hold test

Handling of across-chip variation

Each gate has a range of delay: [lb, ub]The lower bound is used for early timingThe upper bound is used for late timing

This is called an early/late splitStatic timing obtains bounds on timing slacks

Timing is sperformed as one forward pass and one backward pass

LF CF

Setup test

Capturing early path

Launching late path

Capturing late path

Launching early path

ICCAD '07 Tutorial Chandu Visweswariah

28

How is the early/late split computed?

The best way is to take known effects into account during characterization of library cells

History effect, simultaneous switching, pre-charging of internal nodes, etc.This drives separate characterization for early and late; this is the most accurate method

Failing that, the most common method is derating factorsExample: Late delay = library delay * 1.05

Early delay = library delay * 0.95The IBM way of achieving derating is LCD factors (Linear Combination of Delay) (FC=fast chip, SC=slow chip, see next page)

Late delay = L * FC_delay + L * NOM_delay + L * SC_delayEarly delay = E * FC_delay + E * NOM_delay + E * SC_delayAcross-chip variation is therefore assumed to be a fixed proportion of chip-to-chip variation for each cell type

ICCAD '07 Tutorial Chandu Visweswariah

Page 15: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

15

29

IBM delay modeling*

Intrinsic:Chip means

SystematicACV

RandomACV

Early Late Early Late*P. S. Zuchowski, ICCAD’04

At a given cornerlate delay = intrinsic + systematic + randomearly delay = intrinsic – systematic – random

ICCAD '07 Tutorial Chandu Visweswariah

30

Traditional timing corners

Fast

chi

p ea

rly

Fast

chi

p la

te

Intra-chip variation

Slow

chi

p ea

rly

Slow

chi

p la

te

Intra-chip variation

Chip-to-chip variation

Fast chip Slow chip

ICCAD '07 Tutorial Chandu Visweswariah

Page 16: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

16

31

The problem with an early/late splitThe early/late split is very useful

Allows bounds during delay modelingAny unknown or hard-to-model effect can be swept under the rug of an early/late split

But, it has problemsAdditional pessimism (which may be desirable)Unnecessary pessimism (which is never desirable)

LF CF

Setup test

Capturing early path

Launching late path

This physically commonportion can’t be both fastand slow at the same time

ICCAD '07 Tutorial Chandu Visweswariah

32

Additional pessimism: Clock tree view

LF1 LF2 CFLF3 Comb.

Comb.

Comb.

FP1 and FP2

FP3

ICCAD '07 Tutorial Chandu Visweswariah

Page 17: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

17

33

How to have less pessimism?

Common path pessimism removal

Account for correlations

Credit for statistical averaging of random

34

Statistical timing

Deterministic

Statistical

a

b

c+

+ MAX

b

a

c+

+ MAX

ICCAD '07 Tutorial Chandu Visweswariah

Page 18: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

18

35

The problem of correlationsThere are many reasons for correlations

Chip-to-chip variations are perfectly correlated within a single chipSame circuit typesSame device familiesSame metal levelsSame voltage islandsSame regions of the chipDependence on common sources of variationReconvergent fanoutEtc.

In a reasonable-sized chip, there may be 100 million timing quantities, so we don’t handle correlations in the classical way

Not by storing and manipulating a 100M x 100M covariance matrix

36

Canonical form

All timing quantities are parameterized by the sources of variation

Correlation can be judged on-demand by inspection

annn RaXaXaXaa 122110

Constant(nominalvalue)

Independentlyrandomuncertainty

Sensitivities Deviation ofglobal sourcesof variation fromnominal values

Page 19: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

19

37

Statistical timing basicsRepresent all timing quantities in canonical form

Delays, slews, guard times, ATs, RATs, slacks, PLL adjusts, constraints, CPPR adjusts

Propagate ATs forward through the timing graphAddition of two canonical forms is easyMax/min operations are also easy with the help of some analytic formulas

Propagate RATs backward through the timing graphSubtraction of two canonical forms is easyUse statistical max/min operations

Slack is simply the difference between AT and RATSince this is available in canonical form, we get sensitivities of circuit performance to sources of variation for freeThese can be used to ensure a robust design

38

Statistical max operation

*C. E. Clark, “The greatest of a finite set of random variables,” OR Journal, March-April 1961, pp. 145—162**M. Cain, “The moment-generating function of the minimum of bivariate normal random variables,” American Statistician, May ’94, 48(2)

Page 20: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

20

39

Unified view of correlations

Independentlyrandom part

Spatially correlated part:within-chip distance-related correlation

Globally correlated part: chip-to-chip, wafer-to-wafer, batch-to-batch variation

1

0 Distance

Correlation Coefficient

Rrii XaXaaD 0

ICCAD '07 Tutorial Chandu Visweswariah

40

Spatial correlation vs. early/late split

LF1 LF2 CFLF3

early clock

0

1

2

3

4 6

7

8

9

10

5 11

12

13

14

15

16

17 19

20

21

22

23

18 24

25

26

27

28

29

30 32

33

34

35

36

31 37

38

39

40

41

42

43 45

46

47

48

49

44 50

51

52

53

54

55

56 58

59

60

61

62

57 63

64

65

66

67

68

69 71

72

73

74

75

70 76

77

7879

8081

82 84

85

86

87

88

83 89

90

Dependence on common virtual variables cancels out at the timing testICCAD '07 Tutorial Chandu Visweswariah

Page 21: EE241 - Spring 2013bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s13/Lectures/Lectu… · Additional pessimism: Clock tree view LF1 LF2 LF3 Comb. CF Comb. Comb. FP1 and FP2 FP3 ICCAD

21

41

Next Lecture

Latches and flip-flops