# EE241 - Spring 2011bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s11/...Kogge, Stone, Trans on...

date post

18-Feb-2021Category

## Documents

view

0download

0

Embed Size (px)

### Transcript of EE241 - Spring 2011bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s11/...Kogge, Stone, Trans on...

1

EE241 Spring 2011EE241 - Spring 2011Advanced Digital Integrated Circuits

Lecture 22: Adders

AnnouncementsHomework #4 due on MondayQuiz #4 on MondayFinal exam next Wedensday!

80 minutes, in classProject reports due Wednesday, May 3, noon

6 pages, double columnProject presentations on Wednesday May 4 at 2pm in

2

Project presentations on Wednesday, May 4, at 2pm in BWRC

20min + 5 min Q&A

2

OutlineLast lecture

Domino logicThis lecture

Other dynamic stylesDigital arithmetic

Reading: Selected publications

3

g p

Other Dynamic Logic Other Dynamic Logic Styles

3

Self-Resetting DominoSignals exist as pulses, not levels

5

Used in Pentium 4 (130nm generation)

Pulsed Static CMOS

RH – Reset highRL – Reset low

6

Fast pull-up Fast pull-down

Chen, Ditlow, US Pat. 5,495,188 Feb. 1996.

4

Sense-Amplifying Logic

Matsui,JSSC 12/94

7

SA-F/F

8

Falling edge Rising edge

5

Dynamic Logic with SA-F/F

9

Example

10

6

4-Bit Adder

11

20-Bit Carry-Skip Adder

12

7

Pentium 4 (Prescott) 7GHz PathDeleganes, ISSCC’04

I

13

Sense Amplifier

14

Can build in logic if needed

8

Timing

15

Carry-Skip Adder

16

9

AddersAdders

Arithmetic Circuits

Chapter 11, Rabaey, 2nd ed.Selected journal publicationsBooks:

Ercegovac and Lang, “Digital Arithmetic” Elsevier 2004High-Speed VLSI Arithmetic Units: Adders and Multipliers, by V. Oklobdzija in Chandrakasan et al.

18

10

AddersEE141

Ripple carry & implementationCarry bypass (skip)Carry selectCarry lookahead (basic)

EE241Conditional sum

19

More carry lookahead

Conditional Sum Adders

0i i is x y i i iy1i i is x y

0oi i ic x y 1oi i ic x y

20

Sklansky,Trans on Comp6/60

11

Conditional Sum Adders

21

TG Conditional Sum

Conditional CellConditional Sum Adder

22

2-way MUXes

Rothermel, JSSC 89

12

TG Conditional Sum

Serial connection of transmission gatesSerial connection of transmission gates Chain length = 1+log2n

23Signal propagation

DPL Conditional Sum

24CLA

“Conditional carry select”

13

DPL Conditional Sum

Block Conditional Sums

25

Carry-Lookahead AddersAdder trees

Radix of a treeMinimum depth treesSparse trees

Logic manipulationsConventional vs. LingStack height limiting

26

14

Lookahead Adder: Basic Idea

AN-1, BN-1A1, B1 • • •A0, B0

P1 PN-1Ci, N-1P0Ci,0 Ci,1

27

S1 • • • SN-1S0

, 1 , , ,, ,i k o k k k i k k k i kC C f A B C g p C

Propagate and Generate Signals

Define 2 (or 3) new variables which ONLY depend on inputs ak, bkDefine 2 (or 3) new variables which ONLY depend on inputs ak, bkGenerate (gk) = akbkPropagate (pk) = ak bk (could be XOR as well)(Delete = akBk)

,out k k k k inc g p g p c

28

Can also derive expressions for s and cout based on dkand pk

( , )k k k k ins g p a b c

15

Lookahead Adder

Looakahead Equations

1k k k kc g p c

1 1 1

1 1 1

1 1 1 1

k k k k

k k k k k

k k k k k k

c g p cg p g p cg p g p p c

Position k:

Position k + 1:

29

Carry exists if:- generated in stage k + 1- generated in stage k and propagated through k + 1- propagated through both k and k + 1

Lookahead Adder

• Unrolling of carry recurrence can be continuedUnrolling of carry recurrence can be continued• If unrolled to level k, resulting in two-level AND-OR

structure• AND Fan-In = k + 1, OR Fan-In = k + 1• k + 1 transistors in the MOS stack• Limits k to 2 – 4 • Later referred to as a radix of an adder

30

16

Carry Lookahead Trees

Co 0 G0 P0Ci 0+=

Co 1 G1 P1 G0 P1P0 Ci 0+ +=

Co 2 G2 P2G1 P2P1G0 P+ 2 P1P0Ci 0+ +=

G2 P2G1+ = P2P1 G0 P0Ci 0+ + G2:1 P2:1Co 0+=

31

Can continue building the tree hierarchically

Tree Adders

lG ppP m more significantlmG ppP

lmmG gpgG

m – more significantl – less significant

Start from the input P, G, and continue up the tree2-bit groups, then 4-bit groups, …

PG )(

32

lmlmmllmmGG ppgpgpgpgPG ,,,),(

Kogge, Stone, Trans on Comp,’73 Radix 2

17

Adder Structure

33

Carry tree and sum precompute operate in parallelSum select – selects the correct precomputed sum based on final carry

Adder OptimizationIf given

Input capacitance, Overall fanout (loading capacitance)Overall fanout (loading capacitance)Wiring structureAdder topology

Optimization can be performed to:Minimize the delay subject to powerMinimize the power for given delay constraint

34

18

Design Considerations for CLA Adders

Wire capacitance is determined by the microarchitecture

From register files / Cache / Bypass

Carry signals cross certain number of bitslicesThe adder topology determines the wire capacitance

weak function of gate sizingThe capacitance of wires depends on the tree topology and wiring/shielding methodology

Adder stage 1

Wiring

Adder stage 2

Wiring

Adder stage 3

Bit slice 0

Bit slice 2

Bit slice 1

Bit slice 63

Sum Select

Shifter

Multiplexers

Loopback Bus

From register files / Cache / Bypass

Loopback Bus

Loopback Bus

35

To register files / Cache

Specifying the Output Capacitance

Fanout is dictated by the architectureI It i h IEU d i 6 th IEU In Itanium, each IEU drives 6 other IEUs, register files and the cache, through a long busThus the fanout is larger than 15-20, but depends on the ratio of the IEU input capacitance compared to the bus capacitanceBus is driven through a buffer, thus reducing

36

the adder fanout to close to 1.

19

Specifying the Input CapacitanceLarger Cin:

Less impact of internal wiresLess fanout (less impact of the buss)Faster adderPower grows linearly with Cin

Smaller Cin:Larger impact of internal wiresLarger fanoutSlower, lower power adder

Optimum tradeoff:

37

For desired dE/dD (for both adder and 6 IEUs) find optimal Cg/CwFor example dE/dD=2, Cg/Cw = 2.5-3

Carry Tree ConsiderationsNumber of signals merging at each stage (radix)

Uniform vs. non-uniformNumber of logic levels

Full vs. sparse trees

38

20

Tree Adders: Kogge-Stone

S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10

S 11

S 12

S 13

S 14

S 15

39

16-bit radix-2 Kogge-Stone Tree

(A0,

B0)

(A1,

B1)

(A2,

B2)

(A3,

B3)

(A4,

B4)

(A5,

B5)

(A6,

B6)

(A7,

B7)

(A8,

B8)

(A9,

B9)

(A10

, B10

)

(A11

, B11

)

(A12

, B12

)

(A13

, B13

)

(A14

, B14

)

(A15

, B15

)

Tree Adders: Other TreesLadner-Fischer

S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10

S 11

S 12

S 13

S 14

S 15

S S S S S S S S S S S S S S S S

40

(A0,

B0)

(A1,

B1)

(A2,

B2)

(A3,

B3)

(A4,

B4)

(A5,

B5)

(A6,

B6)

(A7,

B7)

(A8,

B8)

(A9,

B9)

(A10

, B10

)

(A11

, B11

)

(A12

, B12

)

(A13

, B13

)

(A14

, B14

)

(A15

, B15

)

21

Tree Adders: Radix 4

S0

S1

S2

S3

S4

S5

S6

S7

S8

S9

S10

S11

S12

S13

S14

S15

41

(a0,

b 0)

(a1,

b 1)

(a2,

b 2)

(a3,

b 3)

(a4,

b 4)

(a5,

b 5)

(a6,

b 6)

(a7,

b 7)

(a8,

b 8)

(a9,

b 9)

(a10

, b10

)

(a11

, b11

)

(a12

, b12

)

(a13

, b13

)

(a14

, b14

)

(a15

, b15

)

16-bit radix-4 Kogge-Stone Tree

Ling Adder

CLA Ling’s equations

:0 1:0

1:0

i i i

i i i

i i i i

i i i i

g a bp a bG g p GS a b G

:0 1 1:0

:0 1 1:0

i i i

i i i

i i i i

i i i i i i

g a bt a bH g t HS t H g t H

42Ling, IBM J. Res. Dev, 5/81

22

Ling Adder

G g p g p p g p p p g Conventional radix-4

3:0 3 3 2 3 2 1 3 2 1 0G g p g p p g p p p g

3:0 3 2 2 2 1 1 2 1 0 0

3 2 2 1 2 1 0

H g t g t t g t t t gg g t g t t g

Ling’s radix-4

43

Reduces the stack height (or width)Reduces input loading

Ling vs. CLAConventional G3

Ling’s H3

C K

a3

b3

a3 b3

a2

b2

a2

a1

b2

a1 b1

G 3

CK

a3

b3

a2 a2

b2 a1

b1

b1

a0

H3

b2

a1

44

b1 a0

b0

b1 a0

b0

23

Ling vs. CLA: Sum Pre-Computation

Conventional CLA Ling’s

0

1

i i i

i i i

S a b

S a b

0

11 1

i i i

i i i i i

S a b

S a b a b

45

Ling vs. CLA (64 bit)

44

49

Radix-2 Ling0.5 FO4

1 1

19

24

29

34

39

Ener

gy [p

J]

Radix-4 LingRadix-2 CLARadix-4 CLA

0.5 FO4

1

3 234

46

4

9

14

7 9 11 13 15

Delay [FO4]

2 4

24

Ling vs. CLATradeoff between the first carry and the sum circuit complexity

Later carry stages are unchanged from conventional CLAReducing the input loading and smaller stack speed up the carryReducing the input loading and smaller stack speed up the carry

Sum gets more complexWith tight power constraints Ling is slower than CLA

47

Next LectureFinish addersWrap-up

48