EE241 - Spring 2011bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s11/...Kogge, Stone, Trans on...

of 24/24
1 EE241 Spring 2011 EE241 - Spring 2011 Advanced Digital Integrated Circuits Lecture 22: Adders Announcements Homework #4 due on Monday Quiz #4 on Monday Final exam next Wedensday! 80 minutes, in class Project reports due Wednesday, May 3, noon 6 pages, double column Project presentations on Wednesday May 4 at 2pm in 2 Project presentations on Wednesday, May 4, at 2pm in BWRC 20min + 5 min Q&A
  • date post

    18-Feb-2021
  • Category

    Documents

  • view

    0
  • download

    0

Embed Size (px)

Transcript of EE241 - Spring 2011bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s11/...Kogge, Stone, Trans on...

  • 1

    EE241 Spring 2011EE241 - Spring 2011Advanced Digital Integrated Circuits

    Lecture 22: Adders

    AnnouncementsHomework #4 due on MondayQuiz #4 on MondayFinal exam next Wedensday!

    80 minutes, in classProject reports due Wednesday, May 3, noon

    6 pages, double columnProject presentations on Wednesday May 4 at 2pm in

    2

    Project presentations on Wednesday, May 4, at 2pm in BWRC

    20min + 5 min Q&A

  • 2

    OutlineLast lecture

    Domino logicThis lecture

    Other dynamic stylesDigital arithmetic

    Reading: Selected publications

    3

    g p

    Other Dynamic Logic Other Dynamic Logic Styles

  • 3

    Self-Resetting DominoSignals exist as pulses, not levels

    5

    Used in Pentium 4 (130nm generation)

    Pulsed Static CMOS

    RH – Reset highRL – Reset low

    6

    Fast pull-up Fast pull-down

    Chen, Ditlow, US Pat. 5,495,188 Feb. 1996.

  • 4

    Sense-Amplifying Logic

    Matsui,JSSC 12/94

    7

    SA-F/F

    8

    Falling edge Rising edge

  • 5

    Dynamic Logic with SA-F/F

    9

    Example

    10

  • 6

    4-Bit Adder

    11

    20-Bit Carry-Skip Adder

    12

  • 7

    Pentium 4 (Prescott) 7GHz PathDeleganes, ISSCC’04

    I

    13

    Sense Amplifier

    14

    Can build in logic if needed

  • 8

    Timing

    15

    Carry-Skip Adder

    16

  • 9

    AddersAdders

    Arithmetic Circuits

    Chapter 11, Rabaey, 2nd ed.Selected journal publicationsBooks:

    Ercegovac and Lang, “Digital Arithmetic” Elsevier 2004High-Speed VLSI Arithmetic Units: Adders and Multipliers, by V. Oklobdzija in Chandrakasan et al.

    18

  • 10

    AddersEE141

    Ripple carry & implementationCarry bypass (skip)Carry selectCarry lookahead (basic)

    EE241Conditional sum

    19

    More carry lookahead

    Conditional Sum Adders

    0i i is x y i i iy1i i is x y

    0oi i ic x y 1oi i ic x y

    20

    Sklansky,Trans on Comp6/60

  • 11

    Conditional Sum Adders

    21

    TG Conditional Sum

    Conditional CellConditional Sum Adder

    22

    2-way MUXes

    Rothermel, JSSC 89

  • 12

    TG Conditional Sum

    Serial connection of transmission gatesSerial connection of transmission gates Chain length = 1+log2n

    23Signal propagation

    DPL Conditional Sum

    24CLA

    “Conditional carry select”

  • 13

    DPL Conditional Sum

    Block Conditional Sums

    25

    Carry-Lookahead AddersAdder trees

    Radix of a treeMinimum depth treesSparse trees

    Logic manipulationsConventional vs. LingStack height limiting

    26

  • 14

    Lookahead Adder: Basic Idea

    AN-1, BN-1A1, B1 • • •A0, B0

    P1 PN-1Ci, N-1P0Ci,0 Ci,1

    27

    S1 • • • SN-1S0

    , 1 , , ,, ,i k o k k k i k k k i kC C f A B C g p C

    Propagate and Generate Signals

    Define 2 (or 3) new variables which ONLY depend on inputs ak, bkDefine 2 (or 3) new variables which ONLY depend on inputs ak, bkGenerate (gk) = akbkPropagate (pk) = ak bk (could be XOR as well)(Delete = akBk)

    ,out k k k k inc g p g p c

    28

    Can also derive expressions for s and cout based on dkand pk

    ( , )k k k k ins g p a b c

  • 15

    Lookahead Adder

    Looakahead Equations

    1k k k kc g p c

    1 1 1

    1 1 1

    1 1 1 1

    k k k k

    k k k k k

    k k k k k k

    c g p cg p g p cg p g p p c

    Position k:

    Position k + 1:

    29

    Carry exists if:- generated in stage k + 1- generated in stage k and propagated through k + 1- propagated through both k and k + 1

    Lookahead Adder

    • Unrolling of carry recurrence can be continuedUnrolling of carry recurrence can be continued• If unrolled to level k, resulting in two-level AND-OR

    structure• AND Fan-In = k + 1, OR Fan-In = k + 1• k + 1 transistors in the MOS stack• Limits k to 2 – 4 • Later referred to as a radix of an adder

    30

  • 16

    Carry Lookahead Trees

    Co 0 G0 P0Ci 0+=

    Co 1 G1 P1 G0 P1P0 Ci 0+ +=

    Co 2 G2 P2G1 P2P1G0 P+ 2 P1P0Ci 0+ +=

    G2 P2G1+ = P2P1 G0 P0Ci 0+ + G2:1 P2:1Co 0+=

    31

    Can continue building the tree hierarchically

    Tree Adders

    lG ppP m more significantlmG ppP

    lmmG gpgG

    m – more significantl – less significant

    Start from the input P, G, and continue up the tree2-bit groups, then 4-bit groups, …

    PG )(

    32

    lmlmmllmmGG ppgpgpgpgPG ,,,),(

    Kogge, Stone, Trans on Comp,’73 Radix 2

  • 17

    Adder Structure

    33

    Carry tree and sum precompute operate in parallelSum select – selects the correct precomputed sum based on final carry

    Adder OptimizationIf given

    Input capacitance, Overall fanout (loading capacitance)Overall fanout (loading capacitance)Wiring structureAdder topology

    Optimization can be performed to:Minimize the delay subject to powerMinimize the power for given delay constraint

    34

  • 18

    Design Considerations for CLA Adders

    Wire capacitance is determined by the microarchitecture

    From register files / Cache / Bypass

    Carry signals cross certain number of bitslicesThe adder topology determines the wire capacitance

    weak function of gate sizingThe capacitance of wires depends on the tree topology and wiring/shielding methodology

    Adder stage 1

    Wiring

    Adder stage 2

    Wiring

    Adder stage 3

    Bit slice 0

    Bit slice 2

    Bit slice 1

    Bit slice 63

    Sum Select

    Shifter

    Multiplexers

    Loopback Bus

    From register files / Cache / Bypass

    Loopback Bus

    Loopback Bus

    35

    To register files / Cache

    Specifying the Output Capacitance

    Fanout is dictated by the architectureI It i h IEU d i 6 th IEU In Itanium, each IEU drives 6 other IEUs, register files and the cache, through a long busThus the fanout is larger than 15-20, but depends on the ratio of the IEU input capacitance compared to the bus capacitanceBus is driven through a buffer, thus reducing

    36

    the adder fanout to close to 1.

  • 19

    Specifying the Input CapacitanceLarger Cin:

    Less impact of internal wiresLess fanout (less impact of the buss)Faster adderPower grows linearly with Cin

    Smaller Cin:Larger impact of internal wiresLarger fanoutSlower, lower power adder

    Optimum tradeoff:

    37

    For desired dE/dD (for both adder and 6 IEUs) find optimal Cg/CwFor example dE/dD=2, Cg/Cw = 2.5-3

    Carry Tree ConsiderationsNumber of signals merging at each stage (radix)

    Uniform vs. non-uniformNumber of logic levels

    Full vs. sparse trees

    38

  • 20

    Tree Adders: Kogge-Stone

    S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10

    S 11

    S 12

    S 13

    S 14

    S 15

    39

    16-bit radix-2 Kogge-Stone Tree

    (A0,

    B0)

    (A1,

    B1)

    (A2,

    B2)

    (A3,

    B3)

    (A4,

    B4)

    (A5,

    B5)

    (A6,

    B6)

    (A7,

    B7)

    (A8,

    B8)

    (A9,

    B9)

    (A10

    , B10

    )

    (A11

    , B11

    )

    (A12

    , B12

    )

    (A13

    , B13

    )

    (A14

    , B14

    )

    (A15

    , B15

    )

    Tree Adders: Other TreesLadner-Fischer

    S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10

    S 11

    S 12

    S 13

    S 14

    S 15

    S S S S S S S S S S S S S S S S

    40

    (A0,

    B0)

    (A1,

    B1)

    (A2,

    B2)

    (A3,

    B3)

    (A4,

    B4)

    (A5,

    B5)

    (A6,

    B6)

    (A7,

    B7)

    (A8,

    B8)

    (A9,

    B9)

    (A10

    , B10

    )

    (A11

    , B11

    )

    (A12

    , B12

    )

    (A13

    , B13

    )

    (A14

    , B14

    )

    (A15

    , B15

    )

  • 21

    Tree Adders: Radix 4

    S0

    S1

    S2

    S3

    S4

    S5

    S6

    S7

    S8

    S9

    S10

    S11

    S12

    S13

    S14

    S15

    41

    (a0,

    b 0)

    (a1,

    b 1)

    (a2,

    b 2)

    (a3,

    b 3)

    (a4,

    b 4)

    (a5,

    b 5)

    (a6,

    b 6)

    (a7,

    b 7)

    (a8,

    b 8)

    (a9,

    b 9)

    (a10

    , b10

    )

    (a11

    , b11

    )

    (a12

    , b12

    )

    (a13

    , b13

    )

    (a14

    , b14

    )

    (a15

    , b15

    )

    16-bit radix-4 Kogge-Stone Tree

    Ling Adder

    CLA Ling’s equations

    :0 1:0

    1:0

    i i i

    i i i

    i i i i

    i i i i

    g a bp a bG g p GS a b G

    :0 1 1:0

    :0 1 1:0

    i i i

    i i i

    i i i i

    i i i i i i

    g a bt a bH g t HS t H g t H

    42Ling, IBM J. Res. Dev, 5/81

  • 22

    Ling Adder

    G g p g p p g p p p g Conventional radix-4

    3:0 3 3 2 3 2 1 3 2 1 0G g p g p p g p p p g

    3:0 3 2 2 2 1 1 2 1 0 0

    3 2 2 1 2 1 0

    H g t g t t g t t t gg g t g t t g

    Ling’s radix-4

    43

    Reduces the stack height (or width)Reduces input loading

    Ling vs. CLAConventional G3

    Ling’s H3

    C K

    a3

    b3

    a3 b3

    a2

    b2

    a2

    a1

    b2

    a1 b1

    G 3

    CK

    a3

    b3

    a2 a2

    b2 a1

    b1

    b1

    a0

    H3

    b2

    a1

    44

    b1 a0

    b0

    b1 a0

    b0

  • 23

    Ling vs. CLA: Sum Pre-Computation

    Conventional CLA Ling’s

    0

    1

    i i i

    i i i

    S a b

    S a b

    0

    11 1

    i i i

    i i i i i

    S a b

    S a b a b

    45

    Ling vs. CLA (64 bit)

    44

    49

    Radix-2 Ling0.5 FO4

    1 1

    19

    24

    29

    34

    39

    Ener

    gy [p

    J]

    Radix-4 LingRadix-2 CLARadix-4 CLA

    0.5 FO4

    1

    3 234

    46

    4

    9

    14

    7 9 11 13 15

    Delay [FO4]

    2 4

  • 24

    Ling vs. CLATradeoff between the first carry and the sum circuit complexity

    Later carry stages are unchanged from conventional CLAReducing the input loading and smaller stack speed up the carryReducing the input loading and smaller stack speed up the carry

    Sum gets more complexWith tight power constraints Ling is slower than CLA

    47

    Next LectureFinish addersWrap-up

    48