EE241 - Spring 2011bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s11/...Carry Tree Considerations...

15
1 EE241 Spring 2011 EE241 - Spring 2011 Advanced Digital Integrated Circuits Lecture 23: Wrap-up Announcements Homework #4 due today Quiz #4 today Final exam on Wedensday! 80 minutes, in class Project reports due next Wednesday, May 4, noon 6 pages, double column Project presentations next Wednesday May 4 at 2pm in 2 Project presentations next Wednesday, May 4, at 2pm in BWRC 20min + 5 min Q&A

Transcript of EE241 - Spring 2011bwrcs.eecs.berkeley.edu/Classes/icdesign/ee241_s11/...Carry Tree Considerations...

  • 1

    EE241 Spring 2011EE241 - Spring 2011Advanced Digital Integrated Circuits

    Lecture 23: Wrap-up

    AnnouncementsHomework #4 due todayQuiz #4 todayFinal exam on Wedensday!

    80 minutes, in classProject reports due next Wednesday, May 4, noon

    6 pages, double columnProject presentations next Wednesday May 4 at 2pm in

    2

    Project presentations next Wednesday, May 4, at 2pm in BWRC

    20min + 5 min Q&A

  • 2

    OutlineLast lecture

    Other dynamic logic stylesAdders: Conditional sum and carry-lookahead

    This lectureFinish addersPerspective

    Reading: Selected publications

    3

    g p

    AddersAdders

  • 3

    Carry Tree ConsiderationsNumber of signals merging at each stage (radix)

    Uniform vs. non-uniformNumber of logic levels

    Full vs. sparse trees

    5

    Tree Adders: Kogge-Stone

    S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10

    S 11

    S 12

    S 13

    S 14

    S 15

    6

    16-bit radix-2 Kogge-Stone Tree

    (A0,

    B0)

    (A1,

    B1)

    (A2,

    B2)

    (A3,

    B3)

    (A4,

    B4)

    (A5,

    B5)

    (A6,

    B6)

    (A7,

    B7)

    (A8,

    B8)

    (A9,

    B9)

    (A10

    , B10

    )

    (A11

    , B11

    )

    (A12

    , B12

    )

    (A13

    , B13

    )

    (A14

    , B14

    )

    (A15

    , B15

    )

  • 4

    Tree Adders: Other TreesLadner-Fischer

    S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10

    S 11

    S 12

    S 13

    S 14

    S 15

    S S S S S S S S S S S S S S S S

    7

    (A0,

    B0)

    (A1,

    B1)

    (A2,

    B2)

    (A3,

    B3)

    (A4,

    B4)

    (A5,

    B5)

    (A6,

    B6)

    (A7,

    B7)

    (A8,

    B8)

    (A9,

    B9)

    (A10

    , B10

    )

    (A11

    , B11

    )

    (A12

    , B12

    )

    (A13

    , B13

    )

    (A14

    , B14

    )

    (A15

    , B15

    )

    Tree Adders: Radix 4

    S0

    S1

    S2

    S3

    S4

    S5

    S6

    S7

    S8

    S9

    S10

    S11

    S12

    S13

    S14

    S15

    8

    (a0,

    b 0)

    (a1,

    b 1)

    (a2,

    b 2)

    (a3,

    b 3)

    (a4,

    b 4)

    (a5,

    b 5)

    (a6,

    b 6)

    (a7,

    b 7)

    (a8,

    b 8)

    (a9,

    b 9)

    (a10

    , b10

    )

    (a11

    , b11

    )

    (a12

    , b12

    )

    (a13

    , b13

    )

    (a14

    , b14

    )

    (a15

    , b15

    )

    16-bit radix-4 Kogge-Stone Tree

  • 5

    Ling Adder

    CLA Ling’s equations

    :0 1:0

    1:0

    i i i

    i i i

    i i i i

    i i i i

    g a bp a bG g p GS a b G

    :0 1 1:0

    :0 1 1:0

    i i i

    i i i

    i i i i

    i i i i i i

    g a bt a bH g t HS t H g t H

    9Ling, IBM J. Res. Dev, 5/81

    Ling Adder

    G g p g p p g p p p g Conventional radix-4

    3:0 3 3 2 3 2 1 3 2 1 0G g p g p p g p p p g

    3:0 3 2 2 2 1 1 2 1 0 0

    3 2 2 1 2 1 0

    H g t g t t g t t t gg g t g t t g

    Ling’s radix-4

    10

    Reduces the stack height (or width)Reduces input loading

  • 6

    Ling vs. CLAConventional G3

    Ling’s H3

    C K

    a3

    b3

    a3 b3

    a2

    b2

    a2

    a1

    b2

    a1 b1

    G 3

    CK

    a3

    b3

    a2 a2

    b2 a1

    b1

    b1

    a0

    H3

    b2

    a1

    11

    b1 a0

    b0

    b1 a0

    b0

    Ling vs. CLA: Sum Pre-Computation

    Conventional CLA Ling’s

    0

    1

    i i i

    i i i

    S a b

    S a b

    0

    11 1

    i i i

    i i i i i

    S a b

    S a b a b

    12

  • 7

    Ling vs. CLA (64 bit)

    44

    49

    Radix-2 Ling0.5 FO4

    1 1

    19

    24

    29

    34

    39

    Ener

    gy [p

    J]

    Radix-4 LingRadix-2 CLARadix-4 CLA

    0.5 FO4

    1

    3 234

    13

    4

    9

    14

    7 9 11 13 15

    Delay [FO4]

    2 4

    Ling vs. CLATradeoff between the first carry and the sum circuit complexity

    Later carry stages are unchanged from conventional CLAReducing the input loading and smaller stack speed up the carryReducing the input loading and smaller stack speed up the carry

    Sum gets more complexWith tight power constraints Ling is slower than CLA

    14

  • 8

    Sparse TreesNot all the carries are calculated

    Only every 2nd or 4th

    R d d i t itReduced input capacitanceMore complex sum

    Sparseness of 2 doubles the sum select loadingEffectively shifting the fanout towards back

    15

    Sparse Trees

    S1

    S3

    S5

    S7

    S9

    S11

    S13

    S15

    S0

    S2

    S4

    S6

    S8

    S10

    S12

    S14

    16

    (a0,

    b0)

    (a1,

    b1)

    (a2,

    b2)

    (a3,

    b3)

    (a4,

    b4)

    (a5,

    b5)

    (a6,

    b6)

    (a7,

    b7)

    (a8,

    b8)

    (a9,

    b9)

    (a10

    , b10

    )

    (a11

    , b11

    )

    (a12

    , b12

    )

    (a13

    , b13

    )

    (a14

    , b14

    )

    (a15

    , b15

    )

    16-bit radix-2 sparse Kogge-Stone tree with sparseness of 2 (Han-Carlson)

  • 9

    Sparse TreesPrecomputed sums for sparse Ling adderEven bit sums unchangedOdd bit lOdd bit sums complex

    2211111

    110

    iiiiiiiii

    iiiii

    babababaS

    babaS

    17

    Sparse TreesLadner-Fischer

    18

  • 10

    Sparse TreesFull trees

    19

    24

    29

    pJ]

    1-1-1-1-1-12-2-2-2-2-14-4-4-4-2-18-8-8-4-2-116-16-8-4-2-132-16-8-4-2-1

    123456

    1 2 3

    Lateral fanout

    Sparse-2 trees

    19

    24

    29

    [pJ]

    1-1-1-1-1-12-2-2-2-2-14-4-4-4-2-18-8-8-4-2-116-16-8-4-2-132-16-8-4-2-1

    123456

    1 2 3 4

    5

    4

    9

    14

    19

    7 9 11 13 15Delay [FO4]

    Ener

    gy [p

    456

    4

    9

    14

    7 9 11 13 15Delay [FO4]

    Ener

    gy

    Lateral fanout

    6

    Sparse-4 trees

    29

    1-1-1-1-1-12-2-2-2-2-14 4 4 4 2 1

    123

    1

    194

    9

    14

    19

    24

    7 9 11 13 15Delay [FO4]

    Ene

    rgy

    [pJ]

    4-4-4-4-2-18-8-8-4-2-116-16-8-4-2-132-16-8-4-2-1

    3456

    2

    45

    6

    3

    Lateral fanout

    Zlatanovici, JSSC’09

    Other Sparse Trees

    20Mathew, VLSI’02

  • 11

    Intel’s 65nm 32b ALU

    WijeratneISSCC’06

    21

    Radix-2 carry tree generates every fourth carry 73% fewer carry-merge gates 80% reduction in wiring complexity

    vs. dense parallel-prefix

    adders

    Grouping Gates

    29

    Radix-4 sparse-2 Radix-4 Ladner-Fischer sparse-4

    14

    19

    24

    29

    Ene

    rgy

    [pJ]

    Radix-2 full

    Radix-4 full

    Radix-2 Ladner-Fischer full

    g = grouped sizing f = flat sizing

    gg

    g

    f fg

    f

    22

    4

    9

    7 8 9 10 11 12 13 14 15Delay [FO4]

    g

    f f

  • 12

    Sparse Trees

    1-1-1-1-1-1 trees

    24

    29FullSparse-2Sparse-4

    123

    32-16-8-4-2-1 trees

    24

    29

    FullSparse-2Sparse-4

    123

    4

    9

    14

    19

    24

    7 9 11 13 15Delay [FO4]

    Ener

    gy [p

    J] Sparseness

    1

    23

    4

    9

    14

    19

    24

    7 9 11 13 15Delay [FO4]

    Ener

    gy [p

    J]

    Sparseness

    1

    2 3

    23

    What is the fastest 64-b adder?

    32-Bit Adders

    24Patil, ARITH’07

  • 13

    Hybrid Adders

    25Dobberpuhl, JSSC 11/92 DEC Aplha 21064

    DEC AdderCombination:

    8-bit tapered pre-discharged Manchester carry chains, with Cin = 0 d C 1and Cin = 1

    32-bit LSB carry-lookahead32-bit MSB conditional sum adderCarry-select on most significant bitsLatch-based timing

    26

  • 14

    Another DEC Adder

    27

    Propagate-kill cell

    Group propagate-kill Kowaleski, ISSCC’96

    This ClassTried to put design choices in perspective of technologyThe design constraints have changed and will be changing

    Cost, energy, (power, leakage, …), performanceStressed on variability, power-performance tradeoffsDid not cover multipliers, power regulation/distribution

    28

    p , p g(class projects), I/O

  • 15

    This FieldMoore’s law will end sometime during your (my?) career

    28nm in 2011 scales to 0.1nm by 2050 with 2-yr cycles (or to 1nm ith 4 l )with 4-yr cycles)

    Physics will stop CMOS somewhere around 5nmWe will see a different CMOS device beforehand

    Economics will likely stop it earlierAnd the nodes will be stretched out

    29

    Don’t worry: There is plenty of problems that we don’t know how to solve today, and they will be around for a while!

    Even filling 10B/100B/1 trillion transistor chips with SRAM is not trivial!

    Technology Strategy / Roadmap 2000 2005 2010 2015 2020 2025 2030

    Plan A: Extending Si CMOSPlan A: Extending Si CMOS

    Plan B: Subsytem IntegrationPlan B: Subsytem Integration

    R D

    R D

    30

    Plan C: Post Si CMOS Options Plan C: Post Si CMOS Options

    R R&D

    Plan Q:Plan Q:

    R D

    Quantum ComputingQuantum Computing

    T.C. Chen, Where Si-CMOS is going: Trendy Hype vs. Real Technology, ISSCC’06