579.Trends.lect1.08

download 579.Trends.lect1.08

of 32

Transcript of 579.Trends.lect1.08

  • 7/29/2019 579.Trends.lect1.08

    1/32

    Lecture 1RAS 1

    Lecture 1

    Design and Technology Trends

    R. Saleh

    Dept. of ECE

    University of British Columbia

    [email protected]

  • 7/29/2019 579.Trends.lect1.08

    2/32

    Lecture 1RAS 2

    Recently Designed Chips

    Itanium chip (Intel), 2B tx, 700mm2 , 8 layer 65nm

    CMOS (4 processors) TILE64 Processor, 64-Core SoC with Mesh NoC

    Interconnect, 90nm CMOS

    153Mb-SRAM (Intel), 45nm, high-k metal-gate CMOS

    FPGAs recently fabricated in 45nm

    What are the major technology and design issues

    that are driving the IC industry?Lets start from the simple rules of MOS scaling

  • 7/29/2019 579.Trends.lect1.08

    3/32

    Lecture 1RAS 3

    MOS Transistor Scaling(1974 to present)

    Scaling factor s=0.7 per node (0.5x per 2 nodes)

    Metal pitch Technology Node

    set by 1/2 pitch

    (interconnect)

    Gate length

    (transistor)

    Poly width

  • 7/29/2019 579.Trends.lect1.08

    4/32

    Lecture 1RAS 4

    Ideal Technology Scaling (constant field)

    Quantity Before Scaling After Scaling

    Channel Length L L = L * sChannel Width W W = W * s

    Gate Oxide thickness tox tox = tox * s

    Junction depth xj xj = xj * s

    Power Supply Vdd Vdd = Vdd * s

    Threshold Voltage Vth Vth = Vth * s

    Doping Density, pn+

    NAND

    NA = NA / sND = ND / s

  • 7/29/2019 579.Trends.lect1.08

    5/32

    Lecture 1RAS 5

    Technology Nodes 1999-2019

    180nm 130nm 90nm 65nm 45nm 32nm 22nm 16nm

    1999 2001 2004 2007 2010 2013 2016 2019

    0.7x 0.7x

    0.5x

    N-1 N N+1

    Two year cycle between nodes until 2001, then 3 year cycle begins.

  • 7/29/2019 579.Trends.lect1.08

    6/32

    Lecture 1RAS 6

    Forecast Technology Parameters

    Year TechnologyNode(nm)

    PhysicalGate(nm)

    tox(nm)

    Dielec-tric K

    Vdd(V)

    Vth(V)

    Na(/cm

    3)

    Nd(/cm

    3)

    xj(nm)

    2001 130 90 3.0 3.7 1.2 0.34 1.0e16 1.0e19 67.5

    2004 90 53 2.4 3.0 1.1 0.32 1.4e16 1.4e19 46.7

    2007 65 32 1.7 2.5 0.9 0.29 2.0e16 2.0e19 33.8

    2010 45 22 1.5 2.0 0.8 0.29 2.9e16 2.9e19 23.4

    2013 32 16 1.4 1.9 0.7 0.25 4.0e16 4.0e19 16.6

    2016 22 11 1.3 1.7 0.6 0.22 5.9e16 5.9e19 11.4

  • 7/29/2019 579.Trends.lect1.08

    7/32

    Lecture 1RAS 7

    Where are we now?

    130nm and 90nm CMOS volume production

    Early production of 65nm, Leading-edge use of 45nm

    Scaling of gate is leading scaling of wire

    Scaling is driven by DIGITAL design needs

  • 7/29/2019 579.Trends.lect1.08

    8/32

    Lecture 1RAS 8

    Making Photolithograph Work

    Extensive use of OPC and PSM in 90nm and below:

  • 7/29/2019 579.Trends.lect1.08

    9/32

    Lecture 1RAS 9

    Deep Submicron Technology Generations

    Table 1: Time overlap of semiconductor generations

    Each generation spans ~17 yearswe are unlikely to be totally suprised

    Univerisity research Industry development Production

    95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12

    350nm

    1 2 3 4 5

    -2 -1 250nm

    1 2 3 4 5

    -4 -3 -2 -1 180nm

    1 2 3 4 5

    -6 -5 -4 -3 -2 -1 130nm

    1 2 3 4 5

    -9 -8 -7 -6 -5 -4 -3 -2 -1 90nm

    1 2 3 4 5

    -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 65nm

    1 2 3 4 5

    -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 45nm

    1 2

    -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1

    -11 -10 -9 -8 -7 -6 -5 -4

    6 7 8

    6

  • 7/29/2019 579.Trends.lect1.08

    10/32

    Lecture 1RAS 10

    MPU Trends - Moores Law

    4004400480088008

    80808080

    80858085 80868086

    286286

    386386

    486486PentiumPentium procproc

    P6P6

    0.0010.001

    0.010.01

    0.10.1

    11

    1010

    101000

    10001000

    10,00010,000

    7070 8080 9090 0000 1010

    TransistorsTransistors

    (MT)(MT)

    2X Growth2X Growth

    in 2 Years!in 2 Years!

    Transistors DoubleTransistors Double

    Every Two YearsEvery Two Years

    Source: Intel

  • 7/29/2019 579.Trends.lect1.08

    11/32

    Lecture 1RAS 11

    More MPU Trends

    PentiumPentium Pro procPro proc

    PentiumPentium procproc486486

    386386

    28628680868086

    80858085

    80808080

    80088008

    40044004

    40403636

    32322828

    11

    1010

    100100

    7070 8080 9090 0000 1010

    Die sizeDie size

    (mm)(mm)

    ~7% growth per year~7% growth per year

    ~2X growth in 10 years~2X growth in 10 years

    ~40mm Die in 2010?~40mm Die in 2010?

    Source: Intel

  • 7/29/2019 579.Trends.lect1.08

    12/32

    Lecture 1RAS 12

    Delay vs Fanout

    0

    1

    2

    3

    4

    5

    6

    0 2 4 6 8

    Fanout

    Delay

    =0.0

    =0.5

    =1.0

    =2.0

    where

    CIN Cload

    1X 4X 16X

    Delay Metric - FO4 Concept

    Use FO4 delay

    as optimal delay

  • 7/29/2019 579.Trends.lect1.08

    13/32

    Lecture 1RAS 13

    FO4 INV Delay Scaling

    0

    100

    200

    300

    400

    500

    600

    700

    0.20.40.60.811.2

    pS)

    Technology Ldrawn (um)

    Fanout

    FO4 delay 425ps * Ldrawn

    0

    100

    200

    300

    400

    500

    600

    700

    0.20.40.60.811.2

    FO4G

    atedelay(pS)

    Technology Ldrawn (um)

    Fanout =4 inverter delay at TT, 90% Vdd, 125 oC

    Ldrawn

    For scaling purposes, the alpha-power model is very useful:Idsat = K W Leff

    -0.5Tox-0.8 (Vgs -Vth)

    1.25

    If L,Tox

    V all scale (note V scaling will be limited by Vth

    scaling),

    Current should remains constant per micron of width (approx. 600 to 800uA/m)t = CV/i = st since C, V, i all scale down by s

  • 7/29/2019 579.Trends.lect1.08

    14/32

    Lecture 1RAS 14

    10

    100

    1000

    Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98

    8038680486Pentium

    Pentium II

    MPU Clock Frequency Trend

    Intel: Borkar/Parkhurst

  • 7/29/2019 579.Trends.lect1.08

    15/32

    Lecture 1RAS 15

    10

    100

    1000

    Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98

    8038680486

    PentiumPentium II

    Expon.

    MPU Clock Frequency Trend

    Intel: Borkar/Parkhurst

    Dec-99 Dec-00 Dec-01 Dec-02

    10000Forward projection

    may be too optimistic

    P4

  • 7/29/2019 579.Trends.lect1.08

    16/32

    Lecture 1RAS 16

    10.00

    100.00

    Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98

    8038680486PentiumPentium II

    MPU Clock Cycle Trend (FO4 Delays)

    Intel: Borkar/Parkhurst

  • 7/29/2019 579.Trends.lect1.08

    17/32

    Lecture 1RAS 17

    10.00

    100.00

    Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98

    8038680486PentiumPentium II

    Expon.

    MPU Clock Cycle Trend (FO4 Delays)

    Intel: Borkar/Parkhurst

    Dec-99 Dec-00 Dec-01 Dec-02

    Forward projection

    does not make sense

    Curve actually

    flattens at 14-16 FO4

  • 7/29/2019 579.Trends.lect1.08

    18/32

    Lecture 1RAS 18

    Power Trend - Ever Increasing

    Powerpe

    rchip[W]

    1980 1985 1990 1995 20000.01

    0.1

    1

    10

    100

    1000

    Year

    MPU

    x4/3ye

    ars

    DSP

    x1.4/3ye

    ars

    Processors

    published

    in ISSCC

  • 7/29/2019 579.Trends.lect1.08

    19/32

    Lecture 1RAS 19

    Dynamic vs. Leakage Power

    Krishnamurthy, et al., CICC 2002

    250nm 180nm 130nm 90nm 65nm

    Technology Node

    Power(watts)

    Dynamic Power

    Leakage Power

  • 7/29/2019 579.Trends.lect1.08

    20/32

    Lecture 1RAS 20

    Leakage Current Contributions

    130nm 90nm 65nm

  • 7/29/2019 579.Trends.lect1.08

    21/32

    Lecture 1RAS 21

    MPU Diminishing Returns

    Power knob running out Speed == Power 10W/cm2 limit for convection cooling, 50W/cm2 limit for forced-air cooling Large currents, large power surges on wakeup Cf. 125A supply current, 150W total power at 1.2V Vdd for EV8 (Compaq) die size will not continue to increase unless more memory is used to occupy

    the additional area additional power dissipation coming from subthreshold leakage

    Speed knob running out Historically, 2x clock frequency every process generation

    1.4x from device scaling 1.4x from pipelining, hence fewer logic stages (from 40-100 down to around 16 FO4

    INV delays) Clocks cannot be generated with period < 6-8 FO4 INV delays Around 14-16 FO4 INV delays is limit for clock period

    Unrealistic to continue 2x frequency trend!

  • 7/29/2019 579.Trends.lect1.08

    22/32

    Lecture 1RAS 22

    Low-Power Design Techniques

    Supply Voltage Scaling

    Frequency Scaling

    Multiple Supply Voltages (Voltage Islands)

    Clock Gating

    Power Gating

    Multiple Threshold Voltages: LVT, SVT, HVT

    Substrate Biasing

    Power Shut Off

    HW/SW Power Management

  • 7/29/2019 579.Trends.lect1.08

    23/32

    Lecture 1RAS 23

    Low-Power Application: PDA

    0.18um / 400MHz / 470mW (typical)

    CPU

    I-cache

    32KB

    D-cache

    32KB

    I2C

    FICP

    USB

    MMC

    UART AC97

    I2S

    OST

    GPIO

    SSP

    PWM RTC

    DMA controller

    LCDCnt.

    MEMCnt.

    PWR CPG

    SDRAM64MB

    Flash32MB

    LCDPeripheral Area4 48MHz

    Data TransferArea100MHz

    ProcessorArea

    Max 400MHz

    MM Application

    MP3JPEGSimple Moving Picture

    6.5MTrs.

    Available Time6-10Hr

    USB

    MMC

    KEY

    Sound

  • 7/29/2019 579.Trends.lect1.08

    24/32

    Lecture 1RAS 24

    Trends in Low-Power Design Content

    Today, such designs contain embedded processing enginessuch as CPU and DSP, and memory blocks such as SRAM and

    embedded DRAM As we scale technology and keep power constant how does the

    amount of logic vs. memory change?

    Consider the following assumptions to develop trends for on-

    chip logic/memory percentages Die size is 100mm2

    Clock frequency starts at 150MHz increases by about 40% pertechnology node

    Average power dissipation in limited to 100mW at 100o

    C Initial condition at Year 2001: area percentage 75% logic, 25%

    memory

  • 7/29/2019 579.Trends.lect1.08

    25/32

    Lecture 1RAS 25

    Logic/Memory Content Trend

    0%

    10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    2001 2004 2007 2010 2013 2016Year

    PercentageofArea(%)

    Logic Area Contribution (%) LSTP

    Total Memory Area (%) LSTP

    Die Size = 1cm2

  • 7/29/2019 579.Trends.lect1.08

    26/32

    Lecture 1RAS 26

    ASIC Core Composition Breakout

    0

    10

    20

    30

    40

    50

    60

    1999 2000 2001

    PercentgaeofDieArea

    (I/OsExclud

    ed)

    Random Logic

    Memory

    Analog

    Cores

    ASIC Logic/Memory Content Trends

    Source: Dataquest (2001)

  • 7/29/2019 579.Trends.lect1.08

    27/32

    Lecture 1RAS 27

    Design Trend: Productivity Gap

    Year Technology Chip Complexity ASIC Frequency

    1997 250 nm 50M Tr. 100MHz

    1999 180 nm 150M Tr. 200MHz

    2002 130 nm 250M Tr. 400MHz

    2004 90 nm 500M Tr. 600MHz

  • 7/29/2019 579.Trends.lect1.08

    28/32

    Lecture 1RAS 28

    Designing a 50M Transistor IC

    Gates Required ~12.5M

    Gates/Day (Verified) 1K (including memory) Total Eng. Days 12,500

    Total Eng. Years 35

    Cost/Eng./Year $200K

    Total People Cost $7M Other costs (masks, tools, etc.) $8M

    Actual Cost is $10-15M to get actual prototypes after fabrication.

  • 7/29/2019 579.Trends.lect1.08

    29/32

    Lecture 1RAS 29

    Productivity Gap

    Deep submicron (DSM) technology allows hundreds of millions

    of transistors to be integrated on a single chip

    Number of transistors that a designer can design per day(~1000 gates/day) is not going up significantly

    New design methodologies are needed to address the

    integration/productivity issues

    System on a chip Design with reusable IP

    new design methodology, IP development

    new HW/SW design and verification issues

    new test issues

  • 7/29/2019 579.Trends.lect1.08

    30/32

    Lecture 1RAS 30

    SoC Design Hierarchy

    SOC consists of new logic blocks and existing IP

    New Logic blocks

    Existing IP including memory

    Each logic block can be implementedby newly designed portion and a re-useportion based on IPs

    Newly designed portion

    Re-use portion including memory

  • 7/29/2019 579.Trends.lect1.08

    31/32

    Lecture 1RAS 31

    SoC Platform Design Concept

    SoC Verification Flow

    System-Level PerformanceEvaluationRapid Prototype for

    End-Customer Evaluation

    SoC Derivative DesignMethodologies

    System-level performanceevaluation environment

    HW/SW Co-synthesis

    SoC IC Design Flows

    ApplicationSpace

    Methodology / Flows:

    Foundation Block

    MEM

    FPGACPU Processor(s), RTOS(es)

    and SW architecture

    *IP can be hardware (digitalor analog) or software.

    IP can be hard, soft orfirm (HW), source orobject (SW)

    Scaleable

    bus, test, power, IO,

    clock, timing architectures

    + Reference Design

    Foundry-SpecificPre-Qualification

    Programmable IP

    SW IP

    Hardware IP

    Pre-Qualified/VerifiedFoundation-IP*

  • 7/29/2019 579.Trends.lect1.08

    32/32

    Lecture 1RAS 32

    Purpose of this Course

    This course addresses SoC/IP design in DSM technologies

    It is a very broad subject, one that industry is grappling with on a

    daily basis one course cannot address all the issue properly The goal is to present an overview of the various issues from

    Systems to Silicon to provide a perspective on what ishappening in technology and design.

    We will begin with the Systems Level and work our way down tothe Silicon Level

    The projects, presentations, and assignments will provide in-depth analysis of the subjects that are of interest to you