High-Bandwidth Memory Interface Design

download High-Bandwidth Memory Interface Design

of 86

description

High-Bandwidth Memory Interface Design Lecture

Transcript of High-Bandwidth Memory Interface Design

  • 5/25/2018 High-Bandwidth Memory Interface Design

    1/86

    High-Bandwidth Memory Interface Design

    Chulwoo [email protected]

    Dept. of Electrical Engineering

    Korea University, Seoul, Korea

    February 17, 2013

    Chulwoo Kim 1 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    2/86

    Outline

    Introduction

    Clock Generation and Distribution

    Transceiver Design

    TSV Interface for DRAM

    Summary

    References

    Chulwoo Kim 2 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    3/86

    Outline

    Introduction DRAM 101

    Simplified DRAM Architecture and Operation

    Differences of DRAM (DDRx, GDDRx, LPDDRx)

    Trend

    Memory Interface: Differences and Issues

    Clock Generation and Distribution

    Transceiver Design

    TSV Interface for DRAM

    Summary

    References

    Chulwoo Kim 3 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    4/86

    D D D DD D D D

    CLK

    DQ

    SDRAM

    SDRSingle Data Rate

    DDR

    Double Data Rate

    Main MemoryDDRxPC, Notebook, Server

    Graphics MemoryGDDRxGraphic Card, Console

    Mobile MemoryLPDDRxPhone, Tablet PC

    CLK

    DQ D

    CLK

    DQ D D

    CLK

    Command C CAS* Latency

    Burst Length

    MCU

    SDRAM

    DRAM 101

    SynchronousDynamic

    RandomAccessMemory

    Introduction

    CLK &

    CommandData

    *CAS : Column Address Strobe

    Chulwoo Kim 4 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    5/86

    DRAM DDR4 Die Photo

    [1] K. B. Koo et al., ISSCC 2012, pp. 40-41

    Bank0

    Bank1

    Bank2

    Bank3

    Bank8

    Bank9

    Bank10

    Bank11

    Bank

    4

    Bank

    5

    Bank

    6

    Bank

    7

    Bank

    12

    Bank

    13

    Bank

    14

    Bank

    15

    Supply Voltage VDD=1.2V, VPP=2.5V

    Process 38nm CMOS /3-metal

    Banks 4-Bank Group, 16 Bank

    Data Rate 2400 Mbps

    Number of IOs X4 / X8

    IntroductionChulwoo Kim 5 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    6/86

    Bank

    Simplified DRAM Architecture

    Bank

    Peripheral Circuit

    Cell Array

    Column Repair FuseWrite Drv. / Read Amp.

    Column Decoder

    RowRepairF

    use

    RowDecoder

    WordLineDr

    iver

    CLK/ADD/CMD Buffer

    CMDController

    DLL

    Gener

    ator

    BLSA*

    BLT BLB

    WL

    ICLKDCLK

    DQ TX

    Serial toparallel

    Parallelto serial

    DQ RX

    Bank Bank

    * BLSA : Bit line sense amplifier

    IntroductionChulwoo Kim 6 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    7/86

    Concept of DRAM operation

    Bank Bank

    Bank Bank

    *BLSA : Bit line senseamplifier

    *Np: Number of

    pre-fetch*Ndq: Number of DQ

    Peripheral Circuit

    GIO

    Ndq bitsNdq bits

    WRITE: Serial to parallel

    (DQ GIO)

    READ

    : Parallel to serial

    (GIO DQ)

    DQ RX DQ TX

    Serial toparallel

    Parallelto serial

    BLSABLSANpNdq

    NpNdq bits

    *GIO : Global I/O

    IntroductionChulwoo Kim 7 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    8/86

    tCCD*=1

    RD RD

    GIO GIOGIO

    Pre-fetch Timing(DDR1,BL*=2)

    0

    [2] JEDEC, JESD79F, pp. 24-29

    1 0 1

    DQS

    DQ

    CLK

    Number of GIO channel=NpNdq=28=16 (DDR1 x8)

    After CL*

    * tCCD : CAS to CAS delay * CL : CAS latency

    * BL : Burst length

    Introduction

    BL*=2

    Chulwoo Kim 8 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    9/86

    Pre-fetch Diagram(DDR1)

    Num. of GIO channel= 2Ndq

    Pre-fetch operation 2-bit pre-fetch

    [2Ndq] data access

    (If the output data rate is 400Mbps, the internal data rate is200Mbps)

    Bank Bank Bank Bank

    Bank Bank Bank Bank

    IntroductionChulwoo Kim 9 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    10/86

    tCCD=2

    RD RD

    GIO GIOGIO

    Pre-fetch Timing(DDR2,BL=4)

    [3] JEDEC, JESD79-2F, pp. 35

    0 1 2 3 0 1 2 3

    DQS

    DQ

    CLK

    Number of GIO channel=NpNdq=48=32 (DDR2 x8)

    * RL : READ latency

    After RL*

    Introduction

    BL=4

    Chulwoo Kim 10 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    11/86

    Pre-fetch Diagram(DDR2)

    Num. of GIO channel= 4Ndq

    Pre-fetch operation 4-bit pre-fetch

    [4Ndq]data access

    (If the output data rate is 800Mbps, the internal data rate is200Mbps, same as DDR1)

    Bank Bank Bank Bank

    Bank Bank Bank Bank

    IntroductionChulwoo Kim 11 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    12/86

    tCCD=4

    RD RD

    GIO GIOGIO

    Pre-fetch Timing(DDR3,BL=8)

    [4] JEDEC, JESD79-3F, pp. 62

    0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

    DQS

    DQ

    CLK

    Number of GIO channel=NpNdq=88=64(DDR3 x8)

    After RL

    Introduction

    BL=8

    Chulwoo Kim 12 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    13/86

    Pre-fetch Diagram(DDR3)

    Num. of GIO channel= 8Ndq

    Pre-fetch operation 8-bit pre-fetch

    [8Ndq]data access

    (If the output data rate is 1.6Gbps, the internal data rate is200Mbps, same as DDR1)

    Bank Bank Bank Bank

    Bank Bank Bank Bank

    IntroductionChulwoo Kim 13 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    14/86

    [5] JEDEC, JESD79-4, pp. 77-78[6] T. Y. Oh et al., ISSCC 2010, pp. 434-435

    Bank Grouping Timing(DDR4,BL=8)

    0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

    DQS

    DQ

    tCCD_S=4 tCCD_L=5

    RDG0

    RDG1

    RDG1

    GIO_BG0

    GIO_BG1 GIO_BG1

    GIO_BG0

    GIO_BG1

    GIO_BG2

    GIO_BG3

    CLK

    Number of GIO channel=NpNdqNgroup=884 =256(DDR4 x8)

    After RL

    Introduction

    BL=8

    Chulwoo Kim 14 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    15/86

    GIOMUX

    [1] K. B. Koo et al., ISSCC 2012, pp. 40-41

    Pre-fetch & Bank Grouping(DDR4)

    Num. of GIO channel= 8Ndq

    Bank Bank Bank Bank

    Bank Bank Bank Bank

    Group0 Group1

    Group2 Group3

    Pre-fetch operation 8-bit pre-fetch

    Bank grouping

    IntroductionChulwoo Kim 15 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    16/86

    DDRx GDDRx LPDDRx

    Architecture

    Application PC/Server Graphic card Mobile/Consumer

    Socket DIMM On board MCP*/PoP*/SiP*

    IO 4/8 16/32 16/32

    UniqueFunction

    Single uni-directionalWDQS, RDQS

    VDDQ terminationCRC, DBIABI

    No DLLDPD*

    PASR*TCSR*

    Differences of DDRx,GDDRx,LPDDRx

    Bank

    PAD

    Bank

    Bank Bank PAD

    Bank Bank

    Bank Bank

    PADBank

    PAD

    Bank

    Bank Bank

    * MCP: Multi chip package* PoP : Package on package* SiP : System in package

    * DPD: Deep power down* PASR : Partial array self refresh* TCSR : Temperature compensated self refresh

    IntroductionChulwoo Kim 16 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    17/86

    DDR Comparison

    DDR1 DDR2 DDR3 DDR4

    VDD [V] 2.5 1.8 1.5 1.2

    Data Rate[bps/pin]

    200M~400M 400M~800M 800M~2.1G 1.6G~3.2G

    Pre-Fetch 2 bit 4 bit 8 bit 8 bit

    STROBE Single DQS Differential DQS, DQSB

    Interface SSTL_2 SSTL_18 SSTL_15 POD_12

    New

    Feature

    OCD calibrationODT

    Dynamic ODTZQ calibrationWrite leveling

    CA parityDBI*, CRC*Gear down

    CAL* PDA*FGREF * TCAR*Bank grouping

    * DBI: Data bus inversion* CRC: Cyclic redundancy check* CAL: Command address latency

    * PDA: Per DRAM addressability* FGREF: Fine granularity refresh* TCAR: Temperature controlled array refresh

    IntroductionChulwoo Kim 17 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    18/86

    GDDR Comparison

    GDDR1 gDDR2 GDDR3 GDDR4 GDDR5

    VDD [V] 2.5 1.8 1.5 1.5 1.5/1.35

    Data Rate[bps/pin]

    300~900M 800M~1G 700M~2.6G 2.0G~3.0G 3.6G~7.0G

    Pre-Fetch 2 bit 4 bit 4 bit 8 bit 8 bit

    STROBE Single DQSDifferentialBi-direction

    DQS*, DQSBSingle Uni-direction WDQS, RDQS

    Interface SSTL_2 SSTL_2 POD-18 POD-15 POD-15

    NewFeature

    OCD*calibration

    ODT*

    ZQ DBIParity(opt)

    No DLLPLL(option)

    WCK, WCKBCRC ABI*RDQS(option)Bank grouping

    * DQS: DQ strobe signal, DQ is dada I/O Pin* OCD: Off chip driver

    * ODT: On die termination* ABI: Address bus inversion

    IntroductionChulwoo Kim 18 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    19/86

    LPDDR Comparison

    LPDDR1 LPDDR2 LPDDR3

    VDD [V] 1.8 1.2 1.2

    Data Rate[bps/pin]

    200M~400M 200M~1066M 333M~1600M

    Pre-Fetch 2 bit 4 bit 8 bit

    STROBE DQS DQS_T, DQS_C DQS_T, DQS_C

    Interface SSTL_18* HSUL_12* HSUL_12*

    DLL X X X

    NewFeature

    CA pin ODT

    (High tapped termination)

    * SSTL: Stub series terminated logic* HSUL: High speed un-terminated logic

    IntroductionChulwoo Kim 19 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    20/86

    Trend

    2.5

    1.5

    1.8

    0.2 0.4 0.8 1.2 1.6 2.0

    1.2

    2.4

    DDR1

    GDDR1

    7.0

    Although all types of DRAMs arereaching their limits in supply voltage,the demand of high-bandwidthmemory is keep increasing

    DDR2GDDR3

    DDR4

    LPDDR2

    LPDDR3

    2.8 3.2 3.6

    VDD

    [V]

    Data Rate [Gbps]

    LPDDR1

    DDR3

    gDDR2

    GDDR4 GDDR5

    IntroductionChulwoo Kim 20 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    21/86

    Memory Interface

    System Feature Single-ended/high speed

    Many channel(weak for coupling effect)

    DDR: multi-drop(multi rank, multi DIMM)

    GDDR: point to point

    Impedance discontinuities(stubs, connector, via, etc. )

    Issue Reflection

    Inter-symbol interference

    Simultaneous switching output

    noise Pin to pin skew

    Poor transistor performance

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    CPU

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    GPUDRAMDRAM

    DRAM DRAM

    DRAM

    DRAM

    IntroductionChulwoo Kim 21 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    22/86

    Outline

    Introduction Clock Generation and Distribution

    Delay-locked loop (DLL)

    Duty cycle corrector (DCC)

    Clock distribution

    Transceiver Design

    TSV

    Conclusions

    References

    Chulwoo Kim 22 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    23/86

    Basic DLL Architecture

    Variable

    Delay LineReplicaDelay

    ControllerPD

    DRAMExternal

    Clock

    Data

    tD1 tDREPtDVDL

    I_CLK

    FB_CLK

    O_CLK

    I_CLKFB_CLK

    O_CLK

    Clock

    Data

    tD2

    DATA frommemory core

    Clock Generation and Distribution

    tD1

    tD2

    tDREP

    tCK N = tDVDL +tDREP

    tDREP tD1 +tD2

    tCK N = tDVDL +tD1 +tD2 +

    = tDREP (tD1+tD2)

    tDVDL

    Chulwoo Kim 23 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    24/86

    Replica Delay Mismatch

    Valid

    Data

    Window

    tCK

    tDQSCK* (or tAC)

    Long

    Short

    V

    DD

    HVDD

    LVDD

    tDQSCK (or tAC) tDQSCK (or tAC)

    V

    DD

    HVDD

    LVDD

    Valid

    Data

    Window

    Valid

    Data

    Window

    variation [ps]

    Supply Voltage [V]

    *tDQSCK (or tAC) DQS output access time for CK/CKb

    Clock Generation and Distribution

    >

    Chulwoo Kim 24 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    25/86

    Locking Range Considerations

    [7] H.-W. Lee et al., submitted to TVLSI

    tCK

    tDQSCK (or tAC)

    Birds beak

    I_CLK

    I_CLK

    FB_CLK

    FB_CLK

    tDINIT+tDREPtDREQUIRED

    Clock Generation and Distribution

    tDINIT+tDREP tDREQUIRED

    tDINIT= tDVDL(0)+ tDREP

    Chulwoo Kim 25 of 86

    Short

    Lon

    g

    NtCK > tDVDL(0)+ tDREP

    tCK = tDVDL+ tDREP+ t

  • 5/25/2018 High-Bandwidth Memory Interface Design

    26/86

    Delay Measure Delay Line

    Replicate Delay Line

    Clock

    OUT

    tD1

    tD2

    tD1+tD

    2tD

    3

    Synchronous Mirror Delay (SMD)

    Basic Operation

    Measure and replicate the delay

    No feedback

    Match delay in two cycles

    tD1

    tD1+tD2

    tD3 tD3 tD2

    OUT

    I_CLK

    Clock

    ReplicateMeasure

    Replica

    Delay

    [8] T. Saeki et al., ISSCC 1996, pp. 374-375

    Clock Generation and Distribution

    I_CLK

    Chulwoo Kim 26 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    27/86

    Disadvantages of SMD

    Disadvantages Mismatch between replica delay and input buffer & clock

    distribution

    Coarse resolution

    Input jitter multiplication

    Delay Measure Delay Line

    Replicate Delay Line

    lock

    OUT

    tD1

    tD2

    tD1+tD2 tD3 Clock

    Clock

    w/o jitter

    w/ jitter

    tD1

    tD1+tD2

    tCK-(tD1+tD2) tD2

    OUT

    tCK-(tD1+tD2)+2

    - +

    OUTInput pk-pk

    jitter() Output pk-pk

    jitter(2)

    tCK-(tD1+tD2)+2

    tCK

    tD1

    tD1+tD2

    tD2

    +2

    Clock Generation and Distribution

    I_CLK

    Chulwoo Kim 27 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    28/86

    Register Controlled DLL

    Locking information is stored digitally in register

    Vernier type delay line increases resolution

    [9] A. Hatakeyama et al., ISSCC 1997, pp. 72-73

    tD+ tD+ tD+ tD+

    tD tD tD tD tD

    SW0 SW1 SW2 SW3 SW4

    IN

    OUT

    tD+

    tD

    fan-out=2

    fan-out=1

    SW(n-1) SW(n)

    Sub Delay Line

    Main Delay Line

    Sub Delay Line

    Main Delay Line

    Clock Generation and DistributionChulwoo Kim 28 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    29/86

    SingleRegisterControlledDelayLine

    Clock Generation and Distribution

    Fine Delay

    Controller

    I_CLKCSL1 CSL2 CSL3

    IN1

    IN2

    OUT12PhaseMixer

    1-K

    K

    IN1

    IN2

    OUT12

    OUT1

    OUT2

    OUT12

    OUT1

    IN2

    IN1

    OUT2

    tUD

    tUD

    Coarse Delay

    UP/DN*

    from PD

    *DN=Down

    Chulwoo Kim 29 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    30/86

    Boundary Switching Problem

    IN1(1-K)+IN2K

    I_CLK

    Shift left

    Passing through 4 UDCs

    IN1

    IN2

    OUT12PhaseMixer

    UDC*

    Passing through 3 UDCs

    Clock Generation and Distribution

    tUD

    IN1K=0

    IN2K=1

    tUD

    IN1K=0

    IN2K=1

    K=0.9

    K=0.9

    Coarse shift & finereset do not occursimultaneously

    Chulwoo Kim 30 of 86

    *UDC=Unit delay cell

  • 5/25/2018 High-Bandwidth Memory Interface Design

    31/86

    Seamless Boundary Switching

    Clock

    Shiftleft

    Unit Delay CellIN1(1-K)+IN2K

    Dual Coarse Delay Line

    tUD

    K(0K1)

    IN1K=0

    IN2K=1

    IN1

    IN2

    PhaseMixer

    OUT12

    Clock Generation and Distribution

    K=0.9

    [10] J.-T. Kwak et al., VLSI 2003, pp. 283-284

    tUD

    IN2K=1

    IN1K=0

    K=1.0

    Fine set first

    and thencoarse shift

    Chulwoo Kim 31 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    32/86

    Adaptive Bandwidth DLL w/ SDVS*

    Variable

    Delay Line

    Replica

    Delay

    ControllerPD

    I_CLK

    FB_CLK

    Update PeriodPulse Gen.

    O_CLK To Upper BlockNCODE

    I_CLK

    UpdatePulse

    FB_CLK

    Update PeriodmtCK-tDREP+tDREP=mtCKm=2,BWDLL=1/(2tCK)

    [11] H.-W. Lee et al., ISSCC 2011, pp. 502-504

    Clock Generation and Distribution

    6

    8

    10

    12

    14

    16

    18

    DN BASE UP

    15.9 ps

    10.2 ps

    7.8 ps6

    10

    14

    18

    Low-SpeedMode

    High-SpeedMode

    Base

    [ps]

    Fine Unit Delay vs. Mode

    Update Pulse

    *SDVS: Self-dynamic voltage scaling

    Chulwoo Kim 32 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    33/86

    Duty Cycle Corrector (DCC)

    DCC Reduces duty cycle error

    Enlarges valid data window for DDR

    Needs to correct 15% duty error at max speed

    Can be implemented either in analog or digital type

    DCC Design Issues

    Location of DCC (before/after DLL)

    Embedded in DLL or not

    Power consumption

    Area Operating frequency range

    Locking time in case of digital DCC

    Offset of duty cycle detector

    Clock Generation and DistributionChulwoo Kim 33 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    34/86

    Digital DCC

    Invert-DelayClock

    Generator

    IN

    OutPhaseMixer

    Pulse Width

    Controller

    Duty CycleDetector

    Half-CycleDelayedClock

    Generator

    Edge

    Combiner

    Out

    Out

    Invert and delay

    50% 50%

    50% 50%

    OUT

    IN

    IN

    OUT

    IN

    OUT

    HD_IN

    IN

    IN

    IN

    HD_IN

    IN

    50% 50%

    Clock Generation and DistributionChulwoo Kim 34 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    35/86

    DCC in GDDR5

    Clock Generation and Distribution

    RX

    Divider

    CML2

    CMOS

    DQPLL sel.

    CML only

    Duty Cycle

    Detector

    Adder-based

    Counter

    Duty Cycle

    CorrectorControl Pulse

    Generator

    4-phase

    4

    PLL

    Globa

    l

    Driver

    Repeat

    er

    DutyCycle

    Adjuster

    up/dns

    c

    4

    rxclk rxclkb

    sw hclk & lclk

    4 44DQ

    Clk Distribution

    clock

    Network

    Decreasing

    CML_bias

    WCK WCKb

    X1X2X4X8 X1 X2 X4 X8

    c

    Duty-Cycle

    RX

    rxclk

    rxclk

    rxclkb

    Decoder

    rxclkb

    Adjuster

    duty-cycle

    (DCA) DCA is not in clock path

    No jitter addition

    [12] D. Shin et al., VLSI 2009, pp. 138-139

    Chulwoo Kim 35 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    36/86

    DLL-related Parameters & Reference

    DDR1 DDR2VDD

    Lock time

    Max. tDQSCK

    200 cycles 200 cycles

    333MHz~800MHz

    600MHz~1.37GHz

    2~20K cycles

    2.5V

    600ps

    166MHz

    1tCK

    1.8V 1.5V/1.35V 1.8V 1.5V

    Nominalspeed

    tXPDLL*(tXARD)

    Max. tCK 12ns 8ns 3.3n 3.3n 2.5ns

    300ps 225ps 180ps 140ps

    333MHz 1.6GHz

    512 cycles 2~5K cycles

    DDR3/DDR3L GDDR3 GDDR4

    2tCK 10tCK 7tCK+tIS 9tCK+tIS

    RELATED AREA

    DCC block

    Variable

    Delay LineDelay

    Control Logic

    Replica

    Low Jitter

    REFERENCE Type

    23**141819**2022 2425*26

    23* 2613 15**16182021**

    3132*33**

    27[28]** [29] [30]

    2930** 34*35*

    3227[28** 30**

    14 [36*15**16 32*24262717**19**

    14 25* 28**

    tXPDLL*(tXARD) Timing for exit precharge power-down to any non-READ command

    Clock Generation and Distribution

    digital

    *mixed

    **analog

    131415**1617** 19**2021**18

    Chulwoo Kim 36 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    37/86

    Clock Distribution

    DQDQ DQDQ DQ DQDQ

    DQDQ DQDQ DQDQ DQDQ

    GlobalClockBuffer

    CK/CKB DQ

    Clock Distribution Issues

    Clock skew among DQs

    Low power

    Robust under PVT variations CML to CMOS converter jitter

    [37] S.-J. Bae, et al., ISSCC, 2011, pp. 498-500

    1,20

    0m

    93,750m

    Clock Generation and DistributionChulwoo Kim 37 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    38/86

    CML to CMOS Converter

    Global Clock Buffer

    Current logic mode : high-speed clock

    CML to CMOS Converter Issue

    Susceptible to noise

    Jitter

    CLKP CLKN

    OUTN

    OUTP

    Global Clock Buffer CML to CMOS Converter

    1700mDQ

    CLKP CLKN

    CLKOUT

    Clock Generation and DistributionChulwoo Kim 38 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    39/86

    Outline

    Introduction

    Clock Generation and Distribution

    Transceiver Design Channel

    Pre-emphasis

    Equalizer

    Crosstalk and skew

    Training

    Input buffer

    Output driver

    DBI/CRC

    TSV Interface for DRAM Summary

    References

    Outputdriver

    Training

    Pre-emphasis

    DBI/CRC

    Inputbuffer

    Training

    Equalizer

    DBI/CRC

    CH

    Chulwoo Kim 39 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    40/86

    Channel Characteristics

    GDDRx

    Point to point connection

    Performance target High data rate

    Few reflection components

    PCB VIAS

    DDRx

    Multidrop

    Performance and power

    Many reflection components

    PCB VIAS, DIMM connector.

    GPU

    GDDRx

    GDDRx

    DIMMS

    lot

    CPUSocket

    Transceiver DesignChulwoo Kim 40 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    41/86

    Emphasis for Channel Compensation

    Time

    Channel

    Original Signal Distorted Signal

    D(in) FFE D(out)

    FFE

    Amplitude

    Amplitude

    Amplitude

    Channel FFEChannel

    Freq.fdata/2 Freq. Freq.fdata/2 fdata/2

    Amplitu

    de

    Time

    Amplitu

    de

    Channel

    Transceiver DesignChulwoo Kim 41 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    42/86

    Pre-emphasis vs. De-emphasis

    Pre-emphasis : Transition Bit Boosting

    De-emphasis : Non-transition Bit Suppression

    1-tap pre-emphasis

    No emphasis

    1-tap de-emphasis

    Time

    Transceiver DesignChulwoo Kim 42 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    43/86

    Basic De-emphasis Circuit

    The Number of Taps

    Depends on the channel quality and bit rate

    Usually from one to three taps

    D Q

    QB

    Din

    DoutK0

    Unitdelay

    -K1

    X(n)

    Y(n)

    Transceiver DesignChulwoo Kim 43 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    44/86

    Pre-emphasis Circuit[1/2]

    Cascaded Pre-emphasis

    Internal node ISI due to limited TR performance at high speed Internal node pre-emphasis ratio would not be affected by the

    channel

    Less sensitive to the system environment or channel variations

    [38] K.-H. Kim et al., JSSC, Jan 2006, pp. 127-134

    Din(n-1)

    Din(n-2)

    Driver

    Pre-emph.

    DQ

    DQB

    Din(n)

    4:2

    4:2

    4:2

    2:1

    2:1

    2:1

    2:1

    NoPre-emphasis

    Conventional

    Pre-emphasis

    Proposed

    Pre-emphasis

    4000Time[psec]

    1.04

    1.20

    1.08

    1.201.00

    1.20

    Voltage[V]

    Transceiver DesignChulwoo Kim 44 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    45/86

    Pre-emphasis Circuit[2/2]

    [39] H. Partovi et al., ISSCC, 2009, pp.136-137

    Voltage Mode Driver Pre-emphasis Additional zero by Cc

    Time continuous pre-emphasis

    Pre-Driv

    er

    MainDriver

    Pre-Driver

    RT

    RTDin

    RC

    CCRC

    CP

    Dout

    TX

    Pre-Emph. Driver

    Boosting Capacitor

    CL

    RT

    GPU

    BW

    BW

    CH Din

    RC

    RT

    CC

    Dout

    CL

    Equivalent Linear Model

    CP RT

    Transceiver DesignChulwoo Kim 45 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    46/86

    DFE cancels ISI without noise amplificationClock must be provided by DLL or PLL

    Critical path (feedback path) is important

    (A) (B) (C) (D)

    Decision Feedback Equalization (DFE)

    Time

    Amplitude

    1UI

    Time

    Amplitud

    e

    ISI

    Time

    Amplitude

    Emulated

    ISI

    Time

    Amplitud

    e

    No ISI

    Transceiver Design

    [40] Y. Hidaka, CMOS Emerging Technologies Workshop, May 2010

    Chulwoo Kim 46 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    47/86

    [41] S.-J. Bae et al., ISSCC, 2008, pp. 278-279

    The previously captured data

    must be fed back to thereceiver within 1UI

    WCK/2_0

    DQ Vref

    WCK/2_0

    P0b P0

    WCK/2_0

    P270b P270

    WCK/2_0

    DFE SADQ

    DFE SA

    Vref

    WCK/2_0

    WCK/2_90

    DFE SA

    DFE SA

    WCK/2_180

    WCK/2_270

    SR Latch

    SR Latch

    SR Latch

    SR LatchP270

    P180

    P90

    P0 D0

    D270

    D180

    D90

    DQ

    WCK/2_270

    P270

    WCK/2_0

    P0

    Precharge Evaluation

    Precharge Evaluation

    D270 D0 D90

    TFB=TSA

  • 5/25/2018 High-Bandwidth Memory Interface Design

    48/86

    Crosstalk is coupling of energy from one line to another

    Crosstalk

    Timing Effect

    Timing Jitter

    Signal Integrity

    Near endcrosstalk

    Far endcrosstalk

    Input signal

    Input signalat far end

    Near Far

    Cm

    Near Far

    Lm

    ICm ILm

    Inear=ICm+ILmIfar

    =ICmI

    Lm

    Transceiver DesignChulwoo Kim 48 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    49/86

    Staggered Memory Bus

    No discrepancy of propagation delay due to the crosstalk

    Difference of transition point is /2

    Distance between channels with the same transition isincreased

    Jitter due to coupling from the adjacent channel is reduced

    [42] K.-I. Oh et al., JSSC, Aug. 2009, pp. 2222-2232

    MCU DRAM

    Staggered

    Memory Bus

    Channel

    Channel

    Transceiver DesignChulwoo Kim 49 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    50/86

    Compensation for glitch by adding or subtracting current

    Rise : ICOMPis added to the main driver

    Fall : ICOMPis subtracted from the main driver

    Glitch Canceller

    [42] K.-I. Oh et al., JSSC, Aug. 2009, pp. 2222-2232

    Transceiver Design

    TX1

    TransitionDetector

    DTX3

    TX3

    TX2

    IBIAS+ICOMP

    DTX1DTX2

    Rise/Fall

    Aggressor

    Victim

    DTX1

    Rise

    Fall

    DTX2

    Chulwoo Kim 50 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    51/86

    Crosstalk equalization at transmitter

    Cancel the crosstalk by the impedance calibration

    Crosstalk Equalizer (TX)

    [37] S.-J. Bae et al., ISSCC, Feb. 2011, pp. 498-500

    DO[0]

    DO[1:3]

    DQ[0]

    EN[0:5] DO[0]

    t

    DO[1]

    DQ[0]

    Crosstalk Equalizing Driver

    EN[1]

    EN[0] EN[1]

    EN[0]

    Transceiver DesignChulwoo Kim 51 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    52/86

    Skew

    Differences of flight time between signals

    Skew can cause timing errors

    Key design criterion in high-speed systems

    Transceiver Design

    MCU/GPU DRAM

    Bank

    Bank

    PeripheralCircuit

    DLL

    CMD

    Controller

    Serial

    .

    Parallel

    Generator

    TD

    TD

    CLK

    Command

    DQS

    DQ

    AddressTD

    Chulwoo Kim 52 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    53/86

    Pre/De-skew with Preamble Signal

    Skew cancellation circuit is put in each DRAM

    With estimated skew information

    De-skew the data during write mode

    Pre-skew the data during read mode[43] S. H. Wang et al., JSSC, Apr. 2001, pp. 648-657

    DataDelayLinesPLLMux

    RegisterFiles

    SkewEstimator

    Skewed Data

    Data

    Ext.Clk

    Data[n] Skew

    De-skewedData

    Sampling

    Clk

    8

    8

    3

    8

    38

    Transceiver DesignChulwoo Kim 53 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    54/86

    Fly-by Topology for DDR3

    [4] JEDEC, JESD79-3E, pp. 56-59

    Fly-by Topology

    Better signal integrity to reducethe number of stubs and stublength

    Easy to apply a singletermination at the end of signal

    DQ and DQS are applied to each

    DRAM at the same time Large skew bw. CLK and DQS

    Need to calibrate skew

    DRAM

    #1

    DRAM

    #2

    DRAM

    #7

    DRAM

    #8

    T-branch

    CLK, CMD, Address

    DRAM

    #1

    DRAM

    #2

    DRAM

    #7

    DRAM

    #8

    CLK, CMD, Address

    Skew[s]

    DRAM#1

    DRAM#2

    Skew[s]

    DRAM#3

    DRAM#4

    DRAM#5

    DRAM#6

    DRAM#7

    DRAM#8

    DRAM

    #1

    DRAM

    #2

    DRAM

    #3

    DRAM

    #4

    DRAM

    #5

    DRAM

    #6

    DRAM

    #7

    DRAM

    #8

    DQ & DQS

    Fly-by

    DQ & DQS

    VTT

    T-branch Topology

    CLK/CMD/Address are applied toeach DRAM in parallel

    Small skew bw. CLK and DQS

    Transceiver DesignChulwoo Kim 54 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    55/86

    Write Leveling for DDR3

    Write Leveling Timing mismatch compensation between CLK and DQS

    Write leveling is applied to all DRAMs, respectively

    [4] JEDEC, JESD79-3F, pp. 56-59

    T0 T1 T2 T3 T4 T5 T6 T7

    T0 T1 T2 T3 T4 T5 T6Tn

    CK#

    CK

    diff_DQS

    CK#

    CK

    diff_DQS

    DQ

    DQ

    diff_DQS

    Source

    Destination

    Push DQS to capture0-1 transition

    0 or 1

    0 or 1

    0 0 0

    1 1 1

    Transceiver DesignChulwoo Kim 55 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    56/86

    Training for GDDR5

    Adaptive Interface Training Ensure the Widest Timing Margins for All Signals

    Controlled by MCU

    [44] W. Hubert et al., ATS, 2008, pp. 24-27

    CK

    CMD

    ADDR

    WCK

    DQ

    GDDR5 Timing after Training

    Transceiver DesignChulwoo Kim 56 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    57/86

    Training Sequence for GDDR5

    Optional

    Optimize address input data eye

    Clock alignment

    Ready for read/write

    Search for best read data eye

    Detect burst boundaries of read stream

    Search for best write data eye

    Detect burst boundaries of write stream

    [45] JEDEC, JESD212, pp. 23-39

    Detect the configuration and mirror function

    ODT setting

    Transceiver Design

    Power Up

    Address Training

    WCK CKAlignment Training

    READ Training

    WRITE Training

    ExitChulwoo Kim 57 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    58/86

    Training Example : Write Training

    [44] W. Hubert et al., ATS, 2008, pp. 24-27

    t0+ t1

    Memory Controller GDDR5 Device

    Write Data eyes

    t1 t2

    Memory Controller GDDR5 Device

    WriteData eyes Data eyes

    t1t2

    t0

    t0

    t0

    t0

    Data eyes

    t0- t2

    Transceiver DesignChulwoo Kim 58 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    59/86

    Input Buffer

    Convert attenuated external signal to rail-to-rail signal

    Trade-off between high speed operation and power consumption

    Transceiver Design

    DRAMMCU/GPU

    DQS Bank

    Bank

    CLK

    Command

    DQ

    P

    eripheralCir

    cuit

    DLL

    CMD

    C

    ontroller

    Serial

    .

    Parallel

    GEN

    4

    n

    Address

    m*

    * m: The number of address channels which are depend on kinds of memory or its density

    Chulwoo Kim 59 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    60/86

    Input Buffer Comparison

    CMOS Type

    Simple circuit

    Low-speed input (CKE)

    Susceptible to noise

    Unstable threshold

    Differential Type

    Complex circuit

    High-speed input

    Robust to noise

    Stable threshold

    Commonly used

    In OUT

    En

    En

    OUT

    En En

    InVref

    En

    Transceiver DesignChulwoo Kim 60 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    61/86

    DDR4 Input Buffer

    [46] K. Sohn et al., ISSCC, 2012, pp. 38-40

    Gain Enhanced Buffer Signal transition detector is added The bias level (I) is controlled

    Sensitivity can be enhancedat higher frequencies

    Wide Common-Mode Range DQ Buffer

    Delivers stable inputs tothe second stage Amp.

    Feedback network reduces theoutput common-mode variation

    Vref In

    CMFB

    Amp.

    In

    Vref

    InBuffer

    Transition

    DetectorI

    * CMFB : Common-mode feedback

    Transceiver DesignChulwoo Kim 61 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    62/86

    Pseudo Open Drain (POD)

    Impedance Calibration

    Manual vs. Automatic

    External Resistor

    240

    Din

    Din

    Pull-UP

    Pull-DOWN

    Din

    Din

    I/O

    BufferChannel

    240

    Transceiver DesignChulwoo Kim 62 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    63/86

    Impedance Calibration

    Thermometer Code Control

    PU PUREG

    PD

    REG

    DRAMExternal

    PUcon

    PDcon

    Vref

    En

    En

    ZQPAD

    Dout

    n

    n

    WP

    R

    WN

    R

    WP

    R

    WN

    R

    WP

    R

    WN

    R

    Din

    +PUcon

    Din+

    PDcon

    [47] C. Park et al., JSSC, Apr. 2006, pp. 831-838

    Transceiver DesignChulwoo Kim 63 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    64/86

    Multi Slew-rate Output Driver

    Binary-weighted Code Control

    PU PUDF

    PD

    DF

    DRAMExternal

    PUcon

    PDcon

    Vref

    En

    En

    DF = Digital LPF + UP/DOWN Counter

    ZQPAD

    Dout

    WP/4 WP/2 WP 32WP

    128R 64R 32R R

    WN/4 WN/2 WN 32WN

    128R 64R 32R R

    60

    120240

    n

    n

    Din

    +PUcon

    Din+

    PDcon

    [48] D. U. Lee et al., ISSCC, 2008, pp. 280-613

    Transceiver DesignChulwoo Kim 64 of 86

  • 5/25/2018 High-Bandwidth Memory Interface Design

    65/86

    Global ZQ Calibration

    Global Impedance Mismatch Error < 1%

    PVT variation sensor

    LS

    PA

    CP

    LO

    Ref.

    ZZcal

    i0cal

    (-)

    i0cal

    ODT

    calibrati

    on

    block

    atZQ

    p

    in

    Zcal

    DQ0ZQ

    LS

    PA

    CP

    LO

    Ref.

    CP: ComparatorPA: Pre-amplifierLS: Local PVT sensor

    LO: Local controller

    i0cal

    DQn (n=1~31) Z

    Global Reference Signal

    [49] J. Koo et al., CICC, 2009, pp. 717-720

    Transceiver DesignChulwoo Kim 65 of 86

    i ( )

  • 5/25/2018 High-Bandwidth Memory Interface Design

    66/86

    Data Bus Inversion (DBI)

    Power reduction technique independent of data pattern

    Dominant power (I/O Buffer)

    P= X CPCB X VDD2 < 0.5 For high-BW memory, inversion time +CRC can be a bottle

    neck

    [50] S.-S. Yoon et al., ASSCC 2008, pp.249-252

    Transceiver DesignChulwoo Kim 66 of 86

    C li R d d Ch k (CRC)

  • 5/25/2018 High-Bandwidth Memory Interface Design

    67/86

    Cyclic Redundancy Check (CRC)

    Data error check for every unit interval (64 bits data only) Redundancy bit : 1 bit/byte

    Speed bottleneck for high-BW Time (READ DBI + READ CRC + CRC calculator) < 9 periods

    [50] S.-S. Yoon et al., ASSCC 2008, pp.249-252

    Transceiver Design

    Error type Detection rate

    random single bit 100%

    random double bit 100%

    random odd count 100%

    burst 8 100%

    Chulwoo Kim 67 of 86

    CRC ( td)

  • 5/25/2018 High-Bandwidth Memory Interface Design

    68/86

    CRC (contd)

    X8+X2+X1+1 with an initial value of 0 Algorithm for GDDR5 ATM-0M83

    Logic for algorithm takes a long time

    To increase CRC speed XOR logic optimization

    CRC calculation time < TCRC

    Transceiver DesignChulwoo Kim 68 of 86

    O tli

  • 5/25/2018 High-Bandwidth Memory Interface Design

    69/86

    Outline

    Introduction

    Clock Generation and Distribution

    Transceiver Design

    TSV Interface for DRAM

    Bandwidth requirement

    DRAM with TSV

    TSV DRAM type

    DRAM stacking type

    Data confliction issue & solution

    Failed TSV issue & solution

    Summary

    References

    Chulwoo Kim 69 of 86

    B d idth R i t

  • 5/25/2018 High-Bandwidth Memory Interface Design

    70/86

    Bandwidth Requirements

    Requirement

    Next GDDR will require over 10Gb/s/pin data rate

    Restrictions Very difficult over 10Gb/s/pin

    Cost for performance improvements

    Power consumption

    2000 2005 2010 2010

    2

    4

    6

    8

    10

    12

    DDRDDR2DDR3DDR4GDDR3GDDR4GDDR5

    DataR

    ate/Pin

    [Gbps]

    DDRx / GDDRx Data Rate/Pin Trend

    Gb/s/pinGb/s/chipGDDR1 32 1

    GDDR3 51.2 1.6

    GDDR4 102.4 3.2

    GDDR5 224 7GDDR? 448 (?) 14 (?)

    TSV Interface for DRAMChulwoo Kim 70 of 86

    DRAM ith TSV

  • 5/25/2018 High-Bandwidth Memory Interface Design

    71/86

    DRAM with TSV

    Advantages of DRAM with TSV

    Higher density per area

    Shorter interconnection : lower power, faster flight time

    Higher bandwidth with wide I/O

    Wide I/O easily achieves 448 Gb/s/chip at next GDDR

    (Example : 800 Mb/s/pin 512 I/O 448 Gb/s/chip)

    MCU/GPU

    Wide I/OMemory

    TSV

    MCU/GPU

    Memory

    Memory

    Memory

    Memory Interposer

    TSV Interface for DRAMChulwoo Kim 71 of 86

    TSV DRAM T

  • 5/25/2018 High-Bandwidth Memory Interface Design

    72/86

    TSV DRAM Type

    Type Main Memory Mobile Graphics

    Architecture

    No. of TSV 500~1000 EA 1000~1500 EA 2000~3000 EA

    Feature Low power High speed

    Low power Multi channel Wide I/O

    Max bandwidth Multi channel

    Package

    GPU

    Controller Interposer

    TSV Interface for DRAMChulwoo Kim 72 of 86

    St ki T

  • 5/25/2018 High-Bandwidth Memory Interface Design

    73/86

    Stacking Type

    Type Homogeneous Heterogeneous

    Architecture

    Feature Same chips Low cost

    Slave : only cells Master : with peripheral

    Slave

    Slave

    SlaveMaster

    TSV Interface for DRAMChulwoo Kim 73 of 86

    D t C fli ti I

  • 5/25/2018 High-Bandwidth Memory Interface Design

    74/86

    Data Confliction Issue

    PVT variations cause the data skew Data Confliction increases the short current

    DQ DQ DQ DQ DQ DQ

    DQ DQ DQ DQ

    Data Confliction

    Slowest Chip Fastest Chip

    PVT Variations

    [51] H.-W. Lee et al., ISSCC, 2012, pp. 48-50

    TSV Interface for DRAM

    DQ of

    CHIP 0

    MN0

    MP0

    EN0

    /EN0

    MN3

    MP3

    EN3

    /EN3

    DQ of

    CHIP 3

    HIGH

    LOW

    DQ

    Pin

    TSV

    Chulwoo Kim 74 of 86

    Separate Data B s per Gro p

  • 5/25/2018 High-Bandwidth Memory Interface Design

    75/86

    Rank 0

    Group A

    Bank Bank

    Bank Bank

    Group B

    TSV array TSV array

    Bank Bank

    Bank BankRank 1

    Group A

    Bank Bank

    Bank Bank

    Group B

    TSV array TSV array

    Bank Bank

    Bank BankRank 2

    Group A

    Bank Bank

    Bank Bank

    Group B

    TSV array TSV array

    Bank Bank

    Bank Bank

    Separate Data Bus per Group

    Separate Data Bus per Bank Group Less dependent on the PVT variation

    Rank 3

    Group A

    Bank Bank

    Bank Bank

    Group B

    TSV array TSV array

    Bank Bank

    Bank Bank

    [52] U. Kang et al., ISSCC, 2009, pp. 130-131

    TSV Interface for DRAMChulwoo Kim 75 of 86

    DLL Based Self Aligner

  • 5/25/2018 High-Bandwidth Memory Interface Design

    76/86

    DLL-Based Self-Aligner

    Data alignment to external clock or clock of the slowestchip

    [51] H.-W. Lee et al., ISSCC, 2012, pp. 48-50

    TSV Interface for DRAMChulwoo Kim

    SkewDetector

    SkewCompensator

    FineAligner

    Replica

    UP/DN

    TSV

    Model

    READ

    READb

    REAL PATH0

    1

    0

    1

    CK

    TRCLK

    RFBCLK

    C_CLK

    CLKOUT

    CHIP 1

    CHIP 2

    CHIP 3

    CHIP 0

    MODE

    TFBCLK

    PINDQS or

    Dummy PinTSV model

    PipelatchesPipe

    latchesLatches

    Datas AlignedDatas

    SAMMODE

    PD1

    PD2

    76 of 86

    Failed TSV Issue

  • 5/25/2018 High-Bandwidth Memory Interface Design

    77/86

    Failed TSV Issue

    a. TSV plating defect b. pinch-off

    Decreasing the assembly yield

    Increasing the total cost

    Failed TSV

    [53] D. Malta et al., ECTC, 2010, pp. 1779-1775

    TSV Interface for DRAMChulwoo Kim 77 of 86

    TSV Check

  • 5/25/2018 High-Bandwidth Memory Interface Design

    78/86

    TSV Check

    A TSV connectivity check by using the internal circuit

    Test Signal Generating Circuits

    Scan Chain Based Testing Circuits

    T

    SV_

    0

    T

    SV_

    1

    T

    SV_

    2

    T

    SV_

    3

    T

    SV_

    4

    In_0 In_1 In_2 In_3 In_4

    Out_0 Out_1 Out_2 Out_3 Out_4

    Receiver End

    Sender End

    [54] A.-C. Hsieh et al., TVLSI, Apr. 2012, pp. 711-722

    TSV Interface for DRAMChulwoo Kim 78 of 86

    TSV Repair

  • 5/25/2018 High-Bandwidth Memory Interface Design

    79/86

    Redundant TSVs for Failed TSV

    Conventional : redundant TSVs are dedicated and fixed Proposed : failed TSV is repaired with a neighboring TSV

    TSV Repair

    Chip1

    Conventional

    Chip2

    A

    B

    C

    D

    A

    B

    C

    D

    a

    b

    r2

    r1

    c

    d

    Chip1

    Proposed

    Chip2

    B

    C

    D

    A

    B

    C

    D

    a

    b

    c

    d

    e

    f

    A

    [52] U. Kang et al., ISSCC, 2009, pp. 130-131

    TSV Interface for DRAMChulwoo Kim 79 of 86

    Outline

  • 5/25/2018 High-Bandwidth Memory Interface Design

    80/86

    Outline

    Introduction

    Clock Generation and Distribution

    Transceiver Design

    TSV Interface for DRAM

    Summary

    References

    Chulwoo Kim 80 of 86

    Summary

  • 5/25/2018 High-Bandwidth Memory Interface Design

    81/86

    Summary

    Although all types of DRAMs are reaching their limits in

    supply voltage, the demand of high-bandwidth memoryis keep increasing

    For synchronization of external clock and output ofDRAM, low power, small area, and low skew are

    important design parameters

    To achieve high-BW memory, many design techniqueshave been and will be adopted from other high-speedwireline transceivers

    TSV interface for DRAM might be a good solution toachieve high bandwidth and low power

    SummaryChulwoo Kim 81 of 86

    Suggested Papers to See

  • 5/25/2018 High-Bandwidth Memory Interface Design

    82/86

    Suggested Papers to See

    17.1 A 6.4Gb/s near-ground single-ended transceiver

    for dual-rank DIMM memory interface systems

    17.2 A 27% reduction in transceiver power for single-ended point-to-point DRAM interface with thetermination resistance of 4Z0at both TX and RX

    17.3 A 5.7mW/Gb/s 24-to-2401.6Gb/s thin-oxideDDR transmitter with 1.9-to-7.6V/ns clock-featheringslew-rate control in 22nm CMOS

    17.4 An adaptive-bandwidth PLL for avoiding noiseinterference and DFE-less fast precharge sampling forover 10Gb/s/pin graphics DRAM interface

    Chulwoo Kim 82 of 86

    References

  • 5/25/2018 High-Bandwidth Memory Interface Design

    83/86

    References[1] K. Koo et al., A 1.2V 38nm 2.4Gb/s/pin 2Gb DDR4 SDRAM with bank group and 4 half-page architecture,in IEEE ISSCC Dig. Tech. Papers, pp. 4041, 2012.

    [2] JEDEC, JESD79F.

    [3] JEDEC, JESD79-2F.

    [4] JEDEC, JESD79-3F.

    [5] JEDEC, JESD79-4.

    [6] T.-Y. Oh et al., A 7Gb/s/pin GDDR5 SDRAM with 2.5ns bank-to-bank active time and no bank-grouprestriction, in IEEE ISSCC Dig. Tech. Papers, pp. 434435, 2010.

    [7] H.-W. Lee et al., Survey and analysis of delay-locked loops used in DRAM interfaces, submitted to IEEETrans. VLSI Syst.

    [8] T. Saeki et al., A 2.5 ns clock access 250 MHz 256 Mb SDRAM with a synchronous mirror delay, in IEEE

    ISSCC Dig. Tech. Papers, pp. 374-375, 1996.[9] A. Hatakeyama et al., A 256 Mb SDRAM using a register-controlled digital DLL, in IEEE ISSCC Dig. Tech.Papers, pp. 72-73, 1997.

    [10] J.-T. Kwak et al., A low cost high performance register-controlled digital DLL for 1Gbps x32 DDR SDRAM,in IEEESymp. VLSI CircuitsDig. Tech. Papers, pp. 283-284, 2003.

    [11] H.-W. Lee et al., A 1.6V 1.4Gb/s/pin consumer DRAM with self-dynamic voltage-scaling technique in 44nmCMOS technology, in IEEE ISSCC Dig. Tech. Papers, pp. 502-504, 2011.

    [12] D. Shin et al., Wide-range fast-lock duty-cycle corrector with offset-tolerant duty-cycle detection schemefor 54nm 7Gb/s GDDR5 DRAM interface, in IEEESymp. VLSI CircuitsDig. Tech. Papers, pp. 138-139, 2009.

    [13] W.-J. Yun et al., A 3.57 Gb/s/pin low jitter all-digital DLL with dual DCC circuit for GDDR3 DRAM in 54-nmCMOS technology, IEEE Trans. VLSI Sys., vol. 19, no. 9, pp. 1718-1722, Nov. 2011.

    [14] H.W. Lee et al.,A 7.7mW/1.0ns/1.35V delay locked loop with racing mode and OA-DCC for DRAMinterface, in Proc. of Int. Symp. Circuits and Syst., pp. 3861-3864, 2010.

    [15] B.-G. Kim et al., A DLL with jitter reduction techniques and quadrature phase generation for DRAMinterfaces, IEEE J. Solid-State Circuits, vol. 44, no. 5, pp. 1522-1530, May 2009.

    ReferencesChulwoo Kim 83 of 86

    References

  • 5/25/2018 High-Bandwidth Memory Interface Design

    84/86

    References[16] W.J. Yunet al., A 0.1-to-1.5GHz 4.2mW all-digital DLL with dual duty-cycle correction circuit and updategear circuit for DRAM in 66nm CMOS Technology, inIEEE ISSCC Dig. Tech. Papers, pp. 282-283, 2008.

    [17] S. Kimet al., A low jitter, fast recoverable, fully analog DLL using tracking ADC for high speed and low

    stand-by power DDR I/O interface in IEEESymp. VLSI CircuitsDig. Tech. Papers, pp. 285-286, 2003.

    [18] T. Matanoet al., A 1-Gb/s/pin 512-Mb DDRII SDRAM using a digital DLL and a slew-rate-controlled outputbuffer, IEEE J. Solid-State Circuits, vol. 38, no. 5, pp. 762-768, May 2003.

    [19] K.-H. Kimet al., Built-in duty cycle corrector using coded phase blending scheme for DDR/DDR2synchronous DRAM application in IEEESymp. VLSI CircuitsDig. Tech. Papers, pp. 287-288, 2003.

    [20] J.-T. Kwaket al., A low cost high performance register-controlled digital DLL for 1 Gbps x32 DDR SDRAMin IEEESymp. VLSI CircuitsDig. Tech. Papers, pp. 283-284,2003.

    [21] O. Okudaet al., A 66-400 MHz, adaptive-lock-mode DLL circuit with duty-cycle error correction [for

    SDRAMs] in IEEESymp. VLSI CircuitsDig. Tech. Papers, pp. 37-38, 2001.[22] F. Lin et al.,A wide-range mixed-mode DLL for a combination 512 Mb 2.0 Gb/s/pin GDDR3 and 2.5Gb/s/pin GDDR4 SDRAM, IEEE J. Solid-State Circuits, vol. 43, no. 3, pp. 631-641, Mar. 2008.

    [23] K.-W. Kim et al., A 1.5-V 3.2 Gb/s/pin Graphic DDR4 SDRAM With dual-clock system, four-phase inputstrobing, and low-jitter fully analog DLL, IEEE J. Solid-State Circuits, vol. 42, no. 11, pp. 2369-2377, Nov. 2007.

    [24] D.U. Lee et al., A 2.5Gb/s/pin 256Mb GDDR3 SDRAM with series pipelined CAS latency control and dual-loop digital DLL, in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 547-548, 2006.

    [25] S.J. Bae et al., A 3Gb/s 8b single-ended transceiver for 4-drop DRAM interface with digital calibration ofequalization skew and offset coefficients, in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 520-521,

    2005.[26] Y.-J. Jeon et al., A 66-333-MHz 12-mW register-controlled DLL with a single delay line and adaptive-duty-cycle clock dividers for production DDR SDRAMs, IEEE J. Solid-State Circuits, vol. 39, no. 11, pp. 2087-2092,Nov. 2004.

    [27] T. Hamamoto et al., A 667-Mb/s operating digital DLL architecture for 512-Mb DDR, IEEE J. Solid-StateCircuits, vol. 39, no. 1, pp. 194-206, Jan. 2004.

    ReferencesChulwoo Kim 84 of 86

    References

  • 5/25/2018 High-Bandwidth Memory Interface Design

    85/86

    References[28] S. Kim et al., A low-jitter wide-range skew-calibrated dual-loop DLL using antifuse circuitry for high-speedDRAM, IEEE J. Solid-State Circuits, vol. 37, no. 6, pp. 726-734, Jun. 2002.

    [29] J.B. Lee et al., Digitally-controlled DLL and I/O circuits for 500 Mb/s/pin x16 DDR SDRAM, in IEEE ISSCC

    Dig. Tech. Papers, pp. 68-69, 2001.[30] S. Kuge et al., A 0.18um 256-Mb DDR-SDRAM with low-cost post-mold tuning method for DLL replica,

    IEEE J. Solid-State Circuits, vol. 35, no. 11, pp. 726-734, Nov. 2000.

    [31] H.W. Lee et al., A 1.6V 1.4Gb/s/pin consumer DRAM with self-dynamic voltage-scaling technique in 44nmCMOS technology, IEEE J. Solid-State Circuits. vol. 47, no. 1, pp. 131-140, Jan. 2012.

    [32] Y. K. Kim et al., A 1.5V, 1.6Gb/s/pin, 1Gb DDR3 SDRAM with an address queuing scheme and bang-bangjitter reduced DLL scheme in IEEE Symp. VLSI Dig. Tech. Papers, pp. 182-183, 2007.

    [33] K.H. Kim et al., A 1.4 Gb/s DLL using 2nd order charge-pump scheme with low phase/duty error for high-speed DRAM application, in IEEE ISSCC Dig. Tech. Papers, pp. 213-214, 2004.

    [34] J.H. Lee et al., A 330 MHz low-jitter and fast-locking direct skew compensation DLL, in IEEE ISSCC Dig.Tech. Papers, pp. 352-353, 2000.

    [35] J. Kim et al., A low-jitter mixed-mode DLL for high-speed DRAM applications, IEEE J. Solid-State Circuits,vol. 35, no. 10, pp. 1430-1436, Oct. 2000.

    [36] H.W. Lee et al., A 1.6V 3.3Gb/s GDDR3 DRAM with dual-mode phase- and delay-locked loop using power-noise management with unregulated power supply in 54nm CMOS, in IEEE ISSCC Dig. Tech. Papers, 2009, pp.140-141.

    [37] S.-J. Bae et al., A 40nm 2Gb 7Gb/s/pin GDDR5 SDRAM with a Programmable DQ Ordering CrosstalkEqualizer and Adjustable clock-Tracing BW, in IEEE ISSCC Dig. Tech. Papers, pp. 498-500, 2011.

    [38] K.-h. Kim et al., A 20-Gb/s 256-Mb DRAM with an inductorless quadrature PLL and a cascaded pre-emphasis transmitter, IEEE J. Solid-State Circuits, vol.41, no. 1, pp. 127-134, Jan. 2006.

    [39] H. Partovi et al., Single-ended transceiver design techniques for 5.33Gb/s graphics applications, in IEEEISSCC Dig. Tech. Papers, pp. 136-137, 2009.

    [40] Y. Hidaka, Sign-based-Zero-Forcing Adaptive Equalizer Control, in CMOS Emerging TechnologiesWorkshop, May 2010.

    ReferencesChulwoo Kim 85 of 86

    References

  • 5/25/2018 High-Bandwidth Memory Interface Design

    86/86

    References[41] S.-J. Bae et al., A 60nm 6Gb/s/pin GDDR5 graphics DRAM with multifaceted clocking and ISI/SSN-reduction techniques, in IEEE ISSCC Dig. Tech. Papers, pp. 278-279, 2008.

    [42] K.-I. Oh et al., A 5-Gb/s/pin transceiver for DDR memory interface with a crosstalk suppression scheme,

    IEEE J. Solid-State Circuits, vol. 44, no. 8, pp. 2222-2232, Aug. 2009.[43] S. H. Wang et al., A 500-Mb/s quadruple data rate SDRAM interface using a skew cancellation technique,

    IEEE J. Solid-State Circuits, vol. 36, no. 4, pp. 648-657, Apr. 2001.

    [44] W. Hubert et al., GDDR5 training-challenges and solution for ATE-based test,inAsian Test Symposium,pp. 24-27, Nov. 2008.

    [45] JEDEC, JESD212.

    [46] K. Sohn et al., A 1.2V 30nm 3.2Gb/s/pin 4Gb DDR4 SDRAM with dual-error detection and PVT-tolerantdata-fetch scheme, in IEEE ISSCC Dig. Tech. Papers, pp. 38-40, 2012.

    [47] C. Park et al., A 512-mb DDR3 SDRAM prototype with CIO minimization and self-calibration techniques,IEEE J. Solid-State Circuits, vol. 41, no. 4, pp. 831-838, Apr. 2006.

    [48] D. Lee et al., Multi-slew-rate output driver and optimized impedance-calibration circuit for 66nm3.0Gb/s/pin DRAM interface, in IEEE ISSCC Dig. Tech. Papers, pp. 280-613, 2008.

    [49] J. Koo et al., Small-area high-accuracy ODT/OCD by calibration of global on-chip for 512M GDDR5application, in Proc. IEEE CICC, pp. 717-720, Sep. 2009.

    [50] S.-S. Yoon et al., "A fast GDDR5 read CRC calculation circuit with read DBI operation," IEEE Asian Solid-State Circuits Conference, pp. 249-252, 2008

    [51] H.-W. Lee et al., A 283.2W 800Mbp/s/pin DLL-based data self-aligner for through silicon via (TSV)

    interface, in IEEE ISSCC Dig. Tech. Papers, pp. 48-50, 2012.[52] U. Kang et al., 8Gb 3D DDR3 DRAM using through-silicon-via technology, in IEEE ISSCC Dig. Tech. Papers,pp. 130-131, 2009.

    [53] D. Malta et al., Integrated process for defect-free copper plating and chemical-mechanical polishing ofthrough-silicon vias for 3D interconnects, in ECTC, pp. 1769-1775, 2010.

    [54] A.-C. Hsieh et al., TSV redundancy: architecture and design issues in 3-D IC, IEEE Trans. VLSI Systems,pp. 711-722, Apr. 2012.