High-Bandwidth Memory Interface Design -...

86
High-Bandwidth Memory Interface Design Chulwoo Kim [email protected] Dept. of Electrical Engineering Korea University, Seoul, Korea February 17, 2013 Chulwoo Kim 1 of 86

Transcript of High-Bandwidth Memory Interface Design -...

  • High-Bandwidth Memory Interface Design

    Chulwoo Kim

    [email protected] Dept. of Electrical Engineering Korea University, Seoul, Korea

    February 17, 2013

    Chulwoo Kim 1 of 86

  • Outline

    Introduction

    Clock Generation and Distribution

    Transceiver Design

    TSV Interface for DRAM

    Summary

    References

    Chulwoo Kim 2 of 86

  • Outline

    Introduction DRAM 101

    Simplified DRAM Architecture and Operation

    Differences of DRAM (DDRx, GDDRx, LPDDRx)

    Trend

    Memory Interface: Differences and Issues

    Clock Generation and Distribution

    Transceiver Design

    TSV Interface for DRAM

    Summary

    References

    Chulwoo Kim 3 of 86

  • D D D D D D D D

    CLK

    DQ

    SDRAM

    SDR Single Data Rate

    DDR

    Double Data Rate

    Main Memory DDRx PC, Notebook, Server

    Graphics Memory GDDRx Graphic Card, Console

    Mobile Memory LPDDRx Phone, Tablet PC

    CLK

    DQ D

    CLK

    DQ D D

    CLK

    Command C CAS* Latency

    Burst Length

    MCU

    SDRAM

    DRAM 101

    Synchronous Dynamic Random Access Memory

    Introduction

    CLK &

    Command Data

    *CAS : Column Address Strobe

    Chulwoo Kim 4 of 86

  • DRAM DDR4 Die Photo

    [1] K. B. Koo et al., ISSCC 2012, pp. 40-41

    Bank

    0

    Bank

    1

    Bank

    2

    Bank

    3

    Bank

    8

    Bank

    9

    Bank

    10

    Bank

    11

    Bank

    4

    Bank

    5

    Bank

    6

    Bank

    7

    Bank

    12

    Bank

    13

    Bank

    14

    Bank

    15

    Supply Voltage VDD=1.2V, VPP=2.5V

    Process 38nm CMOS /3-metal

    Banks 4-Bank Group, 16 Bank

    Data Rate 2400 Mbps

    Number of IO‟s X4 / X8

    Introduction Chulwoo Kim 5 of 86

  • Bank

    Simplified DRAM Architecture

    Bank

    Peripheral Circuit

    Cell Array

    Column Repair Fuse

    Write Drv. / Read Amp. Column Decoder

    Row

    Repair F

    use

    Row

    Decoder

    Word

    Lin

    e D

    river

    CLK/ADD/CMD Buffer

    CMD Controller

    DLL

    Genera

    tor

    BLSA*

    BLT BLB

    WL

    ICLK DCLK

    DQ TX

    Serial to parallel

    Parallel to serial

    DQ RX

    Bank Bank

    * BLSA : Bit line sense amplifier

    Introduction Chulwoo Kim 6 of 86

  • Concept of DRAM operation

    Bank Bank

    Bank Bank

    *BLSA : Bit line sense amplifier

    *Np: Number of pre-fetch

    *Ndq: Number of DQ

    Peripheral Circuit

    GIO

    Ndq bits Ndq bits

    WRITE

    : Serial to parallel

    (DQ GIO)

    READ

    : Parallel to serial

    (GIO DQ)

    DQ RX DQ TX

    Serial to parallel

    Parallel to serial

    BLSA BLSA Np×Ndq

    Np×Ndq bits

    *GIO : Global I/O

    Introduction Chulwoo Kim 7 of 86

  • tCCD*=1

    RD

    RD

    GIO GIO GIO

    Pre-fetch Timing(DDR1,BL*=2)

    0

    [2] JEDEC, JESD79F, pp. 24-29

    1 0 1

    DQS

    DQ

    CLK

    Number of GIO channel=Np×Ndq=2×8=16 (DDR1 x8)

    After CL*

    * tCCD : CAS to CAS delay * CL : CAS latency

    * BL : Burst length

    Introduction

    BL*=2

    Chulwoo Kim 8 of 86

  • Pre-fetch Diagram(DDR1)

    Num. of GIO channel = 2×Ndq

    Pre-fetch operation 2-bit pre-fetch

    [2×Ndq] data access

    (If the output data rate is 400Mbps, the internal data rate is 200Mbps)

    Bank Bank Bank Bank

    Bank Bank Bank Bank

    Introduction Chulwoo Kim 9 of 86

  • tCCD=2

    RD

    RD

    GIO GIO GIO

    Pre-fetch Timing(DDR2,BL=4)

    [3] JEDEC, JESD79-2F, pp. 35

    0 1 2 3 0 1 2 3

    DQS

    DQ

    CLK

    Number of GIO channel=Np×Ndq=4×8=32 (DDR2 x8)

    * RL : READ latency

    After RL*

    Introduction

    BL=4

    Chulwoo Kim 10 of 86

  • Pre-fetch Diagram(DDR2)

    Num. of GIO channel = 4×Ndq

    Pre-fetch operation 4-bit pre-fetch

    [4×Ndq] data access

    (If the output data rate is 800Mbps, the internal data rate is 200Mbps, same as DDR1)

    Bank Bank Bank Bank

    Bank Bank Bank Bank

    Introduction Chulwoo Kim 11 of 86

  • tCCD=4

    RD

    RD

    GIO GIO GIO

    Pre-fetch Timing(DDR3,BL=8)

    [4] JEDEC, JESD79-3F, pp. 62

    0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

    DQS

    DQ

    CLK

    Number of GIO channel=Np×Ndq=8×8=64 (DDR3 x8)

    After RL

    Introduction

    BL=8

    Chulwoo Kim 12 of 86

  • Pre-fetch Diagram(DDR3)

    Num. of GIO channel = 8×Ndq

    Pre-fetch operation 8-bit pre-fetch

    [8×Ndq] data access

    (If the output data rate is 1.6Gbps, the internal data rate is 200Mbps, same as DDR1)

    Bank Bank Bank Bank

    Bank Bank Bank Bank

    Introduction Chulwoo Kim 13 of 86

  • [5] JEDEC, JESD79-4, pp. 77-78 [6] T. Y. Oh et al., ISSCC 2010, pp. 434-435

    Bank Grouping Timing(DDR4,BL=8)

    0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

    DQS

    DQ

    tCCD_S=4 tCCD_L=5

    RD G0

    RD G1

    RD G1

    GIO_BG0

    GIO_BG1 GIO_BG1

    GIO_BG0

    GIO_BG1

    GIO_BG2

    GIO_BG3

    CLK

    Number of GIO channel=Np×Ndq×Ngroup=8×8×4 = 256(DDR4 x8)

    After RL

    Introduction

    BL=8

    Chulwoo Kim 14 of 86

  • GIO MUX

    [1] K. B. Koo et al., ISSCC 2012, pp. 40-41

    Pre-fetch & Bank Grouping(DDR4)

    Num. of GIO channel = 8×Ndq

    Bank Bank Bank Bank

    Bank Bank Bank Bank

    Group0 Group1

    Group2 Group3

    Pre-fetch operation 8-bit pre-fetch

    Bank grouping

    Introduction Chulwoo Kim 15 of 86

  • DDRx GDDRx LPDDRx

    Architecture

    Application PC/Server Graphic card Mobile/Consumer

    Socket DIMM On board MCP*/PoP*/SiP*

    IO ×4/×8 ×16/×32 ×16/×32

    Unique Function

    Single uni-directional WDQS, RDQS VDDQ termination CRC, DBI ABI

    No DLL DPD* PASR* TCSR*

    Differences of DDRx,GDDRx,LPDDRx

    Bank

    PAD

    Bank

    Bank Bank PAD

    Bank Bank

    Bank Bank

    PAD Bank

    PAD

    Bank

    Bank Bank

    * MCP: Multi chip package * PoP : Package on package * SiP : System in package

    * DPD: Deep power down * PASR : Partial array self refresh * TCSR : Temperature compensated self refresh

    Introduction Chulwoo Kim 16 of 86

  • DDR Comparison

    DDR1 DDR2 DDR3 DDR4

    VDD [V] 2.5 1.8 1.5 1.2

    Data Rate [bps/pin]

    200M~400M 400M~800M 800M~2.1G 1.6G~3.2G

    Pre-Fetch 2 bit 4 bit 8 bit 8 bit

    STROBE Single DQS Differential DQS, DQSB

    Interface SSTL_2 SSTL_18 SSTL_15 POD_12

    New Feature

    OCD calibration ODT

    Dynamic ODT ZQ calibration Write leveling

    CA parity DBI*, CRC* Gear down CAL* ▪ PDA* FGREF * ▪ TCAR* Bank grouping

    * DBI: Data bus inversion * CRC: Cyclic redundancy check * CAL: Command address latency

    * PDA: Per DRAM addressability * FGREF: Fine granularity refresh * TCAR: Temperature controlled array refresh

    Introduction Chulwoo Kim 17 of 86

  • GDDR Comparison

    GDDR1 gDDR2 GDDR3 GDDR4 GDDR5

    VDD [V] 2.5 1.8 1.5 1.5 1.5/1.35

    Data Rate [bps/pin]

    300~900M 800M~1G 700M~2.6G 2.0G~3.0G 3.6G~7.0G

    Pre-Fetch 2 bit 4 bit 4 bit 8 bit 8 bit

    STROBE Single DQS Differential Bi-direction

    DQS*, DQSB Single Uni-direction WDQS, RDQS

    Interface SSTL_2 SSTL_2 POD-18 POD-15 POD-15

    New Feature

    OCD* calibration ODT*

    ZQ DBI Parity(opt)

    No DLL PLL(option) WCK, WCKB

    CRC ▪ ABI* RDQS(option) Bank grouping

    * DQS: DQ strobe signal, DQ is dada I/O Pin * OCD: Off chip driver

    * ODT: On die termination * ABI: Address bus inversion

    Introduction Chulwoo Kim 18 of 86

  • LPDDR Comparison

    LPDDR1 LPDDR2 LPDDR3

    VDD [V] 1.8 1.2 1.2

    Data Rate [bps/pin]

    200M~400M 200M~1066M 333M~1600M

    Pre-Fetch 2 bit 4 bit 8 bit

    STROBE DQS DQS_T, DQS_C DQS_T, DQS_C

    Interface SSTL_18* HSUL_12* HSUL_12*

    DLL X X X

    New Feature

    CA pin ODT (High tapped termination)

    * SSTL: Stub series terminated logic * HSUL: High speed un-terminated logic

    Introduction Chulwoo Kim 19 of 86

  • Trend

    2.5

    1.5

    1.8

    0.2 0.4 0.8 1.2 1.6 2.0

    1.2

    2.4

    DDR1

    GDDR1

    7.0

    Although all types of DRAMs are reaching their limits in supply voltage, the demand of high-bandwidth memory is keep increasing

    DDR2 GDDR3

    DDR4 LPDDR2

    LPDDR3

    … 2.8 3.2 3.6

    VD

    D [

    V]

    Data Rate [Gbps]

    LPDDR1

    DDR3

    gDDR2

    GDDR4 GDDR5

    Introduction Chulwoo Kim 20 of 86

  • Memory Interface

    System Feature Single-ended/high speed

    Many channel (weak for coupling effect)

    DDR: multi-drop (multi rank, multi DIMM) GDDR: point to point

    Impedance discontinuities (stubs, connector, via, etc. )

    Issue Reflection

    Inter-symbol interference

    Simultaneous switching output noise

    Pin to pin skew

    Poor transistor performance

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    CPU

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    DRAM

    GPU DRAM DRAM

    DRAM DRAM

    DRAM

    DRAM

    Introduction Chulwoo Kim 21 of 86

  • Outline

    Introduction

    Clock Generation and Distribution Delay-locked loop (DLL)

    Duty cycle corrector (DCC)

    Clock distribution

    Transceiver Design

    TSV

    Conclusions

    References

    Chulwoo Kim 22 of 86

  • Basic DLL Architecture

    Variable

    Delay Line Replica Delay

    Controller PD

    DRAM External

    Clock

    Data

    tD1 tDREP tDVDL

    I_CLK

    FB_CLK

    O_CLK

    I_CLK

    FB_CLK

    O_CLK

    Clock

    Data

    tD2

    DATA from memory core

    Clock Generation and Distribution

    tD1

    tD2

    tDREP

    tCK ∙ N = tDVDL +tDREP

    tDREP ≈ tD1 +tD2

    tCK ∙ N = tDVDL +tD1 +tD2 + γ

    γ = tDREP – (tD1 +tD2)

    tDVDL

    Chulwoo Kim 23 of 86

  • Replica Delay Mismatch

    Valid

    Data

    Window

    tCK

    tDQSCK* (or tAC)

    Long

    Short

    VD

    D H

    VD

    D

    LVD

    D

    tDQSCK (or tAC) tDQSCK (or tAC) V

    DD

    H

    VD

    D

    LVD

    D

    Valid

    Data

    Window

    Valid

    Data

    Window

    γ variation [ps]

    Supply Voltage [V]

    *tDQSCK (or tAC) – DQS output access time for CK/CKb

    Clock Generation and Distribution

    γ ≈0

    γ >0

    γ

  • Locking Range Considerations

    [7] H.-W. Lee et al., submitted to TVLSI

    tCK

    tDQSCK (or tAC)

    Bird’s beak

    I_CLK

    I_CLK

    FB_CLK

    FB_CLK

    tDINIT+tDREP tDREQUIRED

    Clock Generation and Distribution

    tDINIT+tDREP tDREQUIRED

    tDINIT = tDVDL(0) + tDREP

    Chulwoo Kim 25 of 86

    Short

    Long

    N×tCK > tDVDL(0) + tDREP

    tCK = tDVDL + tDREP + t∆

  • Delay Measure Delay Line

    Replicate Delay Line

    Clock

    OUT

    tD1

    tD2

    tD1+tD2 tD3

    Synchronous Mirror Delay (SMD)

    Basic Operation

    Measure and replicate the delay

    No feedback

    Match delay in two cycles

    tD1

    tD1+tD2

    tD3 tD3 tD2

    OUT

    I_CLK

    Clock

    Replicate Measure

    Replica

    Delay

    [8] T. Saeki et al., ISSCC 1996, pp. 374-375

    Clock Generation and Distribution

    I_CLK

    Chulwoo Kim 26 of 86

  • Disadvantages of SMD

    Disadvantages

    Mismatch between replica delay and input buffer & clock distribution

    Coarse resolution

    Input jitter multiplication

    Delay Measure Delay Line

    Replicate Delay Line

    Clock

    OUT

    tD1

    tD2

    tD1+tD2 tD3 Clock

    Clock

    w/o jitter

    w/ jitter

    tD1

    tD1+tD2

    tCK-(tD1+tD2) tD2

    OUT

    tCK-(tD1+tD2)+2Δ

    -Δ +Δ

    OUTInput pk-pk

    jitter(±Δ) Output pk-pk

    jitter(±2Δ)

    tCK-(tD1+tD2)+2Δ

    tCK

    tD1

    tD1+tD2

    tD2

    +2Δ

    Clock Generation and Distribution

    I_CLK

    Chulwoo Kim 27 of 86

  • Register Controlled DLL

    Locking information is stored digitally in register

    Vernier type delay line increases resolution

    [9] A. Hatakeyama et al., ISSCC 1997, pp. 72-73

    tD+Δ tD+Δ tD+Δ tD+Δ

    tD tD tD tD tD

    SW0 SW1 SW2 SW3 SW4

    IN

    OUT

    tD+Δ

    tD

    fan-out=2

    fan-out=1

    SW(n-1) SW(n)

    Sub Delay Line

    Main Delay Line

    Sub Delay Line

    Main Delay Line

    Clock Generation and Distribution Chulwoo Kim 28 of 86

  • Single Register Controlled Delay Line

    Clock Generation and Distribution

    Fine Delay

    Controller

    I_CLK CSL1 CSL2 CSL3

    IN1

    IN2

    OUT12 Phase Mixer

    1-K

    K

    IN1

    IN2

    OUT12

    OUT1

    OUT2

    OUT12

    OUT1

    IN2

    IN1

    OUT2

    tUD

    tUD

    Coarse Delay

    UP/DN* from PD

    *DN=Down

    Chulwoo Kim 29 of 86

  • Boundary Switching Problem

    IN1×(1-K)+IN2×K

    I_CLK

    Shift left

    Passing through 4 UDCs

    IN1

    IN2

    OUT12 Phase Mixer

    UDC*

    Passing through 3 UDCs

    Clock Generation and Distribution

    tUD

    IN1 K=0

    IN2 K=1

    tUD

    IN1 K=0

    IN2 K=1

    K=0.9

    K=0.9

    Coarse shift & fine reset do not occur simultaneously

    Chulwoo Kim 30 of 86

    *UDC=Unit delay cell

  • Seamless Boundary Switching

    Clock

    Shift left

    Unit Delay Cell IN1×(1-K)+IN2×K

    Dual Coarse Delay Line

    tUD

    K(0≤K≤1)

    IN1 K=0

    IN2 K=1

    IN1

    IN2

    Phase Mixer

    OUT12

    Clock Generation and Distribution

    K=0.9

    [10] J.-T. Kwak et al., VLSI 2003, pp. 283-284

    tUD

    IN2 K=1

    IN1 K=0

    K=1.0

    Fine set first and then coarse shift

    Chulwoo Kim 31 of 86

  • Adaptive Bandwidth DLL w/ SDVS*

    Variable

    Delay Line Replica Delay

    Controller PD

    I_CLK

    FB_CLK

    Update Period Pulse Gen.

    O_CLK To Upper Block NCODE

    I_CLK

    Update Pulse

    FB_CLK

    Update Period m×tCK-tDREP+tDREP=m×tCK m=2,BWDLL=1/(2×tCK)

    [11] H.-W. Lee et al., ISSCC 2011, pp. 502-504

    Clock Generation and Distribution

    6

    8

    10

    12

    14

    16

    18

    DN BASE UP

    15.9 ps

    10.2 ps

    7.8 ps

    6

    10

    14

    18

    Low -Speed Mode

    High -Speed Mode

    Base

    [ps]

    Fine Unit Delay vs. Mode

    Update Pulse

    *SDVS: Self-dynamic voltage scaling

    Chulwoo Kim 32 of 86

  • Duty Cycle Corrector (DCC)

    DCC

    Reduces duty cycle error

    Enlarges valid data window for DDR

    Needs to correct ±15% duty error at max speed

    Can be implemented either in analog or digital type

    DCC Design Issues

    Location of DCC (before/after DLL)

    Embedded in DLL or not

    Power consumption

    Area

    Operating frequency range

    Locking time in case of digital DCC

    Offset of duty cycle detector

    Clock Generation and Distribution Chulwoo Kim 33 of 86

  • Digital DCC

    Invert-Delay Clock

    Generator

    IN

    Out Phase Mixer

    Pulse Width Controller

    Duty Cycle Detector

    Half-Cycle Delayed Clock

    Generator

    Edge

    Combiner

    Out

    Out

    Invert and delay

    50% 50%

    50% 50%

    OUT

    IN

    IN

    OUT

    IN

    OUT

    HD_IN

    IN

    IN

    IN

    HD_IN

    IN

    50% 50%

    Clock Generation and Distribution Chulwoo Kim 34 of 86

  • DCC in GDDR5

    Clock Generation and Distribution

    RX

    Div

    ide

    r

    CML2

    CMOS

    DQPLL sel.

    CML only

    Duty Cycle

    Detector

    Adder-based

    Counter

    Duty Cycle

    Corrector

    Control Pulse

    Generator

    4-phase

    4

    PLL

    Glo

    ba

    l

    Drive

    r

    Re

    pe

    ate

    r

    Duty

    Cycle

    Adjuster

    up/dns

    c

    4

    rxclk rxclkb

    sw hclk & lclk

    4 44

    DQ

    Clk Distribution

    clock

    Network

    Decreasing

    CML_bias

    WCK WCKb

    X1X2X4X8 X1 X2 X4 X8

    c

    Duty-Cycle

    RX

    rxclk

    rxclk

    rxclk

    b

    Decoder

    rxclkb

    Adjuster

    duty-cycle

    (DCA) DCA is not in clock path

    No jitter addition

    [12] D. Shin et al., VLSI 2009, pp. 138-139

    Chulwoo Kim 35 of 86

  • DLL-related Parameters & Reference

    DDR1 DDR2 VDD

    Lock time

    Max. tDQSCK

    200 cycles 200 cycles

    333MHz~ 800MHz

    600MHz~ 1.37GHz

    2~20K cycles

    2.5V

    600ps

    166MHz

    1×tCK

    1.8V 1.5V/1.35V 1.8V 1.5V

    Nominal speed

    tXPDLL*(tXARD)

    Max. tCK 12ns 8ns 3.3n 3.3n 2.5ns

    300ps 225ps 180ps 140ps

    333MHz 1.6GHz

    512 cycles 2~5K cycles

    DDR3/DDR3L GDDR3 GDDR4

    2×tCK 10×tCK 7×tCK+tIS 9×tCK+tIS

    RELATED AREA

    DCC block

    Variable Delay Line

    Delay Control Logic

    Replica

    Low Jitter

    REFERENCE Type

    [23]** [14] [18] [19]** [20] [22] [24] [25]* [26]

    [23]* [26] [13] [15]** [16] [18] [20] [21]** [31] [32]* [33]**

    [27][28]** [29] [30]

    [29] [30]** [34]* [35]*

    [32] [27] [28]** [30]**

    [14] [36]* [15]** [16] [32]* [24] [26] [27] [17]** [19]**

    [14] [25]* [28]**

    tXPDLL*(tXARD) – Timing for exit precharge power-down to any non-READ command

    Clock Generation and Distribution

    [ ] digital

    [ ]* mixed

    [ ]** analog

    [13] [14] [15]** [16] [17]** [19]** [20] [21]** [18]

    Chulwoo Kim 36 of 86

  • Clock Distribution

    DQ DQ DQ DQ DQ DQ DQ

    DQ DQ DQ DQ DQ DQ DQ DQ

    Global Clock Buffer

    CK/CKB DQ

    Clock Distribution Issues Clock skew among DQs

    Low power

    Robust under PVT variations

    CML to CMOS converter jitter

    [37] S.-J. Bae, et al., ISSCC, 2011, pp. 498-500

    1,2

    00μm

    93,750μm

    Clock Generation and Distribution Chulwoo Kim 37 of 86

  • CML to CMOS Converter

    Global Clock Buffer

    Current logic mode : high-speed clock

    CML to CMOS Converter Issue

    Susceptible to noise

    Jitter

    CLKP CLKN

    OUTNOUTP

    Global Clock Buffer CML to CMOS Converter

    1700μm DQ

    CLKP CLKN

    CLKOUT

    Clock Generation and Distribution Chulwoo Kim 38 of 86

  • Outline

    Introduction

    Clock Generation and Distribution

    Transceiver Design Channel

    Pre-emphasis

    Equalizer

    Crosstalk and skew

    Training

    Input buffer

    Output driver

    DBI/CRC

    TSV Interface for DRAM

    Summary

    References

    Output driver

    Training

    Pre-emphasis

    DBI/CRC

    Input buffer

    Training

    Equalizer

    DBI/CRC

    CH

    Chulwoo Kim 39 of 86

  • Channel Characteristics

    GDDRx

    Point to point connection

    Performance target

    • High data rate

    Few reflection components

    • PCB VIAS

    DDRx

    Multidrop

    Performance and power

    Many reflection components

    • PCB VIAS, DIMM connector….

    GPU

    GDDRx

    GD

    DR

    x

    DIM

    M S

    lot

    CPU Socket

    Transceiver Design Chulwoo Kim 40 of 86

  • Emphasis for Channel Compensation

    Time

    Channel

    Original Signal Distorted Signal

    D(in) FFE D(out)

    FFE

    Am

    plitu

    de

    Am

    plitu

    de

    Am

    plitu

    de

    Channel FFEChannel

    Freq.fdata/2 Freq. Freq.fdata/2 fdata/2

    Am

    plitu

    de

    Time

    Am

    plitu

    de

    Channel

    Transceiver Design Chulwoo Kim 41 of 86

  • Pre-emphasis vs. De-emphasis

    Pre-emphasis : Transition Bit Boosting

    De-emphasis : Non-transition Bit Suppression

    1-tap pre-emphasis

    No emphasis

    1-tap de-emphasis

    Va

    Va

    Va

    Time

    Transceiver Design Chulwoo Kim 42 of 86

  • Basic De-emphasis Circuit

    The Number of Taps

    Depends on the channel quality and bit rate

    Usually from one to three taps

    D Q

    QB

    Din

    DoutK0

    Unit delay

    -K1

    X(n)

    Y(n)

    Transceiver Design Chulwoo Kim 43 of 86

  • Pre-emphasis Circuit[1/2]

    Cascaded Pre-emphasis Internal node ISI due to limited TR performance at high speed

    Internal node pre-emphasis ratio would not be affected by the channel

    Less sensitive to the system environment or channel variations

    [38] K.-H. Kim et al., JSSC, Jan 2006, pp. 127-134

    Din(n-1)

    Din(n-2)

    Driver

    Pre-emph.

    DQ

    DQB

    Din(n)

    4:2

    4:2

    4:2

    2:1

    2:1

    2:1

    2:1

    NoPre-emphasis

    ConventionalPre-emphasis

    ProposedPre-emphasis

    4000Time[psec]

    1.04

    1.20

    1.08

    1.201.00

    1.20

    Vo

    ltag

    e[V

    ]

    Transceiver Design Chulwoo Kim 44 of 86

  • Pre-emphasis Circuit[2/2]

    [39] H. Partovi et al., ISSCC, 2009, pp.136-137

    Voltage Mode Driver Pre-emphasis Additional zero by Cc

    Time continuous pre-emphasis

    Pre

    -Dri

    ver

    MainDriverP

    re-D

    river

    RT

    RTDin

    RC

    CCRC

    CP

    Dout

    TX

    Pre-Emph. Driver

    Boosting Capacitor

    CL

    RT

    GPU

    BW

    BW

    CHDin

    RC

    RT

    CC

    Dout

    CL

    Equivalent Linear Model

    CP RT

    Transceiver Design Chulwoo Kim 45 of 86

  • DFE cancels ISI without noise amplification

    Clock must be provided by DLL or PLL

    Critical path (feedback path) is important

    (A) (B) (C) (D)

    Decision Feedback Equalization (DFE)

    Time

    Am

    plitu

    de

    1UI

    Time

    Am

    plitu

    de

    ISI

    Time

    Am

    plitu

    de

    Emulated ISI

    Time

    Am

    plitu

    de

    No ISI

    Transceiver Design

    [40] Y. Hidaka, CMOS Emerging Technologies Workshop, May 2010

    Chulwoo Kim 46 of 86

  • [41] S.-J. Bae et al., ISSCC, 2008, pp. 278-279

    The previously captured data must be fed back to the receiver within 1UI

    WCK/2_0

    DQ Vref

    WCK/2_0

    P0b P0

    WCK/2_0

    P270b P270

    WCK/2_0

    × α × α

    × α

    DFE SADQ

    DFE SA

    Vref

    WCK/2_0

    WCK/2_90

    DFE SA

    DFE SA

    WCK/2_180

    WCK/2_270

    SR Latch

    SR Latch

    SR Latch

    SR LatchP270

    P180

    P90

    P0 D0

    D270

    D180

    D90

    DQ

    WCK/2_270

    P270

    WCK/2_0

    P0

    Precharge Evaluation

    Precharge Evaluation

    D270 D0 D90

    TFB=TSA

  • Crosstalk is coupling of energy from one line to another

    Crosstalk

    Timing Effect

    Timing Jitter

    Signal Integrity

    Near end crosstalk Far end crosstalk

    Input signal

    Input signal at far end

    Near Far

    Cm

    Near Far

    Lm

    ICm ILm

    Inear=ICm+ILm Ifar=ICm─ILm

    Transceiver Design Chulwoo Kim 48 of 86

  • Staggered Memory Bus

    No discrepancy of propagation delay due to the crosstalk

    Difference of transition point is τ/2

    Distance between channels with the same transition is increased

    Jitter due to coupling from the adjacent channel is reduced

    [42] K.-I. Oh et al., JSSC, Aug. 2009, pp. 2222-2232

    MCU DRAM

    Staggered Memory Bus

    Channel

    Channel

    τ

    Transceiver Design Chulwoo Kim 49 of 86

  • Compensation for glitch by adding or subtracting current

    Rise : ICOMP is added to the main driver

    Fall : ICOMP is subtracted from the main driver

    Glitch Canceller

    [42] K.-I. Oh et al., JSSC, Aug. 2009, pp. 2222-2232

    Transceiver Design

    TX1

    Transition Detector

    DTX3

    TX3

    TX2

    IBIAS+ICOMP

    DTX1 DTX2

    Rise/Fall

    Aggressor

    Victim

    DTX1

    Rise

    Fall

    DTX2

    Chulwoo Kim 50 of 86

  • Crosstalk equalization at transmitter

    Cancel the crosstalk by the impedance calibration

    Crosstalk Equalizer (TX)

    [37] S.-J. Bae et al., ISSCC, Feb. 2011, pp. 498-500

    DO[0]

    DO[1:3]

    DQ[0]

    EN[0:5] DO[0]

    ∆t

    DO[1]

    DQ[0]

    Crosstalk Equalizing Driver

    EN[1]

    EN[0] EN[1]

    EN[0]

    Transceiver Design Chulwoo Kim 51 of 86

  • Skew

    Differences of flight time between signals

    Skew can cause timing errors

    Key design criterion in high-speed systems

    Transceiver Design

    MCU/GPU DRAM

    Ban

    k

    Ban

    k

    Perip

    heral C

    ircu

    it

    DLL

    CM

    D

    Co

    ntr

    olle

    r

    Seria

    l . P

    aralle

    l

    Generator

    TD

    TD‟‟

    CLK

    Command

    DQS

    DQ

    Address TD‟

    Chulwoo Kim 52 of 86

  • Pre/De-skew with Preamble Signal

    Skew cancellation circuit is put in each DRAM

    With estimated skew information

    De-skew the data during write mode

    Pre-skew the data during read mode [43] S. H. Wang et al., JSSC, Apr. 2001, pp. 648-657

    Data Delay Lines PLL Mux

    Register Files

    Skew Estimator

    Skewed Data Data

    Ext.Clk

    Data[n] Skew

    De-skewed Data

    Sampling Clk

    8

    8

    3

    8

    3 8

    Transceiver Design Chulwoo Kim 53 of 86

  • Fly-by Topology for DDR3

    [4] JEDEC, JESD79-3E, pp. 56-59

    Fly-by Topology Better signal integrity to reduce

    the number of stubs and stub length

    Easy to apply a single termination at the end of signal

    DQ and DQS are applied to each DRAM at the same time

    Large skew bw. CLK and DQS

    Need to calibrate skew

    DRAM#1

    DRAM#2

    DRAM#7

    DRAM#8

    T-branch

    CLK, CMD, Address

    DRAM#1

    DRAM#2

    DRAM#7

    DRAM#8

    CLK, CMD, Address

    Skew

    [s]

    DRAM#1

    DRAM#2

    Skew

    [s]

    DRAM#3

    DRAM#4

    DRAM#5

    DRAM#6

    DRAM#7

    DRAM#8

    DRAM#1

    DRAM#2

    DRAM#3

    DRAM#4

    DRAM#5

    DRAM#6

    DRAM#7

    DRAM#8

    DQ & DQS

    Fly-by

    DQ & DQS

    VTT

    T-branch Topology CLK/CMD/Address are applied to

    each DRAM in parallel

    Small skew bw. CLK and DQS

    Transceiver Design Chulwoo Kim 54 of 86

  • Write Leveling for DDR3

    Write Leveling

    Timing mismatch compensation between CLK and DQS

    Write leveling is applied to all DRAMs, respectively

    [4] JEDEC, JESD79-3F, pp. 56-59

    T0 T1 T2 T3 T4 T5 T6 T7

    T0 T1 T2 T3 T4 T5 T6Tn

    CK#

    CK

    diff_DQS

    CK#

    CK

    diff_DQS

    DQ

    DQ

    diff_DQS

    Source

    Destination

    Push DQS to capture 0-1 transition

    0 or 1

    0 or 1

    0 0 0

    1 1 1

    Transceiver Design Chulwoo Kim 55 of 86

  • Training for GDDR5

    Adaptive Interface Training

    Ensure the Widest Timing Margins for All Signals

    Controlled by MCU

    [44] W. Hubert et al., ATS, 2008, pp. 24-27

    CK

    CMD

    ADDR

    WCK

    DQ

    GDDR5 Timing after Training

    Transceiver Design Chulwoo Kim 56 of 86

  • Training Sequence for GDDR5

    Optional

    Optimize address input data eye

    Clock alignment

    Ready for read/write

    Search for best read data eye

    Detect burst boundaries of read stream

    Search for best write data eye

    Detect burst boundaries of write stream

    [45] JEDEC, JESD212, pp. 23-39

    Detect the configuration and mirror function

    ODT setting

    Transceiver Design

    Power Up

    Address Training

    WCK2CK Alignment Training

    READ Training

    WRITE Training

    Exit

    Chulwoo Kim 57 of 86

  • Training Example : Write Training

    [44] W. Hubert et al., ATS, 2008, pp. 24-27

    t0 + t1

    Memory Controller GDDR5 Device

    Write Data eyes

    t1 t2

    Memory Controller GDDR5 Device

    WriteData eyes Data eyes

    t1t2

    t0

    t0

    t0

    t0

    Data eyes

    t0 - t2

    Transceiver Design Chulwoo Kim 58 of 86

  • Input Buffer

    Convert attenuated external signal to rail-to-rail signal

    Trade-off between high speed operation and power consumption

    Transceiver Design

    DRAM MCU/GPU

    DQS

    Ban

    k

    Ban

    k CLK

    Command

    DQ

    Perip

    heral C

    ircu

    it

    DLL

    CM

    D

    Co

    ntr

    olle

    r

    Seria

    l . P

    aralle

    l

    GEN

    4

    n

    Address

    m*

    * m: The number of address channels which are depend on kinds of memory or its density

    Chulwoo Kim 59 of 86

  • Input Buffer Comparison

    CMOS Type

    Simple circuit Low-speed input (CKE)

    Susceptible to noise

    Unstable threshold

    Differential Type

    Complex circuit High-speed input

    Robust to noise

    Stable threshold

    Commonly used

    In OUT

    En

    En

    OUT

    En En

    InVref

    En

    Transceiver Design Chulwoo Kim 60 of 86

  • DDR4 Input Buffer

    [46] K. Sohn et al., ISSCC, 2012, pp. 38-40

    Gain Enhanced Buffer

    Signal transition detector is added The bias level (I) is controlled

    Sensitivity can be enhanced at higher frequencies

    Wide Common-Mode Range DQ Buffer

    Delivers stable inputs to the second stage Amp.

    Feedback network reduces the output common-mode variation

    Vref In

    CMFB

    Amp.

    In

    Vref

    InBuffer

    TransitionDetector

    I

    * CMFB : Common-mode feedback

    Transceiver Design Chulwoo Kim 61 of 86

  • Pseudo Open Drain (POD)

    Impedance Calibration

    Manual vs. Automatic

    External Resistor

    240Ω

    Din

    Din

    Pull-UP

    Pull-DOWN

    Din

    Din

    I/OBuffer

    Channel

    240Ω

    Transceiver Design

    Chulwoo Kim 62 of 86

  • Impedance Calibration

    Thermometer Code Control

    PU PU

    REG

    PD

    REG

    DRAMExternal

    PUcon

    PDcon

    Vref

    En

    En

    ZQPAD

    Dout

    n

    n

    WP

    R

    WN

    R

    WP

    R

    WN

    R

    WP

    R

    WN

    R

    Din+

    PUcon

    Din+

    PDcon

    [47] C. Park et al., JSSC, Apr. 2006, pp. 831-838

    Transceiver Design

    Chulwoo Kim 63 of 86

  • Multi Slew-rate Output Driver

    Binary-weighted Code Control

    PU PU

    DF

    PD

    DF

    DRAMExternal

    PUcon

    PDcon

    Vref

    En

    En

    DF = Digital LPF + UP/DOWN Counter

    ZQPAD

    Dout

    WP/4 WP/2 WP 32WP

    128R 64R 32R R

    WN/4 WN/2 WN 32WN

    128R 64R 32R R

    60Ω

    120Ω

    240Ω

    n

    n

    Din+

    PUcon

    Din+

    PDcon

    [48] D. U. Lee et al., ISSCC, 2008, pp. 280-613

    Transceiver Design

    Chulwoo Kim 64 of 86

  • Global ZQ Calibration

    Global Impedance Mismatch Error < 1%

    PVT variation sensor

    LS

    PA

    CP

    LO

    Ref.

    ZZcal

    i0cal

    (-)

    i0cal

    OD

    T c

    alib

    ratio

    n

    blo

    ck a

    t ZQ

    pin

    Zcal

    DQ0ZQ

    LS

    PA

    CP

    LO

    Ref.

    CP: ComparatorPA: Pre-amplifierLS: Local PVT sensorLO: Local controller

    i0cal

    DQn (n=1~31) Z

    Global Reference Signal

    [49] J. Koo et al., CICC, 2009, pp. 717-720

    Transceiver Design

    Chulwoo Kim 65 of 86

  • Data Bus Inversion (DBI)

    Power reduction technique independent of data pattern

    Dominant power (I/O Buffer)

    P=α X CPCB X VDD2 α < 0.5

    For high-BW memory, inversion time +CRC can be a bottle neck

    [50] S.-S. Yoon et al., ASSCC 2008, pp.249-252

    Transceiver Design Chulwoo Kim 66 of 86

  • Cyclic Redundancy Check (CRC)

    Data error check for every unit interval (64 bits – data only) Redundancy bit : 1 bit/byte

    Speed bottleneck for high-BW

    Time (READ DBI + READ CRC + CRC calculator) < 9 periods

    [50] S.-S. Yoon et al., ASSCC 2008, pp.249-252

    Transceiver Design

    Error type Detection rate

    random single bit 100%

    random double bit 100%

    random odd count 100%

    burst ≤ 8 100%

    Chulwoo Kim 67 of 86

  • CRC (cont’d)

    X8+X2+X1+1 with an initial value of „0‟ Algorithm for GDDR5 ATM-0M83

    Logic for algorithm takes a long time

    To increase CRC speed XOR logic optimization

    CRC calculation time < TCRC

    Transceiver Design Chulwoo Kim 68 of 86

  • Outline

    Introduction

    Clock Generation and Distribution

    Transceiver Design

    TSV Interface for DRAM Bandwidth requirement

    DRAM with TSV

    TSV DRAM type

    DRAM stacking type

    Data confliction issue & solution

    Failed TSV issue & solution

    Summary

    References

    Chulwoo Kim 69 of 86

  • Bandwidth Requirements

    Requirement Next GDDR will require over 10Gb/s/pin data rate

    Restrictions Very difficult over 10Gb/s/pin

    Cost for performance improvements

    Power consumption

    2000 2005 2010 20150

    2

    4

    6

    8

    10

    12

    DDR

    DDR2

    DDR3

    DDR4

    GDDR3

    GDDR4

    GDDR5

    Data

    Rate

    /P

    in [

    Gb

    ps]

    DDRx / GDDRx Data Rate/Pin Trend

    ?Gb/s/pinGb/s/chip

    GDDR1 32 1

    GDDR3 51.2 1.6

    GDDR4 102.4 3.2

    GDDR5 224 7

    GDDR? 448 (?) 14 (?)

    TSV Interface for DRAM Chulwoo Kim 70 of 86

  • DRAM with TSV

    Advantages of DRAM with TSV Higher density per area

    Shorter interconnection : lower power, faster flight time

    Higher bandwidth with wide I/O

    Wide I/O easily achieves 448 Gb/s/chip at next GDDR

    (Example : 800 Mb/s/pin ×512 I/O ≈ 448 Gb/s/chip)

    MCU/GPU

    Wide I/O Memory

    TSV

    MCU/GPU

    Memory

    Memory

    Memory

    Memory Interposer

    TSV Interface for DRAM Chulwoo Kim 71 of 86

  • TSV DRAM Type

    Type Main Memory Mobile Graphics

    Architecture

    No. of TSV 500~1000 EA 1000~1500 EA 2000~3000 EA

    Feature • Low power • High speed

    • Low power • Multi channel • Wide I/O

    • Max bandwidth • Multi channel

    Package

    GPU

    Controller Interposer

    TSV Interface for DRAM Chulwoo Kim 72 of 86

  • Stacking Type

    Type Homogeneous Heterogeneous

    Architecture

    Feature • Same chips • Low cost

    • Slave : only cells • Master : with peripheral

    Slave

    Slave

    Slave

    Master

    TSV Interface for DRAM Chulwoo Kim 73 of 86

  • Data Confliction Issue

    PVT variations cause the data skew

    Data Confliction increases the short current

    DQ DQ DQ DQ DQ DQ

    DQ DQ DQ DQ

    Data Confliction

    Slowest Chip Fastest Chip

    PVT Variations

    [51] H.-W. Lee et al., ISSCC, 2012, pp. 48-50

    TSV Interface for DRAM

    DQ of

    CHIP 0

    MN0

    MP0

    EN0

    /EN0

    MN3

    MP3

    EN3

    /EN3

    DQ of

    CHIP 3

    HIGH

    LOW

    DQ

    Pin

    TS

    V

    Chulwoo Kim 74 of 86

  • Rank 0

    Group A

    Bank Bank

    Bank Bank

    Group B

    TSV array TSV array

    Bank Bank

    Bank Bank Rank 1

    Group A

    Bank Bank

    Bank Bank

    Group B

    TSV array TSV array

    Bank Bank

    Bank Bank Rank 2

    Group A

    Bank Bank

    Bank Bank

    Group B

    TSV array TSV array

    Bank Bank

    Bank Bank

    Separate Data Bus per Group

    Separate Data Bus per Bank Group Less dependent on the PVT variation

    Rank 3

    Group A

    Bank Bank

    Bank Bank

    Group B

    TSV array TSV array

    Bank Bank

    Bank Bank

    [52] U. Kang et al., ISSCC, 2009, pp. 130-131

    TSV Interface for DRAM Chulwoo Kim 75 of 86

  • DLL-Based Self-Aligner

    Data alignment to external clock or clock of the slowest chip [51] H.-W. Lee et al., ISSCC, 2012, pp. 48-50

    TSV Interface for DRAM Chulwoo Kim

    SkewDetector

    SkewCompensator

    Fine Aligner

    Replica

    UP/DN

    TS

    V M

    od

    el

    READ

    READb

    REAL PATH0

    1

    0

    1

    CK

    TRCLK

    RFBCLK

    C_CLK

    CLKOUT

    CHIP 1

    CHIP 2

    CHIP 3

    CHIP 0

    MODE

    TFBCLK

    PINDQS or

    Dummy PinTSV model

    Pipe

    latches

    Pipe

    latchesLatches

    Datas Aligned Datas

    SAM MODE

    PD1

    PD2

    76 of 86

  • Failed TSV Issue

    a. TSV plating defect b. pinch-off

    Decreasing the assembly yield

    Increasing the total cost

    Failed TSV

    [53] D. Malta et al., ECTC, 2010, pp. 1779-1775

    TSV Interface for DRAM Chulwoo Kim 77 of 86

  • TSV Check

    A TSV connectivity check by using the internal circuit

    Test Signal Generating Circuits

    Scan Chain Based Testing Circuits TSV_0

    TSV_1

    TSV_2

    TSV_3

    TSV_4

    In_0 In_1 In_2 In_3 In_4

    Out_0 Out_1 Out_2 Out_3 Out_4

    Receiver End

    Sender End

    [54] A.-C. Hsieh et al., TVLSI, Apr. 2012, pp. 711-722

    TSV Interface for DRAM Chulwoo Kim 78 of 86

  • Redundant TSVs for Failed TSV Conventional : redundant TSVs are dedicated and fixed

    Proposed : failed TSV is repaired with a neighboring TSV

    TSV Repair

    Chip1

    Conventional

    Chip2

    A

    B

    C

    D

    A‟

    B‟

    C‟

    D‟

    a

    b

    r2

    r1

    c

    d

    Chip1

    Proposed

    Chip2

    B

    C

    D

    A‟

    B‟

    C‟

    D‟

    a

    b

    c

    d

    e

    f

    A

    [52] U. Kang et al., ISSCC, 2009, pp. 130-131

    TSV Interface for DRAM Chulwoo Kim 79 of 86

  • Outline

    Introduction

    Clock Generation and Distribution

    Transceiver Design

    TSV Interface for DRAM

    Summary

    References

    Chulwoo Kim 80 of 86

  • Summary

    Although all types of DRAMs are reaching their limits in supply voltage, the demand of high-bandwidth memory is keep increasing

    For synchronization of external clock and output of DRAM, low power, small area, and low skew are important design parameters

    To achieve high-BW memory, many design techniques have been and will be adopted from other high-speed wireline transceivers

    TSV interface for DRAM might be a good solution to achieve high bandwidth and low power

    Summary Chulwoo Kim 81 of 86

  • Suggested Papers to See

    17.1 “A 6.4Gb/s near-ground single-ended transceiver for dual-rank DIMM memory interface systems”

    17.2 “A 27% reduction in transceiver power for single-ended point-to-point DRAM interface with the termination resistance of 4×Z0 at both TX and RX”

    17.3 “A 5.7mW/Gb/s 24-to-240Ω 1.6Gb/s thin-oxide DDR transmitter with 1.9-to-7.6V/ns clock-feathering slew-rate control in 22nm CMOS”

    17.4 “An adaptive-bandwidth PLL for avoiding noise interference and DFE-less fast precharge sampling for over 10Gb/s/pin graphics DRAM interface”

    Chulwoo Kim 82 of 86

  • References [1] K. Koo et al., “A 1.2V 38nm 2.4Gb/s/pin 2Gb DDR4 SDRAM with bank group and ×4 half-page architecture”, in IEEE ISSCC Dig. Tech. Papers, pp. 40–41, 2012.

    [2] JEDEC, JESD79F.

    [3] JEDEC, JESD79-2F.

    [4] JEDEC, JESD79-3F.

    [5] JEDEC, JESD79-4.

    [6] T.-Y. Oh et al., “A 7Gb/s/pin GDDR5 SDRAM with 2.5ns bank-to-bank active time and no bank-group restriction”, in IEEE ISSCC Dig. Tech. Papers, pp. 434–435, 2010.

    [7] H.-W. Lee et al., “Survey and analysis of delay-locked loops used in DRAM interfaces”, submitted to IEEE Trans. VLSI Syst.

    [8] T. Saeki et al., “A 2.5 ns clock access 250 MHz 256 Mb SDRAM with a synchronous mirror delay”, in IEEE ISSCC Dig. Tech. Papers, pp. 374-375, 1996.

    [9] A. Hatakeyama et al., “A 256 Mb SDRAM using a register-controlled digital DLL”, in IEEE ISSCC Dig. Tech. Papers, pp. 72-73, 1997.

    [10] J.-T. Kwak et al., “A low cost high performance register-controlled digital DLL for 1Gbps x32 DDR SDRAM”, in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 283-284, 2003.

    [11] H.-W. Lee et al., “A 1.6V 1.4Gb/s/pin consumer DRAM with self-dynamic voltage-scaling technique in 44nm CMOS technology”, in IEEE ISSCC Dig. Tech. Papers, pp. 502-504, 2011.

    [12] D. Shin et al., “Wide-range fast-lock duty-cycle corrector with offset-tolerant duty-cycle detection scheme for 54nm 7Gb/s GDDR5 DRAM interface”, in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 138-139, 2009.

    [13] W.-J. Yun et al., “A 3.57 Gb/s/pin low jitter all-digital DLL with dual DCC circuit for GDDR3 DRAM in 54-nm CMOS technology,” IEEE Trans. VLSI Sys., vol. 19, no. 9, pp. 1718-1722, Nov. 2011.

    [14] H.–W. Lee et al., “A 7.7mW/1.0ns/1.35V delay locked loop with racing mode and OA-DCC for DRAM interface,” in Proc. of Int. Symp. Circuits and Syst., pp. 3861-3864, 2010.

    [15] B.-G. Kim et al., “A DLL with jitter reduction techniques and quadrature phase generation for DRAM interfaces,” IEEE J. Solid-State Circuits, vol. 44, no. 5, pp. 1522-1530, May 2009.

    References Chulwoo Kim 83 of 86

  • References [16] W.–J. Yun et al., “A 0.1-to-1.5GHz 4.2mW all-digital DLL with dual duty-cycle correction circuit and update gear circuit for DRAM in 66nm CMOS Technology,” in IEEE ISSCC Dig. Tech. Papers, pp. 282-283, 2008.

    [17] S. Kim et al., “A low jitter, fast recoverable, fully analog DLL using tracking ADC for high speed and low

    stand-by power DDR I/O interface” in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 285-286, 2003.

    [18] T. Matano et al., “A 1-Gb/s/pin 512-Mb DDRII SDRAM using a digital DLL and a slew-rate-controlled output buffer,” IEEE J. Solid-State Circuits, vol. 38, no. 5, pp. 762-768, May 2003.

    [19] K.-H. Kim et al., “Built-in duty cycle corrector using coded phase blending scheme for DDR/DDR2 synchronous DRAM application” in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 287-288, 2003.

    [20] J.-T. Kwak et al., “A low cost high performance register-controlled digital DLL for 1 Gbps x32 DDR SDRAM” in IEEE Symp. VLSI Circuits Dig. Tech. Papers , pp. 283-284, 2003.

    [21] O. Okuda et al., “A 66-400 MHz, adaptive-lock-mode DLL circuit with duty-cycle error correction [for SDRAMs]” in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 37-38, 2001.

    [22] F. Lin et al., “A wide-range mixed-mode DLL for a combination 512 Mb 2.0 Gb/s/pin GDDR3 and 2.5 Gb/s/pin GDDR4 SDRAM,” IEEE J. Solid-State Circuits, vol. 43, no. 3, pp. 631-641, Mar. 2008.

    [23] K.-W. Kim et al., “A 1.5-V 3.2 Gb/s/pin Graphic DDR4 SDRAM With dual-clock system, four-phase input strobing, and low-jitter fully analog DLL,” IEEE J. Solid-State Circuits, vol. 42, no. 11, pp. 2369-2377, Nov. 2007.

    [24] D.–U. Lee et al., “A 2.5Gb/s/pin 256Mb GDDR3 SDRAM with series pipelined CAS latency control and dual-loop digital DLL,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 547-548, 2006.

    [25] S.–J. Bae et al., “A 3Gb/s 8b single-ended transceiver for 4-drop DRAM interface with digital calibration of equalization skew and offset coefficients,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 520-521, 2005.

    [26] Y.-J. Jeon et al., “A 66-333-MHz 12-mW register-controlled DLL with a single delay line and adaptive-duty-cycle clock dividers for production DDR SDRAMs,” IEEE J. Solid-State Circuits, vol. 39, no. 11, pp. 2087-2092, Nov. 2004.

    [27] T. Hamamoto et al., “A 667-Mb/s operating digital DLL architecture for 512-Mb DDR,” IEEE J. Solid-State Circuits, vol. 39, no. 1, pp. 194-206, Jan. 2004.

    References Chulwoo Kim 84 of 86

  • References [28] S. Kim et al., “A low-jitter wide-range skew-calibrated dual-loop DLL using antifuse circuitry for high-speed DRAM,” IEEE J. Solid-State Circuits, vol. 37, no. 6, pp. 726-734, Jun. 2002.

    [29] J.–B. Lee et al., “Digitally-controlled DLL and I/O circuits for 500 Mb/s/pin x16 DDR SDRAM,” in IEEE ISSCC Dig. Tech. Papers, pp. 68-69, 2001.

    [30] S. Kuge et al., “A 0.18um 256-Mb DDR-SDRAM with low-cost post-mold tuning method for DLL replica,”

    IEEE J. Solid-State Circuits, vol. 35, no. 11, pp. 726-734, Nov. 2000.

    [31] H.–W. Lee et al., “A 1.6V 1.4Gb/s/pin consumer DRAM with self-dynamic voltage-scaling technique in 44nm CMOS technology,” IEEE J. Solid-State Circuits. vol. 47, no. 1, pp. 131-140, Jan. 2012.

    [32] Y. K. Kim et al., “A 1.5V, 1.6Gb/s/pin, 1Gb DDR3 SDRAM with an address queuing scheme and bang-bang jitter reduced DLL scheme” in IEEE Symp. VLSI Dig. Tech. Papers, pp. 182-183, 2007.

    [33] K.–H. Kim et al., “A 1.4 Gb/s DLL using 2nd order charge-pump scheme with low phase/duty error for high-speed DRAM application,” in IEEE ISSCC Dig. Tech. Papers, pp. 213-214, 2004.

    [34] J.–H. Lee et al., “A 330 MHz low-jitter and fast-locking direct skew compensation DLL,” in IEEE ISSCC Dig. Tech. Papers, pp. 352-353, 2000.

    [35] J. Kim et al., “A low-jitter mixed-mode DLL for high-speed DRAM applications,” IEEE J. Solid-State Circuits, vol. 35, no. 10, pp. 1430-1436, Oct. 2000.

    [36] H.–W. Lee et al., “A 1.6V 3.3Gb/s GDDR3 DRAM with dual-mode phase- and delay-locked loop using power-noise management with unregulated power supply in 54nm CMOS,” in IEEE ISSCC Dig. Tech. Papers, 2009, pp. 140-141.

    [37] S.-J. Bae et al., “A 40nm 2Gb 7Gb/s/pin GDDR5 SDRAM with a Programmable DQ Ordering Crosstalk Equalizer and Adjustable clock-Tracing BW,” in IEEE ISSCC Dig. Tech. Papers, pp. 498-500, 2011.

    [38] K.-h. Kim et al., “A 20-Gb/s 256-Mb DRAM with an inductorless quadrature PLL and a cascaded pre-emphasis transmitter,” IEEE J. Solid-State Circuits, vol.41, no. 1, pp. 127-134, Jan. 2006.

    [39] H. Partovi et al., “Single-ended transceiver design techniques for 5.33Gb/s graphics applications,” in IEEE ISSCC Dig. Tech. Papers, pp. 136-137, 2009.

    [40] Y. Hidaka, “Sign-based-Zero-Forcing Adaptive Equalizer Control,” in CMOS Emerging Technologies Workshop, May 2010.

    References Chulwoo Kim 85 of 86

  • References [41] S.-J. Bae et al., “A 60nm 6Gb/s/pin GDDR5 graphics DRAM with multifaceted clocking and ISI/SSN-reduction techniques,” in IEEE ISSCC Dig. Tech. Papers, pp. 278-279, 2008.

    [42] K.-I. Oh et al., “A 5-Gb/s/pin transceiver for DDR memory interface with a crosstalk suppression scheme,” IEEE J. Solid-State Circuits, vol. 44, no. 8, pp. 2222-2232, Aug. 2009.

    [43] S. H. Wang et al., “A 500-Mb/s quadruple data rate SDRAM interface using a skew cancellation technique,”

    IEEE J. Solid-State Circuits, vol. 36, no. 4, pp. 648-657, Apr. 2001.

    [44] W. Hubert et al., “GDDR5 training-challenges and solution for ATE-based test,” in Asian Test Symposium, pp. 24-27, Nov. 2008.

    [45] JEDEC, JESD212.

    [46] K. Sohn et al., “A 1.2V 30nm 3.2Gb/s/pin 4Gb DDR4 SDRAM with dual-error detection and PVT-tolerant data-fetch scheme,” in IEEE ISSCC Dig. Tech. Papers, pp. 38-40, 2012.

    [47] C. Park et al., “A 512-mb DDR3 SDRAM prototype with CIO minimization and self-calibration techniques,” IEEE J. Solid-State Circuits, vol. 41, no. 4, pp. 831-838, Apr. 2006.

    [48] D. Lee et al., “Multi-slew-rate output driver and optimized impedance-calibration circuit for 66nm 3.0Gb/s/pin DRAM interface,” in IEEE ISSCC Dig. Tech. Papers, pp. 280-613, 2008.

    [49] J. Koo et al., “Small-area high-accuracy ODT/OCD by calibration of global on-chip for 512M GDDR5 application,” in Proc. IEEE CICC, pp. 717-720, Sep. 2009.

    [50] S.-S. Yoon et al., "A fast GDDR5 read CRC calculation circuit with read DBI operation," IEEE Asian Solid-State Circuits Conference, pp. 249-252, 2008

    [51] H.-W. Lee et al., “A 283.2μW 800Mbp/s/pin DLL-based data self-aligner for through silicon via (TSV) interface,” in IEEE ISSCC Dig. Tech. Papers, pp. 48-50, 2012.

    [52] U. Kang et al., “8Gb 3D DDR3 DRAM using through-silicon-via technology,” in IEEE ISSCC Dig. Tech. Papers, pp. 130-131, 2009.

    [53] D. Malta et al., “Integrated process for defect-free copper plating and chemical-mechanical polishing of through-silicon vias for 3D interconnects,” in ECTC, pp. 1769-1775, 2010.

    [54] A.-C. Hsieh et al., “TSV redundancy: architecture and design issues in 3-D IC,” IEEE Trans. VLSI Systems, pp. 711-722, Apr. 2012.

    References Chulwoo Kim 86 of 86