High-Bandwidth Memory Interface Design
description
Transcript of High-Bandwidth Memory Interface Design
-
5/25/2018 High-Bandwidth Memory Interface Design
1/86
High-Bandwidth Memory Interface Design
Chulwoo [email protected]
Dept. of Electrical Engineering
Korea University, Seoul, Korea
February 17, 2013
Chulwoo Kim 1 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
2/86
Outline
Introduction
Clock Generation and Distribution
Transceiver Design
TSV Interface for DRAM
Summary
References
Chulwoo Kim 2 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
3/86
Outline
Introduction DRAM 101
Simplified DRAM Architecture and Operation
Differences of DRAM (DDRx, GDDRx, LPDDRx)
Trend
Memory Interface: Differences and Issues
Clock Generation and Distribution
Transceiver Design
TSV Interface for DRAM
Summary
References
Chulwoo Kim 3 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
4/86
D D D DD D D D
CLK
DQ
SDRAM
SDRSingle Data Rate
DDR
Double Data Rate
Main MemoryDDRxPC, Notebook, Server
Graphics MemoryGDDRxGraphic Card, Console
Mobile MemoryLPDDRxPhone, Tablet PC
CLK
DQ D
CLK
DQ D D
CLK
Command C CAS* Latency
Burst Length
MCU
SDRAM
DRAM 101
SynchronousDynamic
RandomAccessMemory
Introduction
CLK &
CommandData
*CAS : Column Address Strobe
Chulwoo Kim 4 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
5/86
DRAM DDR4 Die Photo
[1] K. B. Koo et al., ISSCC 2012, pp. 40-41
Bank0
Bank1
Bank2
Bank3
Bank8
Bank9
Bank10
Bank11
Bank
4
Bank
5
Bank
6
Bank
7
Bank
12
Bank
13
Bank
14
Bank
15
Supply Voltage VDD=1.2V, VPP=2.5V
Process 38nm CMOS /3-metal
Banks 4-Bank Group, 16 Bank
Data Rate 2400 Mbps
Number of IOs X4 / X8
IntroductionChulwoo Kim 5 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
6/86
Bank
Simplified DRAM Architecture
Bank
Peripheral Circuit
Cell Array
Column Repair FuseWrite Drv. / Read Amp.
Column Decoder
RowRepairF
use
RowDecoder
WordLineDr
iver
CLK/ADD/CMD Buffer
CMDController
DLL
Gener
ator
BLSA*
BLT BLB
WL
ICLKDCLK
DQ TX
Serial toparallel
Parallelto serial
DQ RX
Bank Bank
* BLSA : Bit line sense amplifier
IntroductionChulwoo Kim 6 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
7/86
Concept of DRAM operation
Bank Bank
Bank Bank
*BLSA : Bit line senseamplifier
*Np: Number of
pre-fetch*Ndq: Number of DQ
Peripheral Circuit
GIO
Ndq bitsNdq bits
WRITE: Serial to parallel
(DQ GIO)
READ
: Parallel to serial
(GIO DQ)
DQ RX DQ TX
Serial toparallel
Parallelto serial
BLSABLSANpNdq
NpNdq bits
*GIO : Global I/O
IntroductionChulwoo Kim 7 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
8/86
tCCD*=1
RD RD
GIO GIOGIO
Pre-fetch Timing(DDR1,BL*=2)
0
[2] JEDEC, JESD79F, pp. 24-29
1 0 1
DQS
DQ
CLK
Number of GIO channel=NpNdq=28=16 (DDR1 x8)
After CL*
* tCCD : CAS to CAS delay * CL : CAS latency
* BL : Burst length
Introduction
BL*=2
Chulwoo Kim 8 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
9/86
Pre-fetch Diagram(DDR1)
Num. of GIO channel= 2Ndq
Pre-fetch operation 2-bit pre-fetch
[2Ndq] data access
(If the output data rate is 400Mbps, the internal data rate is200Mbps)
Bank Bank Bank Bank
Bank Bank Bank Bank
IntroductionChulwoo Kim 9 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
10/86
tCCD=2
RD RD
GIO GIOGIO
Pre-fetch Timing(DDR2,BL=4)
[3] JEDEC, JESD79-2F, pp. 35
0 1 2 3 0 1 2 3
DQS
DQ
CLK
Number of GIO channel=NpNdq=48=32 (DDR2 x8)
* RL : READ latency
After RL*
Introduction
BL=4
Chulwoo Kim 10 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
11/86
Pre-fetch Diagram(DDR2)
Num. of GIO channel= 4Ndq
Pre-fetch operation 4-bit pre-fetch
[4Ndq]data access
(If the output data rate is 800Mbps, the internal data rate is200Mbps, same as DDR1)
Bank Bank Bank Bank
Bank Bank Bank Bank
IntroductionChulwoo Kim 11 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
12/86
tCCD=4
RD RD
GIO GIOGIO
Pre-fetch Timing(DDR3,BL=8)
[4] JEDEC, JESD79-3F, pp. 62
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
DQS
DQ
CLK
Number of GIO channel=NpNdq=88=64(DDR3 x8)
After RL
Introduction
BL=8
Chulwoo Kim 12 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
13/86
Pre-fetch Diagram(DDR3)
Num. of GIO channel= 8Ndq
Pre-fetch operation 8-bit pre-fetch
[8Ndq]data access
(If the output data rate is 1.6Gbps, the internal data rate is200Mbps, same as DDR1)
Bank Bank Bank Bank
Bank Bank Bank Bank
IntroductionChulwoo Kim 13 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
14/86
[5] JEDEC, JESD79-4, pp. 77-78[6] T. Y. Oh et al., ISSCC 2010, pp. 434-435
Bank Grouping Timing(DDR4,BL=8)
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
DQS
DQ
tCCD_S=4 tCCD_L=5
RDG0
RDG1
RDG1
GIO_BG0
GIO_BG1 GIO_BG1
GIO_BG0
GIO_BG1
GIO_BG2
GIO_BG3
CLK
Number of GIO channel=NpNdqNgroup=884 =256(DDR4 x8)
After RL
Introduction
BL=8
Chulwoo Kim 14 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
15/86
GIOMUX
[1] K. B. Koo et al., ISSCC 2012, pp. 40-41
Pre-fetch & Bank Grouping(DDR4)
Num. of GIO channel= 8Ndq
Bank Bank Bank Bank
Bank Bank Bank Bank
Group0 Group1
Group2 Group3
Pre-fetch operation 8-bit pre-fetch
Bank grouping
IntroductionChulwoo Kim 15 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
16/86
DDRx GDDRx LPDDRx
Architecture
Application PC/Server Graphic card Mobile/Consumer
Socket DIMM On board MCP*/PoP*/SiP*
IO 4/8 16/32 16/32
UniqueFunction
Single uni-directionalWDQS, RDQS
VDDQ terminationCRC, DBIABI
No DLLDPD*
PASR*TCSR*
Differences of DDRx,GDDRx,LPDDRx
Bank
PAD
Bank
Bank Bank PAD
Bank Bank
Bank Bank
PADBank
PAD
Bank
Bank Bank
* MCP: Multi chip package* PoP : Package on package* SiP : System in package
* DPD: Deep power down* PASR : Partial array self refresh* TCSR : Temperature compensated self refresh
IntroductionChulwoo Kim 16 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
17/86
DDR Comparison
DDR1 DDR2 DDR3 DDR4
VDD [V] 2.5 1.8 1.5 1.2
Data Rate[bps/pin]
200M~400M 400M~800M 800M~2.1G 1.6G~3.2G
Pre-Fetch 2 bit 4 bit 8 bit 8 bit
STROBE Single DQS Differential DQS, DQSB
Interface SSTL_2 SSTL_18 SSTL_15 POD_12
New
Feature
OCD calibrationODT
Dynamic ODTZQ calibrationWrite leveling
CA parityDBI*, CRC*Gear down
CAL* PDA*FGREF * TCAR*Bank grouping
* DBI: Data bus inversion* CRC: Cyclic redundancy check* CAL: Command address latency
* PDA: Per DRAM addressability* FGREF: Fine granularity refresh* TCAR: Temperature controlled array refresh
IntroductionChulwoo Kim 17 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
18/86
GDDR Comparison
GDDR1 gDDR2 GDDR3 GDDR4 GDDR5
VDD [V] 2.5 1.8 1.5 1.5 1.5/1.35
Data Rate[bps/pin]
300~900M 800M~1G 700M~2.6G 2.0G~3.0G 3.6G~7.0G
Pre-Fetch 2 bit 4 bit 4 bit 8 bit 8 bit
STROBE Single DQSDifferentialBi-direction
DQS*, DQSBSingle Uni-direction WDQS, RDQS
Interface SSTL_2 SSTL_2 POD-18 POD-15 POD-15
NewFeature
OCD*calibration
ODT*
ZQ DBIParity(opt)
No DLLPLL(option)
WCK, WCKBCRC ABI*RDQS(option)Bank grouping
* DQS: DQ strobe signal, DQ is dada I/O Pin* OCD: Off chip driver
* ODT: On die termination* ABI: Address bus inversion
IntroductionChulwoo Kim 18 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
19/86
LPDDR Comparison
LPDDR1 LPDDR2 LPDDR3
VDD [V] 1.8 1.2 1.2
Data Rate[bps/pin]
200M~400M 200M~1066M 333M~1600M
Pre-Fetch 2 bit 4 bit 8 bit
STROBE DQS DQS_T, DQS_C DQS_T, DQS_C
Interface SSTL_18* HSUL_12* HSUL_12*
DLL X X X
NewFeature
CA pin ODT
(High tapped termination)
* SSTL: Stub series terminated logic* HSUL: High speed un-terminated logic
IntroductionChulwoo Kim 19 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
20/86
Trend
2.5
1.5
1.8
0.2 0.4 0.8 1.2 1.6 2.0
1.2
2.4
DDR1
GDDR1
7.0
Although all types of DRAMs arereaching their limits in supply voltage,the demand of high-bandwidthmemory is keep increasing
DDR2GDDR3
DDR4
LPDDR2
LPDDR3
2.8 3.2 3.6
VDD
[V]
Data Rate [Gbps]
LPDDR1
DDR3
gDDR2
GDDR4 GDDR5
IntroductionChulwoo Kim 20 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
21/86
Memory Interface
System Feature Single-ended/high speed
Many channel(weak for coupling effect)
DDR: multi-drop(multi rank, multi DIMM)
GDDR: point to point
Impedance discontinuities(stubs, connector, via, etc. )
Issue Reflection
Inter-symbol interference
Simultaneous switching output
noise Pin to pin skew
Poor transistor performance
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
CPU
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
GPUDRAMDRAM
DRAM DRAM
DRAM
DRAM
IntroductionChulwoo Kim 21 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
22/86
Outline
Introduction Clock Generation and Distribution
Delay-locked loop (DLL)
Duty cycle corrector (DCC)
Clock distribution
Transceiver Design
TSV
Conclusions
References
Chulwoo Kim 22 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
23/86
Basic DLL Architecture
Variable
Delay LineReplicaDelay
ControllerPD
DRAMExternal
Clock
Data
tD1 tDREPtDVDL
I_CLK
FB_CLK
O_CLK
I_CLKFB_CLK
O_CLK
Clock
Data
tD2
DATA frommemory core
Clock Generation and Distribution
tD1
tD2
tDREP
tCK N = tDVDL +tDREP
tDREP tD1 +tD2
tCK N = tDVDL +tD1 +tD2 +
= tDREP (tD1+tD2)
tDVDL
Chulwoo Kim 23 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
24/86
Replica Delay Mismatch
Valid
Data
Window
tCK
tDQSCK* (or tAC)
Long
Short
V
DD
HVDD
LVDD
tDQSCK (or tAC) tDQSCK (or tAC)
V
DD
HVDD
LVDD
Valid
Data
Window
Valid
Data
Window
variation [ps]
Supply Voltage [V]
*tDQSCK (or tAC) DQS output access time for CK/CKb
Clock Generation and Distribution
>
Chulwoo Kim 24 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
25/86
Locking Range Considerations
[7] H.-W. Lee et al., submitted to TVLSI
tCK
tDQSCK (or tAC)
Birds beak
I_CLK
I_CLK
FB_CLK
FB_CLK
tDINIT+tDREPtDREQUIRED
Clock Generation and Distribution
tDINIT+tDREP tDREQUIRED
tDINIT= tDVDL(0)+ tDREP
Chulwoo Kim 25 of 86
Short
Lon
g
NtCK > tDVDL(0)+ tDREP
tCK = tDVDL+ tDREP+ t
-
5/25/2018 High-Bandwidth Memory Interface Design
26/86
Delay Measure Delay Line
Replicate Delay Line
Clock
OUT
tD1
tD2
tD1+tD
2tD
3
Synchronous Mirror Delay (SMD)
Basic Operation
Measure and replicate the delay
No feedback
Match delay in two cycles
tD1
tD1+tD2
tD3 tD3 tD2
OUT
I_CLK
Clock
ReplicateMeasure
Replica
Delay
[8] T. Saeki et al., ISSCC 1996, pp. 374-375
Clock Generation and Distribution
I_CLK
Chulwoo Kim 26 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
27/86
Disadvantages of SMD
Disadvantages Mismatch between replica delay and input buffer & clock
distribution
Coarse resolution
Input jitter multiplication
Delay Measure Delay Line
Replicate Delay Line
lock
OUT
tD1
tD2
tD1+tD2 tD3 Clock
Clock
w/o jitter
w/ jitter
tD1
tD1+tD2
tCK-(tD1+tD2) tD2
OUT
tCK-(tD1+tD2)+2
- +
OUTInput pk-pk
jitter() Output pk-pk
jitter(2)
tCK-(tD1+tD2)+2
tCK
tD1
tD1+tD2
tD2
+2
Clock Generation and Distribution
I_CLK
Chulwoo Kim 27 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
28/86
Register Controlled DLL
Locking information is stored digitally in register
Vernier type delay line increases resolution
[9] A. Hatakeyama et al., ISSCC 1997, pp. 72-73
tD+ tD+ tD+ tD+
tD tD tD tD tD
SW0 SW1 SW2 SW3 SW4
IN
OUT
tD+
tD
fan-out=2
fan-out=1
SW(n-1) SW(n)
Sub Delay Line
Main Delay Line
Sub Delay Line
Main Delay Line
Clock Generation and DistributionChulwoo Kim 28 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
29/86
SingleRegisterControlledDelayLine
Clock Generation and Distribution
Fine Delay
Controller
I_CLKCSL1 CSL2 CSL3
IN1
IN2
OUT12PhaseMixer
1-K
K
IN1
IN2
OUT12
OUT1
OUT2
OUT12
OUT1
IN2
IN1
OUT2
tUD
tUD
Coarse Delay
UP/DN*
from PD
*DN=Down
Chulwoo Kim 29 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
30/86
Boundary Switching Problem
IN1(1-K)+IN2K
I_CLK
Shift left
Passing through 4 UDCs
IN1
IN2
OUT12PhaseMixer
UDC*
Passing through 3 UDCs
Clock Generation and Distribution
tUD
IN1K=0
IN2K=1
tUD
IN1K=0
IN2K=1
K=0.9
K=0.9
Coarse shift & finereset do not occursimultaneously
Chulwoo Kim 30 of 86
*UDC=Unit delay cell
-
5/25/2018 High-Bandwidth Memory Interface Design
31/86
Seamless Boundary Switching
Clock
Shiftleft
Unit Delay CellIN1(1-K)+IN2K
Dual Coarse Delay Line
tUD
K(0K1)
IN1K=0
IN2K=1
IN1
IN2
PhaseMixer
OUT12
Clock Generation and Distribution
K=0.9
[10] J.-T. Kwak et al., VLSI 2003, pp. 283-284
tUD
IN2K=1
IN1K=0
K=1.0
Fine set first
and thencoarse shift
Chulwoo Kim 31 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
32/86
Adaptive Bandwidth DLL w/ SDVS*
Variable
Delay Line
Replica
Delay
ControllerPD
I_CLK
FB_CLK
Update PeriodPulse Gen.
O_CLK To Upper BlockNCODE
I_CLK
UpdatePulse
FB_CLK
Update PeriodmtCK-tDREP+tDREP=mtCKm=2,BWDLL=1/(2tCK)
[11] H.-W. Lee et al., ISSCC 2011, pp. 502-504
Clock Generation and Distribution
6
8
10
12
14
16
18
DN BASE UP
15.9 ps
10.2 ps
7.8 ps6
10
14
18
Low-SpeedMode
High-SpeedMode
Base
[ps]
Fine Unit Delay vs. Mode
Update Pulse
*SDVS: Self-dynamic voltage scaling
Chulwoo Kim 32 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
33/86
Duty Cycle Corrector (DCC)
DCC Reduces duty cycle error
Enlarges valid data window for DDR
Needs to correct 15% duty error at max speed
Can be implemented either in analog or digital type
DCC Design Issues
Location of DCC (before/after DLL)
Embedded in DLL or not
Power consumption
Area Operating frequency range
Locking time in case of digital DCC
Offset of duty cycle detector
Clock Generation and DistributionChulwoo Kim 33 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
34/86
Digital DCC
Invert-DelayClock
Generator
IN
OutPhaseMixer
Pulse Width
Controller
Duty CycleDetector
Half-CycleDelayedClock
Generator
Edge
Combiner
Out
Out
Invert and delay
50% 50%
50% 50%
OUT
IN
IN
OUT
IN
OUT
HD_IN
IN
IN
IN
HD_IN
IN
50% 50%
Clock Generation and DistributionChulwoo Kim 34 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
35/86
DCC in GDDR5
Clock Generation and Distribution
RX
Divider
CML2
CMOS
DQPLL sel.
CML only
Duty Cycle
Detector
Adder-based
Counter
Duty Cycle
CorrectorControl Pulse
Generator
4-phase
4
PLL
Globa
l
Driver
Repeat
er
DutyCycle
Adjuster
up/dns
c
4
rxclk rxclkb
sw hclk & lclk
4 44DQ
Clk Distribution
clock
Network
Decreasing
CML_bias
WCK WCKb
X1X2X4X8 X1 X2 X4 X8
c
Duty-Cycle
RX
rxclk
rxclk
rxclkb
Decoder
rxclkb
Adjuster
duty-cycle
(DCA) DCA is not in clock path
No jitter addition
[12] D. Shin et al., VLSI 2009, pp. 138-139
Chulwoo Kim 35 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
36/86
DLL-related Parameters & Reference
DDR1 DDR2VDD
Lock time
Max. tDQSCK
200 cycles 200 cycles
333MHz~800MHz
600MHz~1.37GHz
2~20K cycles
2.5V
600ps
166MHz
1tCK
1.8V 1.5V/1.35V 1.8V 1.5V
Nominalspeed
tXPDLL*(tXARD)
Max. tCK 12ns 8ns 3.3n 3.3n 2.5ns
300ps 225ps 180ps 140ps
333MHz 1.6GHz
512 cycles 2~5K cycles
DDR3/DDR3L GDDR3 GDDR4
2tCK 10tCK 7tCK+tIS 9tCK+tIS
RELATED AREA
DCC block
Variable
Delay LineDelay
Control Logic
Replica
Low Jitter
REFERENCE Type
23**141819**2022 2425*26
23* 2613 15**16182021**
3132*33**
27[28]** [29] [30]
2930** 34*35*
3227[28** 30**
14 [36*15**16 32*24262717**19**
14 25* 28**
tXPDLL*(tXARD) Timing for exit precharge power-down to any non-READ command
Clock Generation and Distribution
digital
*mixed
**analog
131415**1617** 19**2021**18
Chulwoo Kim 36 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
37/86
Clock Distribution
DQDQ DQDQ DQ DQDQ
DQDQ DQDQ DQDQ DQDQ
GlobalClockBuffer
CK/CKB DQ
Clock Distribution Issues
Clock skew among DQs
Low power
Robust under PVT variations CML to CMOS converter jitter
[37] S.-J. Bae, et al., ISSCC, 2011, pp. 498-500
1,20
0m
93,750m
Clock Generation and DistributionChulwoo Kim 37 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
38/86
CML to CMOS Converter
Global Clock Buffer
Current logic mode : high-speed clock
CML to CMOS Converter Issue
Susceptible to noise
Jitter
CLKP CLKN
OUTN
OUTP
Global Clock Buffer CML to CMOS Converter
1700mDQ
CLKP CLKN
CLKOUT
Clock Generation and DistributionChulwoo Kim 38 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
39/86
Outline
Introduction
Clock Generation and Distribution
Transceiver Design Channel
Pre-emphasis
Equalizer
Crosstalk and skew
Training
Input buffer
Output driver
DBI/CRC
TSV Interface for DRAM Summary
References
Outputdriver
Training
Pre-emphasis
DBI/CRC
Inputbuffer
Training
Equalizer
DBI/CRC
CH
Chulwoo Kim 39 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
40/86
Channel Characteristics
GDDRx
Point to point connection
Performance target High data rate
Few reflection components
PCB VIAS
DDRx
Multidrop
Performance and power
Many reflection components
PCB VIAS, DIMM connector.
GPU
GDDRx
GDDRx
DIMMS
lot
CPUSocket
Transceiver DesignChulwoo Kim 40 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
41/86
Emphasis for Channel Compensation
Time
Channel
Original Signal Distorted Signal
D(in) FFE D(out)
FFE
Amplitude
Amplitude
Amplitude
Channel FFEChannel
Freq.fdata/2 Freq. Freq.fdata/2 fdata/2
Amplitu
de
Time
Amplitu
de
Channel
Transceiver DesignChulwoo Kim 41 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
42/86
Pre-emphasis vs. De-emphasis
Pre-emphasis : Transition Bit Boosting
De-emphasis : Non-transition Bit Suppression
1-tap pre-emphasis
No emphasis
1-tap de-emphasis
Time
Transceiver DesignChulwoo Kim 42 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
43/86
Basic De-emphasis Circuit
The Number of Taps
Depends on the channel quality and bit rate
Usually from one to three taps
D Q
QB
Din
DoutK0
Unitdelay
-K1
X(n)
Y(n)
Transceiver DesignChulwoo Kim 43 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
44/86
Pre-emphasis Circuit[1/2]
Cascaded Pre-emphasis
Internal node ISI due to limited TR performance at high speed Internal node pre-emphasis ratio would not be affected by the
channel
Less sensitive to the system environment or channel variations
[38] K.-H. Kim et al., JSSC, Jan 2006, pp. 127-134
Din(n-1)
Din(n-2)
Driver
Pre-emph.
DQ
DQB
Din(n)
4:2
4:2
4:2
2:1
2:1
2:1
2:1
NoPre-emphasis
Conventional
Pre-emphasis
Proposed
Pre-emphasis
4000Time[psec]
1.04
1.20
1.08
1.201.00
1.20
Voltage[V]
Transceiver DesignChulwoo Kim 44 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
45/86
Pre-emphasis Circuit[2/2]
[39] H. Partovi et al., ISSCC, 2009, pp.136-137
Voltage Mode Driver Pre-emphasis Additional zero by Cc
Time continuous pre-emphasis
Pre-Driv
er
MainDriver
Pre-Driver
RT
RTDin
RC
CCRC
CP
Dout
TX
Pre-Emph. Driver
Boosting Capacitor
CL
RT
GPU
BW
BW
CH Din
RC
RT
CC
Dout
CL
Equivalent Linear Model
CP RT
Transceiver DesignChulwoo Kim 45 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
46/86
DFE cancels ISI without noise amplificationClock must be provided by DLL or PLL
Critical path (feedback path) is important
(A) (B) (C) (D)
Decision Feedback Equalization (DFE)
Time
Amplitude
1UI
Time
Amplitud
e
ISI
Time
Amplitude
Emulated
ISI
Time
Amplitud
e
No ISI
Transceiver Design
[40] Y. Hidaka, CMOS Emerging Technologies Workshop, May 2010
Chulwoo Kim 46 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
47/86
[41] S.-J. Bae et al., ISSCC, 2008, pp. 278-279
The previously captured data
must be fed back to thereceiver within 1UI
WCK/2_0
DQ Vref
WCK/2_0
P0b P0
WCK/2_0
P270b P270
WCK/2_0
DFE SADQ
DFE SA
Vref
WCK/2_0
WCK/2_90
DFE SA
DFE SA
WCK/2_180
WCK/2_270
SR Latch
SR Latch
SR Latch
SR LatchP270
P180
P90
P0 D0
D270
D180
D90
DQ
WCK/2_270
P270
WCK/2_0
P0
Precharge Evaluation
Precharge Evaluation
D270 D0 D90
TFB=TSA
-
5/25/2018 High-Bandwidth Memory Interface Design
48/86
Crosstalk is coupling of energy from one line to another
Crosstalk
Timing Effect
Timing Jitter
Signal Integrity
Near endcrosstalk
Far endcrosstalk
Input signal
Input signalat far end
Near Far
Cm
Near Far
Lm
ICm ILm
Inear=ICm+ILmIfar
=ICmI
Lm
Transceiver DesignChulwoo Kim 48 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
49/86
Staggered Memory Bus
No discrepancy of propagation delay due to the crosstalk
Difference of transition point is /2
Distance between channels with the same transition isincreased
Jitter due to coupling from the adjacent channel is reduced
[42] K.-I. Oh et al., JSSC, Aug. 2009, pp. 2222-2232
MCU DRAM
Staggered
Memory Bus
Channel
Channel
Transceiver DesignChulwoo Kim 49 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
50/86
Compensation for glitch by adding or subtracting current
Rise : ICOMPis added to the main driver
Fall : ICOMPis subtracted from the main driver
Glitch Canceller
[42] K.-I. Oh et al., JSSC, Aug. 2009, pp. 2222-2232
Transceiver Design
TX1
TransitionDetector
DTX3
TX3
TX2
IBIAS+ICOMP
DTX1DTX2
Rise/Fall
Aggressor
Victim
DTX1
Rise
Fall
DTX2
Chulwoo Kim 50 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
51/86
Crosstalk equalization at transmitter
Cancel the crosstalk by the impedance calibration
Crosstalk Equalizer (TX)
[37] S.-J. Bae et al., ISSCC, Feb. 2011, pp. 498-500
DO[0]
DO[1:3]
DQ[0]
EN[0:5] DO[0]
t
DO[1]
DQ[0]
Crosstalk Equalizing Driver
EN[1]
EN[0] EN[1]
EN[0]
Transceiver DesignChulwoo Kim 51 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
52/86
Skew
Differences of flight time between signals
Skew can cause timing errors
Key design criterion in high-speed systems
Transceiver Design
MCU/GPU DRAM
Bank
Bank
PeripheralCircuit
DLL
CMD
Controller
Serial
.
Parallel
Generator
TD
TD
CLK
Command
DQS
DQ
AddressTD
Chulwoo Kim 52 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
53/86
Pre/De-skew with Preamble Signal
Skew cancellation circuit is put in each DRAM
With estimated skew information
De-skew the data during write mode
Pre-skew the data during read mode[43] S. H. Wang et al., JSSC, Apr. 2001, pp. 648-657
DataDelayLinesPLLMux
RegisterFiles
SkewEstimator
Skewed Data
Data
Ext.Clk
Data[n] Skew
De-skewedData
Sampling
Clk
8
8
3
8
38
Transceiver DesignChulwoo Kim 53 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
54/86
Fly-by Topology for DDR3
[4] JEDEC, JESD79-3E, pp. 56-59
Fly-by Topology
Better signal integrity to reducethe number of stubs and stublength
Easy to apply a singletermination at the end of signal
DQ and DQS are applied to each
DRAM at the same time Large skew bw. CLK and DQS
Need to calibrate skew
DRAM
#1
DRAM
#2
DRAM
#7
DRAM
#8
T-branch
CLK, CMD, Address
DRAM
#1
DRAM
#2
DRAM
#7
DRAM
#8
CLK, CMD, Address
Skew[s]
DRAM#1
DRAM#2
Skew[s]
DRAM#3
DRAM#4
DRAM#5
DRAM#6
DRAM#7
DRAM#8
DRAM
#1
DRAM
#2
DRAM
#3
DRAM
#4
DRAM
#5
DRAM
#6
DRAM
#7
DRAM
#8
DQ & DQS
Fly-by
DQ & DQS
VTT
T-branch Topology
CLK/CMD/Address are applied toeach DRAM in parallel
Small skew bw. CLK and DQS
Transceiver DesignChulwoo Kim 54 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
55/86
Write Leveling for DDR3
Write Leveling Timing mismatch compensation between CLK and DQS
Write leveling is applied to all DRAMs, respectively
[4] JEDEC, JESD79-3F, pp. 56-59
T0 T1 T2 T3 T4 T5 T6 T7
T0 T1 T2 T3 T4 T5 T6Tn
CK#
CK
diff_DQS
CK#
CK
diff_DQS
DQ
DQ
diff_DQS
Source
Destination
Push DQS to capture0-1 transition
0 or 1
0 or 1
0 0 0
1 1 1
Transceiver DesignChulwoo Kim 55 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
56/86
Training for GDDR5
Adaptive Interface Training Ensure the Widest Timing Margins for All Signals
Controlled by MCU
[44] W. Hubert et al., ATS, 2008, pp. 24-27
CK
CMD
ADDR
WCK
DQ
GDDR5 Timing after Training
Transceiver DesignChulwoo Kim 56 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
57/86
Training Sequence for GDDR5
Optional
Optimize address input data eye
Clock alignment
Ready for read/write
Search for best read data eye
Detect burst boundaries of read stream
Search for best write data eye
Detect burst boundaries of write stream
[45] JEDEC, JESD212, pp. 23-39
Detect the configuration and mirror function
ODT setting
Transceiver Design
Power Up
Address Training
WCK CKAlignment Training
READ Training
WRITE Training
ExitChulwoo Kim 57 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
58/86
Training Example : Write Training
[44] W. Hubert et al., ATS, 2008, pp. 24-27
t0+ t1
Memory Controller GDDR5 Device
Write Data eyes
t1 t2
Memory Controller GDDR5 Device
WriteData eyes Data eyes
t1t2
t0
t0
t0
t0
Data eyes
t0- t2
Transceiver DesignChulwoo Kim 58 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
59/86
Input Buffer
Convert attenuated external signal to rail-to-rail signal
Trade-off between high speed operation and power consumption
Transceiver Design
DRAMMCU/GPU
DQS Bank
Bank
CLK
Command
DQ
P
eripheralCir
cuit
DLL
CMD
C
ontroller
Serial
.
Parallel
GEN
4
n
Address
m*
* m: The number of address channels which are depend on kinds of memory or its density
Chulwoo Kim 59 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
60/86
Input Buffer Comparison
CMOS Type
Simple circuit
Low-speed input (CKE)
Susceptible to noise
Unstable threshold
Differential Type
Complex circuit
High-speed input
Robust to noise
Stable threshold
Commonly used
In OUT
En
En
OUT
En En
InVref
En
Transceiver DesignChulwoo Kim 60 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
61/86
DDR4 Input Buffer
[46] K. Sohn et al., ISSCC, 2012, pp. 38-40
Gain Enhanced Buffer Signal transition detector is added The bias level (I) is controlled
Sensitivity can be enhancedat higher frequencies
Wide Common-Mode Range DQ Buffer
Delivers stable inputs tothe second stage Amp.
Feedback network reduces theoutput common-mode variation
Vref In
CMFB
Amp.
In
Vref
InBuffer
Transition
DetectorI
* CMFB : Common-mode feedback
Transceiver DesignChulwoo Kim 61 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
62/86
Pseudo Open Drain (POD)
Impedance Calibration
Manual vs. Automatic
External Resistor
240
Din
Din
Pull-UP
Pull-DOWN
Din
Din
I/O
BufferChannel
240
Transceiver DesignChulwoo Kim 62 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
63/86
Impedance Calibration
Thermometer Code Control
PU PUREG
PD
REG
DRAMExternal
PUcon
PDcon
Vref
En
En
ZQPAD
Dout
n
n
WP
R
WN
R
WP
R
WN
R
WP
R
WN
R
Din
+PUcon
Din+
PDcon
[47] C. Park et al., JSSC, Apr. 2006, pp. 831-838
Transceiver DesignChulwoo Kim 63 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
64/86
Multi Slew-rate Output Driver
Binary-weighted Code Control
PU PUDF
PD
DF
DRAMExternal
PUcon
PDcon
Vref
En
En
DF = Digital LPF + UP/DOWN Counter
ZQPAD
Dout
WP/4 WP/2 WP 32WP
128R 64R 32R R
WN/4 WN/2 WN 32WN
128R 64R 32R R
60
120240
n
n
Din
+PUcon
Din+
PDcon
[48] D. U. Lee et al., ISSCC, 2008, pp. 280-613
Transceiver DesignChulwoo Kim 64 of 86
-
5/25/2018 High-Bandwidth Memory Interface Design
65/86
Global ZQ Calibration
Global Impedance Mismatch Error < 1%
PVT variation sensor
LS
PA
CP
LO
Ref.
ZZcal
i0cal
(-)
i0cal
ODT
calibrati
on
block
atZQ
p
in
Zcal
DQ0ZQ
LS
PA
CP
LO
Ref.
CP: ComparatorPA: Pre-amplifierLS: Local PVT sensor
LO: Local controller
i0cal
DQn (n=1~31) Z
Global Reference Signal
[49] J. Koo et al., CICC, 2009, pp. 717-720
Transceiver DesignChulwoo Kim 65 of 86
i ( )
-
5/25/2018 High-Bandwidth Memory Interface Design
66/86
Data Bus Inversion (DBI)
Power reduction technique independent of data pattern
Dominant power (I/O Buffer)
P= X CPCB X VDD2 < 0.5 For high-BW memory, inversion time +CRC can be a bottle
neck
[50] S.-S. Yoon et al., ASSCC 2008, pp.249-252
Transceiver DesignChulwoo Kim 66 of 86
C li R d d Ch k (CRC)
-
5/25/2018 High-Bandwidth Memory Interface Design
67/86
Cyclic Redundancy Check (CRC)
Data error check for every unit interval (64 bits data only) Redundancy bit : 1 bit/byte
Speed bottleneck for high-BW Time (READ DBI + READ CRC + CRC calculator) < 9 periods
[50] S.-S. Yoon et al., ASSCC 2008, pp.249-252
Transceiver Design
Error type Detection rate
random single bit 100%
random double bit 100%
random odd count 100%
burst 8 100%
Chulwoo Kim 67 of 86
CRC ( td)
-
5/25/2018 High-Bandwidth Memory Interface Design
68/86
CRC (contd)
X8+X2+X1+1 with an initial value of 0 Algorithm for GDDR5 ATM-0M83
Logic for algorithm takes a long time
To increase CRC speed XOR logic optimization
CRC calculation time < TCRC
Transceiver DesignChulwoo Kim 68 of 86
O tli
-
5/25/2018 High-Bandwidth Memory Interface Design
69/86
Outline
Introduction
Clock Generation and Distribution
Transceiver Design
TSV Interface for DRAM
Bandwidth requirement
DRAM with TSV
TSV DRAM type
DRAM stacking type
Data confliction issue & solution
Failed TSV issue & solution
Summary
References
Chulwoo Kim 69 of 86
B d idth R i t
-
5/25/2018 High-Bandwidth Memory Interface Design
70/86
Bandwidth Requirements
Requirement
Next GDDR will require over 10Gb/s/pin data rate
Restrictions Very difficult over 10Gb/s/pin
Cost for performance improvements
Power consumption
2000 2005 2010 2010
2
4
6
8
10
12
DDRDDR2DDR3DDR4GDDR3GDDR4GDDR5
DataR
ate/Pin
[Gbps]
DDRx / GDDRx Data Rate/Pin Trend
Gb/s/pinGb/s/chipGDDR1 32 1
GDDR3 51.2 1.6
GDDR4 102.4 3.2
GDDR5 224 7GDDR? 448 (?) 14 (?)
TSV Interface for DRAMChulwoo Kim 70 of 86
DRAM ith TSV
-
5/25/2018 High-Bandwidth Memory Interface Design
71/86
DRAM with TSV
Advantages of DRAM with TSV
Higher density per area
Shorter interconnection : lower power, faster flight time
Higher bandwidth with wide I/O
Wide I/O easily achieves 448 Gb/s/chip at next GDDR
(Example : 800 Mb/s/pin 512 I/O 448 Gb/s/chip)
MCU/GPU
Wide I/OMemory
TSV
MCU/GPU
Memory
Memory
Memory
Memory Interposer
TSV Interface for DRAMChulwoo Kim 71 of 86
TSV DRAM T
-
5/25/2018 High-Bandwidth Memory Interface Design
72/86
TSV DRAM Type
Type Main Memory Mobile Graphics
Architecture
No. of TSV 500~1000 EA 1000~1500 EA 2000~3000 EA
Feature Low power High speed
Low power Multi channel Wide I/O
Max bandwidth Multi channel
Package
GPU
Controller Interposer
TSV Interface for DRAMChulwoo Kim 72 of 86
St ki T
-
5/25/2018 High-Bandwidth Memory Interface Design
73/86
Stacking Type
Type Homogeneous Heterogeneous
Architecture
Feature Same chips Low cost
Slave : only cells Master : with peripheral
Slave
Slave
SlaveMaster
TSV Interface for DRAMChulwoo Kim 73 of 86
D t C fli ti I
-
5/25/2018 High-Bandwidth Memory Interface Design
74/86
Data Confliction Issue
PVT variations cause the data skew Data Confliction increases the short current
DQ DQ DQ DQ DQ DQ
DQ DQ DQ DQ
Data Confliction
Slowest Chip Fastest Chip
PVT Variations
[51] H.-W. Lee et al., ISSCC, 2012, pp. 48-50
TSV Interface for DRAM
DQ of
CHIP 0
MN0
MP0
EN0
/EN0
MN3
MP3
EN3
/EN3
DQ of
CHIP 3
HIGH
LOW
DQ
Pin
TSV
Chulwoo Kim 74 of 86
Separate Data B s per Gro p
-
5/25/2018 High-Bandwidth Memory Interface Design
75/86
Rank 0
Group A
Bank Bank
Bank Bank
Group B
TSV array TSV array
Bank Bank
Bank BankRank 1
Group A
Bank Bank
Bank Bank
Group B
TSV array TSV array
Bank Bank
Bank BankRank 2
Group A
Bank Bank
Bank Bank
Group B
TSV array TSV array
Bank Bank
Bank Bank
Separate Data Bus per Group
Separate Data Bus per Bank Group Less dependent on the PVT variation
Rank 3
Group A
Bank Bank
Bank Bank
Group B
TSV array TSV array
Bank Bank
Bank Bank
[52] U. Kang et al., ISSCC, 2009, pp. 130-131
TSV Interface for DRAMChulwoo Kim 75 of 86
DLL Based Self Aligner
-
5/25/2018 High-Bandwidth Memory Interface Design
76/86
DLL-Based Self-Aligner
Data alignment to external clock or clock of the slowestchip
[51] H.-W. Lee et al., ISSCC, 2012, pp. 48-50
TSV Interface for DRAMChulwoo Kim
SkewDetector
SkewCompensator
FineAligner
Replica
UP/DN
TSV
Model
READ
READb
REAL PATH0
1
0
1
CK
TRCLK
RFBCLK
C_CLK
CLKOUT
CHIP 1
CHIP 2
CHIP 3
CHIP 0
MODE
TFBCLK
PINDQS or
Dummy PinTSV model
PipelatchesPipe
latchesLatches
Datas AlignedDatas
SAMMODE
PD1
PD2
76 of 86
Failed TSV Issue
-
5/25/2018 High-Bandwidth Memory Interface Design
77/86
Failed TSV Issue
a. TSV plating defect b. pinch-off
Decreasing the assembly yield
Increasing the total cost
Failed TSV
[53] D. Malta et al., ECTC, 2010, pp. 1779-1775
TSV Interface for DRAMChulwoo Kim 77 of 86
TSV Check
-
5/25/2018 High-Bandwidth Memory Interface Design
78/86
TSV Check
A TSV connectivity check by using the internal circuit
Test Signal Generating Circuits
Scan Chain Based Testing Circuits
T
SV_
0
T
SV_
1
T
SV_
2
T
SV_
3
T
SV_
4
In_0 In_1 In_2 In_3 In_4
Out_0 Out_1 Out_2 Out_3 Out_4
Receiver End
Sender End
[54] A.-C. Hsieh et al., TVLSI, Apr. 2012, pp. 711-722
TSV Interface for DRAMChulwoo Kim 78 of 86
TSV Repair
-
5/25/2018 High-Bandwidth Memory Interface Design
79/86
Redundant TSVs for Failed TSV
Conventional : redundant TSVs are dedicated and fixed Proposed : failed TSV is repaired with a neighboring TSV
TSV Repair
Chip1
Conventional
Chip2
A
B
C
D
A
B
C
D
a
b
r2
r1
c
d
Chip1
Proposed
Chip2
B
C
D
A
B
C
D
a
b
c
d
e
f
A
[52] U. Kang et al., ISSCC, 2009, pp. 130-131
TSV Interface for DRAMChulwoo Kim 79 of 86
Outline
-
5/25/2018 High-Bandwidth Memory Interface Design
80/86
Outline
Introduction
Clock Generation and Distribution
Transceiver Design
TSV Interface for DRAM
Summary
References
Chulwoo Kim 80 of 86
Summary
-
5/25/2018 High-Bandwidth Memory Interface Design
81/86
Summary
Although all types of DRAMs are reaching their limits in
supply voltage, the demand of high-bandwidth memoryis keep increasing
For synchronization of external clock and output ofDRAM, low power, small area, and low skew are
important design parameters
To achieve high-BW memory, many design techniqueshave been and will be adopted from other high-speedwireline transceivers
TSV interface for DRAM might be a good solution toachieve high bandwidth and low power
SummaryChulwoo Kim 81 of 86
Suggested Papers to See
-
5/25/2018 High-Bandwidth Memory Interface Design
82/86
Suggested Papers to See
17.1 A 6.4Gb/s near-ground single-ended transceiver
for dual-rank DIMM memory interface systems
17.2 A 27% reduction in transceiver power for single-ended point-to-point DRAM interface with thetermination resistance of 4Z0at both TX and RX
17.3 A 5.7mW/Gb/s 24-to-2401.6Gb/s thin-oxideDDR transmitter with 1.9-to-7.6V/ns clock-featheringslew-rate control in 22nm CMOS
17.4 An adaptive-bandwidth PLL for avoiding noiseinterference and DFE-less fast precharge sampling forover 10Gb/s/pin graphics DRAM interface
Chulwoo Kim 82 of 86
References
-
5/25/2018 High-Bandwidth Memory Interface Design
83/86
References[1] K. Koo et al., A 1.2V 38nm 2.4Gb/s/pin 2Gb DDR4 SDRAM with bank group and 4 half-page architecture,in IEEE ISSCC Dig. Tech. Papers, pp. 4041, 2012.
[2] JEDEC, JESD79F.
[3] JEDEC, JESD79-2F.
[4] JEDEC, JESD79-3F.
[5] JEDEC, JESD79-4.
[6] T.-Y. Oh et al., A 7Gb/s/pin GDDR5 SDRAM with 2.5ns bank-to-bank active time and no bank-grouprestriction, in IEEE ISSCC Dig. Tech. Papers, pp. 434435, 2010.
[7] H.-W. Lee et al., Survey and analysis of delay-locked loops used in DRAM interfaces, submitted to IEEETrans. VLSI Syst.
[8] T. Saeki et al., A 2.5 ns clock access 250 MHz 256 Mb SDRAM with a synchronous mirror delay, in IEEE
ISSCC Dig. Tech. Papers, pp. 374-375, 1996.[9] A. Hatakeyama et al., A 256 Mb SDRAM using a register-controlled digital DLL, in IEEE ISSCC Dig. Tech.Papers, pp. 72-73, 1997.
[10] J.-T. Kwak et al., A low cost high performance register-controlled digital DLL for 1Gbps x32 DDR SDRAM,in IEEESymp. VLSI CircuitsDig. Tech. Papers, pp. 283-284, 2003.
[11] H.-W. Lee et al., A 1.6V 1.4Gb/s/pin consumer DRAM with self-dynamic voltage-scaling technique in 44nmCMOS technology, in IEEE ISSCC Dig. Tech. Papers, pp. 502-504, 2011.
[12] D. Shin et al., Wide-range fast-lock duty-cycle corrector with offset-tolerant duty-cycle detection schemefor 54nm 7Gb/s GDDR5 DRAM interface, in IEEESymp. VLSI CircuitsDig. Tech. Papers, pp. 138-139, 2009.
[13] W.-J. Yun et al., A 3.57 Gb/s/pin low jitter all-digital DLL with dual DCC circuit for GDDR3 DRAM in 54-nmCMOS technology, IEEE Trans. VLSI Sys., vol. 19, no. 9, pp. 1718-1722, Nov. 2011.
[14] H.W. Lee et al.,A 7.7mW/1.0ns/1.35V delay locked loop with racing mode and OA-DCC for DRAMinterface, in Proc. of Int. Symp. Circuits and Syst., pp. 3861-3864, 2010.
[15] B.-G. Kim et al., A DLL with jitter reduction techniques and quadrature phase generation for DRAMinterfaces, IEEE J. Solid-State Circuits, vol. 44, no. 5, pp. 1522-1530, May 2009.
ReferencesChulwoo Kim 83 of 86
References
-
5/25/2018 High-Bandwidth Memory Interface Design
84/86
References[16] W.J. Yunet al., A 0.1-to-1.5GHz 4.2mW all-digital DLL with dual duty-cycle correction circuit and updategear circuit for DRAM in 66nm CMOS Technology, inIEEE ISSCC Dig. Tech. Papers, pp. 282-283, 2008.
[17] S. Kimet al., A low jitter, fast recoverable, fully analog DLL using tracking ADC for high speed and low
stand-by power DDR I/O interface in IEEESymp. VLSI CircuitsDig. Tech. Papers, pp. 285-286, 2003.
[18] T. Matanoet al., A 1-Gb/s/pin 512-Mb DDRII SDRAM using a digital DLL and a slew-rate-controlled outputbuffer, IEEE J. Solid-State Circuits, vol. 38, no. 5, pp. 762-768, May 2003.
[19] K.-H. Kimet al., Built-in duty cycle corrector using coded phase blending scheme for DDR/DDR2synchronous DRAM application in IEEESymp. VLSI CircuitsDig. Tech. Papers, pp. 287-288, 2003.
[20] J.-T. Kwaket al., A low cost high performance register-controlled digital DLL for 1 Gbps x32 DDR SDRAMin IEEESymp. VLSI CircuitsDig. Tech. Papers, pp. 283-284,2003.
[21] O. Okudaet al., A 66-400 MHz, adaptive-lock-mode DLL circuit with duty-cycle error correction [for
SDRAMs] in IEEESymp. VLSI CircuitsDig. Tech. Papers, pp. 37-38, 2001.[22] F. Lin et al.,A wide-range mixed-mode DLL for a combination 512 Mb 2.0 Gb/s/pin GDDR3 and 2.5Gb/s/pin GDDR4 SDRAM, IEEE J. Solid-State Circuits, vol. 43, no. 3, pp. 631-641, Mar. 2008.
[23] K.-W. Kim et al., A 1.5-V 3.2 Gb/s/pin Graphic DDR4 SDRAM With dual-clock system, four-phase inputstrobing, and low-jitter fully analog DLL, IEEE J. Solid-State Circuits, vol. 42, no. 11, pp. 2369-2377, Nov. 2007.
[24] D.U. Lee et al., A 2.5Gb/s/pin 256Mb GDDR3 SDRAM with series pipelined CAS latency control and dual-loop digital DLL, in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 547-548, 2006.
[25] S.J. Bae et al., A 3Gb/s 8b single-ended transceiver for 4-drop DRAM interface with digital calibration ofequalization skew and offset coefficients, in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 520-521,
2005.[26] Y.-J. Jeon et al., A 66-333-MHz 12-mW register-controlled DLL with a single delay line and adaptive-duty-cycle clock dividers for production DDR SDRAMs, IEEE J. Solid-State Circuits, vol. 39, no. 11, pp. 2087-2092,Nov. 2004.
[27] T. Hamamoto et al., A 667-Mb/s operating digital DLL architecture for 512-Mb DDR, IEEE J. Solid-StateCircuits, vol. 39, no. 1, pp. 194-206, Jan. 2004.
ReferencesChulwoo Kim 84 of 86
References
-
5/25/2018 High-Bandwidth Memory Interface Design
85/86
References[28] S. Kim et al., A low-jitter wide-range skew-calibrated dual-loop DLL using antifuse circuitry for high-speedDRAM, IEEE J. Solid-State Circuits, vol. 37, no. 6, pp. 726-734, Jun. 2002.
[29] J.B. Lee et al., Digitally-controlled DLL and I/O circuits for 500 Mb/s/pin x16 DDR SDRAM, in IEEE ISSCC
Dig. Tech. Papers, pp. 68-69, 2001.[30] S. Kuge et al., A 0.18um 256-Mb DDR-SDRAM with low-cost post-mold tuning method for DLL replica,
IEEE J. Solid-State Circuits, vol. 35, no. 11, pp. 726-734, Nov. 2000.
[31] H.W. Lee et al., A 1.6V 1.4Gb/s/pin consumer DRAM with self-dynamic voltage-scaling technique in 44nmCMOS technology, IEEE J. Solid-State Circuits. vol. 47, no. 1, pp. 131-140, Jan. 2012.
[32] Y. K. Kim et al., A 1.5V, 1.6Gb/s/pin, 1Gb DDR3 SDRAM with an address queuing scheme and bang-bangjitter reduced DLL scheme in IEEE Symp. VLSI Dig. Tech. Papers, pp. 182-183, 2007.
[33] K.H. Kim et al., A 1.4 Gb/s DLL using 2nd order charge-pump scheme with low phase/duty error for high-speed DRAM application, in IEEE ISSCC Dig. Tech. Papers, pp. 213-214, 2004.
[34] J.H. Lee et al., A 330 MHz low-jitter and fast-locking direct skew compensation DLL, in IEEE ISSCC Dig.Tech. Papers, pp. 352-353, 2000.
[35] J. Kim et al., A low-jitter mixed-mode DLL for high-speed DRAM applications, IEEE J. Solid-State Circuits,vol. 35, no. 10, pp. 1430-1436, Oct. 2000.
[36] H.W. Lee et al., A 1.6V 3.3Gb/s GDDR3 DRAM with dual-mode phase- and delay-locked loop using power-noise management with unregulated power supply in 54nm CMOS, in IEEE ISSCC Dig. Tech. Papers, 2009, pp.140-141.
[37] S.-J. Bae et al., A 40nm 2Gb 7Gb/s/pin GDDR5 SDRAM with a Programmable DQ Ordering CrosstalkEqualizer and Adjustable clock-Tracing BW, in IEEE ISSCC Dig. Tech. Papers, pp. 498-500, 2011.
[38] K.-h. Kim et al., A 20-Gb/s 256-Mb DRAM with an inductorless quadrature PLL and a cascaded pre-emphasis transmitter, IEEE J. Solid-State Circuits, vol.41, no. 1, pp. 127-134, Jan. 2006.
[39] H. Partovi et al., Single-ended transceiver design techniques for 5.33Gb/s graphics applications, in IEEEISSCC Dig. Tech. Papers, pp. 136-137, 2009.
[40] Y. Hidaka, Sign-based-Zero-Forcing Adaptive Equalizer Control, in CMOS Emerging TechnologiesWorkshop, May 2010.
ReferencesChulwoo Kim 85 of 86
References
-
5/25/2018 High-Bandwidth Memory Interface Design
86/86
References[41] S.-J. Bae et al., A 60nm 6Gb/s/pin GDDR5 graphics DRAM with multifaceted clocking and ISI/SSN-reduction techniques, in IEEE ISSCC Dig. Tech. Papers, pp. 278-279, 2008.
[42] K.-I. Oh et al., A 5-Gb/s/pin transceiver for DDR memory interface with a crosstalk suppression scheme,
IEEE J. Solid-State Circuits, vol. 44, no. 8, pp. 2222-2232, Aug. 2009.[43] S. H. Wang et al., A 500-Mb/s quadruple data rate SDRAM interface using a skew cancellation technique,
IEEE J. Solid-State Circuits, vol. 36, no. 4, pp. 648-657, Apr. 2001.
[44] W. Hubert et al., GDDR5 training-challenges and solution for ATE-based test,inAsian Test Symposium,pp. 24-27, Nov. 2008.
[45] JEDEC, JESD212.
[46] K. Sohn et al., A 1.2V 30nm 3.2Gb/s/pin 4Gb DDR4 SDRAM with dual-error detection and PVT-tolerantdata-fetch scheme, in IEEE ISSCC Dig. Tech. Papers, pp. 38-40, 2012.
[47] C. Park et al., A 512-mb DDR3 SDRAM prototype with CIO minimization and self-calibration techniques,IEEE J. Solid-State Circuits, vol. 41, no. 4, pp. 831-838, Apr. 2006.
[48] D. Lee et al., Multi-slew-rate output driver and optimized impedance-calibration circuit for 66nm3.0Gb/s/pin DRAM interface, in IEEE ISSCC Dig. Tech. Papers, pp. 280-613, 2008.
[49] J. Koo et al., Small-area high-accuracy ODT/OCD by calibration of global on-chip for 512M GDDR5application, in Proc. IEEE CICC, pp. 717-720, Sep. 2009.
[50] S.-S. Yoon et al., "A fast GDDR5 read CRC calculation circuit with read DBI operation," IEEE Asian Solid-State Circuits Conference, pp. 249-252, 2008
[51] H.-W. Lee et al., A 283.2W 800Mbp/s/pin DLL-based data self-aligner for through silicon via (TSV)
interface, in IEEE ISSCC Dig. Tech. Papers, pp. 48-50, 2012.[52] U. Kang et al., 8Gb 3D DDR3 DRAM using through-silicon-via technology, in IEEE ISSCC Dig. Tech. Papers,pp. 130-131, 2009.
[53] D. Malta et al., Integrated process for defect-free copper plating and chemical-mechanical polishing ofthrough-silicon vias for 3D interconnects, in ECTC, pp. 1769-1775, 2010.
[54] A.-C. Hsieh et al., TSV redundancy: architecture and design issues in 3-D IC, IEEE Trans. VLSI Systems,pp. 711-722, Apr. 2012.