Impact of Memory - Ansys...Can be used as (In-package) direct attach memory ― Mitigates...
Transcript of Impact of Memory - Ansys...Can be used as (In-package) direct attach memory ― Mitigates...
Victor CaiMarketing Manager8/20/2011
Impact of Memory
Application Requirements
What do users want?― Zero latency― Infinite bandwidth― Zero power― Zero cost― Infinite capacity― Backward compatibility, future extensibility― Programmable access granularity― Absolute reliability
More realistic user expectations― Better performance at lower cost
2
How to get “Better Performance”
Higher Capacity― In-memory databases
● more capability― OLTP databases
● Less disk access, lower average latency― Consolidated (virtual) servers
● More emulated servers in single system = lower response latency
Higher Bandwidth― “Typical” scientific code
● Higher throughput
Lower Latency― single threaded code
Low Power― Enables large capacity applications― Enables mobile/battery based applications
3
Memory System Selection Criteria
No single answer defines “better performance”― Sometimes lower power, lower data rate, longer latency is
“better performance”
― Sometimes lower datarate, longer latency, higher capacity is “better performance”
― Sometimes higher datarate, same latency, same capacity is “better performance”
JEDEC member companies drive future specification requirement based on own respective application set
Specification “frame” scaling decision for near future designs
4
Synchronous DRAM (SDRAM)
Moved from signal interface to command interface― Enables programmable, pipelined operations― Consecutive data phases in single burst designed for CPU
cacheline fill
Multiple banks per device― Facilitates pipelining, different banks may be in different
phases of row-cycle operation
{DDR3, GDDR5, LPDDR2} - all direct evolutionary descendants of SDRAM
5
From SDRAM to DDR3
SDRAM DDR SDRAM DDR2 SDRAM DDR3 SDRAM Unit
Data rate 66~133 200~400 400~1066 800~2133 MT/s
Voltage 3.3 2.5 1.8 1.5 (1.35) V
Capacity 16~512 256~1024 256~2048 1024~4096 Mb
Burst length 1, 2, 4, 8 2, 4, 8 4, 8 4*, 8
I/O Type LVTTL SSTL-2 SSTL-18 SSTL-15
Termination Off chip Off chip On die On die
Dynamic ODT N/A N/A No RttWR
Bank count 2*, 4 4 4, 8* 8
I/O Calibration None None OCD* ZQ
• Small evolutionary steps to increase data rate, lower voltage and add features 6
0
20
40
60
80
100
120
Acc
ess
Late
ncy
(ns)
L1 Cache L2 Cache
L3 Cache
DRAM
Memory (System) Access Latency
Component Access Latency― Commodity DRAM (DDR3)
● Open page ~13 ns● Close page ~26 ns
― NAND Flash● ~25,000 ns
System Access Latency Example― "DRAM" only responsible for
~30% of ~100 ns of "memory access latency"
― TLB misses, buffering/re-drive, FIFO, serial-to-parallel, parallel-to-serial conversions, etc. all increase latency
7
Commodity DRAM Latency Scaling Trend
8
90.000
70.000
60.000 60.000 60.00055.000 56.250
54.000 55.00052.533 50.975 50.000
30.000
20.00015.000 15.000
18.00015.000 15.000 15.000 15.000 13.133 13.493 12.500
0
10
20
30
40
50
60
70
80
90
100
1996 1998 1999 2001 2002 2003 2004 2006 2007 2008 2010 2011
Late
ncy
(nan
oose
cond
s)
Year of Introduction
RC
CAS
Commodity DRAM Datarate Scaling Trend
66.667 100.000 133.333
266.667333.333
400.000
533.333
666.667
800.000
1066.000
1333.000
1600.000
0
200
400
600
800
1000
1200
1400
1600
1800
1996 1998 1999 2001 2002 2003 2004 2006 2007 2008 2010 2011
Data
rate
(MT/
s)
Year of Introduction
datarate
9
10CONFIDENTIAL
Key Challenges
Data In
1 Transmitter2 Transmission Line
4 TerminatorData Out
3 Receiver
Vref
+-
5 Clocking System
Signaling System
1. Transmitter2. Transmission Line3. Receiver4. Terminator5. Clocks Taken from “Signaling in High-Performance Memory Systems,” John
Poulton, ISSCC Tutorial,1999. 11
Channel Topology
Multiple DIMMs per channel 2 ranks (loads) per DIMM Dynamic ODT Asymmetric R/W topology
Memory Controller Die
Memory Controller Package
TraceMC
Socket
Open Field Trace DIMM
Field Trace
Socket-to-socket trace
DRAM Die(DQ Interface)
DRAM PKG WirebondPCB Trace (RS to DRAM PKG)DQ RStub
PCB Trace (Connector to RS)DIMM Connector
PCB Trace (Breakout on system board)
DIMM 0DIMM 2 DIMM 1
12
Current Typical System Memory Configuration
Supply voltage
Loading / Speed 800-1066 1333 1600 1866 2133
DR QR DR QR DR QR DR QR DR QR
1.5V
1DPC - - -
2DPC - - - - - -
3DPC - - - - - - - - -
1.35V
1DPC - - - -
2DPC - - - - - -
3DPC - - - - - - - - -
1.25V
1DPC - - - -
2DPC - - - - - -
3DPC - - - - - - - - -
14CONFIDENTIAL
Looking Forward
LRDIMM 4Rx4
Density: 16G Target platform:
― Westmere― Romley― Interlagos
Compatible with DDR3 DIMMs
20% cost advantage over equivalent LRDIMM
Back to the Future
Best candidate to replace/augment DRAM?― Flash
Why?― It’s cheaper than DRAM
Both DRAM and Flash running into process scaling challenges― Alternative technologies making claims and trying to win
mindshare― DRAM and Flash are multi-billion dollar industries with
continuous re-investment
Revolution postponed until ― Effectiveness of billion dollar investments in DRAM/Flash loses
to million dollar investments in alternatives.16
High Level Goals for DDR4
Latency― Comparable or slightly lower than DDR3
Bandwidth― Data rate to 3.2 Gb/s and beyond
Power― Lower power by lowering voltage AND new features to suppress
idle and active power
Cost― Minimize additional die size penalty relative to DDR3
Capacity― Leverage TSV to create extreme-capacity systems
Compatibility― Leverage as much infrastructure from DDR3 as practicable
17
DDR4 Features(Subject to Change)
Pseudo Open Drain, faster I/O― Leveraged from GDDR5
Single-ended signaling― Continues on DDRx legacy, may be end of line
1:1 signal/ground ratio― Better noise shielding on connector for higher data rate
Higher data rate, lower voltages Smaller rows for x4 device
― X4 devices to have lower activation power and better performance Same prefetch length as DDR3
― Supports existing CPU cache infrastructure GDDR5-style bank groups
― High data rate, low DRAM core cycle rate, same prefetch length
18
DDR4 Solution Development
Basic idea: Same register device for both RDIMM and LRDIMM― DDR4 Register + DRAM + no data buffers = RDIMM― DDR4 Register + DRAM + data buffers = LRDIMM
19
133.35 mm
31.2
5 m
m
Same Register Chip for DDR4 RDIMM or DDR4 LRDIMM
Data Buffer Devices0.85 mm pitch fingers
(0.9
mm
talle
r tha
n D
DR
3)
1:1 S:G ratio
Through Silicon Via (TSV)
(Potentially) a real revolution in DRAM technology― Within DRAM generation event horizon, 3~5 years
Can enable direct attach memory to processor― Low power, low latency― May have slightly higher per bit cost structure
Mitigate the bandwidth-vs-capacity issue― Multiple DRAM dies in single stack presents single electrical
{address, data} load to system
20
DRAM Stacks with TSV
Traditional Dual Die Package (DDP)― Same DRAM die as used in single die packages― System sees all loads on all DRAM dies
DRAM with TSV― Special die for TSV use― System sees single load of separate I/O re-drive chip 21
DRAM Die
DRAM Die
DRAM DieDRAM Die
DRAM DieDRAM DieDRAM DieDRAM Die
Interface Die
DRAM Die
Single Die PackageLow Cost Dual Die Package (DDP)
(with long wirebond on top die)
Dual Die Package (DDP)(with RDL on both dies)4 high TSV Stack with interface die
TSV Benefits and Drawbacks
Benefits― Can be used as (In-package) direct attach memory― Mitigates capacity-vs-bandwidth trade-off― Enables ultra-high capacity memory systems not possible
today
Drawbacks― Increased cost basis for DRAM manufacturers― Different dies used for SDP and stacked devices― Require new manufacturing/testing models for direct attach
memory function between different devices from different companies
22