INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI...
Transcript of INVITED PLENARY TALK FOR VLSI TECHNOLOGY SYMPOSIUM 2011 TECHNOLOGY … · 2011-08-28 · VLSI...
INVITED PLENARY TALK FORINVITED PLENARY TALK FORVLSIVLSI TECHNOLOGY SYMPOSIUM 2011TECHNOLOGY SYMPOSIUM 2011
TECHNOLOGY IMPACTS FROM THETECHNOLOGY IMPACTS FROM THE
VLSI VLSI TECHNOLOGY SYMPOSIUM 2011TECHNOLOGY SYMPOSIUM 2011
TECHNOLOGY IMPACTS FROM THE TECHNOLOGY IMPACTS FROM THE NEW WAVE OF ARCHITECTURES NEW WAVE OF ARCHITECTURES FOR MEDIAFOR MEDIA--RICH WORKLOADSRICH WORKLOADS
Samuel NaffzigerAMD Corporate Fellow
August 26th, 2011 (Original presentation June 14th , 2011)
1 VLSI Technology Symposium | June 2011 | Public
OutlineOutline
Introduction
The new workloads and demands on computationThe new workloads and demands on computation
Characteristics of serial and parallel computation
The Accelerated Processing Unit (APU) architectureThe Accelerated Processing Unit (APU) architecture
APU architecture implications for technology
Summary
2 VLSI Technology Symposium | June 2011 | Public
Now: Parallel/DataNow: Parallel/Data--DenseDense
The Big Experience/Small Form Factor ParadoxThe Big Experience/Small Form Factor ParadoxMid 2000sMid 2000sTechnologyTechnology Mid 1990sMid 1990s Now: Parallel/DataNow: Parallel/Data--DenseDense
16:9 @ 7 megapixels
HD video flipcams, phones,
4:3 @ 1.2 megapixels
Digital cameras, SD webcams (1-5
TechnologyTechnology Mid 1990sMid 1990s
DisplayDisplay 4:3 @ 0.5 megapixel
E il fil & webcams (1GB)
3D Internet apps and HD video online, social networking w/HD files
SD webcams (1-5 MB files)
WWW and streaming SD video
ContentContent Email, film & scanners
OnlineOnline Text and low res photos
3D Blu-ray HD
Multi-touch, facial/gesture/voice recognition + mouse & keyboard
DVDs
Mouse & keyboardMouse & keyboard
MultimediaMultimedia CD-ROM
InterfaceInterface Mouse & keyboard
All day computing (8+ Hours)All day computing (8+ Hours)
Immersive and Immersive and interactive performanceinteractive performance ss
33--4 Hours4 HoursStandardStandard--definition definition
InternetInternet
Battery Battery Life*Life* 1-2 Hours
rmrm tors
tors pp
Wor
kloa
dsW
orkl
oadsFor
For
Fact
Fact
Early Early Internet and Multimedia Internet and Multimedia ExperiencesExperiences
3 VLSI Technology Symposium| June 2011 | Public
*Resting battery life as measured with industry standard tests.
Focusing on the experiences that matterFocusing on the experiences that matterConsumer PC Usage New Experiences
Web browsing
Office productivity
New Experiences
Accelerated Internet Accelerated Internet Office productivity
Listen to music
Online chat
Watching online video
Photo editing
Accelerated Internet Accelerated Internet and HD Videoand HD Video
Photo editing
Personal finances
Taking notes
Online web-based games
Simplified Content Simplified Content ManagementManagement
Social networking
Calendar management
Locally installed games
Educational apps
ImmersiveImmersiveGamingGaming
Video editing
Internet phone
0% 20% 40% 60% 80% 100%
4 VLSI Technology Symposium| June 2011 | Public
Source: IDC's 2009 Consumer PC Buyer Survey
People Prefer Visual CommunicationsPeople Prefer Visual Communications
Visual Visual PerceptionPerceptionVerbal Verbal PerceptionPerception
Words are processedWords are processedat only 150 wordsat only 150 wordsper minuteper minute
Pictures and videoare processed 400 to
2000 times faster
Rich visual experiencesM lti l t tAugmenting Today’s Content:Augmenting Today’s Content: Multiple content sources Multi-Display Stereo 3D
5 VLSI Technology Symposium | June 2011 | Public
The Emerging World of New Data Rich Applications The Emerging World of New Data Rich Applications
The Ultimate VisualThe Ultimate Visual
Communicating
The Ultimate Visual Experience™
Fast Rich Web content, favorite HD Movies, games with realistic
graphics
The Ultimate Visual Experience™
Fast Rich Web content, favorite HD Movies, games with realistic
graphics • IM, Email, Facebook• Video Chat, NetMeeting
Using photos• Viewing& Sharing• Search, Recognition, Labeling? • Advanced Editing
graphicsgraphics
Gaming
• Advanced Editing
Using video• DVD, BLU-RAY™, HD • Search, Recognition, Labeling • Advanced Editing & Mixing g
• Mainstream Games• 3D gamesMusic
• Listening and Sharing• Editing and Mixing• Composing and compositing
6 VLSI Technology Symposium | June 2011 | Public
ArcSoftTotalMedia®
Theatre 5
ArcSoftMedia
Converter® 7
CyberLinkMedia
Espresso 6
CyberLinkPower
Director 9
Corel VideoStudioPro
Corel Digital Studio2010
InternetExplorer 9
Microsoft® PowerPoint® 2010
Windows Live
Essentials
CodemastersF1 2010Nuvixa
Be Present
ViVuDesktop
TelepresenceViewdle
Uploader
New Workload Examples: New Workload Examples: Changing Consumer BehaviorChanging Consumer Behavior
24 hoursof video
Approximately
9 billionvideo files owned are of video
uploaded to YouTube
every minute
video files owned are
high-definition
50 million +digital media files
1000 imagesd g ta ed a es
added to personal content libraries
every day
imagesare uploaded to Facebook
every second
7 | 2011 VLSI Symposium| June 2011 | Public
What Are the Implications for Computation?What Are the Implications for Computation?Insatiable demand for highInsatiable demand for high bandwidth processing–Visual image processing–Natural user interfaces–Massive data mining for
associative searchesassociative searches, recognition
Some of these compute needs b ffl d d tcan be offloaded to servers,
some must be done on the mobile device– Similar compute needs and
massive growth in both spaces
How must CPU architecture change to d l ith th t d ?
8 VLSI Technology Symposium | June 2011 | Public
pdeal with these trends?
Serial ComputationSerial Computation
Transistors(thousands)
35 Years of Microprocessor Trend DataSerial Code
(thousands)
Single-threadPerformance(SpecINT)
i 0
…Conditionalbranches
Frequency(MHz)
Typical Power(Watts)
i=0i++
load x(i)fmulstore
cmp i (16)bc
Number ofCores
Loops, branches and conditional evaluation
Original data collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond and C. Batten
9 VLSI Technology Symposium | June 2011 | Public
Parallel ComputationParallel Computation
GFLOPs Trendi=0
…
DataParallel Code
6000
7000
8000
P)
GFLOPs Trend
GPU
i++load x(i)
fmulstore
cmp i (1000000)bc
…
Loop 1M times for 1M pieces
of data
3000
4000
5000
eak GFlop
s (SPFP CPU
AMD projections
…
i,j=0i++j++
load x(i,j)fmulstore
0
1000
2000
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014P projections
…cmp j (100000)
bccmp i (100000)
bc
2D array representingvery large dataset
Years
…
10 VLSI Technology Symposium | June 2011 | Public
GPU/CPU Design DifferencesGPU/CPU Design Differences
L t f i t ti littl d t
CPU (Serial compute) GPU (parallel compute)
Lots of instructions little data
• Out of order exec, Branch prediction
Few instructions lots of data
• Single Instruction Multiple Data• Extensive fine threading capability• Few hardware threads
Weak performance gains through density
• Extensive fine-threading capability
Nearly linear performance gains with density
Maximize speed with fast devices
Maximize density with cool devices
11 VLSI Technology Symposium | June 2011 | Public
Three Eras of Processor PerformanceThree Eras of Processor Performance
Single-Core Era
Enabled by:
Multi-Core Era
Enabled by: Moore’s Law
HeterogeneousSystems Era
Enabled by: M ’ L Moore’s Law
Voltage & Process Scaling Micro Architecture
Constrained by:P
Moore’s Law Desire for Throughput 20 years of SMP arch
Constrained by:Power
Moore’s Law Abundant data parallelism Power efficient GPUs
Temporarily constrained by:P i d lPower
ComplexityPowerParallel SW availabilityScalability
Programming modelsCommunication overheadsWorkloads
erfo
rman
ce
?o rform
ance
o
plic
atio
n an
ce
ingl
e-th
read
P
we arehere
Thro
ughp
ut P
e we arehere
Targ
eted
App
Per
form
a
we arehere
o
12 VLSI Technology Symposium | June 2011 | Public
S Time
T
Time(# of Processors)
Time(Data-parallel exploitation)
Heterogeneous Computing with an APU ArchitectureHeterogeneous Computing with an APU Architecture
2010 G2010 G (“ ) f(“ ) f 20112011 (“ ) f(“ ) f
CPU CPU CoresCores
~17 GB/sec~17 GB/sec~17 GB/sec~17 GB/sec
2010 IGP2010 IGP--based based (“Danube”) Platform(“Danube”) Platform 2011 APU2011 APU--based based (“Llano”) Platform(“Llano”) Platform
CoresCores
~~7 GB/sec7 GB/sec
UNBUNB
MC
MC
DDR3 DIMMDDR3 DIMMMemoryMemory
CPU ChipCPU Chip
DDR3 DIMMDDR3 DIMMMemoryMemory
APU ChipAPU Chip
CPU CPU CoresCores
UVDUVD
UN
B / M
CU
NB
/ MC
GPUGPU UVDUVD
SB FunctionsSB Functions
FCH ChipFCH Chip
Graphics requires memory Graphics requires memory BW to bring full capabilities BW to bring full capabilities
to lifeto life~27 GB/sec~27 GB/sec
~27 GB/sec~27 GB/secPCIe
GPUGPU
OptionalOptional
PCIe®®
Bandwidth pinch points and latency Bandwidth pinch points and latency hold back the GPU capabilitieshold back the GPU capabilities
Integration Provides ImprovementIntegration Provides Improvement Eliminate power and latency of extra chip Eliminate power and latency of extra chip
GPUGPU
crossingcrossing 3X 3X bandwidth between GPU and Memory!bandwidth between GPU and Memory! Same Same sized GPU is substantially more sized GPU is substantially more effectiveeffective
P ffi i d d h l f b hP ffi i d d h l f b h
13 VLSI Technology Symposium | June 2011 | Public
Power efficient, advanced technology for both Power efficient, advanced technology for both CPU and GPUCPU and GPU
The Challenges of Integration
erfo
rman
ceP
erDensity
Thick, fast metalBig devices
Dense, thin metal, small devicesBig devices devices
CPU flop area = 2 14
GPU flop area = 1 0
Flop count for 4 Llano CPU cores=0.66M
Flop count for Llano GPU =3.5M
14 VLSI Technology Symposium | June 2011 | Public
area 2.14 area 1.0
How to Balance the Metal Stack?Cu Resistivity
22.12.22.32.42.5
uohm
-cm
)
rman
ce With barrierWithout barrier
1.51.61.71.81.92
0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 1Res
istiv
ity (
Per
for
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Line Width (um)Density
With the 20nm node, even local metal will be seeing large RC increase compromises more diffi lt
Add metal layers?Add metal layers? Thin, dense layers for the GPUThin, dense layers for the GPU
difficultyy
Thick, low resistance layers for the CPUThick, low resistance layers for the CPU Cost issues?Cost issues? Via resistance?Via resistance?
15 VLSI Technology Symposium | June 2011 | Public
Via resistance?Via resistance?Technology improvements in BEOL are requiredTechnology improvements in BEOL are required
R R vsvs C?C?
Given the grim RC prognosis, should we be re-shaping either the aspect ratio or stack
M1 M1 M1 – M2
M5 – M6
M5 – M1
composition?Maybe. However, there are times when
M1 – M1 M1 M2
However, there are times when RC is important, but there are also many times when only C matters 7000
8000
9000
m
Track Availability vs Distance
GPU Stack:8-1X; 1-2X
Moreover, metal stack aspect ratio is more or less maxed out, so that leaves stack
3000
4000
5000
6000
7000
able
trac
ks/1
00um CPU Stack:
2-1X; 2-1.3X; 4-2X; 1-4X
composition Different products will emphasize different metal 0
1000
2000
3000
0 20 40 60 80 100 120 140
Ava
ila
16 VLSI Technology Symposium | June 2011 | Public
pstacks
0 20 40 60 80 100 120 140
Norm Dist/ps
The growth in metal layer countThe growth in metal layer count
14Number of CPU Metal Levels vs Technology Node
10
12
els
6
8
Met
al L
eve
0
2
4
010 100 1000
Technology Node
17 VLSI Technology Symposium | June 2011 | Public
Factors driving growth in Metal LayersFactors driving growth in Metal LayersI t t i t f b i liInterconnect requirements from basic scaling–Transistor count N scales as S2 (with fixed die size)Total interconnect length (in lambda) scales N>1 because of semi-global and global routes Therefore interconnect length (in mm) global and global routes. Therefore, interconnect length (in mm) increases at a rate <1/S
Non-scaling design rules–In order to achieve tight pitch, more restrictive design rules In order to achieve tight pitch, more restrictive design rules are imposed that significantly reduce the routeability of metal layers: Unidirectional metal, increased overlap requirements, restrictive T2T
d T2L land T2L rulesEach metal layer is “worth less” in terms of routeability: need more metal layers
Reverse scalingReverse scaling–Long distance routes require lower RC than can be accommodated by scaled metal
–So move routes to thicker layers but fewer tracks available
18 VLSI Technology Symposium | June 2011 | Public
–So, move routes to thicker layers, but fewer tracks available, so pressure on layer count
Factors driving increase in Metal LayersFactors driving increase in Metal Layers
Electromigration/Power Supply GridElectromigration/Power Supply Grid–As cross section scales with S2, and current increases as Vdddrops, so current densities increase dramaticallyHigher Via R Metal resistances significantly degrade Higher Via R, Metal resistances significantly degrade Drives improved E-M sophistication, process techniques (alloys/barriers), denser power networksPower Gating and Power Islands may drive the need for multipleo e Gat g a d o e s a ds ay d e t e eed o u t p esupply grids More metal consumed by power supply grid
All of the above can have the effect of increasing the number of metal layers– But it can be a tradeoff of Metal layers vs die size and/or route timeBut it can be a tradeoff of Metal layers vs die size and/or route time
19 VLSI Technology Symposium | June 2011 | Public
Device OptimizationAPUVtMix
rman
ce
CPU GPU
0.8
1
1.2
APU Vt Mix
LC‐HVT
Per
for
0.2
0.4
0.6HVT
RVT
LC‐RVT
LVTDevice Ioff
To achieve breakthrough APU performance, the Llano GPU
0
Llano CPU Llano GPU
LVTDevice Ioff
p ,has ~5X the flops and ~5X the device count of the CPUs
200
250
FPG vs. Ioff
desired device range
Speed vs. Leakage
ed
A broader device suite is
Broader span of devices required
0
50
100
150
FPG
device range
RO
spe
e
LVT LC-RVTRVT HVT LC-HVT
suite is required
20 VLSI Technology Symposium | June 2011 | Public
0
175 50 20 4.3 2.7 0.5 0.4
Ioff (nA/um room temp)
Power Transfers
110.0110.0
100.0
105.0
100.0
105.0
90.0
95.0
90.0
95.0
85.0
Balanced workload
85.0
GPU-centric data parallel workload
V l i i i l bliV l i i i l bliVoltage range is critical to enabling Voltage range is critical to enabling the efficient power transfers that the efficient power transfers that make for compelling APU make for compelling APU performanceperformance
21 VLSI Technology Symposium | June 2011 | Public
performance performance
Power Transfers
110.0110.0
100.0
105.0
100.0
105.0
90.0
95.0
90.0
95.0
85.0
CPU-centric serial orkload
85.0
Balanced workloadV l i i i l bliV l i i i l bli workloadVoltage range is critical to enabling Voltage range is critical to enabling
the efficient power transfers that the efficient power transfers that make for compelling APU make for compelling APU performanceperformance
22 VLSI Technology Symposium | June 2011 | Public
performance performance
Operating Voltage RangeE/op vs V
Operating voltage Operating voltage requirements:requirements: Low voltage necessary forLow voltage necessary for
2
2.5
E/op vs. V
Low voltage necessary for Low voltage necessary for power efficiencypower efficiency High voltage necessary for High voltage necessary for 0.5
1
1.5
g g yg g ya snappy user experience a snappy user experience enabled by turbo modeenabled by turbo mode
0
0.7V 0.8V 0.9V 1.0V 1.1V 1.2V 1.3V
23 VLSI Technology Symposium | June 2011 | Public
Power Density Limited GPU
Operating Voltage Challenges
0.952V
0.915V
0 886V0.900V
0.950V
1.000V
3.5
4
4.5
5
age
40nm to 14nm
Frequency
To maintain cost effective To maintain cost effective performance growth with performance growth with technology node the GPUtechnology node the GPU 0.886V
0.805V
0 750V
0.800V
0.850V
1
1.5
2
2.5
3
Nom
inal Volta
Frequency
Power Density
Voltage
technology node, the GPU technology node, the GPU must:must:
–– Hold power density Hold power density constantconstant
0.700V
0.750V
0
0.5
40nm 28nm 20nm 14nm
constantconstant–– Exploit density gains to add Exploit density gains to add
compute units compute units Juniper FrequencyData40nm GPU Frequency Data
This necessarily drives This necessarily drives operating voltage downoperating voltage down
This would be good for energy This would be good for energy 700MHz
800MHz
900MHz
1000MHz
Juniper Frequency Data
1
2
3
4
5
6
40nm GPU Frequency Data
efficiency except …efficiency except …–– Variation impacts are much Variation impacts are much
greater at low voltagegreater at low voltage 300MHz
400MHz
500MHz
600MHz
6
7
8
9
10
11
12
13Frequency
24 VLSI Technology Symposium | June 2011 | Public
0MHz
100MHz
200MHz
0.85V 0.90V 0.95V 1.00V 1.10V 1.15V
14
15
16
17
18
spread increases at low voltage
The Operating Voltage Challenge
Many barriers to maintaining both high Many barriers to maintaining both high and low voltage as technology scalesand low voltage as technology scales
FD devices should enable FD devices should enable maintaining the functional maintaining the functional range for a generation or tworange for a generation or twoand low voltage as technology scalesand low voltage as technology scales
TDDB vs. SCE controlTDDB vs. SCE control ULK breakdown vs. denser pitchesULK breakdown vs. denser pitches
V i ti t lV i ti t l
g gg gWill turbo modes be too Will turbo modes be too
compromised?compromised?What’s next?What’s next? Variation controlVariation control What s next?What s next?
95.0
100.0
105.0
110.0
Poly
85.0
90.0
Fin
25 VLSI Technology Symposium | June 2011 | Public
BOXFin
Cost issuesCost issues
26 VLSI Technology Symposium | June 2011 | Public
1000 R = k1/NA
Lithography evolutionLithography evolution
R k1/NA is saturating
NA is saturating
k1 limit is about 0.25
i-linei-lineg-line 430nmg-line 430nm
100
"1:1"
i-line 365nmi-line
365nmKrF 248nmKrF 248nmArF
193 nmimmersion
430nm430nmArF193 nm
10 NA~1.35 NA< 0.8
27 VLSI Technology Symposium | June 2011 | Public
1010 100 1000
Technology Node or min Feat. size
NA 1.35 NA 0.8
Scaling implicationsScaling implications
R = k1/NA . – stuck at 193nm for now, NA at 1.35, and k1 limit at 0.25–Reducing k1 to <0 3 has very considerable cost:Reducing k1 to <0.3 has very considerable cost: –Much OPC and RDR needed to achieve tight pitches
K1=0.36
- Net, a significant erosion of pitch-based scaling entitlement- Scale factors are proprietary … but block area scaling > pitch scaling^2!
Fundamental pitch limitation for 193nm lithography is ~ 80nm
28 VLSI Technology Symposium | June 2011 | Public
Pitch splittingPitch splitting
Decomposing a layer into two effectively 72nm 144nm
doubles pitch, resolving k1 issue and allowing complex shapes
Decomposition requires Decomposition requires significant CAD effort to break the patterns into two printable layersp y
29 VLSI Technology Symposium | June 2011 | Public
Pitch splittingPitch splitting
Decomposing a layer into two effectively 72nm 144nm
doubles pitch, resolving k1 issue and allowing complex shapes
Decomposition requires Decomposition requires significant CAD effort to break the patterns into two printable layersp y
However, now have within-layer overlay issues, and min space can be a Vmax issue or a Cap issue
4σ min space ~ 16.5nm (PS @ 72nm) versus ~28nm ( @ 80nm)
a Vmax issue, or a Cap issue
30 VLSI Technology Symposium | June 2011 | Public
4σ max space ~ 44.8nm (PS @ 72nm) versus ~41.3nm (D@ 80nm) ~Ccap variation: +85% / -30% over nominal for PS @ 72nm
~Ccap variation: +25% / -15% over nominal for SE @ 80nm
Why do we care?Why do we care?
Foundries have settled on a 28nm node with a ~4:3 M1X:Poly pitch ratioFoundries have settled on a 28nm node with a ~4:3 M1X:Poly pitch ratio–Typical Design rules
assuming 0.7x scalingDesign Rule 28nm Desired 20nm
Contacted Poly Pitch ~113nm ~80nm
20nm node CPP is doable –but probably want >80nm for margin and gate oversize capability
Contacted Poly Pitch 113nm 80nmM1X Pitch ~90nm ~64nm
but probably want >80nm for margin and gate oversize capabilityDesired 1X metal scaling to 20nm is below pitch split limitCan get “true” scaling and pitch split 1X metals
–GPU’s have up to 8 1X metals–CPU’s have 2-5 1X metals
Choice: significant cost adder for “true” scaling, or reduced cost and reduced scaling
31 VLSI Technology Symposium | June 2011 | Public
Other cost considerationsOther cost considerationsMOL: Conventional contacts at <90nm CPP don’t work, and a more complex scheme is required, analogous to LI used by Intel at 32nm (+2 masks)
BEOL Options: Only scale 1X metals to ~80nm pitch, get reduced scaling but lower cost
Add metal layers at 80nm pitch to recover scaling; increased cost and cycle time
Use some combination of pitch split and non-pitch split layers to obtain greater scaling at higher cost
Key questions to resolve:
Additional cost of pitch split layers– Additional cost of pitch split layers
– Additional defectivity of pitch split layers (~64 vs ~80nm pitch)
– Whether or not to pitch split vias
32 VLSI Technology Symposium | June 2011 | Public
p p
Relative cost experimentRelative cost experiment
33 VLSI Technology Symposium | June 2011 | Public
What about EUV?What about EUV?At = 13.5nm, EUV should make lithography simple, and eliminate the
f O C ?need for pitch splitting, as well as most OPC. Right?Maybe:
– Very expensive capital equipment– Complex, expensive reflective masks– Very low throughput due to illuminator
output >10X below requirementsoutput 10X below requirements– Very high power requirements
Th i b l blThese issues may be solvable, unlikely by the leading edge of 14nm
EUV
Other forms of advanced lithography such as MEBL look attractive, but are even further behind EUV.
34 VLSI Technology Symposium | June 2011 | Public
3D Integration to the Rescue?
DRAM
TIM (Thermal Interface Material)
Heat Sink
DRAMi
Through GPU Die
Analog Die (SB, Power)Metal Layers
Metal LayersDRAMMicro-
bumps
ThroughSilicon Vias
(TSVs) CPU DieMetal Layers
G U eMetal Layers
Package Substrate
South Bridge
35 VLSI Technology Symposium | June 2011 | Public
3D Integration to the Rescue?
Stacking offers many attractive benefits Stacking offers many attractive benefitsHigher bandwidth to local memory
E bl ll l d i l t di t b i th iEnables parallel and serial compute die to be in their own separate optimized technology – interconnect speed vs. density, device optimization etc.
Allows IO and southbridge content to remain in older, more analog-friendly technology
36 VLSI Technology Symposium | June 2011 | Public
3D Integration Challenges Economical 3D stacking in high volume manufacturing presents co o ca 3 stac g g o u e a u actu g p ese tsmany challenges
Benefits must exceed the additional costs of TSVs, and yield fallout
Logistics of testing and assembling die from multiple sources can be immense
Countless mechanical and thermal issues to solve in high volume mfgCountless mechanical and thermal issues to solve in high volume mfg
DRAM
Clearly 3D provides compelling solutions to many problems, but the TIM (Thermal Interface Material)
Heat Sink
DRAMy
barriers to entry mean heavy R&D $$ and partnerships required
ThroughSilicon
Vias(TSVs)
CPU DieMetal Layers
GPU DieMetal Layers
Analog Die (SB, Power)Metal Layers
Metal LayersDRAMDie to
Die Vias
37 VLSI Technology Symposium | June 2011 | Public
p p qPackage Substrate
South Bridge
SummarySummary
Insatiable demand for high bandwidth computation–Visual image processing–Natural user interfacesNatural user interfaces–Massive data mining for associate searches, recognition
Some of these compute needs can be offloaded to servers, some must be done on the mobile devicesome must be done on the mobile device–Similar compute needs and massive growth in both spaces–Combined serial and parallel computation architectures are
key in both spaceskey in both spacesHuge technology challenges to meeting this opportunity
–Interconnect scaling is hitting a wall that must be overcomeA broad device suite is necessary that operates efficiently at–A broad device suite is necessary that operates efficiently at low voltage while enabling high speed for response time
–Cost issues present a very real barrier to further scaling3D integration offers a promising long term solution
38 VLSI Technology Symposium | June 2011 | Public
–3D integration offers a promising long term solution