Processor Development and current trends · 2019. 1. 29. · Phase Change Memory In phase change...
Transcript of Processor Development and current trends · 2019. 1. 29. · Phase Change Memory In phase change...
© 2010 IBM Corporation
Processor Development and current trends
Dr. Harry Barowski, Technology Development, IBM Systems &Technology GroupGuide Share Europe ESG
2 © 2010 IBM Corporation2
Burlington
Poughkeepsie
▲Austin
Fujisawa
▲Almaden
▲ Peking
▲ Zürich
Tucson
YasuYamato
▲ Tokio
▲Delhi
▲Haifa
Dublin
Greenock
Hursley
East FishkillEndicott
▲ Yorktown HeightsTorontoRochester
Boulder
Rom
Raleigh
Krakau
Perth
Bangalore
Gold Coast
Sydney
Paris
Pune
ShanghaiTaipei
Moskau
San Jose
Santa Teresa
Böblingen
São Paolo
Beaverton
Kairo
• Burlington
• Poughkeepsie
▲Austin
• Fujisawa
▲ Almaden
▲ Peking
▲ Zürich
• Tucson
• Yasu• Yamato
▲ Tokio
▲Delhi
▲Haifa
• Dublin
• Greenock
• Hursley
• East Fishkill
• Endicott
▲ Yorktown Heights• Toronto
• Rochester
• Boulder
• Rom
• Raleigh
• Krakau
• Perth
• Bangalore
• Gold Coast
• Sydney
• Paris
• Pune
• Shanghai
• Taipei
• Moskau
• San Jose
• Santa Teresa
• BBööblingenblingen
• São Paolo
• Beaverton
• Kairo
▲ Research• Hardware Development• Software Development• Hardware and Software Development
Processor and technology development GSE
IBM development and research centers
3 © 2010 IBM Corporation
Milestones of the semiconductor technologyTransistor (1947), Bardeen, Brattain & Shockley
Integrated Circuitry (1959), Kilby
Moore‘s Law (1965)– „Cramming more components onto
integrated circuits“– Complexity and number of transistors double
every 12 months, resp. 24 month (1975 - (…•
Microprocessor (1971)– Intel 4004 – 2300 transistors, 700 kHz clock
Cell/B.E. (2005)– 250 million transistors in 90nm CMOS SOI– Heterogeneous Multicore-Architecture
POWER6/POWER7 (2006/2010)– Dualcore microprocessor with 790 million
transistors @ 5GHz in 65nm CMOS SOI technology (POWER6)
– Octocore microprocessor with 1,2 billion transistors @ >4GHz in 45nm CMOS SOI technology (POWER7)
– Record setting clock frequencies beyond 4GHz barrier!
Cell/B.E.1 PPE + 8 SPEs @ 3,2 GHz
POWER7OctoCore >>4GHz!
Processor and technology development GSE
500.000x transistors 500.000x transistors
6.000x frequency6.000x frequency
Intel 4004
4 © 2010 IBM Corporation
Trends in silicon technology: CMOS generations and wafer size
200 mm8“
300 mm12“
180 nm
130 nm
90 nm65 nm
45 nm32 nm 23 nm
450 mm18“
?
Processor and technology development GSE
5
Scaling benefits: past & future
Smaller devices – more performance/power!– Less gate capacity– Less wiring distance (even if wires are scaled, RC was constant)– => higher clock frequencies / performance– => less power / increased efficiency
Higher density (almost doubled with each CMOS generation) – => allows for more complexity => higher performance– Greater caches
Reduced cost: more chips / wafer
Past
Futu
re
Devices will not gain performance due to scaling– Amplification of transistors will drop
Gate/Subthreshold leakage effects– Power loss worsens with smaller devices and thinner oxides
Density limited by wiring and viasCosts increase due to photolithography limits
– (optical proximity, photoresist roughness, immersion lithography)
Processor and technology development GSE
6 © 2010 IBM Corporation
Optical lithography: Ultraviolett => DeepUV => ExtremeUV?
Extreme Ultra-Violet (EUV)Lithography
Exposure/Patterning
Double patterning/exposure with 193nm immersion?
(According to ITRS)
Processor and technology development GSE
7
New devices?Physical limits reached?20219nmCMOS16
Emerging new devices: CNT FET / SET / spin devices
EUV destructionPhotoresist limitations
EUV limits201715nmCMOS15
450mm wafer production3D integrationNew transistor models/designshigh refractive index immersion lithography
No performanc gains . Reduced density gaingate length ~ mean free path => current limited by contacts
Economic limit?Deterministic dopingQuasi-ballistic transport:193nm lithography ends
201422nmCMOS14
EUVlight?Double exposure/patterningUTB-FDSOI / DoubleGate (FinFET)AirGap
193nm (ArF) lithography at limitsVariability controlLow-k slowdown
201132nmCMOS13
Immersion (water based) lithographyHigh k, metal gate
193nm: NA>1 requiredGate leakage
200945nmCMOS12
Final frontier20255nmCMOS17
Solution/ForecastTechnical limitsLimits of technologyGeneral Availability
Technology
CMOS future: Saturation of technology?
Final frontier?
© 2010 IBM Corporation
Processor and technology development GSE
8
Double patterning/exposure
© 2010 IBM Corporation
subtrateHard mask/etch layer
Photo resist
expose cover etch
Sidewallformation etch Spacer
removal
1) Double patterning by Litho-Etch, Litho-Etch (LELE)
2) Self-aligned double patterning process (SADP)
2ndexposure etch
1stexposure
Processor and technology development GSE
Masking material
9
EUV lithography @ 13.5 nm wavelength exposure system
© 2010 IBM Corporation
Processor and technology development GSE
10
© 2010 IBM Corporation
End of scaling forces innovation!
• New materials• Innovative device structures
Silicon-On-Insulator
Copper wiring, Low-K dielectric
Strained siliconDouble-Gate Transistor (FinFET)
Processor and technology development GSE
11
© 2010 IBM Corporation
High-K, metal gate technology in CMOS
Problem:Isolation SiO2 layer between gate and channel reached atomic dimension
11 Å ≈ three atomic layers only!Innovation:Add high-k dielectric HfOx onto SiO2• Reduces gate leakage by factor 10x• but: maintains device performance•
Processor and technology development GSE
12
© 2009 IBM Corporation
State-of-the-art Technologie, e.g. SEM of 10 Layers of Metal
2x(0.25µm)
4x(0.50µm)
8x(1.2µm)
Layer Thickness
Aktuelle Entwicklungen auf Prozessor und Speicherebene
13
IBM CMOS Innovation path
Clock frequency/perform
ance
time
Bulk SiliziumAluminium
Bulk SiliziumKupfer
SOIKupfer
PFET
NFET
Tensile
Gate Poly
Compressive
SOICu/Low-K
SOIStrained SiCu/Lower-K
Interlayer
High-K
Metal gate
SOIStrained Si
High K Metal Gate
Cu/Ultralow-KSOI
Strained SiUT-SOI/FinFet
High K Metal Gate Cu/Airgap
Back End Innovation
Transistor Innovationen
…
Source: accroding K. Bernstein, IBM research © 2010 IBM Corporation
Processor and technology development GSE
14
Energy density as design challenge!
0
12
3
45
6
7
89
10
1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010
Proz
esso
r Fre
quen
z (G
Hz)
„Wishful Thinking“
Realität ?
D.J. Frank et al., IBM J. RES. & DEV. VOL. 46 NO. 2/3 MARCH/MAY 2002 © 2010 IBM Corporation
Power equations:Pactive = α f C Vdd²Pleakage = Ileakage Vdd (≈> Vdd
(3
Energy density restricts clock frequency!
Processor and technology development GSE
Scaling into smaller dimensions aims for increased device performance
=> Increasing active switching power density
Physical effects: Scaling leads to leakage currents
=> Dramatic increase of passive power density
15
System Performance exploiting Multicore-technology
Mod
ule
Hea
t Flu
x(w
atts
/cm
)2
0
2
4
6
8
10
12
14
Vacuum IBM 360 IBM 370
IBM ES9000
IBM 3090S
IBM 3090
IBM GP
Merced
Pentium II(DSIP)
Squadrons
Pentium 4
Prescott
Low-PowerMulti-Core
Bipolar
NMOS / PMOS /CMOS
1950 1960 1970 1980 1990 2000 2010
CMOS
CMOS CMOS
Performance
Density
Disruptive evolution steps
as
„Phase transitions“!
© 2010 IBM Corporation
Processor and technology development GSE
16
Example: POWER 7 Processor Chip - OctoCore
© 2010 IBM Corporation
• 567mm², 45nm lithography, Cu, SOI, eDRAM• 1.2B transistors
•Equivalent function of 2.7B•eDRAM efficiency
• Eight processor cores•12 execution units per core•4way SMT per core•32 threads per chip•256KB L2 per core
• 32MB on chip EDRAM shared L3• Dual DDR3 Memory Controllers
•100GB/s memory bandwidth per chip sustained• Scalability up to 32 sockets
•360GB/s SMP bandwidth/chip•20,000 coherent operations in flight
• Advanced pre-fetching Data and Instruction• Binary Compatibilty with POWER6
Ron Kalla, HotChips2009
Processor and technology development GSE
17
Phase Change Memory
In phase change memory, data are stored by switching small regions of a "phase-change" material between one of two phases: • a highly resistive amorphous phase, and• a highly conductive crystalline phase. • Switching is done by using voltage/current pulses to control time temperature within this small volume.
Phase change memory has a number of advantages as a prospective next-generation solid-state memory, including • Solid-state memory array & • potential for CMOS compatibility • Large resistance contrast • Media familiarity from CD-RW • Inherently fast transformation
© 2010 IBM Corporation
Processor and technology development GSE
18
3D Integration for new era of 3D circuitry?
A.W. Topol (IBM), IEDM 2005.
Bonded oxide interfaceBonded interface
Gates
Bottom SOI IC
Top SOI IC
IBM J. RES. & DEV. VOL. 50 NO. 4/5 JULY/SEPTEMBER 2006
© 2009 IBM Corporation
Processor and technology development GSE
19
© 2010 IBM Corporation
New device structures of future CMOS transistors
Source DrainBack Gate Oxide
Gate
Back Gate
BackBack--GateGate
M. Ieong et al., IEDM 2001
FinFETFinFET
J. Kedzierski et al., IEDM 2001-3
C-Y Sung et al., IEDM 2005
Direct Substrate BondingDirect Substrate Bonding
B. Doris et al., VLSI 2004
Simplified HOTSimplified HOT
B. Doris et al., IEDM 2002
UTSOIUTSOI
6nm FET6nm FET
Processor and technology development GSE
20
Future CMOS alternatives: Advanced new nano technologies
Molecular Logic
Carbon Nanotubes for FET
Spintronics
Source: public press room IBM research © 2010 IBM Corporation
Processor and technology development GSE
21
Research in nano technology: 3D Chips and computation on atomic scales
State of the art– Information stored with millions of atoms.– Harddisc-Drives: some TeraBytes capacity
Nanotechnology promises– 3D Chips– Storage of data on atomic scales (atoms/molecules)– Switches based on atomic molecules– Increases density by a factor of 1000– iPod capable of storing 30.000 HD movies
Older research topics– Millipede– …
Illustration of the preferred magnetic orientation of an iron atom on a specially prepared copper surface. The stability of an atom's magnetic orientation can help determine that atom's suitability for storing data.
Schematic three-dimensional image of a molecular "logic gate" of two naphthalocyanine molecules, which are probed by the tip of the low-temperature scanning tunneling microscope.
An animated view of the Millipede nanomechanical storage device illustrates how an individual tip creates an indentation in a polymer surface (bottom) and how a large number of such tips are operated in parallel (top).
© 2010 IBM Corporation
Processor and technology development GSE
22
Trend: Multicore and application optimized Systems
SupercomputerGaming Application
Protein foldingDOE ApplicationsModelling of Neocortex
ApplicationGames 3D rendering
BladeCenterApplication
networking Data MiningXMLCell blades
© 2010 IBM Corporation
Processor and technology development GSE
23
© 2010 IBM Corporation
Changing Paradigmas in a Digital World/Future
Multimedia is key! -era everywhereSearch engines (like Google, Inktomi) extend search for document files, images, streaming media
Natural information processing (like human brain) – Information wrapped in multimedia containers
needs for high bandwidth and compute power– New killer applications:
video processing and digital entertainmentRealistic information processing for (online) gamingSurveillance support
(i.e. Videocams on airports) Simulation/Visualisierung:
polygon modelling > real data processingMedia/streaming services:
High bandwidth, new algorithms
Compute power up to 1018 – 1020 FLOPsand high bandwidth
needed!
Processor and technology development GSE
24
© 2010 IBM Corporation
Nature as example of power efficient computing?
Human brain:• 100 billion neurons• 100 trillion synapses• Up to 10000 synapses per neuron
PowerPC core + 8x Synergistic Processor Element + Element interconnect bus mimic neuron
Processor and technology development GSE
Heterogeneous multicore architecture:* exploiting parallel computing
with data parallelism (SIMD)* distributed memory
25
© 2009 IBM Corporation
Accelerator-Example: Cell Synergistic Processor Element (SPE) Synergistic Processor Element (SPE):• Provides the computational performance• Simple RISC User Mode Architecture • Dual issue VMX-like• Graphics SP-Float• IEEE DP-Float• Dedicated resources: unified 128x128-bit RF, 256KB
Local Store• Dedicated DMA engine: Up to 16 outstanding
requests
Source: „The cell processor“ J. Leenstra, IBM Entwicklung GmbH, technical talk CeBit 2006
Processor and technology development GSE
26
1994 1996 1998 2000 2002 2004 2006 2008 2010 ……
ASCIWhite
EarthSimulator
BlueGene/L
ASCIRed
Teraflops
Blue Pacific
1,000
750
500
250
0
HybrIde
ArchItecture
Sources: www.top500.org; not drawn to scale; all statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only
RoadRunner
History of fastest supercomputers (TOP500.org) 10,000
Sources: http://movementarian.com/2006/08/18/flops-mips-watts-and-the-human-brain/; all statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only
10 billion neurons
Chip technology based on standard PPC440
(700MHz BG/L) (850MHz BG/P)1976Cray 1s
170 MFlops
© 2010 IBM Corporation
BlueGene/L
PowerXCell 8i
Enhanced Cell/B.E.45nm lithography, 3.2 GHz
Processor and technology development GSE
27
QPACE – most energy efficient HPC System based on PowerXCell!
Node Card (NC) – Rialto
PowerXCell 8iNetwork
Processor(NWP)
Memory
CH0
CH1
Service Processor
(SP)
Node CardCPLD
(NCPLD)
Status LEDTemperature
Sensors Flash
VPD
1 .. 2048 1 .. 64
Debug Connector
Front-End Node (FEN)
Quantum Chromodynamics Parallel computing on Cell
Highly optimized system using hybrid, multicore, specialized processor architecture!200 TFLOPs peak performance, 50 TFLOPs sustained performanceFactor > 4 less system cost than State-of-the-Art-SupercomputersHighest energy efficiency ever above State-of-the-Art-Supercomputer (>770 MFlops/W) 3D Torus topology w/ 6 nearest neighbour interaction based on FPGA network processor
© 2010 IBM Corporation
Processor and technology development GSE
28
SC
homogeneous Multicore
AcceleratorSingle Core
Single-Thread
(Simultaneous)-Multi-Thread
Massive Multi-Threading
Heterogeneous Multi Core (SoC)
ST SMT MMT
Trends of the processor architecture
MC Acc
© 2010 IBM Corporation
Processor and technology development GSE
29
Massive Parallel cluster:- blade center based - several possible OS in same chip/blade- OS-based assignment of resources
SMP system:- single thread orientated- superscalar structures- classic scale-up system
Uni processor system:
Hybrid setup:- may be blade center based- OS over various systems- based on various kinds of systems
Many Core:- still coherent memory
Trends of system structure
© 2010 IBM Corporation
Processor and technology development GSE
30
IBM: System Performance from Atoms towards application
Application
S/W Development
Tools
MiddlewareOperating
System Hypervisor
SemiconductorsDevice, process, interconnect
PackageI/Os, wiring level cooling, …
Microprocessor Core
Microarchitecture, logic circuits, design methodology
CacheCache levels,
granularity, latency, throughput
Interconnect
I/O
Compilers
SMP System StructureFabric, switches, busses, memory system, protocols, …
Software
SMPHardwareSystem30-40% -> 10-15%
CAGR
70%-80% CAGR Virtualization,cluster management,etc.
© 2010 IBM Corporation
Processor and technology development GSE
31
Outlook: Hybrid Systems and heterogeneous ManyCore-Systems with distributed memory
Prozessor Kern 1Prozessor Kern 2
.Prozessor Kern n
Accelerator 1Accelerator 2
.Accelerator 3
Cache 1Cache 2
.Cache n
Hauptspeicher
...
• Parallelism (Multi-Core)
• Accelerator-Cores (Specialization/Optimization)• Integration
(System on Chip, SoC)
• Modularity (Flexibility)
• Software optimization
• nano technology
„post von Neumann“ era begins!
© 2010 IBM Corporation
Processor and technology development GSE
32
© 2010 IBM Corporation
Summary
Moore's law still valid!
CMOS scaling continued towards structure sizes beyond 20nm– Lithography not the limiting issue.– EUV still in development.
Technical challenges:– Manufacturing Defects, Yield – “Soft Errors” and fault tolerance
• “On-line”-recognition of soft errors• Recovery/Availability
– Power Density• Optimize devices • Exploit new materials
Mikro-Architecture/System-Architecture– Rethinking paradigms!– Exploit parallelism by use of multicore techniques and vector/SIMD processing– Exploit hybride architectures
Processor and technology development GSE
33
Thank you!
Green IT: www.ibm.com/de/ibm/
Processor and technology development GSE
34
Disclaimer
(c) Copyright International Business Machines Corporation 2010.All Rights Reserved. The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both.
IBM IBM Logo Power Architecture
Other company, product and service names may be trademarks or service marks of others.
All information contained in this document is subject to change without notice. The products described in this document are NOT intended for use in applications such as implantation, life support, or other hazardous uses where malfunction could result in death, bodily injury, or catastrophic property damage. The information contained in this document does not affect or change IBM product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of IBM or third parties. All information contained in this document was obtained in specific environments, and is presented as an illustration. The results obtained in other operating environments may vary.
While the information contained herein is believed to be accurate, such information is preliminary, and should not be relied upon for accuracy or completeness, and no representations or warranties of accuracy or completeness are made.
THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN "AS IS" BASIS. In no event will IBM be liable for damages arising directly or indirectly from any use of the information contained in this document.
IBM Microelectronics Division The IBM home page is http://www.ibm.com1580 Route 52, Bldg. 504 The IBM Microelectronics Division home page is Hopewell Junction, NY 12533-6351 http://www.chips.ibm.com
© 2009 IBM Corporation
Processor and technology development GSE
35
© 2009 IBM Corporation
Publicly Available Information
•Introduction to the Cell Broadband Engine White Paper•Cell Broadband Engine Public Registers Guide (subset of CDA version) •Cell Broadband Engine Linux Reference Implementation Application Binary Interface Specification •SPU C/C++ Language Extensions Software Reference Manual•SPU Application Binary Interface Specifications•SPU Assembly Language Specifications•Broadband Engine Linux Application Binary Interface Specification•Cell Broadband Engine SDK Libraries, Overview and User’s Guide•Cell Broadband Engine Architecture•Cell Broadband Engine Datasheet•SPU Instruction Set Architecture Specifications•Cell Broadband Engine Processor Full System Simulator•XLC Alpha Edition for Cell Broadband Engine•IBM Cell Broadband Engine Software Sample and Library Source Code•GCC Toolchain for Cell Broadband Engine•Cell Broadband Engine SPE Management Library•Linux Kernel patch for Cell Broadband Engine•SDK Installation script
•Introduction to the Cell Microprocessor, Article•A 4.8GHz Fully Pipelined Embedded SRAM in the Streaming Processor of a Cell Processor, Article•A Double-Precision Multiplier with Fine-Grained Clock-Gating Support for a First-Generation Cell Processor, Article•A Streaming Processing Unit for a Cell Processor, Article•The Design and Implementation of a First-Generation Cell Processor, Article•Microprocessor Report - Cell Moves into the Limelight, Analyst Report •Microprocessor Reports - 2004 Technology Awards, Analyst Report
Processor and technology development GSE