Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s M....
-
Upload
lynette-harrison -
Category
Documents
-
view
217 -
download
1
Transcript of Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s M....
Atomistic nanoelectronic device engineering with sustained
performances up to 1.44 PFlop/s
M. Luisier, T. Boykin, G. Klimeck, and W. FichtnerETH Zurich, University of Alabama, Purdue University
Freitag, 21. April 2023 1Integrated Systems Laboratory, ETH Zurich
Freitag, 21. April 2023 2
• Nanoelectronics and HPC
From Moore’s Law to OMEN• Numerical Simulation of Nanoscale Devices
HEMT, CNT, and BTBT Transistors• Code Implementation
Physical Models
Parallelization Scheme
Numerical Algorithms• Performance Benchmarks on Jaguar• Outlook and Conclusion
Overview
Nanoelectronics in HPC
3Freitag, 21. April 2023
0.35μm (1995)
90nm (2002)
45nm (2008)
22nm (2011)
Number of transistors per chip doubles every 2 years (Moore’s law)
2011 breakthrough:3-D FinFETs instead of planar MOSFETs
Lg
Lg
Lg
Source: Intel Corporation
???
???
8nm (2020)
To keep Moore’s law:New breakthrough required by 2020
Freitag, 21. April 2023 4
Next Generation Devices
Production:around 2020
P. Hashemi et al., EDL 30, 401 (2009)
L. Tapasztó et al., Nat. Nano. 3, 397 (2008)
Y.Q. Wu et al., EDL 30, 700 (2009)
W.Y. Choi et al., EDL 28, 743 (2007)
Nanowire Graphene III-V UTB BTB Tunneling CNT
NEEDED: Fast, cheap, and reliable platform to support the development and accelerate the innovation of novel nanoelectronic devices
Physics-based Numerical Device Simulator OMEN
Supratik Guha, IBM Research
OMEN
Physical ModelsDevice Engineering
Efficient Parallel Computing
GAA NW
ElectronDensity
Id-Vgs
Para
lleliz
atio
n
Scheme
• 3D Quantum Transport Solver• Accurate Representation of the
Semiconductor Properties• Atomistic Description of Devices• Multi-Physics Modeling
• Industrial-Strength Nano-electronic Device Simulator
• Multi-Geometry Capabilities • Explore, Understand, Explain,
Optimize Novel Designs
What is OMEN? HPC in Nanoelectronics
First Peta-scale Engineering Application
• Accelerate Simulation Time• Investigate New Phenomena
at the Nanometer Scale• Move Hero Experiments to a
Day-to-Day Basis
5Freitag, 21. April 2023
6Freitag, 21. April 2023
Overview
• Nanoelectronics and HPC
From Moore’s Law to OMEN• Numerical Simulation of Nanoscale Devices
HEMT, CNT, and BTBT Transistors• Code Implementation
Physical Models
Parallelization Scheme
Numerical Algorithms• Performance Benchmarks on Jaguar• Outlook and Conclusion
7Freitag, 21. April 2023
strained InAs
In0.53Ga0.47As
In0.53Ga0.47As
In0.52Al0.48As
In0.52Al0.48As
Si δ-doping
(1) III-V HEMT Simulations
Expt: J. del Alamo @ MIT
OMEN Device
Structure
VS VD
VG
Thermionic Current over a Potential
BarrierOFF
ON
CB
Publications: IEDM 2008, IEDM 2009, IEEE TED 2011
8Freitag, 21. April 2023
(2) CNT FET Simulations
HfO2HfO2 Gate
Drain
Air
Source
Expt: A. Franklin @ IBM YH
OMENDevice
Structure
Lg=9nm
AmbipolarCurrent Flow
Efl
Efr
Sou
rce
Dra
in
CB
VB
Id-Vgs
Characteristics
Publication: IEDM 2011, submitted to Nano Letters 2011
9Freitag, 21. April 2023
(3) BTBT Diode Simulations
Expt: S. Rommel @ RIT S. Datta @ PSU
Discrepancy due to measurement setup
Band-to-band Tunneling Current
CBVB
P+
N+
P+ source
N+ drain
Contact
Contact
OMENDevice
Structure
Zener Current
NDR Current
Publication: TECHCON 2010, submitted to APL 2011
One single, multi-geometry, multi-physics code for a wide range of different nanoscale applications
OMEN already used by device engineers in industry at Intel and
Global Foundries
10Freitag, 21. April 2023
Overview
• Nanoelectronics and HPC
From Moore’s Law to OMEN• Numerical Simulation of Nanoscale Devices
HEMT, CNT, and BTBT Transistors• Code Implementation
Physical Models
Parallelization Scheme
Numerical Algorithms• Performance Benchmarks on Jaguar• Outlook and Conclusion
11Freitag, 21. April 2023
Ballistic (Wave Function)
Ax=b
(E-H-ΣRB)·C = Inj(E-H-ΣRB+ΣRS)·GR = IG< = GR·(Σ<B+Σ<S)·GR†
Scattering (NEGF)AB=C
Tight-Binding Ansatz for the Wave Function
Multi-Dimensional Schrödinger Equation with OBCs
H | ψE > = E | ψE >
< r | ψE > = ∑ Cij(E,kt)Φσ (r - Rijk)eikt·rtσ
σ,ijk,kt
Physical Models: Quantum Transport (i)
12Freitag, 21. April 2023
Physical Models: Quantum Transport (ii)
Carriers Localized around Atom Positions
ρ(r) = Fρ∑ ∫ dE |Ci(E,k)|2 δ(r - ri)
Current along Bonds Connecting two Atoms
J(r) = FJ∑ ∫ dE Im{Ci(E,k)·Hij·Cj(E,k)} (rj-ri)δ(r - ri)ij,k
Solve Poisson Equation on FEM Grid
ΔV(r) = -ρ(r)/ε(r)
i,k
Repeat till ρ(r) and V(r) Convergence
13Freitag, 21. April 2023
Parallelization Scheme
Objective:•Nanoelectronic Device Simulations with Quantum Transport and Atomistic Basis
Approach:•Multi-Level parallelism
•Voltage•Momentum•Energy•Space
•Parameter sweep over voltages•Dynamic load balancing in double integral
• Leverage of existing linear solvers (Pardiso, MUMPs, SuperLU, Umfpack, …)
•Novel:•Development of new solvers (Block Cyclic Reduction) with Computational Interleaving between BC and sparse LSE
Initialization of Structureand Hamiltonian Matrix
Initialize New Bias V
Update Potential
Get Momentum k
Get Energy E
Solve Schrödinger Eq. for (V,k,E)
All E?
All k?
Charge and Current
Poisson Eq.
Convergence?
All V?
Done
Loop
ove
r V
olta
ges
Loop
ove
r M
omen
tum
Loop
ove
r E
nerg
y
Sel
f-co
nsis
tent
Poi
sson
Ite
ratio
ns
Quad-Level Parallelisation SchemeTested on multiple platforms
14Freitag, 21. April 2023
Overview
• Nanoelectronics and HPC
From Moore’s Law to OMEN• Numerical Simulation of Nanoscale Devices
HEMT, CNT, and BTBT Transistors• Code Implementation
Physical Models
Parallelization Scheme
Numerical Algorithms• Performance Benchmarks on Jaguar• Outlook and Conclusion
15Freitag, 21. April 2023
Benchmarks: End-to-end Device Simulations
strained InAs
In0.53Ga0.47As
In0.53Ga0.47As
In0.52Al0.48As
In0.52Al0.48As
Si δ-doping
5nm
2nm
3nm
40nm
3nm
3nm
55nm 55nm
p+ InAs
intrinsic InAs
n- InAs
HfO2
HfO2
16nm
2nm
2nm
40nm25nm 90nm
Double-Gate InAs BTBT FET Single-Gate MQW III-V HEMT (MIT)
Specifications:
•unsymmetric single-material structure
•electron and hole current flow
•sp3s* tight-binding with SO coupling
•NA=54,272 atoms in active region
•sizeof(A)=542,720 in Ax=b (|| on 9 CPU)
Specifications:
•symmetric multi-quantum-well structure
•electron flow only, mainly in s-InAs
•sp3d5s* tight-binding without SO
•NA=55,226 atoms in active region
•sizeof(A)=552,260 in Ax=b (|| on 9 CPU)
I-V Curve20 Bias Points
I-V Curve20 Bias PointsPat
ent F
iledSame code executable
for both applications: no specific tuning
16Freitag, 21. April 2023
Band-to-band Tunneling Transistor
p+ InAs
intrinsic InAs
n- InAs
HfO2
HfO2
Double Precision Strong Scaling up to 221,400 Cores
• 4 parallel levels
• maximum of 11,070 cores per bias
• ~20 years on a single core
• <1 hour on 221,400 cores
• almost ideal speed-up till 221,400 cores
• 1.28 PFlop/s
• 55.4% of peak
82×
78.5
×
96% || efficiency
1.28 PFlo
p/s
17Freitag, 21. April 2023
High Electron Mobility Transistor
strained InAs
In0.53Ga0.47As
In0.53Ga0.47As
In0.52Al0.48As
In0.52Al0.48As
• 4 parallel levels
• maximum of 11,070 cores per bias
• 5 Poisson iterations
• mixed: last Poisson iteration in double precision
• 1.27 PFlop/s double
• 54% of peak
• 1.44 PFlop/s mixed
Double and Mixed Precision Scheme Strong Scaling from 2,700 up to 221,400 Cores
82×
75.5
×
92% || efficiency1.44 P
Flop/s
1.27 PFlo
p/s
18Freitag, 21. April 2023
Evolution of Nanoelectronic Device Simulation
strained InAs
In0.53Ga0.47As
In0.53Ga0.47As
In0.52Al0.48As
In0.52Al0.48As
NEGF
MUMPSMUMPS BCR BCR BCR
Wal
ltim
e (s
)
8000
0
Experiment
Load Balance
Computational
Interleaving
Mixed
Time to compute 1 Poisson Iteration for 1 Bias Point on 11,070 cores
• NEGF: most popular technique, but not most efficient
• WF: computationally more efficient
• BCR: 20% faster than MUMPS and allows comp. interleaving
• as compared to standard techniques, OMEN 10.7x faster (double precision)
4x
1.7x1.2x
1.3x
1.1x
10.7x
19Freitag, 21. April 2023
Overview
• Nanoelectronics and HPC
From Moore’s Law to OMEN• Numerical Simulation of Nanoscale Devices
HEMT, CNT, and BTBT Transistors• Code Implementation
Physical Models
Parallelization Scheme
Numerical Algorithms• Performance Benchmarks on Jaguar• Outlook and Conclusion
20Freitag, 21. April 2023
Outlook: Could we run on larger systems?
strained InAs
In0.53Ga0.47As
In0.53Ga0.47As
In0.52Al0.48As
In0.52Al0.48As
So far: end-to-end simulation of I-V curve with 20 bias points on 221,400 cores
=> 11,070 cores per bias point
Bias NA CPU DD CPU PFlop/s Peak (%)
Case 1 20 55,226 221,400 9 1.27 54
Case 2 10 117,455 219,300 17 1.29 56.5
Fact: the loop over bias points is embarrassingly parallel
Consequence: “Case 2” with 20 instead of 10 bias points could eaily run on 2*219,300 = 438,600 cores and still
reach more than 50% of peak performance
21Freitag, 21. April 2023
Conclusion
• Nanoelectronic Device Simulations Required: quantum transport simulator
• OMEN Simulation Approach Good agreement with experimental data Dedicated to large variety of nanoscale devices Multi-geometry and multi-physics approach Sustained performance up to 1.44 PFlop/s
• Future Work and Challenges Development of new physical models Code modifications to benefit from GPUs
strained InAs
In0.53Ga0.47As
In0.53Ga0.47As
In0.52Al0.48As
In0.52Al0.48As
Si δ-doping
HfO2HfO2 Gate
Drain
Air
Source
Acknowledgment
Freitag, 21. April 2023 22