Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

22
Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s M. Luisier, T. Boykin, G. Klimeck, and W. Fichtner ETH Zurich, University of Alabama, Purdue University Montag, 27. Juni 2022 1 Integrated Systems Laboratory, ETH Zurich

description

Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s. M. Luisier , T. Boykin, G. Klimeck , and W. Fichtner ETH Zurich, University of Alabama, Purdue University. Overview. Nanoelectronics in HPC. ???. ???. - PowerPoint PPT Presentation

Transcript of Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

Page 1: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

Atomistic nanoelectronic device engineering with sustained

performances up to 1.44 PFlop/s

M. Luisier, T. Boykin, G. Klimeck, and W. FichtnerETH Zurich, University of Alabama, Purdue University

Freitag, 21. April 2023 1Integrated Systems Laboratory, ETH Zurich

Page 2: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

Freitag, 21. April 2023 2

• Nanoelectronics and HPC

From Moore’s Law to OMEN• Numerical Simulation of Nanoscale Devices

HEMT, CNT, and BTBT Transistors• Code Implementation

Physical Models

Parallelization Scheme

Numerical Algorithms• Performance Benchmarks on Jaguar• Outlook and Conclusion

Overview

Page 3: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

Nanoelectronics in HPC

3Freitag, 21. April 2023

0.35μm (1995)

90nm (2002)

45nm (2008)

22nm (2011)

Number of transistors per chip doubles every 2 years (Moore’s law)

2011 breakthrough:3-D FinFETs instead of planar MOSFETs

Lg

Lg

Lg

Source: Intel Corporation

???

???

8nm (2020)

To keep Moore’s law:New breakthrough required by 2020

Page 4: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

Freitag, 21. April 2023 4

Next Generation Devices

Production:around 2020

P. Hashemi et al., EDL 30, 401 (2009)

L. Tapasztó et al., Nat. Nano. 3, 397 (2008)

Y.Q. Wu et al., EDL 30, 700 (2009)

W.Y. Choi et al., EDL 28, 743 (2007)

Nanowire Graphene III-V UTB BTB Tunneling CNT

NEEDED: Fast, cheap, and reliable platform to support the development and accelerate the innovation of novel nanoelectronic devices

Physics-based Numerical Device Simulator OMEN

Supratik Guha, IBM Research

Page 5: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

OMEN

Physical ModelsDevice Engineering

Efficient Parallel Computing

GAA NW

ElectronDensity

Id-Vgs

Para

lleliz

atio

n

Scheme

• 3D Quantum Transport Solver• Accurate Representation of the

Semiconductor Properties• Atomistic Description of Devices• Multi-Physics Modeling

• Industrial-Strength Nano-electronic Device Simulator

• Multi-Geometry Capabilities • Explore, Understand, Explain,

Optimize Novel Designs

What is OMEN? HPC in Nanoelectronics

First Peta-scale Engineering Application

• Accelerate Simulation Time• Investigate New Phenomena

at the Nanometer Scale• Move Hero Experiments to a

Day-to-Day Basis

5Freitag, 21. April 2023

Page 6: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

6Freitag, 21. April 2023

Overview

• Nanoelectronics and HPC

From Moore’s Law to OMEN• Numerical Simulation of Nanoscale Devices

HEMT, CNT, and BTBT Transistors• Code Implementation

Physical Models

Parallelization Scheme

Numerical Algorithms• Performance Benchmarks on Jaguar• Outlook and Conclusion

Page 7: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

7Freitag, 21. April 2023

strained InAs

In0.53Ga0.47As

In0.53Ga0.47As

In0.52Al0.48As

In0.52Al0.48As

Si δ-doping

(1) III-V HEMT Simulations

Expt: J. del Alamo @ MIT

OMEN Device

Structure

VS VD

VG

Thermionic Current over a Potential

BarrierOFF

ON

CB

Publications: IEDM 2008, IEDM 2009, IEEE TED 2011

Page 8: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

8Freitag, 21. April 2023

(2) CNT FET Simulations

HfO2HfO2 Gate

Drain

Air

Source

Expt: A. Franklin @ IBM YH

OMENDevice

Structure

Lg=9nm

AmbipolarCurrent Flow

Efl

Efr

Sou

rce

Dra

in

CB

VB

Id-Vgs

Characteristics

Publication: IEDM 2011, submitted to Nano Letters 2011

Page 9: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

9Freitag, 21. April 2023

(3) BTBT Diode Simulations

Expt: S. Rommel @ RIT S. Datta @ PSU

Discrepancy due to measurement setup

Band-to-band Tunneling Current

CBVB

P+

N+

P+ source

N+ drain

Contact

Contact

OMENDevice

Structure

Zener Current

NDR Current

Publication: TECHCON 2010, submitted to APL 2011

One single, multi-geometry, multi-physics code for a wide range of different nanoscale applications

OMEN already used by device engineers in industry at Intel and

Global Foundries

Page 10: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

10Freitag, 21. April 2023

Overview

• Nanoelectronics and HPC

From Moore’s Law to OMEN• Numerical Simulation of Nanoscale Devices

HEMT, CNT, and BTBT Transistors• Code Implementation

Physical Models

Parallelization Scheme

Numerical Algorithms• Performance Benchmarks on Jaguar• Outlook and Conclusion

Page 11: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

11Freitag, 21. April 2023

Ballistic (Wave Function)

Ax=b

(E-H-ΣRB)·C = Inj(E-H-ΣRB+ΣRS)·GR = IG< = GR·(Σ<B+Σ<S)·GR†

Scattering (NEGF)AB=C

Tight-Binding Ansatz for the Wave Function

Multi-Dimensional Schrödinger Equation with OBCs

H | ψE > = E | ψE >

< r | ψE > = ∑ Cij(E,kt)Φσ (r - Rijk)eikt·rtσ

σ,ijk,kt

Physical Models: Quantum Transport (i)

Page 12: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

12Freitag, 21. April 2023

Physical Models: Quantum Transport (ii)

Carriers Localized around Atom Positions

ρ(r) = Fρ∑ ∫ dE |Ci(E,k)|2 δ(r - ri)

Current along Bonds Connecting two Atoms

J(r) = FJ∑ ∫ dE Im{Ci(E,k)·Hij·Cj(E,k)} (rj-ri)δ(r - ri)ij,k

Solve Poisson Equation on FEM Grid

ΔV(r) = -ρ(r)/ε(r)

i,k

Repeat till ρ(r) and V(r) Convergence

Page 13: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

13Freitag, 21. April 2023

Parallelization Scheme

Objective:•Nanoelectronic Device Simulations with Quantum Transport and Atomistic Basis

Approach:•Multi-Level parallelism

•Voltage•Momentum•Energy•Space

•Parameter sweep over voltages•Dynamic load balancing in double integral

• Leverage of existing linear solvers (Pardiso, MUMPs, SuperLU, Umfpack, …)

•Novel:•Development of new solvers (Block Cyclic Reduction) with Computational Interleaving between BC and sparse LSE

Initialization of Structureand Hamiltonian Matrix

Initialize New Bias V

Update Potential

Get Momentum k

Get Energy E

Solve Schrödinger Eq. for (V,k,E)

All E?

All k?

Charge and Current

Poisson Eq.

Convergence?

All V?

Done

Loop

ove

r V

olta

ges

Loop

ove

r M

omen

tum

Loop

ove

r E

nerg

y

Sel

f-co

nsis

tent

Poi

sson

Ite

ratio

ns

Quad-Level Parallelisation SchemeTested on multiple platforms

Page 14: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

14Freitag, 21. April 2023

Overview

• Nanoelectronics and HPC

From Moore’s Law to OMEN• Numerical Simulation of Nanoscale Devices

HEMT, CNT, and BTBT Transistors• Code Implementation

Physical Models

Parallelization Scheme

Numerical Algorithms• Performance Benchmarks on Jaguar• Outlook and Conclusion

Page 15: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

15Freitag, 21. April 2023

Benchmarks: End-to-end Device Simulations

strained InAs

In0.53Ga0.47As

In0.53Ga0.47As

In0.52Al0.48As

In0.52Al0.48As

Si δ-doping

5nm

2nm

3nm

40nm

3nm

3nm

55nm 55nm

p+ InAs

intrinsic InAs

n- InAs

HfO2

HfO2

16nm

2nm

2nm

40nm25nm 90nm

Double-Gate InAs BTBT FET Single-Gate MQW III-V HEMT (MIT)

Specifications:

•unsymmetric single-material structure

•electron and hole current flow

•sp3s* tight-binding with SO coupling

•NA=54,272 atoms in active region

•sizeof(A)=542,720 in Ax=b (|| on 9 CPU)

Specifications:

•symmetric multi-quantum-well structure

•electron flow only, mainly in s-InAs

•sp3d5s* tight-binding without SO

•NA=55,226 atoms in active region

•sizeof(A)=552,260 in Ax=b (|| on 9 CPU)

I-V Curve20 Bias Points

I-V Curve20 Bias PointsPat

ent F

iledSame code executable

for both applications: no specific tuning

Page 16: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

16Freitag, 21. April 2023

Band-to-band Tunneling Transistor

p+ InAs

intrinsic InAs

n- InAs

HfO2

HfO2

Double Precision Strong Scaling up to 221,400 Cores

• 4 parallel levels

• maximum of 11,070 cores per bias

• ~20 years on a single core

• <1 hour on 221,400 cores

• almost ideal speed-up till 221,400 cores

• 1.28 PFlop/s

• 55.4% of peak

82×

78.5

×

96% || efficiency

1.28 PFlo

p/s

Page 17: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

17Freitag, 21. April 2023

High Electron Mobility Transistor

strained InAs

In0.53Ga0.47As

In0.53Ga0.47As

In0.52Al0.48As

In0.52Al0.48As

• 4 parallel levels

• maximum of 11,070 cores per bias

• 5 Poisson iterations

• mixed: last Poisson iteration in double precision

• 1.27 PFlop/s double

• 54% of peak

• 1.44 PFlop/s mixed

Double and Mixed Precision Scheme Strong Scaling from 2,700 up to 221,400 Cores

82×

75.5

×

92% || efficiency1.44 P

Flop/s

1.27 PFlo

p/s

Page 18: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

18Freitag, 21. April 2023

Evolution of Nanoelectronic Device Simulation

strained InAs

In0.53Ga0.47As

In0.53Ga0.47As

In0.52Al0.48As

In0.52Al0.48As

NEGF

MUMPSMUMPS BCR BCR BCR

Wal

ltim

e (s

)

8000

0

Experiment

Load Balance

Computational

Interleaving

Mixed

Time to compute 1 Poisson Iteration for 1 Bias Point on 11,070 cores

• NEGF: most popular technique, but not most efficient

• WF: computationally more efficient

• BCR: 20% faster than MUMPS and allows comp. interleaving

• as compared to standard techniques, OMEN 10.7x faster (double precision)

4x

1.7x1.2x

1.3x

1.1x

10.7x

Page 19: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

19Freitag, 21. April 2023

Overview

• Nanoelectronics and HPC

From Moore’s Law to OMEN• Numerical Simulation of Nanoscale Devices

HEMT, CNT, and BTBT Transistors• Code Implementation

Physical Models

Parallelization Scheme

Numerical Algorithms• Performance Benchmarks on Jaguar• Outlook and Conclusion

Page 20: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

20Freitag, 21. April 2023

Outlook: Could we run on larger systems?

strained InAs

In0.53Ga0.47As

In0.53Ga0.47As

In0.52Al0.48As

In0.52Al0.48As

So far: end-to-end simulation of I-V curve with 20 bias points on 221,400 cores

=> 11,070 cores per bias point

Bias NA CPU DD CPU PFlop/s Peak (%)

Case 1 20 55,226 221,400 9 1.27 54

Case 2 10 117,455 219,300 17 1.29 56.5

Fact: the loop over bias points is embarrassingly parallel

Consequence: “Case 2” with 20 instead of 10 bias points could eaily run on 2*219,300 = 438,600 cores and still

reach more than 50% of peak performance

Page 21: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

21Freitag, 21. April 2023

Conclusion

• Nanoelectronic Device Simulations Required: quantum transport simulator

• OMEN Simulation Approach Good agreement with experimental data Dedicated to large variety of nanoscale devices Multi-geometry and multi-physics approach Sustained performance up to 1.44 PFlop/s

• Future Work and Challenges Development of new physical models Code modifications to benefit from GPUs

strained InAs

In0.53Ga0.47As

In0.53Ga0.47As

In0.52Al0.48As

In0.52Al0.48As

Si δ-doping

HfO2HfO2 Gate

Drain

Air

Source

Page 22: Atomistic nanoelectronic device engineering with sustained performances up to 1.44 PFlop/s

Acknowledgment

Freitag, 21. April 2023 22