THIS PROJECT HAS RECEIVED FUNDING FROM THE EUROPEAN … · 2019-10-09 · Today and next generation...
Transcript of THIS PROJECT HAS RECEIVED FUNDING FROM THE EUROPEAN … · 2019-10-09 · Today and next generation...
THIS PROJECT HAS RECEIVED FUNDING FROM THE EUROPEAN UNION’S HORIZON 2020 RESEARCH AND INNOVATION
PROGRAMME UNDER GRANT AGREEMENT NO 826647
* FPA : Framework Partnership Agreement
* FP8 : Framework Programmes 8 for 2014-2020, succeeding FP7 (2007-2013)
➔
1018
EU - FPA
System vendors
EPI HPC blade
EPI mother board
EPI
Common Platform
PCIe card
EPI
Automotive
PoC
Proof of concept by EPI project
* Pre-ExaScale level with general-purpose CPU core in the first EPI GPP chip
* Develop acceleration technologies for better DP GFLOPS/Watt performance
* Inclusion of MPPA for real-time application acceleration
* Develop a Common Platform to enable EPI accelerations
* Adopt Arm general-purpose CPU core with SVE / vector acceleration in the first EPI chip
* Supply sufficient Memory Bandwidth (Byte/FLOP) to support the GPP application
* in SGA1, focus on programming models to include accelerations.
* (to-be-engineered)
100 EFLOPS
10 EFLOPS
1 EFLOPS
100 PFLOPS
10 PFLOPS 2 nJ/FLOP
1 PFLOPS 200 pJ/FLOP
100 TFLOPS 20 pJ/FLOP
10 TFLOPS 2 pJ/FLOP
1 TFLOPS 0.2 pJ/FLOP
20
00
20
01
20
02
20
03
20
04
20
05
20
06
20
07
20
08
20
09
20
10
20
11
20
12
20
13
20
14
20
15
20
16
20
17
20
18
20
19
20
20
20
21
20
22
20
23
20
24
20
25
20
26
20
27
20
28
20
29
20
30
20
31
20
32
20
33
x10 every 4 years
/10 every 4 years
10x energy efficiency improvement every 4 years<
PERFORMANCE
ENERGY PER OPERATION*
* assuming 20 MWatt supercomputer
1018 FLOPs @ 20MW?
Source from Kunle Olukotun, Lance Hammond, Herb Sutter,
Burton Smith, Chris Batten, and Krste Asanovic
Max. frequency
reached
Max. power
reached
Max. nb of
transistors reached
Max. frequency
reached
CPU
Cache
Memory
Bus
NIC(Network
Interconnect)
Transistor nb +++
Frequency +
Power density =
Source from Kunle Olukotun, Lance Hammond, Herb Sutter,
Burton Smith, Chris Batten, and Krste Asanovic
Max. power
reachedTransistor nb ++
Frequency =
Power density =
NoC + LLC
Cache Cache
MemoryNIC
CacheCache
Source from Kunle Olukotun, Lance Hammond, Herb Sutter,
Burton Smith, Chris Batten, and Krste Asanovic
Max. nb of
transistors reached
Transistor nb +
Frequency =
Total power =
Close
Mem.
High
Speed L
ink
Close
Mem
Close
Mem
Close
Mem
Far
Mem.NIC
Generic processing
HW accelerator
Today and next generation
Source from Kunle Olukotun, Lance Hammond, Herb Sutter,
Burton Smith, Chris Batten, and Krste Asanovic
homogeneous
heterogeneous, accelerated
China
US
Japan
EPI takes 2-step approach
step#1 : homogeneous with Arm core+SVE
step#2 : heterogeneous with additional EPI accelerators Sierra / LLNL, 2019IBM P9 + NVidia GPU125 petaflops (peak)
(2021)Aurora / ANLIntel Xeon + Xe>1.0 exaflops (peak)
(2021)Frontier / ORNLAMD CPU + GPU~1.5 exaflops (peak)
Summit / ORNL, 2019IBM P9 + NVidia GPU200 petaflops (peak)148.6 petaflops
(2020-2021)Tianhe-3 / NUDTMatrix-3000>1.0 exaflops (peak)
(2020-2021)Fugaku / RIKENA64FX (Armv8.2+SVE)>0.5 exaflops
Tianhe-2a /NUDT, 2018Intel Xeon + Matrix-200094.97 petaflops (peak)
Tianhe-2 /NUDT, 2013Intel Xeon + KNC 33.86 petaflops (peak)
K / RIKEN, 2011SPARC64 VIIIfx11.28 petaflops (peak)10.51 petaflops
Sugon Exa-prototypeHygon CPU + DCU
NRCPC Exa-prototypeSW26010 based
Sunway TaihuLight /NRCPCSW26010125.43 petaflops (peak)
(?)Hygon CPU + DCU?
(?)??
MPPA
eFPGA EPAC
HBM memorires
DDR memories
PCIe gen5 links HSL links
D2D links to adjacent chiplets
Arm Core
Functional Block
levelChip level
Package levelBoard / System
level
STX
Bridge to GPP
Bridge to GPP
VPU
VRP EPAC - EPI Accelerator
VPU – Vector Processing Unit
STX – Stencil/Tensor accelerator
VRP - VaRiable Precision co-processor
TITAN (Gen2)
S1 - Common Stream
S2 - GPP Processor
S3 - Acceleration
S4 - Automotive
S5 - Administration
Codesign, Architecture, System software and key
technologies for the Common Platform
Design and implement of the processor
chip(s) and PoC system
Foster acceleration technologies and
create building blocks
Address automotive market needs and
create a pilot eHPC system
Manage and support activities
Application
Experts
Architects
within
Streams
Model and
Modeling
Benchmarks
Requirements
Simulator,
Eval. requirements
Eval. results
+
S2 - GPP Processor
S3 - Acceleration
S4 - Automotive
MUSA(BSC)
SESAM(CEA-List)
gem5(FZJ)
ARM models
ARM v8A Qemu
ARM FastModels
SESAM(CEA-List)
MPPA3 Qemu(Kalray)
gem5(FZJ)
dist-gem5(FZJ)
multiple chiplet
MB2020-EPI NoCinclude MB2020-NoC "configuration"
MUSA(BSC)
gen2(Bull)
EPAC(CEA)
DDR, PCIe,..(CEA)
NoC design
Reference platforms
Modeling Environment
Model built by EPI
Commercial component
Color Legends
annotations of t iming and aging
eHPC MCU(Infineon)
NVM(Forth)
s/w dev.(Bull)
MB2020 NoC (Bull)
ARM cycle accurate model
NoC w/ LLC(CEA-List)
node node ...
node node ...
node node ...
Node with MTBF µnode = 10yrs
Worst case MTTI for a 50K-node system
µ =10yrs/50K = 1.75hrs
node
node
node
* eHPC – embedded HighPerformanceComputing
External Memory
(e.g.NVDIMM,DDR DIMMs)
safety / security monitors
RADAR LIDARUltra
sonicGPS IMS Camera
External Memory
(e.g.NVDIMM,DDR DIMMs)
Automotive
Safety/security
MCU
Trustable Sensor Interfaces
HSL fast link
Peripheral Bus
Credit: INFINEON & EPI
12-20V
Cdie
Lpkg
CpkgCpkg
LDO
Load
1.1-0.6V
REG
Package
Die
Board
DC-DCs
Existing solution in the market
Voff=0 Von=Vdd
Probably Density
leakagecurrent
drive current
Vt-3s +3s
FF
SS
TT
FS
ffg
ssg
SF
nmos
pm
os
Security Infrastructure
Security Services
RoT
Adv. Cryptos
Security Domain #1
CU CU
/
Security Domain #2
CUCUCU CU
CU CU
SIPEARL SAS
78600 Maisons-Laffitte
France
RCS Versailles Siren 851 434 365
WE ACCELERATE ACCELERATORS !!!!
Contact
Philippe NOTTON
+33180835490
R&D in Paris / Grenoble / Sophia Antipolis