European energy efficient supercomputer project · All references to unavailable products are...
Transcript of European energy efficient supercomputer project · All references to unavailable products are...
This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/2007-2013] under grant agreement n° 288777.
http://www.montblanc-project.eu
European energy efficient
supercomputer project
Simon McIntosh-Smith
University of Bristol
(Based on slides from Alex Ramirez, BSC)
Disclaimer: Speaking for myself ... All references to unavailable products are speculative, taken from web sources.
There is no commitment from ARM, Samsung, TI, Nvidia, Bull, or others, implied.
Mont-Blanc 1 project goals
• To develop a European Exascale approach
• Based on embedded power-efficient technology
• Objectives
• Develop a first prototype system, using available technology
• Design a Next Generation system, to overcome the limitations
• Develop a set of Exascale applications targeting the new system
Role of Bristol-based multicore expertise
• One Bristol-based partner in Mont Blanc 1 – Gnodal
• Fast, energy efficient Ethernet interconnect
• My group at the University of Bristol has been a user of
the first Mont Blanc prototype machine
• Code development, scaling tests etc
• My group was subsequently invited to join the project
team for the Mont Blanc 2 proposal
• Currently in progress, outcome to be announced soon
• Roger Shepherd’s team at ST Bristol also a new partner
for the Mont Blanc 2 proposal
• So now 3 of the 12 MB2 partners based in Bristol!
4
In the beginning ... there were supercomputers
• Built to order
• Very few of them
• Special purpose hardware
• Very expensive
• Control Data, Convex, ...
• Cray-1
• 1975, 160 MFLOPS
• 80 units, 5-8 M$
• Cray X-MP
• 1982, 800 MFLOPS
• Cray-2
• 1985, 1.9 GFLOPS
• Cray Y-MP
• 1988, 2.6 GFLOPS
• Fortran+vectorizing compilers
4
5
The Killer Microprocessors
• Microprocessors killed the Vector supercomputers
• They were not faster ...
• ... but they were significantly cheaper (and greener)
• Initially needed ~10 microprocessors to achieve equivalent
performance to 1 Vector CPU
• SIMD vs. MIMD programming paradigms
Cray-1, Cray-C90
NEC SX4, SX5
Alpha AV4, EV5
Intel Pentium
IBM P2SC
HP PA8200
1974 1979 1984 1989 1994 1999 10
100
1000
10,000
MF
LO
PS
6
Then, commodity displaced special purpose
• ASCI Red, Sandia, #1 in Top500
• 1997, 1 TFLOPS (Linpack),
• 9,298 cores @ 200 MHz
• Intel Pentium Pro 32-bit CPUs
6
Top500 systems by Processor Family
By November 2008 this had grown to 85.8%.
November 2000, x86 account for 1.2% of systems.
7
The next step in the commodity chain
• Total cores in Jun'12 Top500
• 13.5 Mcores
• Predicted tablets sold in 2013*
• 182 Mtablets
• Predicted smartphones sold in ’13*
• ~1 Bsmartphone
7
HPC
Servers
Desktop
Mobile
Gartner: http://www.gartner.com/newsroom/id/1980115,
http://www.gartner.com/newsroom/id/2335616
8
ARM Processor improvements in DP FLOPS
• IBM BG/Q and Intel AVX implement DP in 256-bit SIMD
• 8 DP ops / cycle
• ARM quickly moved from optional floating-point to state-of-the-art
• ARMv8 ISA introduces DP in the NEON instruction set (128-bit SIMD)
8
Double
Pre
cis
ion (
DP
) ops/c
ycle
1
2
4
8
16
2015
ARM CortexTM-A9
ARM CortexTM-A15
ARMv8
IBM
BG/Q
Intel
AVX
IBM
BG/P
2013 2011 2009 2007 2005 2003 2001 1999
Intel
SSE2
Integrated ARM GPU performance
• GPU compute performance has increased very rapidly
as it catches up to the available silicon resource
2012 2013 2014
Mali-T604 First Midgard architecture product
Scalable to 4 cores
68 GFLOPS*
Mali-T658 High-end solution + compute capability
Scalable to 8 cores, ARMv8 compatible
272 GFLOPS*
Skrymir
Perf
orm
ance
* Data from web sources, not an ARM commitment
Are the “Killer Mobiles™" coming?
• Where is the sweet spot? Maybe in the low-end ... • Today ~ 1:8 ratio in performance, 1:100 ratio in cost
• Tomorrow ~ 1:2 ratio in performance, still 1:100 in cost ?
• The same reason why microprocessors killed supercomputers • Not so much performance ... but much lower cost, and power
• Of course multiple vendors will ship these products … • ARM, Intel, AMD, Imagination, …
Performance (log2)
Cost (log
10)
Mobile ($20)
Desktop ($150)
Server ($1500)
Today
Near future
HPC-Mobile ($40) ?
Mont-Blanc ARM-based prototype roadmap
• Prototypes are critical to accelerate software development
• System software stack + applications
2011 2012 2013 2014
GF
LO
PS
/ W
Tibidabo:
ARM multicore
Pedraforca:
ARM + GPU
Integrated
ARM + GPU
• Dual-core ARM Cortex-A15 @ (up to 1.7 GHz) • VFP for 64-bit F.P.
• 6.8 GFLOPS (1 FMA / cycle)
• NEON for 32-bit F.P. SIMD
• Quad-core ARM Mali T604 • GPU computing:
• OpenCL 1.1
• 68 GFLOPS (32-bit)
• Shared memory between CPU and GPU
Samsung Exynos 5 Dual
High density packaging architecture
• Standard Bull blade enclosure
• Multiple compute nodes per
blade
• Additional level of
interconnect, on-blade
network
The hype curve
• We'll see how deep it gets on the way down ...
Vis
ibil
ity
Time
Technology Trigger
Peak of Inflated Expectations
Trough of Disillusionment
Slope of Enlightenment
Plateau of Productivity
Y1
Y3
Conclusions
• Mont-Blanc architecture is shaping up
• ARM multicore + integrated OpenCL accelerator
• Ethernet NIC
• High density packaging
• OmpSs programming model port to OpenCL
• Range of HPC Applications being ported
• Bristol multicore expertise now being harnessed for MB2
MontBlancEU
@MontBlanc_EU
www.montblanc-project.eu