European energy efficient supercomputer project · All references to unavailable products are...

15
This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/2007-2013] under grant agreement n° 288777. http://www.montblanc-project.eu European energy efficient supercomputer project Simon McIntosh-Smith University of Bristol (Based on slides from Alex Ramirez, BSC) Disclaimer: Speaking for myself ... All references to unavailable products are speculative, taken from web sources. There is no commitment from ARM, Samsung, TI, Nvidia, Bull, or others, implied.

Transcript of European energy efficient supercomputer project · All references to unavailable products are...

Page 1: European energy efficient supercomputer project · All references to unavailable products are speculative, taken from web sources. There is no commitment from ARM, Samsung, TI, Nvidia,

This project and the research leading to these results has received funding from the European Community's Seventh Framework Programme [FP7/2007-2013] under grant agreement n° 288777.

http://www.montblanc-project.eu

European energy efficient

supercomputer project

Simon McIntosh-Smith

University of Bristol

(Based on slides from Alex Ramirez, BSC)

Disclaimer: Speaking for myself ... All references to unavailable products are speculative, taken from web sources.

There is no commitment from ARM, Samsung, TI, Nvidia, Bull, or others, implied.

Page 2: European energy efficient supercomputer project · All references to unavailable products are speculative, taken from web sources. There is no commitment from ARM, Samsung, TI, Nvidia,

Mont-Blanc 1 project goals

• To develop a European Exascale approach

• Based on embedded power-efficient technology

• Objectives

• Develop a first prototype system, using available technology

• Design a Next Generation system, to overcome the limitations

• Develop a set of Exascale applications targeting the new system

Page 3: European energy efficient supercomputer project · All references to unavailable products are speculative, taken from web sources. There is no commitment from ARM, Samsung, TI, Nvidia,

Role of Bristol-based multicore expertise

• One Bristol-based partner in Mont Blanc 1 – Gnodal

• Fast, energy efficient Ethernet interconnect

• My group at the University of Bristol has been a user of

the first Mont Blanc prototype machine

• Code development, scaling tests etc

• My group was subsequently invited to join the project

team for the Mont Blanc 2 proposal

• Currently in progress, outcome to be announced soon

• Roger Shepherd’s team at ST Bristol also a new partner

for the Mont Blanc 2 proposal

• So now 3 of the 12 MB2 partners based in Bristol!

Page 4: European energy efficient supercomputer project · All references to unavailable products are speculative, taken from web sources. There is no commitment from ARM, Samsung, TI, Nvidia,

4

In the beginning ... there were supercomputers

• Built to order

• Very few of them

• Special purpose hardware

• Very expensive

• Control Data, Convex, ...

• Cray-1

• 1975, 160 MFLOPS

• 80 units, 5-8 M$

• Cray X-MP

• 1982, 800 MFLOPS

• Cray-2

• 1985, 1.9 GFLOPS

• Cray Y-MP

• 1988, 2.6 GFLOPS

• Fortran+vectorizing compilers

4

Page 5: European energy efficient supercomputer project · All references to unavailable products are speculative, taken from web sources. There is no commitment from ARM, Samsung, TI, Nvidia,

5

The Killer Microprocessors

• Microprocessors killed the Vector supercomputers

• They were not faster ...

• ... but they were significantly cheaper (and greener)

• Initially needed ~10 microprocessors to achieve equivalent

performance to 1 Vector CPU

• SIMD vs. MIMD programming paradigms

Cray-1, Cray-C90

NEC SX4, SX5

Alpha AV4, EV5

Intel Pentium

IBM P2SC

HP PA8200

1974 1979 1984 1989 1994 1999 10

100

1000

10,000

MF

LO

PS

Page 6: European energy efficient supercomputer project · All references to unavailable products are speculative, taken from web sources. There is no commitment from ARM, Samsung, TI, Nvidia,

6

Then, commodity displaced special purpose

• ASCI Red, Sandia, #1 in Top500

• 1997, 1 TFLOPS (Linpack),

• 9,298 cores @ 200 MHz

• Intel Pentium Pro 32-bit CPUs

6

Top500 systems by Processor Family

By November 2008 this had grown to 85.8%.

November 2000, x86 account for 1.2% of systems.

Page 7: European energy efficient supercomputer project · All references to unavailable products are speculative, taken from web sources. There is no commitment from ARM, Samsung, TI, Nvidia,

7

The next step in the commodity chain

• Total cores in Jun'12 Top500

• 13.5 Mcores

• Predicted tablets sold in 2013*

• 182 Mtablets

• Predicted smartphones sold in ’13*

• ~1 Bsmartphone

7

HPC

Servers

Desktop

Mobile

Gartner: http://www.gartner.com/newsroom/id/1980115,

http://www.gartner.com/newsroom/id/2335616

Page 8: European energy efficient supercomputer project · All references to unavailable products are speculative, taken from web sources. There is no commitment from ARM, Samsung, TI, Nvidia,

8

ARM Processor improvements in DP FLOPS

• IBM BG/Q and Intel AVX implement DP in 256-bit SIMD

• 8 DP ops / cycle

• ARM quickly moved from optional floating-point to state-of-the-art

• ARMv8 ISA introduces DP in the NEON instruction set (128-bit SIMD)

8

Double

Pre

cis

ion (

DP

) ops/c

ycle

1

2

4

8

16

2015

ARM CortexTM-A9

ARM CortexTM-A15

ARMv8

IBM

BG/Q

Intel

AVX

IBM

BG/P

2013 2011 2009 2007 2005 2003 2001 1999

Intel

SSE2

Page 9: European energy efficient supercomputer project · All references to unavailable products are speculative, taken from web sources. There is no commitment from ARM, Samsung, TI, Nvidia,

Integrated ARM GPU performance

• GPU compute performance has increased very rapidly

as it catches up to the available silicon resource

2012 2013 2014

Mali-T604 First Midgard architecture product

Scalable to 4 cores

68 GFLOPS*

Mali-T658 High-end solution + compute capability

Scalable to 8 cores, ARMv8 compatible

272 GFLOPS*

Skrymir

Perf

orm

ance

* Data from web sources, not an ARM commitment

Page 10: European energy efficient supercomputer project · All references to unavailable products are speculative, taken from web sources. There is no commitment from ARM, Samsung, TI, Nvidia,

Are the “Killer Mobiles™" coming?

• Where is the sweet spot? Maybe in the low-end ... • Today ~ 1:8 ratio in performance, 1:100 ratio in cost

• Tomorrow ~ 1:2 ratio in performance, still 1:100 in cost ?

• The same reason why microprocessors killed supercomputers • Not so much performance ... but much lower cost, and power

• Of course multiple vendors will ship these products … • ARM, Intel, AMD, Imagination, …

Performance (log2)

Cost (log

10)

Mobile ($20)

Desktop ($150)

Server ($1500)

Today

Near future

HPC-Mobile ($40) ?

Page 11: European energy efficient supercomputer project · All references to unavailable products are speculative, taken from web sources. There is no commitment from ARM, Samsung, TI, Nvidia,

Mont-Blanc ARM-based prototype roadmap

• Prototypes are critical to accelerate software development

• System software stack + applications

2011 2012 2013 2014

GF

LO

PS

/ W

Tibidabo:

ARM multicore

Pedraforca:

ARM + GPU

Integrated

ARM + GPU

Page 12: European energy efficient supercomputer project · All references to unavailable products are speculative, taken from web sources. There is no commitment from ARM, Samsung, TI, Nvidia,

• Dual-core ARM Cortex-A15 @ (up to 1.7 GHz) • VFP for 64-bit F.P.

• 6.8 GFLOPS (1 FMA / cycle)

• NEON for 32-bit F.P. SIMD

• Quad-core ARM Mali T604 • GPU computing:

• OpenCL 1.1

• 68 GFLOPS (32-bit)

• Shared memory between CPU and GPU

Samsung Exynos 5 Dual

Page 13: European energy efficient supercomputer project · All references to unavailable products are speculative, taken from web sources. There is no commitment from ARM, Samsung, TI, Nvidia,

High density packaging architecture

• Standard Bull blade enclosure

• Multiple compute nodes per

blade

• Additional level of

interconnect, on-blade

network

Page 14: European energy efficient supercomputer project · All references to unavailable products are speculative, taken from web sources. There is no commitment from ARM, Samsung, TI, Nvidia,

The hype curve

• We'll see how deep it gets on the way down ...

Vis

ibil

ity

Time

Technology Trigger

Peak of Inflated Expectations

Trough of Disillusionment

Slope of Enlightenment

Plateau of Productivity

Y1

Y3

Page 15: European energy efficient supercomputer project · All references to unavailable products are speculative, taken from web sources. There is no commitment from ARM, Samsung, TI, Nvidia,

Conclusions

• Mont-Blanc architecture is shaping up

• ARM multicore + integrated OpenCL accelerator

• Ethernet NIC

• High density packaging

• OmpSs programming model port to OpenCL

• Range of HPC Applications being ported

• Bristol multicore expertise now being harnessed for MB2

MontBlancEU

@MontBlanc_EU

www.montblanc-project.eu