Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of...

23
Salishan conference, April 2009 Impacts of Energy Efficiency on Supercomputer Programming Models Craig Stunkel, IBM Research

Transcript of Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of...

Page 1: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

Salishan conference, April 2009

Impacts of Energy Efficiencyon SupercomputerProgramming Models

Craig Stunkel, IBM Research

Page 2: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

What is a programming model?What is a programming model?

A programming model is a May be realized through one orA programming model is a story– A common conceptual

framework – Used by application

May be realized through one or more of:

• Libraries• Language/compiler

extensions – pragmas, – Used by application developers, algorithm designers, compiler-writers, runtime developers, tool builders to communicate with

p gdirectives

• New languagesDifferent programming models may exist at different levels of

b t tieach other and write code.– A good programming model is

a robust story• Makes sense to all stake-

abstractionA good programming model can lead to new industry-wide eco-systems

holders• Meets a critical need

– E.g. Java, Map-Reduce, …

April 2009Programming models, Salishan conference2

Slide courtesy of Vijay Saraswat

Page 3: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

Desirable programming model characteristicsDesirable programming model characteristics

Realizable in existing tool Should support source levelRealizable in existing tool-chains (C, Fortran, Java, OpenMP, scripting languages) with minimal changes

Should support source-level performance debugging (tools)Should provide smooth performance vs effort graph

– E.g. addition of a few directives– Single source codeShould be performance

p g p– With low startup costShould mesh well with scale-out programming model

portable across architecturesShould cover a sweet spot of applications

– Ideal: single unified programming model from accelerators to clusters

– i.e. do really well on targeted workloads on targeted architectures

April 2009Programming models, Salishan conference3

Adapted from Vijay Saraswat

Page 4: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

Programming modelsg g

Applications

Programming Models

System Hardware and Software

April 2009Programming models, Salishan conference4

Page 5: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

Microprocessor Clock Speed Trends

1.0E+04104

Managing power dissipation is limiting clock speed increasesM

Hz)

2004 Frequency Extrapolation

1.0E+03

peed

(M

103

lock

Sp

1.0E+02

Cl

102

April 2009Programming models, Salishan conference5

1990 1995 2000 2005 2010

Page 6: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

Microprocessor Transistor Trend

1.0E+101010

Moore’s (original) Law alive: transistors still increasing exponentially

1.0E+09

sist

ors

109 1 Billion

1.0E+08

of T

rans 108

7

~50% CAGR

1.0E+06

1.0E+07

umbe

r o

106

107

1 Million

1.0E+05

1.0E 06

Nu 10

105

April 2009Programming models, Salishan conference6

1980 1985 1990 1995 2000 2005 2010

Page 7: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

Hardware trends that address the power problem

Trend #1: Multicore processor chips

p p

– Maintain (or even reduce) frequency while replicating cores

Trend #2: Accelerators– Previously, processors would “catch” up with

accelerator function in the next generationaccelerator function in the next generation• Accelerator design expense not amortized well

– New accelerator designs more likely to maintain performance advantage

• And will maintain an enormous power advantage for target workloads

April 2009Programming models, Salishan conference7

Page 8: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

The IBM PowerXCell 8i ProcessorThe IBM PowerXCell 8i Processor

Implementation of CellImplementation of Cell Broadband Engine ArchitectureF ll i li d d bl

D

Fully pipelined double precision FPDDR2 SDRAM support Enhanced DP-Float

DD

R2 C

o

– Up to 16 GB / chipSpeeds & Feeds

– 108 8 DP FLOPSontroller

108.8 DP FLOPS– 217.6 SP FLOPS– 25.6 GB/s mem B/W

April 2009Programming models, Salishan conference8

Page 9: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

Hardware trends that address the power problem

Trend #2b: Heterogeneous multicore in general

p p

– Mixes of powerful cores, smaller cores, and accelerators potentially offer the most efficient nodes

– The challenge is harnessing them efficiently

April 2009Programming models, Salishan conference9

See “Amdahl’s Law in the Multicore Era” by Mark Hill

Page 10: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

Other hardware trends

Tighter integration of memory– Improves both power and performance– But storage is growing further away

Integration of optics– Improves both power and bandwidth

Intrinsically less reliableM t tt k i ltit d f t h i– Must attack via a multitude of techniques

• May also affect programs and programming models

April 2009Programming models, Salishan conference10

Page 11: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

Programming issuesProgramming issues

Many cores per node, and accelerators/heterogeneityFuture performance gains will come via parallelism (not clock speed)– An unwelcome situation for HPC apps!pp

Need new programming models to exploit

At the system/cluster level:– Message-passing to connect node-level languages, or– Global addressing to make communication implicit?

April 2009Programming models, Salishan conference11

Page 12: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

OpenCLOpenCL

New open standard that specifically addresses parallelNew open standard that specifically addresses parallel compute acceleratorsExtension to CProvides data parallel and task parallel modelsProvides data parallel and task parallel modelsFacilitates natural transition from the growing number of (proprietary) CUDA programsPorting of Cell applications to a standard modelPorting of Cell applications to a standard modelPlay wells with MPI– MPI on the host for inter-node communicationCan interoperate with Fortran and OpenMP on the “host”

April 2009Programming models, Salishan conference12

Page 13: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

OpenCLOpenCL

Kernel program runs on the acceleratorKernel program runs on the accelerator

Example kernel for vector add (c[*] = a[*] + b[*]):

__kernel void vec_add(__global const float *a,__global const float *b,__global const float *c)

{int gid = get_global_id(0);c[gid] = a[gid] + b[gid];

}

April 2009Programming models, Salishan conference13

Page 14: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

Partitioned Global Address Space (PGAS)p ( )Address SpaceProcess/Thread Accelerator Address Space

.. .. .. .. .. .. .... ..Accelerator Thread

Shared MemorypThreads, OpenMP, Java

PGAS (UPC, CAF, Titanium)X10, Chapel, Fortress

Message passing MPI

Computation is performed in multiple places.A place contains data that can be operated on remotely.D t li i th l it t d

A datum in one place may reference a datum in another place.Data-structures (e.g. arrays) may be distributed across many places. Places may have different computational

April 2009Programming models, Salishan conference14

Data lives in the place it was created, for its lifetime.

Places may have different computational properties

Page 15: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

Different approaches to exploit parallelismDifferent approaches to exploit parallelismAdvanced compiler

techniques

Enhanced by directives (e.g. Transactional

No change to t d

Rewrite program

techniquesMemory)

Programming Intrusiveness

customer code

Traditional & ParallelDi ti

Parallellanguages

Single-threadprogram

Annotatedprogram

Compiler Innovations

Traditional &Auto-Parallelizing

Compilers

Parallel Language Compiler

Directives +Compiler

Accelerators/Heterogeneity

Speculativethreads Multicore / SMPClusters

April 2009Programming models, Salishan conference15

Hardware Innovations

Page 16: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

Different approaches to exploit parallelismDifferent approaches to exploit parallelismOpenMP

OpenMP with extensions?

No change to t d

Rewrite program

extensions?

Programming Intrusiveness

customer code

Traditional & ParallelDi ti

Parallellanguages

Single-threadprogram

Annotatedprogram

Compiler Innovations

Traditional &Auto-Parallelizing

Compilers

Parallel Language Compiler

Directives +Compiler

Accelerators/Heterogeneity

Speculativethreads Multicore / SMPClusters

April 2009Programming models, Salishan conference16

Hardware Innovations

Page 17: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

Different approaches to exploit parallelismDifferent approaches to exploit parallelism

OpenCL

No change to t d

Rewrite program

Programming Intrusiveness

customer code

Traditional & ParallelDi ti

Parallellanguages

Single-threadprogram

Annotatedprogram

Compiler Innovations

Traditional &Auto-Parallelizing

Compilers

Parallel Language Compiler

Directives +Compiler

Accelerators/Heterogeneity

Speculativethreads Multicore / SMPClusters

April 2009Programming models, Salishan conference17

Hardware Innovations

Page 18: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

Different approaches to exploit parallelismDifferent approaches to exploit parallelismPGAS/APGAS

languagesAPGAS annotations

for existing languages

No change to t d

Rewrite program

languages

Programming Intrusiveness

customer code

Traditional & ParallelDi ti

Parallellanguages

Single-threadprogram

Annotatedprogram

Compiler Innovations

Traditional &Auto-Parallelizing

Compilers

Parallel Language Compiler

Directives +Compiler

Accelerators/Heterogeneity

Speculativethreads Multicore / SMPClusters

April 2009Programming models, Salishan conference18

Hardware Innovations

Page 19: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

Potential Migration PathsGreen: open, widely available*Blue: somewhere in betweenRed: proprietary

Base and MPIC/C++/Fortran/Java (Base)

Red: proprietary

*OpenCL availability predicted

Base/OpenMP and MPI

Base/OpenMP

ClustersCharm++

PGAS/APGAS

ess

acce

lera

tors

w/ AcceleratorsBase/OpenCL

Base/OpenMP+ and MPI

RapidMind

Base/OpenCL and MPI

Har

ne

w/ Accelerators

CUDA

Base/OpenCL

libspe

GEDAE/Streaming models

ALF

April 2009Programming models, Salishan conference19

CUDA libspe

Page 20: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

Programming model perspectiveProgramming model perspective

There will be a variety of programming models– No silver bullet– Must extend dominant legacy tool chains

• Compilers, performance tools, …

In particular, the industry should explore:– OpenMP and potential extensions for hybridp p y– OpenCL (for compute accelerators)– PGAS and APGAS languages (UPC, CAF, X10, …)– Pursue APGAS directives for current languagesPursue APGAS directives for current languages

• C/C++, Fortran, Java

April 2009Programming models, Salishan conference20

Page 21: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

EducationEducation

Producing scaling codes will be a challenge

Too few software engineers understand how to take advantage of parallelism, particularly data parallelismg p p y p– Continuation of the multicore revolution is at stake– And supercomputers are now dependent upon multicore

How to change this?– Update skills through internal and industry courses– Introduce parallelism and concurrency early in

undergraduate programs• But how early (First course? Junior year?)

April 2009Programming models, Salishan conference21

But how early (First course? Junior year?)

Page 22: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

Concluding thoughtsConcluding thoughts

Multicore and heterogeneous nodes attack energy efficiency

However, they introduce new programming challenges– Virtually all performance gains will be due to parallelism– Virtually all performance gains will be due to parallelism

New programming models and/or languages will be required– There is no silver bullet!– Adoption of new models will take time– Evolutionary approaches will likely prevailEvolutionary approaches will likely prevail

It is time to seriously invest in PGAS

April 2009Programming models, Salishan conference22

Page 23: Impacts of Energy Efficiency on Supercomputer Programming Models · 2009-06-01 · Impacts of Energy Efficiency on Supercomputer Programming Models ... Microprocessor Clock Speed

IBM Research

Questions?Questions?

April 2009Programming models, Salishan conference23