Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P....

21
Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS

Transcript of Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P....

Page 1: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Active Harmony and the Chapel HPC

LanguageRay Chen, UMD

Jeff Hollingsworth, UMDMichael P. Ferguson, LTS

Page 2: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Harmony Overview• Harmony system based on feedback loop

2

Harmony Server

Application

ParameterValues

MeasuredPerformance

Page 3: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Simplex AlgorithmsNelder-Mead

Parallel Rank Ordering

3

Page 4: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Tuning Granularity• Initial Parameter Tuning

o Application treated as a black boxo Test parameters delivered during application launcho Application executes once per test configuration

• Internal Application Tuningo Specific internal functions or loops tunedo Possibly multiple locations within applicationo Multiple executions required to test configurations

• Run-time Tuningo Application modified to communicate with server mid-runo Only one run of the application needed

4

Page 5: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Example Application• SMG2000

o 6-dimensional spaceo 3 tiling factorso 2 unrolling factorso 1 compiler choice

o 20 search steps

• Performance gaino 2.37x for residual computationo 1.27x for on full application

5

Page 6: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

The Irony of Auto-Tuning

• Intensely manual processo High cost of adoption

• Requires application specific knowledgeo Tunable variable identificationo Value range determinationo Hotspot identificationo Critical section modification at safe points

• Can auto-tuning be more automatic?

6

Page 7: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Towards AutomaticAuto-tuning

• Reducing the burden on the end-user

• Three questions must be answeredo What parameters are candidates for auto-tuning?o Where are the best code regions for auto-tuning?o When should we apply auto-tuning?

7

Page 8: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Our Goals• Maximize return from minimal investment

o Use profiling feature as a modelo Should be enabled with a runtime flag

o Aim to provide auto-tuning benefits within one execution

• Minimize language extensiono Applications should be used as originally written

• Non-trivial goals with C/C++/Fortrano Are there any alternatives?

8

Page 9: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Chapel Overview• Parallel programming language

o Led by Cray Inc.o “Chapel strives to vastly improve the programmability of large-

scale parallel computers while matching or beating the performance and portability of current programming models like MPI.”

9

Type of HW Parallelism Programming Model Unit of Parallelism

Inter-node MPI executable

Intra-node/multi-core OpenMP/pthreads iteration/task

Instruction-level vectors/threads

pragmas iteration

GPU/accelerator CUDA/OpenCL/OpenAcc SIMD function/taskContent courtesy of Cray Inc.

Page 10: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Chapel Methodology

10Content courtesy of Cray Inc.

Page 11: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Chapel Data Parallelism

• Only domains and forall loop requriedo Forall loop used with arrays to distribute worko Domains used to control distribution

o A generalization of ZPL’s region concept

11Content courtesy of Cray Inc.

Page 12: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Chapel Task Parallelism

• Three constructs used to express control-based parallelism

o begin – “fire and forget”o cobegin – heterogeneous taskso coforall – homogeneous tasks

12

begin writeln(“hello world”);writeln(“good bye”);cobegin { consumer(1); consumer(2); producer();} // wait here for all three tasks to complete

begin producer();coforall 1 in 1..numConsumers { consumer(i);} // wait here for all consumers to return

Content courtesy of Cray Inc.

Page 13: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Chapel Locales

• MPI (SPMD) Functionality

13

writeln(“start on locale 0”);onLocales(1) do writeln(“now on locale 1”);writeln(“on locale 0 again”);

proc main() { coforall loc in Locales do on loc do MySPMDProgram(loc.id, Locales.numElements);}

proc MySPMDProgram(me, p) { println(“Hello from node ”, me);}

Content courtesy of Cray Inc.

Page 14: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Chapel Config Variables

14

config const numLocales: int;const LocaleSpace: domain(1) = [0..numLocales-1];const Locales: [LocaleSpace] locale;

% a.out --numLocales=4Hello from node 3Hello from node 0Hello from node 1Hello from node 2

Content courtesy of Cray Inc.

Page 15: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Leveraging Chapel• Helpful design goals

o Expressing parallelism and locality is the user’s responsibilityo Not the compiler’s

• Chapel source effectively pre-annotatedo Config variables help to locate candidate tuning parameterso Parallel looping constructs help to locate hotspots

15

Page 16: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Current Progress• Harmony Client API ported to Chapel

o Uses Chapel’s foreign function interfaceo Chapel client module to be added to next Harmony release

• Achieves the current state of auto-tuningo What to tune

o Parameters must determined by a domain experto Manually register each parameter and value range

o Where to tuneo Critical loop must be determined by a domain experto Manually fetch and report performance at safe points

o When to tuneo Tuning enabled once manual changes are complete

16

Page 17: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Improving the “What”• Leverage Chapel’s “config” variable type

o Helpful for everybody to extend syntax slightly

• Not a silver bulleto False-positives and false-negatives definitely existo Goes a long way towards reducing candidate variableso Chapel built-in candidate variables

config const someArg = 5;

17

dataParTasksPerLocaledataParIgnoreRunningTasksdataParMinGranularitynumLocales

config const someArg = 5 in 1..100 by 2;

Page 18: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Improving the “Where”

• Naïve approacho Modify all parallel loop constructs

o Fetch new config values at loop heado Report performance at loop tail

o Use PRO to efficiently search parameter space in parallel

• Poses open questionso How to know if config values are safe to modify mid-execution?o How to handle nested parallel loops?o How to prevent overhead explosion?

• Solutions outside the scope of this projecto But we’ve got some ideas...

18

Page 19: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

What’s Possible?• Target pre-run optimization instead

o Run small snippet of code pre-maino Determine optimal values to be used prior to execution

• Example: Cache optimizationo Explore element size and strideo Pad array elements to fit sizeo Define domains

o Automatically optimize for cache size and eviction strategyo Further increase performance portability

• Generate library of performance unit-testso Bundle with Chapel for distribution

19

Page 20: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Improving the “When”• Auto-tuning should be simple to enable

o Use profiling as a model (just add –pg to the compiler flags)

• System should be self-relianto Local server must be launched with application

20

Page 21: Active Harmony and the Chapel HPC Language Ray Chen, UMD Jeff Hollingsworth, UMD Michael P. Ferguson, LTS.

Open Questions• Automatic hotspot detection

o Time spent in loopo Variables manipulated in loopo How to determine correctness-safe modification points

o Static analysis?

• Moving to other languageso C/Fortran lacking needed annotationso More static analysis?

• Why avoid language extension?o Is it really so bad?

21