The basis and perspectives of an exascale algorithm: our ...

The basis and perspectives of an exascale algorithm: our ExaFMM project.

Lorena A Barba, Boston University

IMA Annual Program Year Workshop: High Performance Computing and Emerging ArchitecturesMinneapolis, January 10-14, 2011

Acknowledgements:

work in Barba’s group done in collaboration with Jaydeep Bardhan (Rush), Mathew Knepley (UChicago), Tsuyoshi Hamada (Nagasaki Advanced Computing Center),Rio Yokota (postdoc at BU) and graduate students Felipe Cruz, Christopher Cooper, Anush Krishnan, Simon Layton

Acknowledgements:

work in Barba’s group done in collaboration with Jaydeep Bardhan (Rush), Mathew Knepley (UChicago), Tsuyoshi Hamada (Nagasaki Advanced Computing Center),Rio Yokota (postdoc at BU) and graduate students Felipe Cruz, Christopher Cooper, Anush Krishnan, Simon Layton

in Nagasaki Advanced Computing Center

“... the fundamental law of computer science [is]: the faster the computer, the greater the importance of speed of algorithms”

Trefethen & Bau “Numerical Linear Algebra” SIAM

The curious story of conjugate gradient (CG) algorithms

‣ Iterative methods:

‣ sequence of iterates converging to the solution

‣ CG matrix iterations bring the O(N3) cost to O(N2)

‣ 1950s — N too small for CG to be competitive

‣ 1970s — renewed attention

Gauss ian e

l imin

at ion

CG i te ra t i ve methods

The curious story of conjugate gradient (CG) algorithms

‣ Iterative methods:

‣ sequence of iterates converging to the solution

‣ CG matrix iterations bring the O(N3) cost to O(N2)

‣ 1950s — N too small for CG to be competitive

‣ 1970s — renewed attention

Gauss ian e

l imin

at ion

CG i te ra t i ve methods

‣ 1946 — The Monte Carlo method.

‣ 1947 — Simplex Method for Linear Programming.

‣ 1950 — Krylov Subspace Iteration Method.

‣ 1951 — The Decompositional Approach to Matrix Computations.

‣ 1957 — The Fortran Compiler.

‣ 1959 — QR Algorithm for Computing Eigenvalues.

‣ 1962 — Quicksort Algorithms for Sorting.

‣ 1965 — Fast Fourier Transform.

‣ 1977 — Integer Relation Detection.

‣ 1987 — Fast Multipole Method Dongarra& Sullivan, IEEE Comput. Sci. Eng.,Vol. 2(1):22-- 23 ( 2000)

‣ Solves N-body problems

๏ e.g. astrophysical gravity interactions

๏ reduces operation count from O(N2) to O(N)

Fast multipole method

f(y) =N∑

ciK(y − xi) y ∈ [1...N ]

f(y) =N∑

ciK(y − xi) y ∈ [1...N ]

f(y) =N∑

ciK(y − xi) y ∈ [1...N ]

O(N) advantage

‣ Hierarchical methods:

‣ sequence of refinements converging (or contributing) to the solution

‣ FMM brings the O(N2) cost to O(N)

‣ 1990s — MD codes dropped FMM, as N too small to be competitive

‣ Now — renewed attention100

‣ space subdivision tree structure

‣ to find “near” and “far” bodies

‣ space subdivision tree structure

‣ to find “near” and “far” bodies

Flow of FMM calculation

M2Mmultipole to multipole

treecode & FMM

M2Lmultipole to local

L2Llocal to local

L2Plocal to particle

P2Pparticle to particle

treecode & FMM

M2Pmultipole to particle

treecode

source particlestarget particles

information moves from red to blue

P2Mparticle to multipole

treecode & FMM

‣ The whole algorithm in a sketch

Downward SweepUpward Sweep

Create Multipole Expansions. Evaluate Local Expansions.

P2M M2M M2L L2L L2P

‣ Contributions from Barba group:

‣ Parallelization strategy:

M2M and L2L translations M2L transformation Local domain

Level k

Root tree

Sub-tree 1 Sub-tree 2 Sub-tree 3 Sub-tree 4 Sub-tree 5 Sub-tree 6 Sub-tree 7 Sub-tree 8

‣ Graph representation:

Ref. — F. A Cruz, M. G. Knepley, L. A. Barba, PetFMM—A dynamically load-balancing parallel fast multipole library, Int. J. Num. Meth. Eng., Vol. 85(4): 403–428 (Jan. 2011)

The algorithmic and hardware speed-ups properly multiply

GPU implementation of FMM kernels

GPU Gems, Volume IV

In press, to appear February 2011 (?)

Codes in http://code.google.com/p/gemsfmm/

GPU Gems, Volume III

FMM on GPU

Direct (CPU)

Direct (GPU)

FMM (CPU)

FMM (GPU)

“Treecode and fast multipole method for N-body simulation with CUDA”, chapter in GPU Gems IV, in press

FMM on GPU

Direct (CPU)

Direct (GPU)

FMM (CPU)

FMM (GPU)

FMM on GPU

Direct (CPU)

Direct (GPU)

FMM (CPU)

FMM (GPU)

FMM on GPU

Direct (CPU)

Direct (GPU)

FMM (CPU)

FMM (GPU)

‣ the right methods and algorithms can provide leaps in capability many times that of Moore’s law would in a given period

‣ open source & open data enables tackling large, complex computational projects

‣ the right methods and algorithms can provide leaps in capability many times that of Moore’s law would in a given period

‣ new hardware for HPC adds to the mix for a new era of discovery via computation

‣ open source & open data enables tackling large, complex computational projects

Parallel FMM on multi-GPUs

Strong Scaling:

parallel efficiency of 80% at 256, and 50% at 512 nodes

Degima cluster at NACC, with Infiniband comm

! " # $ !% &" %# !"$ "'% '!"(

)*+,-.

/012343)*+,-.35.6

/+223-,7./+8-/0,71*0.279*"*1*0.2791":;";<2+72:;"=<2+72:="=<2+72:="><2+72:>"><2+72:>";<2+72:

GPU breakdown

‣ N=108, on one node

%&'()*+,-./

(0++,%12.(0'%()12*&).+23&$&*&).+23*$45$56+02+45$76+02+47$76+02+47$86+02+48$86+02+48$56+02+4

(0++,%12.(0'%()12*&).+23&$&*&).+23*$4%9'26)2:,(;.6<'==+0)2:,3;(;%'3;>+(?+@)%+%'3;7;441%%'3;7+*%&A%'3;B+02+4

Under revision for Comput. Phys. Comm.

The basis and perspectives of an exascale algorithm: our ...

Documents

Transcript of The basis and perspectives of an exascale algorithm: our ...

MPI at Exascale

Addressing the Challenge of Exascale - EESI project€¦ · Towards Exascale roadmap implementation European Exascale Software Initiative EESI ... computing on the new generation

An Exascale Workload Study

MainIssues - Exascale

Critical Issues at Exascale for Algorithm and Software Design

Programming for Exascale Computers · Programming for Exascale Computers Exascale systems present programmers with many challenges. A review of appropriate parallel …

Exascale radio astronomy

Hyperspectral data and MaxEnTES algorithm: perspectives ...prisma-i.it/images/Eventi/20170301_Workshop_ASI/Sessione 7/Raim… · Hyperspectral data and MaxEnTES algorithm: perspectives

Exascale Computing Research Center (ECR) · Exascale Computing Research Center (ECR) Overview ... • Exascale Computing Research Center ... – QCM chemistry code from M. Caffarel:

Exascale Computing Technology Challenges

Exascale Architecture Trends - MCSpress3.mcs.anl.gov/atpesc/files/2015/08/Beckman-Dinner...Classic Analysis of Algorithms: Ops = Time Make algorithm quicker: minimize ﬂops, compares

WhatistheDOE, Exascale%Initave

Perspectives on achieving Exascale: View from Europehpc.pnl.gov/modsim/2017/Presentations/Labarta_Day3.final.pdf · Perspectives on achieving Exascale: View from Europe ... • Would

Age of Exascale

Exascale Computing Program - Energy

THE PATH TO EXASCALE COMPUTING - Nvidiaimages.nvidia.com/events/...exascale-computing.pdf · Bill Dally Chief Scientist and Senior Vice President of Research THE PATH TO EXASCALE

Exascale Computing - hlrs.de · Exascale Computing 4 Exascale Computing The shift from petascale computing to exascale computing—a thousandfold increase in computing power—constitutes

An Evolutionary Exascale Programming Model Deserves ... · Exascale Systems: The Planning ! ... Candidate exascale programming models defined . DOE Workshop’s Reverse Timeline .

Fusion Energy Sciences Exascale Requirements Reviewexascaleage.org/wp-content/uploads/sites/67/2017/06/DOE-Exascale... · DOE EXASCALE REQUIREMENTS REVIEW — FES/ASCR Disclaimer

The UK Exascale Project