SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower Annual Progress Review...

16
SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower Annual Progress Review JLab , May 14 , 2007 Code distribution see http://www.usqcd.org/ software.html

Transcript of SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower Annual Progress Review...

SciDAC Software Infrastructurefor Lattice Gauge Theory

SciDAC Software Infrastructurefor Lattice Gauge Theory

Richard C. Brower

Annual Progress Review

JLab , May 14 , 2007

Code distribution see http://www.usqcd.org/software.html

Software Committee

• Rich Brower (chair) [email protected]

• Carleton DeTar [email protected]

• Robert Edwards [email protected]

• Don Holmgren [email protected]

• Bob Mawhinney [email protected]

• Chip Watson [email protected]

• Ying Zhang [email protected]

SciDAC-2 Minutes/Documents &Progress report

http://super.bu.edu/~brower/scc.html

Major Participants in SciDAC Project

Arizona Doug Toussaint MIT Andrew Pochinsky

Dru Renner Joy Khoriaty

BU Rich Brower * North Carolina Rob Fowler

James Osborn Ying Zhang *

Mike Clark JLab Chip Watson *

BNL Chulwoo Jung Robert Edwards *

Enno Scholz Jie Chen

Efstratios Efstathiadis Balint Joo

Columbia Bob Mawhinney * IIT Xien-He Sun

DePaul Massimo DiPierro Indiana Steve Gottlieb

FNAL Don Holmgren * Subhasish Basak

Jim Simone Utah Carleton DeTar *

Jim Kowalkowski Ludmila Levkova

Amitoj Singh Vanderbilt Ted Bapty

* Software Committee: Participants funded in part by SciDAC grant

QCD Software Infrastructure Goals: Create a unified software environment that will enable the US lattice community to achieve very high efficiency on diverse high performance hardware.

Requirements:

1. Build on 20 year investment in MILC/CPS/Chroma

2. Optimize critical kernels for peak performance

3. Minimize software effort to port to new platforms & to create new applications

Solution for Lattice QCD

(Perfect) Load Balancing: Uniform periodic lattices & identical sublattices per processor.

(Complete) Latency hiding: overlap computation /communications

Data Parallel: operations on Small 3x3 Complex Matrices per link.

Critical kernels : Dirac Solver, HMC forces, etc. ... 70%-90%

Lattice Dirac operator:

Optimized Dirac Operators, InvertersLevel 3

QDP (QCD Data Parallel) Lattice Wide Operations, Data shifts

Level 2

QMP (QCD Message Passing)

QLA (QCD Linear Algebra)Level 1

QIOBinary/ XML Metadata Files

SciDAC-1 QCD API SciDAC-1 QCD API

C/C++, implemented over MPI, nativeQCDOC, M-via GigE mesh

Optimised for P4 and QCDOC

Exists in C/C++

ILDG collab

Data Parallel QDP/C,C++ API

• Hides architecture and layout• Operates on lattice fields across sites• Linear algebra tailored for QCD• Shifts and permutation maps across sites• Reductions• Subsets• Entry/exit – attach to existing codes

Example of QDP++ Expression

Typical for Dirac Operator:

QDP/C++ code:

Use Portable Expression Template Engine (PETE)Temporaries eliminated, expressions optimized

QOP (Optimized in asm)Dirac Operator, Inverters, Force etc

QDP (QCD Data Parallel)Lattice Wide Operations, Data shifts

QMP(QCD Message Passing)

QLA(QCD Linear Algebra)

QIOBinary / XML files & ILDG

SciDAC-2 QCD API SciDAC-2 QCD API

QMC (QCD Multi-core interface)

Uniform User EnvRuntime, accounting, grid,

QCD Physics ToolboxShared Alg,Building Blocks, Visualization,Performance

ToolsLevel 4 Workflow

and Data Analysis tools

Application Codes: MILC / CPS / Chroma / RoleYourOwn

Level 3

Level 2

Level 1

SciDAC-1/SciDAC-2 = Gold/Blue

PERITOPS

0

200

400

600

800

1000

1200

1400

1600

0 5000 10000 15000 20000 25000 30000

Local lattice size

Mflops/node

JLAB 3G, Level II JLab 3G, Level III JLAB 4G, Level II

JLab 4G, Level III FNAL Myrinet, Level III

Level 3 Domain Wall CG Inverter y

y Ls = 16, 4g is P4 2.8MHz, 800MHz FSB

(6)102 40 6 163

(32 nodes)

42 6 16 (4) 52 20

Asqtad Inverter on Kaon cluster @ FNAL

0

200

400

600

800

1000

1200

4 6 8 10 12 14

SciDAC 16

SciDAC 64

non-SciDAC 16

non-SciDAC 64

Mflop/s per core

L

Comparison of MILC C code vs SciDAC/QDP on L4 sub-volumes for16 and 64 core partition of Kaon

Level 3 on QCDOC

0

50

100

150

200

250

300

350

400

3 4 6 8 10

Asqtad

Asqtad CG on L4 subvolumes

Mflop/s

L

Asqtad RHMC kernels

243x32 with subvolumes 63x18

DW RHMC kernels

323x64x16 with subvolumes 43x8x16

Building on SciDAC-1

Common Runtime Environment1. File transfer, Batch scripts, Compile targets2. Practical 3 Laboratory “Metafacility”

Fuller use of API in application code. 1. Integrate QDP into MILC & QMP into CPS2. Universal use of QIO, File Formats, QLA etc 3. Level 3 Interface standards

Porting API to INCITE Platforms1. BG/L & BG/P: QMP and QLA using XLC & Perl script

2. Cray XT4 & Opteron, clusters

Workflow and Cluster Reliability1. Automated campaign to merge lattices, propagators and

extract physics . (FNAL & Illinois Institute of Tech) 2. Cluster Reliability: (FNAL & Vanderbuilt)

Tool Box -- shared algorithms / building blocks1. RHMC, eigenvector solvers, etc

2. Visualization and Performance Analysis (DePaul & PERC) 3. Multi-scale Algorithms (QCD/TOPS Collaboration)

Exploitation of Multi-core1. Multi-core not Hertz is new paradigm 2. Plans for a QMC API (JLab & FNAL & PERC)

New SciDAC-2 Goals

http://lqcd.fnal.gov/workflow/WorkflowProject.html

http://www.yale.edu/QCDNA/

See SciDAC-2 kickoff workshop Oct27-28, 2006 http://super.bu.edu/~brower/workshop

QMC – QCD Multi-Threading• General Evaluation

– OpenMP vs. Explicit Thread library (Chen)

– Explicit thread library can do better than OpenMP but OpenMP is compiler dependent

• Simple Threading API: QMC – based on older smp_lib (Pochinsky)

– use pthreads and investigate barrier synchronization algorithms

• Evaluate threads for SSE-Dslash

• Consider threaded version of QMP ( Fowler and Porterfield in RENCI)

Parallelsites 0..7

Serial Code

Parallelsites 8..15

Idle

finalize / thread join

ConclusionsProgress has been made using a common QCD-API & libraries for

Communication, Linear Algebra, I/0, optimized inverters etc.

But full Implementation, Optimization, Documentation & Maintenance of shared codes is a continuing challenge.

And there is much work to keep up with changing Hardware and Algorithms.

Still NEW users (young and old) with no prior lattice experience

have initiated new lattice QCD research using SciDAC software!

The bottom line is PHYSICS is being well served.