HPCMP Benchmarking and Performance Analysis

39
HPCMP Benchmarking and Performance Analysis Mark Cowan USACE ERDC ITL in support of DoD HPCMP Tuesday, April 17, 2012

description

HPCMP Benchmarking and Performance Analysis. Mark Cowan USACE ERDC ITL in support of DoD HPCMP Tuesday , April 17, 2012. What is the HPCMP?. Initiated in 1992 Congressional mandate to modernize DoD’s HPC capabilities Assembled from collection of HPC - PowerPoint PPT Presentation

Transcript of HPCMP Benchmarking and Performance Analysis

Page 1: HPCMP Benchmarking and Performance Analysis

HPCMP Benchmarking and

Performance Analysis

Mark Cowan

USACE ERDC ITL in support of DoD HPCMP

Tuesday, April 17, 2012

Page 2: HPCMP Benchmarking and Performance Analysis

What is the HPCMP?

• Initiated in 1992• Congressional mandate to modernize DoD’s HPC capabilities• Assembled from collection of HPC departments across Army, Air Force, and Navy labs and test centers

Page 3: HPCMP Benchmarking and Performance Analysis

What is the HPCMP?FOCUS• Solve military and security problems using HPC hardware and software• Assess technical and management risks

• Performance• Time• Available resources• Cost• Schedule

• Supports DoD objectives through research, development, test and evaluation

Page 4: HPCMP Benchmarking and Performance Analysis

Where we benchmark

Page 5: HPCMP Benchmarking and Performance Analysis

Migrate to a 2-year acquisition cycle

• Why the radical change?• Entice more vendors into the competition

• Vendor feedback remove or alleviate disincentives

• Review the entirety of the TI acquisition process• Line-by-line justification of benchmarking rules document

• Address both HPC community and vendor concerns

• Comprehensive reevaluation of how we benchmark• Analyze the codes• Justify the test cases

Page 6: HPCMP Benchmarking and Performance Analysis

Migrate to a 2-year acquisition cycle

• Dangers?• Time the milestones poorly on the calendar and miss out on

release of cutting-edge technology• Difficult problem

• How to schedule activities to maximize likelihood of hitting publicly-available products months in advance, while being blind to intricacies of chip fabrication schedules and unforeseen recalls?

Page 7: HPCMP Benchmarking and Performance Analysis

Codes considered for TI-11/12ABAQUS COBALT ICEPICABINIT CP2K LAMMPSACES CPMD LS-DYNA

ADCIRC CTH MATLABADH ETA OOCORE

ALE3D FDTD OVERFLOWALEGRA FLAPW SHAMRC

AMR FLUENT SIERRAAVUS GAMESS STAR CCM+CFD++ GASP VASP

CFDSHIP-IOWA GAUSSIAN WRFCOAMPS HYCOM XPATCH

Page 8: HPCMP Benchmarking and Performance Analysis

TI-11/12 benchmarking applications• ADCIRC – Coastal Circulation

and Storm Surge model – 100% Fortran, MPI– Uses METIS library (C)– 205K LOC

• ALEGRA – Hydrodynamic and solid dynamics plus magnetic field and thermal transport– 96% C, 4% Fortran, MPI– 978K LOC

• AVUS (Cobalt-60) – Turbulent flow CFD code– Fortran, MPI, 29K LOC

• CTH – Shock physics code– ~58% Fortran/~42% C, MPI,

900K LOC

GAMESS – Quantum chemistry code– Fortran, MPI, 330K LOC

HYCOM – Ocean circulation modeling code– Fortran, MPI, 31K LOC

ICEPIC – Particle-in-cell magnetohydrodynamics code – C, MPI, 350K LOC

LAMMPS – Molecular dynamics code– C++, MPI, 45K LOC

█ Predicted█ Benchmarked

Page 9: HPCMP Benchmarking and Performance Analysis

Components of testing packages• Applications tested on representative input sets

CODE CASE

Distinguished

Core Count

Time (sec) on

DIAMOND Core Counts

ADCIRC baroclinic 1024 8959 512, 768, 1024, 1280, 1536, 1792, 2048

ADCIRC hurricane 1280 2082 512, 768, 1024, 1280, 1536, 1792, 2048

ALEGRA obliqueImp 1536 1640 1024, 1280, 1536, 1792, 2048

ALEGRA explWire 256 944 256, 384, 512, 768, 1024

AVUS waverider 1024 941 384, 512, 768, 1024, 1536

AVUS turret-td 1280 1332 768, 1024, 1280, 1536, 2048

CTH fixed-grid 1280 3399 768, 1024, 1280, 1536, 2048

CTH amr 1280 2535 768, 1024, 1280, 1536, 2048

GAMESS DFT-grad 256 4701 128, 192, 256, 384, 512

GAMESS MP2-grad 512 2536 128, 256, 512, 768, 1024

GAMESS CC-energy 1024 3658 512, 768, 1024, 1536, 2048

HYCOM lrg 1353 3020 1001, 1353, 1516, 1770, 2045

ICEPIC magnetron 384 2559 256, 384, 512, 768, 1024

ICEPIC gyrotron 2048 3639 1536, 1792, 2048, 2304, 2560

LAMMPS Au 1024 3182 128, 256, 384, 512, 1024, 1280, 1536, 2048

Page 10: HPCMP Benchmarking and Performance Analysis

Some components of HPC procurement cycle

Page 11: HPCMP Benchmarking and Performance Analysis

Some components of HPC procurement cycle• Acquire new versions of codes• Port codes to various machines• Acquire test cases• Develop or acquire accuracy checks• Test codes, get times to compare• Assemble package for vendors

Page 12: HPCMP Benchmarking and Performance Analysis

Some components of HPC procurement cycle

• Run codes with test cases on installed DSRC machines• Optimize! How fast can we go?

Page 13: HPCMP Benchmarking and Performance Analysis

Some components of HPC procurement cycle• We review vendor submittal

• Anything suspicious?• How do vendor times compare to ours? How did vendors optimize? • How risky is vendor’s proposal?• Present our results

Page 14: HPCMP Benchmarking and Performance Analysis

Components of testing packages continued

• Timers measure the elapsed running times• Accuracy checks ensure validity of output files• Often requires determination of acceptable error bounds

Page 15: HPCMP Benchmarking and Performance Analysis

How the test packages are used• Run all test cases on 5 different DSRC machines to

acquire times• Debug test packages• Quantify variation across/within machines• Compare times to proposed systems

Page 16: HPCMP Benchmarking and Performance Analysis

Machine attributes

        Architectures Used in Study      

DSRC Name Make Model Chip SetProcessor

Speed (GHz)

Interconnect Number of Cores

Cores per Node

Operating System

ERDC Diamond SGI Altix ICEIntel

Xeon QC 2.8DDR4

InfiniBand 15360 8 SUSE Linux

MHPCC Mana DellPowerEdge

M610Intel

Xeon QC 2.8DDR

InfiniBand 9216 8 Linux

NAVY DaVinci IBM Power6IBM P6

DC 4.7DDR

Infiniband 4800 32 AIX

NAVY Einstein Cray XT5

Cray Opteron

QC 2.3 SeaStar2+ 12736 8 CNL

ERDC Garnet Cray XE6

AMD Opteron

64-bit 2.4 Cray Gemini 20224 16 CLE

Page 17: HPCMP Benchmarking and Performance Analysis

RESULTS! Graphs of runtimes

Page 18: HPCMP Benchmarking and Performance Analysis

Risk Assessment: Major Areas Assessed• Compliance assessment

– Ability to follow benchmark rules– Number of test case results provided– Results within accuracy criteria

• Assessment of risk in meeting proposed times in acceptance tests– Differences between benchmarked and proposed system

• Processor , interconnect, and I/O system differences– Quality of estimation procedure

• Quality of explanation and soundness of estimation procedure– Aggressiveness of final estimate

• Comparison with measured benchmark system times• Comparison with predicted times

• Assessment of likelihood of users and/or developers using proposed code modifications– Acceptability of proposed code modifications

Page 19: HPCMP Benchmarking and Performance Analysis

Benchmarking website

URL:http://www.benchmarking.hpc.mil/

Page 20: HPCMP Benchmarking and Performance Analysis

Benchmarking website continued

Narrative of website purpose, codes tested

Heatmap of systems best suited for applications

Page 21: HPCMP Benchmarking and Performance Analysis

Benchmarking website continued

Brief description of application

Brief description of test cases

Page 22: HPCMP Benchmarking and Performance Analysis

Benchmarking website continued

An example of how we madethe heatmap for allocationchoices

Page 23: HPCMP Benchmarking and Performance Analysis

Benchmarking website continued

Got a question? Want to suggest an improvement?

Contact us.

Page 24: HPCMP Benchmarking and Performance Analysis

Performance Team Members

• Mark Cowan – ERDC – Chair• Larry Davis – HPCMPO• Lloyd Slonaker – AFRL • Tim Sell – AFRL • Laura Brown – ERDC• Mahbubur Rashid – ERDC• Christine Cuicchi – NAVO• Matt Grismer – AFRL• Jerry Boatz – AFRL

Page 25: HPCMP Benchmarking and Performance Analysis

Performance Team Advisors

• William Ward – HPCMPO• Steve Finn – DTRA • Carrie Leach – ERDC• Paul Bennett – ERDC • Tom Oppe – ERDC• Henry Newman – Instrumental• Michael Laurenzano – SDSC

• Bronis de Supinski – LLNL• Joseph Swartz – LM • Allan Snavely – SDSC• Laura Carrington – SDSC• Robert Pennington – NSF• Nick Wright – NERSC• James Ianni – ARL

Page 26: HPCMP Benchmarking and Performance Analysis

Questions?

Page 27: HPCMP Benchmarking and Performance Analysis

Contact me…

Mark Cowan

USACE ERDC ITL Computational Analysis Branch3909 Halls Ferry RoadBuilding 8000, Room 1255Vicksburg, MS 39180(601) 634-2665

[email protected]

Page 28: HPCMP Benchmarking and Performance Analysis

ADDENDA

Page 29: HPCMP Benchmarking and Performance Analysis

AVUS: Code description

• CFD code, formerly COBALT_60• Simulates 3-D turbulent viscous flow over

irregular geometries• Grid-based, reads a large grid file• AVUS: 29K lines of Fortran 90 code• Uses ParMETIS: 12K lines of C code• Parallelism via MPI, no OpenMP• Runs on Cray XT, IBM Power, SGI Altix, Linux

clusters

Page 30: HPCMP Benchmarking and Performance Analysis

CTH: Code description• CTA: CSM (Computational Structural Mechanics)• Shock Physics • Two-step, 2nd order accurate Eulerian algorithm is used

to solve the mass, momentum, and energy conservation equations • An explicit approach that does not require solving a linear

system • Has both static and adaptive mesh capabilities • Parallelism via MPI • 900K LOC, 58% FORTRAN and 42% C • Uses NetCDF, supplied with distribution

Page 31: HPCMP Benchmarking and Performance Analysis

GAMESS: Code description• CTA: CCM (Computational Chemistry, Biology, and

Materials Science) • Ab Initio Quantum chemistry • Computes many energy integrals with molecular data

in form of atom positions and electron orbitals • Communication depends on platform• LAPI, Sockets, SHMEM, MPI

• Code composition: 99% FORTRAN, 1% C

Page 32: HPCMP Benchmarking and Performance Analysis

HYCOM: Code description

• CTA: Climate/Weather/Ocean Modeling and Simulation (CWO)

• A primitive equation ocean general circulation model

• Communication is MPI (MPI-2 is available)• 100% FORTRAN• Version 2.2.27

Page 33: HPCMP Benchmarking and Performance Analysis

HYCOM: MPI-2 details

• HYCOM may be run with MPI or MPI-2• MPI-2 is MPI with additional features such as

parallel I/O, dynamic process management and remote memory operations

• HYCOM utilizes parallel I/O feature• Parallel I/O times required starting with TI-10

Page 34: HPCMP Benchmarking and Performance Analysis

ICEPIC: Code description

• CTA: Computational Electromagnetics and Acoustics (CEA)• Particle-in-cell plasma physics code• Ions and electrons move under influence of electromagnetic

fields• Particles are updated in a grid-free manner; grouped in cells

which are periodically adjusted to preserve load balance• Fields calculated on a structured, static grid and dual grid

according to Maxwell's Equations• Can simulate plasmas contained in complex geometries• Used in electromagnetic device design• ~350K lines of code, 100% C++, C

Page 35: HPCMP Benchmarking and Performance Analysis

LAMMPS: Code description• CTA: CCM (Comp Chemistry, Biology, & Material

Science) • Classical molecular dynamics code that models

particles in a liquid, solid, or gaseous state • Calculates atomic velocities, positions, system energy,

and temperature • After equilibration: surface tension, radial pressure, and phase change• Post-processing: pair-correlation function and diffusion coefficients

• All actions occur within box (usually orthogonal) • Distributed-memory message-passing parallelism (MPI) • Highly-portable C++ • Libraries needed: MPI and single-processor FFT

Page 36: HPCMP Benchmarking and Performance Analysis

ADCIRC: Code descriptionADCIRC Coastal Circulation and Storm Surge Model Solves time dependent, free surface circulation and transport problems in 2 and 3 dimensions. Use the finite element method in space, which permits highly flexible, unstructured grids.

Typical ADCIRC applications have included:• Modeling tides and wind driven circulation,• Analysis of hurricane storm surge and flooding,• Dredging feasibility and material disposal studies,• Larval transport studies, and• Near shore marine operations

Page 37: HPCMP Benchmarking and Performance Analysis

“BASE” ALEGRA: Code descriptionALE code -- Arbitrary Lagrangian-Eulerian -- provides flexibility, accuracy and reduced numerical dissipation over pure Eulerian code; modern remeshing technology allows for robust mesh smoothing and control.

Hydrodynamic and solid dynamics

Models large distortions and strong shock propagation in multiple-materials

Finite element code; descendent of PRONTO and uses some CTH Eulerian technology

Energy deposition and explosive burn models

Geometry -- 2D/3D Cartesian, 2D cylindrical Material Models in ALEGRA: Equations of StateElastic-Plastic ModelsFracture Models Pressure and temperature during

formation of jet from shaped charge

Page 38: HPCMP Benchmarking and Performance Analysis

“BASE” ALEGRA: Code description

Page 39: HPCMP Benchmarking and Performance Analysis

ALEGRA_MHD: Code descriptionAll hydrodynamics/solid dynamic modules of "base" ALEGRA PLUS magnetic field and thermal transport effects

Lorentz forces, Joule heating, thermal transport and simple models for radiating excess energy

2D and 3D versions2D modeling with the magnetic flux density vector components in or out of the

plane with the corresponding current density out of or in the plane, respectively.3D uses a magnetic diffusion solution based on edge and face elements which

maintains the discrete flux divergence-free property during magnetic solve and constrained transport remap stage

Lumped element coupled circuit equations

Magnetic and thermal conduction

Advanced models for thermal and electrical conductivity

Emission model radiates excess energy when medium is optically thin while accounting for reabsorption