"The BG collaboration, Past, Present, Future. The new available resources". Prof. Dr. Paul Whitford...
-
Upload
lccausp -
Category
Technology
-
view
27 -
download
4
Transcript of "The BG collaboration, Past, Present, Future. The new available resources". Prof. Dr. Paul Whitford...
The Blue Gene Collabora/on Past, Present and Future
April 17th, 2015
Paul Whi<ord ([email protected])
Liaison to USP for HPC Rice University
Assistant Professor of Physics
Northeastern University
usp.rice.edu
Discussion Points • Blue Gene/P specifica/ons • Considera/ons for running and op/mizing calcula/ons • Status of opera/ons • Addi/onal considera/ons and resources • Newest Resource: BG/Q!
usp.rice.edu
3
IBM Blue Gene/P at Rice University
• 377 in the Top500 supercomputer list of June 2012
• 84 teraflops of performance
• 24,576 cores
usp.rice.edu
USP-‐Rice Blue Gene Supercomputer
• Low-‐power processors, but massively parallel – PowerPC 450 processors: 850 MHz
– High bandwidth and low latency connec/ons
• Was ranked in the top 500 supercomputers in the world
• 30% dedicated for USP
6 racks 192 cards 6,144 cards 24,576 cores usp.rice.edu
Each chip is a 4-‐way symmetric mul/processor (SMP) with 4Gb memory Roughly equivalent to one 3-‐Ghz Intel core.
Can be run in:
SMP mode (shared memory, threading)-‐ full memory per MPI process VN (virtual node) mode – 4 independent MPI processes, ¼ memory per MPI process shared computa/on and communica/on loads
DUAL (dual node) mode -‐2 processes w/ 2 threads each, ½ memory per MPI process Jobs are allocated in set of 4 Node Card units (512 cores)
-‐less scalable applica/ons should go elsewhere General sugges/on: high I/O uses VN, high compu/ng use SMP (use what is fastest)
Technical Considera/on 1: Distribu/on of Calcula/ons
usp.rice.edu
• Frontend – SUSE Linux
• Backend – ppc450 compute cores – Minimal OS, lacking many system calls
• Code is compiled on frontend and executed on the backend.
• BG/Q also requires cross compila/on
• Can be very difficult for users new to BG – There is support staff to help with compiling
Technical Considera/on 2: Cross Compila/on
usp.rice.edu
Technical Considera/on 3: Scalability
• You should always start with scalability tests. – Frequently reveals computaBonal boDlenecks.
• User feedback is incredibly valuable! • Please share your performance stats with us
– Can help users apply for /me on other resources (e.g. XSEDE, INCITE, PRACE)
– Will facilitate more efficient calculaBons – Will help us opBmize builds
usp.rice.edu
Examples of scaling data 100k atom explicit-‐solvent simula/on using NAMD on BG/P (Performed by Filipe Lima)
200k-‐2.5M atom simula/ons using an implicit-‐solvent model in Gromacs on Stampede
Many considera/ons impact scalability • Is your specific calcula/on well suited for paralleliza/on?
– For example, molecular dynamics simula/ons for large systems tend to scale to higher numbers of cores (weak vs. strong scaling)
• Have you enabled all appropriate flags at run/me? – Sopware packages open guess… you need to test various serngs – For example, in MD simula/ons, you must tell the code how to distribute atoms
across nodes • Is MPI, openMP, hybrid MPI+openMP enabled in the code?
– MPI launched many processes, each with its own memory and instruc/ons – openMP launches threads that share memory – hybrid: each MPI process has its own memory, which is then shared with its own
openMP threads • Are you trying the most appropriate combina/on of calcula/on,
paralleliza/on and flags for the machine you are using? • Does your calcula/on take long enough to complete that scalability is
no/ceable? – Some/mes, ini/alizing paralleliza/on may take minutes, or longer.
• Is the performance reproducible? • How well is the code writen? With so many factors impac/ng performance, it is essen/al that you perform scaling tests before doing produc/on calcula/ons
For when the code needs improvement… High-‐performance tuning with HPC toolkit
HPC Toolkit funcBonality • Accurate performance
measurement • Effec/ve performance analysis
• Pinpoin/ng scalability botlenecks
• Scalability botlenecks on large-‐scale parallel systems
• Scaling on mul/core processors • Assessing process variability • Understanding temporal behavior
HPC Toolkit developed at Rice (Mellor-‐Crummey) Available on the BG/P
Experts are available to help you profile your code, even if you think it works well!
Sopware on BG/P Programs • Quantum Espresso • Abinit • cp-‐paw • dl_poly • wi-‐aims • Gromacs • Repast • Siesta • Yambo • LAMMPS • GAMESS • mpiBLAST • NAMD • NMChem • SPECFEM3D-‐GLOBE • VASP • WRF • CP2K
Installed libraries • boost • yw • lapack • scalapack • essl • netcdf • zlib • szip • HDF5 • Pnetcdf • Jasper • Java • Libpng • tcl
usp.rice.edu
Current Ac/vi/es • First phase (May-‐August 2012, pre-‐contract)
– Material Science and Quantum Chemistry • 15 users from 5 research groups • Installed 10 packages for QC/MS calcula/ons so far
– Weekly teleconference mee/ngs with Rice-‐HPC and USP users and faculty
• Second Phase (August-‐December 2012) – USP-‐Rice contract went into effect – Assessed the demands of addi/onal scien/fic communi/es
– Collabora/on Webpage established (usp.rice.edu) – First HPC Workshop at USP: December 2012
usp.rice.edu
Current Ac/vi/es • Con/nued growth (2013-‐2014)
– Hired addi/onal technical staff – Xiaoqin Huang – Regular teleconference mee/ngs
• Open for users – Opened DaVinci resource for tes/ng – 2WHPC workshop – May 2014
• Current/Ongoing Ac/vi/es – BG/Q went online (March 2015) – Possible compe//ve alloca/on process
• Scaling data important! – Student exchanges
• To Rice and to USP – Build scien/fic collabora/ons
usp.rice.edu
U/liza/on of BG/P by USP Users >100 users from USP, USP São Carlos, USP Ribeirão Preto and FMUSP, across many departments
The newest resource: BG/Q • 1-‐rack Blue Gene/Q • Already installed sopware
– abinit, cp2k, quantum espresso, wi-‐aims, gamess, gromacs, lammps, mpi-‐blast, namd, OpenFOAM, repast, siesta, vasp, wps, wrf; blacks, blas, cmake, yw, flex, hdf5, jasper, lapack, libint, libpng, libxc, pnetcdf, scalapack, szip, tcl, zlib.
• Qualifies for Top500 ranking (~293) • Accep/ng USP users
– Get a BG/P account – Email request to LLCA for BG/Q account ac/va/on
– Tech support available – Details will be provided at usp.rice.edu
Differences between BG/P and BG/Q ADribute BG/P BG/Q
Maximum Performance 84 TFLOPS 209 TFLOPS
Power Consump/on 371 MFLOPS/W 2.1 GFLOPS/W
Total number of cores 24576 16384
Number of nodes 6144 1024
Cores per node 4 16 + 1 spare + 1 for O/S
Memory per node 4 GB 16 GB
Possible threads per core 1 4
CPU clock-‐speed 850 MHz 1.6 GHz
Minimum nodes/job 128 (512 physical cores) 32 (512 physical cores)
Scratch disk space 245TB 120TB
4X cores and memory per node. Mul/-‐level paralleliza/on can lead to large performance increases with BG/Q, and larger-‐memory applica/ons.
17
Additional Resource: DAVinCI – 25 TeraFLOPS x86 IBM iDataPlex
• ~20M CPU-hours/year • 2.8GHz Intel Westmere (Hex-Core) • 192 Nodes (>2,300 cores & >4,600 threads) • 192-MPI nodes
• 16 nodes w/ 32 NVIDIA FERMI GPGPUs • 9.2 Tbyte memory (48GB per node) • ~450 TeraBytes storage • 4x QDR InfiniBand interconnect (compute/
storage) • Fully operational: 7/2011 • Available to USP users on a testing basis
• 10 TeraFLOPS IBM POWER8 S822 • 3.0GHz POWER8 12 core processors • 8 nodes (192 cores, 1,536 threads) • 3.5 TiB memory (mix of 256 GiB and 1 TiB) • 361 TiB storage • FDR Infiniband Interconnect • To be commissioned in 2015 • For applica/ons with very high memory demands • If interested in tes/ng, contact [email protected]
Additional Resource: PowerOmics
19
XSEDE (National Major Supercomputing Resource formerly known as TeraGrid)
• Extreme Science and Engineering Discovery Environment • Name changed but The Song (and Mission) Remains the Same • Rice Campus Champions:
– Roger Moye – Qiyou Jiang – Jan Odegard
• If you need MILLIONS of CPU hours, XSEDE is for you • They are your advocates for getting XSEDE accounts and CPU
time so let us help you with your biggest Science
• XSEDE requires US a researcher to be part of the team, but many people would be enthusiastic about collaborations
• In Europe, there is PRACE, which provides similar opportunities • Worldwide, there are many internationally competitive resources
20
Top-‐caliber support and service for research compu/ng
A thriving community of HPC scien/sts sharing research and building collabora/ons with a significant presence in na/onal cyberinfrastructure issues.
GeVng Help, Organizing CollaboraBve Efforts
usp.rice.edu
“One-‐stop shop” Informa/on on: • Gerng accounts • Logging on • Compiling code • Running jobs • Gerng help • Updates and ac/vi/es • Transferring data • Acknowledgements
• Will be updated soon with BG/Q informa/on
USP-‐Rice Blue Gene CollaboraBon Webpage
For more informa/on • To request access to BG/P and BG/Q, send request to LCCA ([email protected]). Include: – Your name – Email – Research group, department, PI – Research area – Code to be run – Scalability informa/on (if available)
• For addi/onal ques/ons, email me: – Paul Whi<ord ([email protected])
usp.rice.edu
Any ques/ons?
Feel free to contact all of us. We are here to help LCCA [email protected] Paul Whi<ord [email protected] Xiaoqin Huang [email protected] Franco Bladilo [email protected]