Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz...
Transcript of Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz...
Scicomp 11 Edinburgh 1
Computational Chemistry codes usage on IBM SP and IBM Linux Cluster
Sigismondo Boschi, CINECA
May 31 June 3, 2005
Scicomp 11 Edinburgh 2
Two typical applicationsGaussianNWchem
Scicomp 11 Edinburgh 3
SCF-HF example
The complexity of a computational chemistry system is determined by the number of basis functions: for the simplest engine you do need to evaluate:N4/8 integrals: O(N4) each of them will be used more times to build the Fock matrix.Apply the SCF procedure to an NxN matrix: O(N2.03.0)In general (not only for SCF) you can distinguish three approaches:
Scicomp 11 Edinburgh 4
Integral evaluation approachesIN CORE: all the integrals are evaluated once and put in memory. Then the matrix is build from them;DIRECT: any time you need an integral it is evaluated, but never stored;SEMIDIRECT: some of the integrals are stored in memory or on disk, the others are evaluated when needed;
How do I choose? It depends on the characteristics of your computer. On today architectures standard semidirect is the most common choice.
Scicomp 11 Edinburgh 5
The “Gaussian case”
Its story starts in 70’sSome “redesign” in late 80’s:common > argumentsMANY contributors to “side” engines500000+ lines of code in Gaussian03Computing engines based on I/O90% of runs use the SCF core: standard runsIt has been written before the invention of parallel computing!Parallelised partially (SCF, optimizations…) using OpenMP and/or Linda
Scicomp 11 Edinburgh 6
I/O subsystems for parallel systems
200 -> 500 MB/s on CLX
30 MB/s on CLX
scalablededicateddistributed cachedifficult to export data
high speedshared in bandwidthimmediate to uselarge latencyNo memory cache
Scicomp 11 Edinburgh 7
Comparing HF-SCF with NWChemccpvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry.109 integrals (10GBs workarea)Integral storage choices:
local disks: semidirect, with minimal usage of memorygpfs: semidirect, with minimal usage of memoryno disk: direct if integrals do not fit into default memory; incore otherwiseno disk + mem: tell NWchem to use up to 100MW per CPUlocal disk + mem: use local disks and 100MW cache on every CPU.
Scicomp 11 Edinburgh 8
SP HPS Power4 1.3 GHz vs
Cluster Linux Myrinet Xeon 3.055 GHzJust one 32 processors (1 p690 node, no HPS usage) run:
Standard (GPFS):CLX: 199.7 sec SP4: 240.7 sec
Specifing memory:CLX: 59.1 sec SP4: 79.9 sec
Scicomp 11 Edinburgh 9
What does it mean “standard”?
scratch_dir /scratch/abc0memory total 900 mb
… molecule specifications …
task scf
means also:global 450heap 225stack 225
Scicomp 11 Edinburgh 10
“expert usage”
scratch_dir /scratch/abc0memory total 900 global 100 mb
… molecule specifications …
scf RHF nopen 0 semidirect memsize 100000000end
task scf
“no disk”:disksize 0
“local disk”:/scratch_local/abc0
means also:heap 400stack 400
Scicomp 11 Edinburgh 11
1 2 4 8 16 32 64 128 25610
100
1000
10000
100000
local disks
gpfs
no disk
no disk + mem
local disks + mem
n. CPUs
Tim
e (s
econ
ds)
Timing of the same run (cc-pvdz amoxycillin, 449 basis functions, 109
integrals, 10GBs workarea) with different choices of integral storage
Scicomp 11 Edinburgh 12
what's better?
LOW
GPFSlocal disk + memlocal diskmem
MEDIUM
local disk+memmemGPFSlocal disk
HIGH
local disk+memmemlocal disk
GPFS
It depends on the parallelism level but it is NEVER worthwhile to recompute the integral respect to storing them except for one case...
Scicomp 11 Edinburgh 13
what's the standard with NWchem?semidirect on GPFS – the worst for high scalability!!!Even if a modern code, the standard usage let you have standard capabilities… that are not compatible with high parallelism.even a good piece of code need an expert “driver”.Computational chemists needs to get involved in the problem;Computational chemistry codes as well;Dealing correctly with memory is the problem.
Scicomp 11 Edinburgh 14
1 2 4 8 16 32 64 128 2560
0,2
0,4
0,6
0,8
1
1,2
1,4
1,6
1,8
2
local disks
gpfs
no disk
no disk + mem
local disks + mem
n. CPUs
para
llel e
ffici
ency
Parallel efficiency
Scicomp 11 Edinburgh 15
scalability
The methods that scale better are the ones that do not use disk at all, despite their absolute performance:
I/Ocontext switchesOS activityvariable task times
The worst is GPFS! It is able of good performance but it is not scalable! Furthermore it is shared => variable performance.
Scicomp 11 Edinburgh 16
cpu/elapsed time ratio
1 2 4 8 16 32 64 128 2560
0,050,1
0,150,2
0,250,3
0,350,4
0,450,5
0,550,6
0,650,7
0,750,8
0,850,9
0,951
local disks
gpfs
no disk
no disk + mem
local disks + mem
n. CPUs
CP
U/e
laps
ed ti
me
Scicomp 11 Edinburgh 17
What's up with I/O?
it is not usefuloften it is negativebest performance for high scalability is without I/OI/O is a killer for high scalability: it needs context switches
Scicomp 11 Edinburgh 18
an historical problem...Computational chemistry kernels:
in core = all integrals in memory, including zeros;direct = integrals evaluated as needed;semidirect = AO integrals recomputed as needed, MO quantities stored temporarly on disk
ATTENTION! this means that it is assumed that:either you have enought memory to do the incore stuffor you do not, and you need disk (semidirect) or CPU (direct).Memory is just a “cache” to disk…
ALIAS: you must specify by hand “do not use disk”... if you can at all, because today parallel computers...
Scicomp 11 Edinburgh 19
... have really a lot of distributed, fast and unused memory