Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz...

19
Scicomp 11 - Edinburgh 1 Computational Chemistry codes usage on IBM SP and IBM Linux Cluster Sigismondo Boschi, CINECA  May 31 - June 3, 2005

Transcript of Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz...

Page 1: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 1

Computational Chemistry codes usage on IBM SP and IBM Linux Cluster

Sigismondo Boschi, CINECA

 May 31 ­ June 3, 2005

Page 2: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 2

Two typical applicationsGaussianNWchem

Page 3: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 3

SCF-HF example

The complexity of a computational chemistry system is determined by the number of basis functions: for the simplest engine you do need to evaluate:N4/8 integrals: O(N4) each of them will be used more times to build the Fock matrix.Apply the SCF procedure to an NxN matrix: O(N2.0­3.0)In general (not only for SCF) you can distinguish three approaches:

Page 4: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 4

Integral evaluation approachesIN CORE: all the integrals are evaluated once and put in memory. Then the matrix is build from them;DIRECT: any time you need an integral it is evaluated, but never stored;SEMI­DIRECT: some of the integrals are stored in memory or on disk, the others are evaluated when needed;

How do I choose? It depends on the characteristics of your computer. On today architectures standard semi­direct is the most common choice.

Page 5: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 5

The “Gaussian case”

Its story starts in 70’sSome “re­design” in late 80’s:common ­> argumentsMANY contributors to “side” engines500000+ lines of code in Gaussian03Computing engines based on I/O90% of runs use the SCF core: standard runsIt has been written before the invention of parallel computing!Parallelised partially (SCF, optimizations…) using OpenMP and/or Linda

Page 6: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 6

I/O subsystems for parallel systems

200 -> 500 MB/s on CLX

30 MB/s on CLX

scalablededicateddistributed cachedifficult to export data

high speedshared in bandwidthimmediate to uselarge latencyNo memory cache

Page 7: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 7

Comparing HF-SCF with NWChemcc­pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry.109 integrals (10GBs workarea)Integral storage choices:

local disks: semidirect, with minimal usage of memorygpfs: semidirect, with minimal usage of memoryno disk: direct if integrals do not fit into default memory; in­core otherwiseno disk + mem: tell NWchem to use up to 100MW per CPUlocal disk + mem: use local disks and 100MW cache on every CPU.

Page 8: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 8

SP HPS Power4 1.3 GHz vs 

Cluster Linux Myrinet Xeon 3.055 GHzJust one 32 processors (1 p690 node, no HPS usage) run:

Standard (GPFS):CLX: 199.7 sec SP4: 240.7 sec

Specifing memory:CLX:    59.1 sec SP4:   79.9 sec

Page 9: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 9

What does it mean “standard”?

scratch_dir /scratch/abc0memory total 900 mb

… molecule specifications …

task scf

means also:global 450heap 225stack 225

Page 10: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 10

“expert usage”

scratch_dir /scratch/abc0memory total 900 global 100 mb

… molecule specifications …

scf  RHF  nopen 0  semidirect memsize 100000000end

task scf

“no disk”:disksize 0

“local disk”:/scratch_local/abc0

means also:heap 400stack 400

Page 11: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 11

1 2 4 8 16 32 64 128 25610

100

1000

10000

100000

local disks

gpfs

no disk

no disk + mem

local disks + mem

n. CPUs

Tim

e (s

econ

ds)

Timing of the same run (cc-pvdz amoxycillin, 449 basis functions, 109

integrals, 10GBs workarea) with different choices of integral storage

Page 12: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 12

what's better?

LOW

GPFSlocal disk + memlocal diskmem

MEDIUM

local disk+memmemGPFSlocal disk

HIGH

local disk+memmemlocal disk

GPFS

It depends on the parallelism level but it is NEVER worthwhile to recompute the integral respect to storing them except for one case...

Page 13: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 13

what's the standard with NWchem?semidirect on GPFS – the worst for high scalability!!!Even if a modern code, the standard usage let you have standard capabilities… that are not compatible with high parallelism.even a good piece of code need an expert “driver”.Computational chemists needs to get involved in the problem;Computational chemistry codes as well;Dealing correctly with memory is the problem.

Page 14: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 14

1 2 4 8 16 32 64 128 2560

0,2

0,4

0,6

0,8

1

1,2

1,4

1,6

1,8

2

local disks

gpfs

no disk

no disk + mem

local disks + mem

n. CPUs

para

llel e

ffici

ency

Parallel efficiency

Page 15: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 15

scalability

The methods that scale better are the ones that do not use disk at all, despite their absolute performance:

I/Ocontext switchesOS activityvariable task times

The worst is GPFS! It is able of good performance but it is not scalable! Furthermore it is shared => variable performance.

Page 16: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 16

cpu/elapsed time ratio

1 2 4 8 16 32 64 128 2560

0,050,1

0,150,2

0,250,3

0,350,4

0,450,5

0,550,6

0,650,7

0,750,8

0,850,9

0,951

local disks

gpfs

no disk

no disk + mem

local disks + mem

n. CPUs

CP

U/e

laps

ed ti

me

Page 17: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 17

What's up with I/O?

it is not usefuloften it is negativebest performance for high scalability is without I/OI/O is a killer for high scalability: it needs context switches

Page 18: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 18

an historical problem...Computational chemistry kernels:

in core = all integrals in memory, including zeros;direct = integrals evaluated as needed;semi­direct = AO integrals recomputed as needed, MO quantities stored temporarly on disk

ATTENTION! this means that it is assumed that:either you have enought memory to do the in­core stuffor you do not, and you need disk (semidirect) or CPU (direct).Memory is just a “cache” to disk… 

ALIAS: you must specify by hand “do not use disk”... if you can at all, because today parallel computers...

Page 19: Computational Chemistry codes usage on IBM SP and IBM ...Comparing HF-SCF with NWChem cc pvtz amoxycillin, 44 atoms, 449 basis functions, C1 symmetry. 109 integrals (10GBs workarea)

Scicomp 11 ­ Edinburgh 19

... have really a lot of distributed, fast and unused memory