Using Distributed Computing in Computational Fluid Dynamics Zoran Constantinescu, Jens Holmen, Pavel...

UsingDistributed ComputinginComputational Fluid DynamicsZoran Constantinescu,Jens Holmen, Pavel Petrovic13-15 May 2003, MoscowNorwegian University of Science and Technology

outlineneed more processing powersupercomputersclusters of PCsdistributed computingexisting systemsQADPZ project description advantages applications current statusNavier-Stokes solverCFD application: existing solver using Diffpack and mpich small mpi replacement results and comparison future work simulations, data processing, complex algs.

problems larger and larger amounts of data are generated every day (simulations, measurements, etc.) software applications used for handling this data are requiring more and more CPU power: simulation, visualization, data processing complex algorithms, e.g. evolutionary algorithms large populations, evaluations very time consuming need for parallel processing

solution1use parallel supercomputersaccess to tens/hundreds of CPUse.g. NOTUR/NTNU = embla (512) + gridur (384)high speed interconnect, shared memoryusually for batch mode processingvery expensive (price, maintenance, upgrade)these CPUs are not so powerful anymoree.g. 500 MHz RISC vs. today's 2.5 GHz Pentium4

solution2use clusters of PCs (Beowulf)network of personal computersusually running Linux operating systempowerful CPUs (Pentium3/4, Athlon)high speed networking (100 MBps, 1 GBps, Myrinet)much cheaper than supercomputersstill quite expensive (install, upgrade, maintenance)trade higher availability and/or greater performance for lower cost

solution3use distributed computingusing existing networks of workstations(PCs from labs, offices connected by LAN )usually running Windows or Linux operating system(also MacOS, Solaris, IRIX, BSD, etc.)powerful CPUs (Pentium3/4, AMD Athlon)high speed networking (100 MBps)already installed computers very cheapalready existing maintenanceeasy to have a network of tens/hundreds of computers

distributed computingspecialized applications run on each individual computer from the LANthey talk to one or more central serversdownload a task, solve it, and send back resultsmore suited (easier) for task-parallel applications(where the applic. can be decomposed into independent tasks)can also be used for data-parallel applicationsthe number of available CPUs is more dynamic

existing systemsseti@homesearch for extraterrestrial intelligenceanalysis of data from radio telescopesclient application is very specializedusing the Internet to connect clients to the server, and to download/upload a task/resultsno framework for other type of applicationsno source code available

existing systemsdistributed.netone of the largest "computer" in the world (~20TFlops)used for solving computational challenges:RC5 cryptography, Optimal Golomb rulerclient application is very specializedusing the Internet to connect clients to the serverno framework for other applicationsno source code available

existing systemsCondor project (Univ.of Wisconsin)more research oriented computational projectsmore advanced features, user applicationsvery difficult to install, problems with some OSes(started from a Unix environment)restrictive license (closed system)other commercial projects: Entropia, Parabonexpensive for research, only certain OSes

a new systemQADPZ project (NTNU)initial application domains: large scale visualization, genetic algorithms, neural networksprototype in early 2001, but abandoned (too viz oriented)started in July 2001, first release v0.1 in Aug 2001we are now close a new release v0.8 (May 2003)system independent of any specific application domainopen source project on SourceForge.net

http://qadpz.sourceforge.net

QADPZ descriptionQADPZ project (NTNU)similar in many ways to Condor (submit computing tasks to idle computers running in a network)easy to install, use, and maintainmodular and extensibleopen source project, implemented in C++support for many OSes (Linux, Windows, Unix, )support for multiple users, encryptionlogging and statistics

QADPZ architectureclient master slave management of slaves scheduling tasks monitoring tasks controlling tasks background process reports status download tasks/data executing tasks user interface submit tasks/data

communicationrepository external server: web, ftp, internal lightweight web server control flow data flow

the clientis the interface to the system describes job project file, prepares task code, prepares input files, submit the taskscommunicates with the mastertwo modes:interactive: stay connected to master and wait for the resultsbatch: detach from the master and get results later; the master will keep all the messages

jobs, tasks, subtasksjob:consists of groups of tasks executed sequentially or in parallela task can consist of subtasks (the same executable is run, but with different input data)each task is submitted individually, the user specifies which OS and min. resource requirements (disk, mem)the master allocates the most suitable slave for executing the tasks and notifies the clientwhen task is finished, results are stored as specified in the project and the client is notified

the tasksnormal binary executable codeno modifications requiredmust be compiled for each of the used platformsnormal interpreted programshell script, Perl, PythonJava programrequires interpreter/VM on each slavedynamically loaded slave library (our API)better performancemore flexibility

the masterkeeps account of all existing slaves (status, specifications)usually one master is enoughmore can be used if there are too many slaves (communication protocol allows one master to act as another client, but this is not fully implemented yet)keeps account of all submitted jobs/taskskeeps a list of accepted users (based on username/passwd)gathers statistics about slaves, taskscan optionally run the internal web server for the data and code repository

the slave

OO design

communicationlayered communication protocolexchanged messages are in XML (w/ compress+encrypt)uses UDP with a reliable layer on topremove?

configurationslaveqadpz_slave (daemon) + slave.cfgmasterqadpz_master (daemon) + master.cfg + qadpz_admin + users.txt + privkeyclientqadpz_run + client.cfg + pubkey Linux, Win32 (9x,2K,XP), SunOS, IRIX, FreeBSD, Darwin MacOSX

installationone of our computer labs (Rose lab)~80 PCs Pentium3, 733 MHz, 128 MBytesdual boot: Win2000 and FreeBSDrunning for several monthwhen a student logs in into the computer, the slave running on that computer is set into disable mode (no new computing tasks are accepted, any current tasks in killed and/or restarted on another comp.)our Beowulf cluster (ClustIS)40 nodes AMD Athlon, 1466 MHz, 512/1024 GBytes100 MBps network, Linux

status

CFD applicationexisting solver for the incompressible Navier-Stokes equations:using the projection method, eq. for the velocity and pressure are solved sequentially at each time stepusing Galerkin finite element method to discretize in spacethe pressure Poisson eq. is preconditioned with a mixed additive/multiplicative domain decomposition method: one iteration of overlapping Schwartz combined with a coarse grid correction

NS solverdeveloped in C++ using the Diffpack object oriented numerical library for differential eqs.used the parallel version based on standard MPI (message passing interface) for communication we used mpich

tested on our 40 nodes PC cluster

MPI implementationneeded to replace the MPI calls used by Diffpacka subset of the MPI standard:MPI_Init, MPI_FinalizeMPI_Comm_size, MPI_Comm_rankMPI_Send, MPI_RecvMPI_Isend, MPI_IrecvMPI_Bcast, MPI_WaitMPI_Reduce, MPI_AllreduceMPI_BarrierMPI_Wtime

test caseoscillating flow around a fixed cylinder3D domain with 8-node isoparametric elements

coarse grid: 2 184 nodes, 1 728 elementsfine grid: 81 600 nodes, 76 800 elements

10 time stepsrunning time ~30 minutes with 2 CPUs(AMD Athlon 1466 MHz)

coarse grid

running time

Chart1

160017131616

101212311050

752933780

555716570

458616480

430633450

344515364

329537352

315545345

260514286

245490272

213493236

203456226

208489225

191492225

MPICH

QADPZ bzip2

QADPZ lzo

CPUs

time [sec]

NS equation

Sheet1

12345678910111213141516

mpich16001012752555458430344329315260245213203208191

qadpz-bzip217131231933716616633515537545514490493456489492

qadpz-lzo16161050780570480450364352345286272236226225225

% bzip26.517.719.322.425.63233.238.742.249.45056.755.457.461.1

% lzo0.93.63.52.64.54.45.46.58.699.99.710.17.515.1

Sheet1

MPICH

QADPZ bzip2

QADPZ lzo

CPUs

time [sec]

NS equation

Sheet2

Sheet3

monitor

future worklocal caching of binaries on the slavesdifferent scheduling protocols on masterdynamic load balancing of tasksconnect to the Gridweb interface to the clientcreating jobs easier, with input datastarting/stopping jobsmonitoring execution of jobseasy access to the output of executionshould decrease learning effort for using the system

the workcooperation between

Norwegian Univ. of Science and Technology Division of Intelligent Systems (DIS) http://dis.idi.ntnu.no

SINTEF http://www.sintef.no

thank youquestions ?http://qadpz.sourceforge.net

Using Distributed Computing in Computational Fluid Dynamics Zoran Constantinescu, Jens Holmen, Pavel...

Documents

Transcript of Using Distributed Computing in Computational Fluid Dynamics Zoran Constantinescu, Jens Holmen, Pavel...