Using Distributed Computing in Computational Fluid Dynamics Zoran Constantinescu, Jens Holmen, Pavel...

35
Using Distributed Computing in Computational Fluid Dynamics Zoran Constantinescu, Jens Holmen, Pavel Petrovic 13-15 May 2003, Moscow Norwegian University of Science and Technology

Transcript of Using Distributed Computing in Computational Fluid Dynamics Zoran Constantinescu, Jens Holmen, Pavel...

  • UsingDistributed ComputinginComputational Fluid DynamicsZoran Constantinescu,Jens Holmen, Pavel Petrovic13-15 May 2003, MoscowNorwegian University of Science and Technology

  • outlineneed more processing powersupercomputersclusters of PCsdistributed computingexisting systemsQADPZ project description advantages applications current statusNavier-Stokes solverCFD application: existing solver using Diffpack and mpich small mpi replacement results and comparison future work simulations, data processing, complex algs.

  • problems larger and larger amounts of data are generated every day (simulations, measurements, etc.) software applications used for handling this data are requiring more and more CPU power: simulation, visualization, data processing complex algorithms, e.g. evolutionary algorithms large populations, evaluations very time consuming need for parallel processing

  • solution1use parallel supercomputersaccess to tens/hundreds of CPUse.g. NOTUR/NTNU = embla (512) + gridur (384)high speed interconnect, shared memoryusually for batch mode processingvery expensive (price, maintenance, upgrade)these CPUs are not so powerful anymoree.g. 500 MHz RISC vs. today's 2.5 GHz Pentium4

  • solution2use clusters of PCs (Beowulf)network of personal computersusually running Linux operating systempowerful CPUs (Pentium3/4, Athlon)high speed networking (100 MBps, 1 GBps, Myrinet)much cheaper than supercomputersstill quite expensive (install, upgrade, maintenance)trade higher availability and/or greater performance for lower cost

  • solution3use distributed computingusing existing networks of workstations(PCs from labs, offices connected by LAN )usually running Windows or Linux operating system(also MacOS, Solaris, IRIX, BSD, etc.)powerful CPUs (Pentium3/4, AMD Athlon)high speed networking (100 MBps)already installed computers very cheapalready existing maintenanceeasy to have a network of tens/hundreds of computers

  • distributed computingspecialized applications run on each individual computer from the LANthey talk to one or more central serversdownload a task, solve it, and send back resultsmore suited (easier) for task-parallel applications(where the applic. can be decomposed into independent tasks)can also be used for data-parallel applicationsthe number of available CPUs is more dynamic

  • existing systemsseti@homesearch for extraterrestrial intelligenceanalysis of data from radio telescopesclient application is very specializedusing the Internet to connect clients to the server, and to download/upload a task/resultsno framework for other type of applicationsno source code available

  • existing systemsdistributed.netone of the largest "computer" in the world (~20TFlops)used for solving computational challenges:RC5 cryptography, Optimal Golomb rulerclient application is very specializedusing the Internet to connect clients to the serverno framework for other applicationsno source code available

  • existing systemsCondor project (Univ.of Wisconsin)more research oriented computational projectsmore advanced features, user applicationsvery difficult to install, problems with some OSes(started from a Unix environment)restrictive license (closed system)other commercial projects: Entropia, Parabonexpensive for research, only certain OSes

  • a new systemQADPZ project (NTNU)initial application domains: large scale visualization, genetic algorithms, neural networksprototype in early 2001, but abandoned (too viz oriented)started in July 2001, first release v0.1 in Aug 2001we are now close a new release v0.8 (May 2003)system independent of any specific application domainopen source project on SourceForge.net

  • http://qadpz.sourceforge.net

  • QADPZ descriptionQADPZ project (NTNU)similar in many ways to Condor (submit computing tasks to idle computers running in a network)easy to install, use, and maintainmodular and extensibleopen source project, implemented in C++support for many OSes (Linux, Windows, Unix, )support for multiple users, encryptionlogging and statistics

  • QADPZ architectureclient master slave management of slaves scheduling tasks monitoring tasks controlling tasks background process reports status download tasks/data executing tasks user interface submit tasks/data

  • communicationrepository external server: web, ftp, internal lightweight web server control flow data flow

  • the clientis the interface to the system describes job project file, prepares task code, prepares input files, submit the taskscommunicates with the mastertwo modes:interactive: stay connected to master and wait for the resultsbatch: detach from the master and get results later; the master will keep all the messages

  • jobs, tasks, subtasksjob:consists of groups of tasks executed sequentially or in parallela task can consist of subtasks (the same executable is run, but with different input data)each task is submitted individually, the user specifies which OS and min. resource requirements (disk, mem)the master allocates the most suitable slave for executing the tasks and notifies the clientwhen task is finished, results are stored as specified in the project and the client is notified

  • the tasksnormal binary executable codeno modifications requiredmust be compiled for each of the used platformsnormal interpreted programshell script, Perl, PythonJava programrequires interpreter/VM on each slavedynamically loaded slave library (our API)better performancemore flexibility

  • the masterkeeps account of all existing slaves (status, specifications)usually one master is enoughmore can be used if there are too many slaves (communication protocol allows one master to act as another client, but this is not fully implemented yet)keeps account of all submitted jobs/taskskeeps a list of accepted users (based on username/passwd)gathers statistics about slaves, taskscan optionally run the internal web server for the data and code repository

  • the slave

  • OO design

  • communicationlayered communication protocolexchanged messages are in XML (w/ compress+encrypt)uses UDP with a reliable layer on topremove?

  • configurationslaveqadpz_slave (daemon) + slave.cfgmasterqadpz_master (daemon) + master.cfg + qadpz_admin + users.txt + privkeyclientqadpz_run + client.cfg + pubkey Linux, Win32 (9x,2K,XP), SunOS, IRIX, FreeBSD, Darwin MacOSX

  • installationone of our computer labs (Rose lab)~80 PCs Pentium3, 733 MHz, 128 MBytesdual boot: Win2000 and FreeBSDrunning for several monthwhen a student logs in into the computer, the slave running on that computer is set into disable mode (no new computing tasks are accepted, any current tasks in killed and/or restarted on another comp.)our Beowulf cluster (ClustIS)40 nodes AMD Athlon, 1466 MHz, 512/1024 GBytes100 MBps network, Linux

  • status

  • CFD applicationexisting solver for the incompressible Navier-Stokes equations:using the projection method, eq. for the velocity and pressure are solved sequentially at each time stepusing Galerkin finite element method to discretize in spacethe pressure Poisson eq. is preconditioned with a mixed additive/multiplicative domain decomposition method: one iteration of overlapping Schwartz combined with a coarse grid correction

  • NS solverdeveloped in C++ using the Diffpack object oriented numerical library for differential eqs.used the parallel version based on standard MPI (message passing interface) for communication we used mpich

    tested on our 40 nodes PC cluster

  • MPI implementationneeded to replace the MPI calls used by Diffpacka subset of the MPI standard:MPI_Init, MPI_FinalizeMPI_Comm_size, MPI_Comm_rankMPI_Send, MPI_RecvMPI_Isend, MPI_IrecvMPI_Bcast, MPI_WaitMPI_Reduce, MPI_AllreduceMPI_BarrierMPI_Wtime

  • test caseoscillating flow around a fixed cylinder3D domain with 8-node isoparametric elements

    coarse grid: 2 184 nodes, 1 728 elementsfine grid: 81 600 nodes, 76 800 elements

    10 time stepsrunning time ~30 minutes with 2 CPUs(AMD Athlon 1466 MHz)

  • coarse grid

  • running time

    Chart1

    160017131616

    101212311050

    752933780

    555716570

    458616480

    430633450

    344515364

    329537352

    315545345

    260514286

    245490272

    213493236

    203456226

    208489225

    191492225

    MPICH

    QADPZ bzip2

    QADPZ lzo

    CPUs

    time [sec]

    NS equation

    Sheet1

    12345678910111213141516

    mpich16001012752555458430344329315260245213203208191

    qadpz-bzip217131231933716616633515537545514490493456489492

    qadpz-lzo16161050780570480450364352345286272236226225225

    % bzip26.517.719.322.425.63233.238.742.249.45056.755.457.461.1

    % lzo0.93.63.52.64.54.45.46.58.699.99.710.17.515.1

    Sheet1

    MPICH

    QADPZ bzip2

    QADPZ lzo

    CPUs

    time [sec]

    NS equation

    Sheet2

    Sheet3

  • monitor

  • future worklocal caching of binaries on the slavesdifferent scheduling protocols on masterdynamic load balancing of tasksconnect to the Gridweb interface to the clientcreating jobs easier, with input datastarting/stopping jobsmonitoring execution of jobseasy access to the output of executionshould decrease learning effort for using the system

  • the workcooperation between

    Norwegian Univ. of Science and Technology Division of Intelligent Systems (DIS) http://dis.idi.ntnu.no

    SINTEF http://www.sintef.no

  • thank youquestions ?http://qadpz.sourceforge.net