Post on 19-Jan-2016
description
Tools and Utilities for parallel Tools and Utilities for parallel and serial codes in ENEA-GRID and serial codes in ENEA-GRID
environmentenvironment
CRESCO Project: Salvatore Raia SubProject I.2
C.R. ENEA-Portici. 11/12/2007
GRID, cluster and parallel Computing (Intro)GRID, cluster and parallel Computing (Intro)
ENEA-GRID. Architecture and functionalityENEA-GRID. Architecture and functionality
My Activity for CRESCO project and results on My Activity for CRESCO project and results on ENEA-GRIDENEA-GRID
Conclusion and objectivesConclusion and objectives
C.R. ENEA-Portici. 11/12/2007
OUTLINE:OUTLINE:
C.R. ENEA-Portici. 11/12/2007
What is a cluster ?What is a cluster ?
-Collection of resources Collection of resources (HW, SW) connected via (HW, SW) connected via public or private network public or private network
- Each CPU runs a separated - Each CPU runs a separated
istance of operating system istance of operating system
-Administration: local-Administration: local
cluster 1cluster 1supercomputersupercomputer
SupercomputerSupercomputer= computer with many = computer with many processors connected via high-speed processors connected via high-speed
computer bus and that share the memory computer bus and that share the memory (SMP) . It runs one Operating system (SMP) . It runs one Operating system
- Collection of interconnectedCollection of interconnected clusters geographically clusters geographically distributed distributed
- administration: sometimes - administration: sometimes clusters belong to different clusters belong to different department or company department or companycluster Ncluster N
cluster 3cluster 3
cluster 2cluster 2cluster 1cluster 1
C.R. ENEA-Portici. 11/12/2007
How to get a Grid ?How to get a Grid ?
GRID 1GRID 1
GRID = nodes made of clusters and GRID = nodes made of clusters and each node may have Shared or each node may have Shared or
Distributed memory architectures Distributed memory architectures (Hybrid ) that share processes .(Hybrid ) that share processes .
ENEA-GRID has the same structure ENEA-GRID has the same structure With 6 clusters: With 6 clusters: Bologna, Casaccia, Bologna, Casaccia, Frascati, Portici, Trisaia, BrindisiFrascati, Portici, Trisaia, Brindisi
ENEA-GRID structure (HW)ENEA-GRID structure (HW)
C.R. ENEA-Portici. 11/12/2007
GRID featuresGRID features
Pro:Pro: Shared resourcesShared resources Low costs (clock ?)Low costs (clock ?) Open systemsOpen systems ScalabilityScalability
Con:Con: Several platformsSeveral platforms Load balancingLoad balancing User AccessUser Access
C.R. ENEA-Portici. 11/12/2007
How is it managed on How is it managed on ENEA-GRID ?ENEA-GRID ?
Frequency scaling (domain ?)Frequency scaling (domain ?)Power consumption P=CPower consumption P=C×V×V×F×V×V×F
ENEA-GRID structure (SW)ENEA-GRID structure (SW)
C.R. ENEA-Portici. 11/12/2007
ICA client ICA client Resources management Resources management
File System File System
Operating Systems Operating Systems
C.R. ENEA-Portici. 11/12/2007
User InterfaceUser Interface
USER ACCESSUSER ACCESS ICA clientICA client ssh o telnetssh o telnet web web
Switch hostSwitch host
Run Appl.Run Appl.
Jobs statusJobs status
Problem with:Problem with: Multi platformsMulti platforms Load balancingLoad balancing User AccessUser Access
C.R. ENEA-Portici. 11/12/2007
How to cope with ? How to cope with ?
My activity on ENEA-GRID (CRESCO pr.)My activity on ENEA-GRID (CRESCO pr.)
Serial and Parallel (MPI) codesSerial and Parallel (MPI) codes
User interfacesUser interfaces
LSF utilitiesLSF utilities
Software dev.Software dev.
C.R. ENEA-Portici. 11/12/2007
Tools for Serial and Parallel (MPI) codesTools for Serial and Parallel (MPI) codes
Serial codesSerial codes CompilersCompilers
GNUGNU PGIPGI IBMIBM
Parallel codes (MPI)Parallel codes (MPI) MPI ImplementationsMPI Implementations
MPICHMPICH LAM-MPILAM-MPI POEPOE
Multi PlatformMulti Platform
Problems with execution tooProblems with execution too
……toolstools
…So we need a lots of binaries for each platform.
Launcher: after compiling our source code in each platform, we have “binary1”…”binaryN” for host1,…hostN.
It is a shell script (placed on AFS) that selects the righteous “binary” for the selected host
C.R. ENEA-Portici. 11/12/2007
Some MPI problemsSome MPI problems
SERIALSERIAL Program for Fortran 77/90,C and C++ serial compilingProgram for Fortran 77/90,C and C++ serial compiling
(look (look Java InterfaceJava Interface)) Launcher for “NS2” application (use external libraries) Launcher for “NS2” application (use external libraries)
PARALLEL (MPI)PARALLEL (MPI) Launcher for running a test program (check command)Launcher for running a test program (check command) Launcher for HPL test on AIX and LinuxLauncher for HPL test on AIX and Linux
C.R. ENEA-Portici. 11/12/2007
Results: tools serial and parallel (MPI) codesResults: tools serial and parallel (MPI) codes
user1 installationuser1 installation user2 installationuser2 installation
C.R. ENEA-Portici. 11/12/2007
Analizing LSF utilities Serial and Parallel codesAnalizing LSF utilities Serial and Parallel codes
Serial codesSerial codes Resources definitionResources definition
““NS2” applicationNS2” application
Serial LSF utilitiesSerial LSF utilities Job array (Multicase)Job array (Multicase) ““lsgrun”lsgrun”
Parallel codes (MPI)Parallel codes (MPI) Parallel LSF utilitiesParallel LSF utilities
““mpijob” (MPICH)mpijob” (MPICH) ““poejob” (POE)poejob” (POE)
LSF ResourcesLSF Resources
No correlationNo correlation CorrelationCorrelation
C.R. ENEA-Portici. 11/12/2007
Results: Integration with other applicationResults: Integration with other application
Serial codesSerial codes
Parallel codes (MPI)Parallel codes (MPI)
(My)Java Interface(My)Java Interface
C.R. ENEA-Portici. 11/12/2007
Conclusion and objectivesConclusion and objectives
Launcher + LSF utilities + User interfaceLauncher + LSF utilities + User interfaceallow to create a omogeneous environmentallow to create a omogeneous environment
ObjectivesObjectives:: OptimizationOptimization of programs to launch serial and of programs to launch serial and
parallel codes, including checking resources to run parallel codes, including checking resources to run the application (the application (e.g. library, other programs, etc)e.g. library, other programs, etc)
ExploitationExploitation of LSF utilities in order to make easy of LSF utilities in order to make easy running MPI programs (running MPI programs (mpijob, poejob, etc) mpijob, poejob, etc) and load and load balancingbalancing
IImprove mprove error handlingerror handling for user interfaces … … for user interfaces … …
Andrew File System Andrew File System
C.R. ENEA-Portici. 11/12/2007
C.R. ENEA-Portici. 11/12/2007
LSF-Load Sharing Facilities LSF-Load Sharing Facilities