TD MXC Parallel Programming Tatkar

download TD MXC Parallel Programming Tatkar

of 55

Transcript of TD MXC Parallel Programming Tatkar

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    1/55

    Sun Tech Days 07-08 /Sun Studio - # 1

    How to Develop SolarisParallel Applications

    Vijay TatkarSr. Engineering Manager

    Sun Studio Developer Toolshttp://blogs.sun.com/tatkar 1

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    2/55

    Sun Tech Days 07- 08 /Sun Studio - # 2

    The GHz Chip Clock Race is Over...

    Classic CPU efficiencies:Clock speedExecution optimizationCache

    Design Impediments:HeatPowerSlower Memory than chips

    Where is my 10GHz chip?

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    3/55

    Sun Tech Days 07- 08 /Sun Studio - # 3

    Putting transistors to work in a new way ...

    The MulticoreRevolution

    UltraSPARC T2

    1.4GHz * 8 cores(64 threads in a chip)

    Intel: Penryn, AMD: BarcelonaIntel: 4 cores * 3.1GHzAMD: 4 cores * 2.3GHz

    (4 threads in a chip)

    Every new system now has a multi-core chip in it

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    4/55

    Sun Tech Days 07- 08 /Sun Studio - # 4

    Things to know about Parallelism

    Parallel processing is not for massively parallel supercomputer anymore.(HPC High Priced Computing)

    CPU clock speed doubled every 18 months, whereas memory doubled every6 years! Heat, Memory, Power lead to multi-cores CPUs.

    Free ride is overfor serial programs relying on the hardware to boostperformance.

    Parallel programming is BEST BET for speedups> Parallelism is all about performance, first and foremost> Program correctness is often harder for parallel programs

    Parallelism is often considered hard, but there are several models to choosefrom, and compiler support for each model to ease the choice.

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    5/55

    Sun Tech Days 07- 08 /Sun Studio - # 5

    Programming Model

    Shared Memory ModelOpenMP (de-facto standard)Java, Native Multi-threaded Programming

    Distributed Memory Model

    Message Passing Interface MPI (de-facto standard)Parallel Virtual Machine PVM (less popular)

    Global Address SpaceUnified Parallel C UPC (research technology)

    Grid ComputingSun Grid Computing (www.network.com)

    Sun Grid Engine (www.sun.com/software/gridware)

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    6/55

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    7/55Sun Tech Days 07- 08 /Sun Studio - # 7

    Easiest Hardest

    SolarisEvent

    Ports

    Posix

    Threads

    Solaris

    Threads

    Atomic

    Operations

    libumem

    Application

    UltraSPARC T1/T2

    SPARC64 VI,

    UltraSPARC IV+

    Intel/AMD

    x86/x64

    Sun Studio Developer Tools

    MT MPIInstruction-levelParallelismAutomatic

    Parallelization,AutomaticVectorization

    Tuned MT libraries

    OpenMP

    Automatic Parallelization and Vectorization

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    8/55Sun Tech Days 07- 08 /Sun Studio - # 8

    Instruction level Parallelism

    Chips have figured out how to dispatch multiple instructions in parallelCompilers have figured out how to schedule for such processors

    Chips + Compilers are very mature in this regard, so there is no programmeraction required and the gain is automatic, whereever possible

    It IS possible to chew gum and walk at the same time!

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    9/55Sun Tech Days 07- 08 /Sun Studio - # 9

    Automatic Parallelization

    Support for the Fortran, C and C++ applicationsFirst introduced for 4-20 way SPARCserver 600 MP in 1991

    Useful for loop oriented programs

    Every (nested) loop will be analyzed for data dependencies andparallelized if safe to do so

    Non-loop code fragments will not be analyzedLoops versioned with serial and parallel code (runtime)

    Combine with powerful loop optimizations

    One can have subtle interactions between loop transformations andparallelization

    Compilers have limited knowledge about the application

    Overall gains can be impressive

    Entire SPECfp 2006 suite gains 16% with PARALLEL=2

    Individual gains can be upto 2x for suitable programs; libquantum from

    SPEC CPU2006 speeds up 6-7x on 8-cores!Not ever ro ram will see a ain

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    10/55Sun Tech Days 07- 08 /Sun Studio - # 10

    Automatic Parallelization Options

    -xautopar

    Automatic parallelization (Fortran, C and C++ compiler) requires -xO3 orhigher (-xautopar implies -xdepend)

    -xreduction

    Parallelize reduction operationsRecommended to use -fsimple=2 as well

    -xloopinfo

    Show parallelization messages on screen

    Only apply to the most time consuming parts of program

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    11/55Sun Tech Days 07- 08 /Sun Studio - # 11

    AutoPar: SPECfp 2006 improvements

    bwaves

    gamess

    milc zeusmp

    gro-mac

    cac-tusADM

    leslie3d

    namd

    dealII

    so-plex

    povr

    cal-culix

    gemsFDT

    tonto lbm wrf sphinx3

    0

    2.5

    5

    7.5

    10

    12.5

    15

    17.5

    20

    22.5

    25

    27.5

    Woodcrest box: 3.0GHz dual-core

    PARALLEL=2

    Overall Gain: 16%

    Base Flags

    + Autopar

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    12/55Sun Tech Days 07- 08 /Sun Studio - # 12

    Automatic Vectorization

    Support for the Fortran, C and C++ applications

    -xvector=simd exploits special SSE2+ instructions

    Works on data in adjacent memory locations

    Gains are smaller than -xautopar

    SPECfp 2006 gains are 3% overall and upto 14% range individually

    Best suited for loop-level SIMD parallelism

    for (i=0; i

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    13/55Sun Tech Days 07- 08 /Sun Studio - # 13

    Case Study:

    Vectorizing STREAM

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    14/55Sun Tech Days 07- 08 /Sun Studio - # 14

    Tuned MT Libraries Sun Perf Lib

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    15/55Sun Tech Days 07- 08 /Sun Studio - # 15

    Easiest Hardest

    Solaris

    Event

    Ports

    Posix

    Threads

    Solaris

    ThreadsAtomic

    Operations

    libumem

    Application

    AutoPar MPIMT

    OpenMP

    UltraSPARC T1/T2

    SPARC64 VI,

    UltraSPARC IV+

    Intel/AMD

    x86/x64

    Sun Studio Developer Tools

    Compiler Support : OpenMP

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    16/55

    Sun Tech Days 07- 08 /Sun Studio - # 16

    What is OpenMP?

    Defacto industry standard API for writing shared-memory parallel applicationsin C, C++ and Fortran See: http://www.openmp.org

    Consists of>Compiler directives (pragmas)>

    Runtime routines (libmtsk)>Environment variables

    Advantages:> Incremental parallelization of source code>Small(er) amount of programming effort

    >Good Performance and Scalability>Portable across variety of vendor compilers

    Sun Studio has consistently led OpenMP> Support for latest version (2.5 now, v3.0 API underway)

    > Consistent World Record SPEC OMP submissions for several years now

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    17/55

    Sun Tech Days 07- 08 /Sun Studio - # 17

    OpenMP- Directives with Intelligence

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    18/55

    Sun Tech Days 07- 08 /Sun Studio - # 18

    A Loop Parallelized With OpenMP

    #pragma omp parallel default (none) \shared(n, x, y) private (i){#pragma omp for

    for ( i = 0; i < n; i++)

    x[i] += y[i];} /*-- End of Parallel region -- */

    !$omp parallel default (none) &!$omp shared(n,x,y) private(i)!$omp do

    do i = 1, nx(i) = x(i) + y(i)

    end do!$ end do!$ end parallel

    C/C++

    Fortran

    Clauses

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    19/55

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    20/55

    Sun Tech Days 07- 08 /Sun Studio - # 20

    An OpenMP Example

    Find the primes up to 3,000,000 (216816) Run on Sun Fire 6800, Solaris 9, 24 processors 1.2GHz US-III+, with 9.8GB

    main memory

    Model # threads Time (secs) % changeSerial N/A 6.636 Base

    OpenMP

    1 7.210 8.65% drop

    2 3.771 1.76x faster

    4 1.988 3.34x faster8 1.090 6.09x faster

    16 0.638 10.40x faster

    20 0.550 12.06x faster

    24 0.931 Saturation drop

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    21/55

    Sun Tech Days 07- 08 /Sun Studio - # 21

    Easiest Hardest

    Solaris

    Event

    Ports

    Posix

    Threads

    Solaris

    ThreadsAtomic

    Operations

    libumem

    Application

    AutoPar MPIOpenMP

    MT

    UltraSPARC T1/T2

    SPARC64 VI,

    UltraSPARC IV+

    Intel/AMD

    x86/x64

    Sun Studio Developer Tools

    Compiler Support : Programming Threads

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    22/55

    Sun Tech Days 07- 08 /Sun Studio - # 22

    Programming Threads

    Use the POSIX APIs pthread_create, pthread_join,pthread_exit, et. al.> Recommendation: consider reducing the thread stack size

    (default is 1MB)

    > See pt hr ead_at t r _i ni t (3C) for this and other attributeswhich can be adjusted

    Do not use the native Solaris threading API (e.g.,thr_create).

    > Though applications which use it are still supported, it is non-portable.

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    23/55

    Sun Tech Days 07- 08 /Sun Studio - # 23

    Data Synchronization

    Concurrent access to shared data requiressynchronization> Mutexes (pthread_mutex_lock/pthread_mutex_unlock)> Condition Variables (pthread_cond_wait)> Reader/Writer locks

    (pthread_rwlock_rdlock/pthread_rwlock_wrlock)> Spin locks (pthread_spin_lock)

    Objects can be local to a process or shared betweenprocesses via shared memory.

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    24/55

    Sun Tech Days 07- 08 /Sun Studio - # 24

    MT Demo

    Multithreading Primes

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    25/55

    Sun Tech Days 07- 08 /Sun Studio - # 25

    int is_prime(int v)

    {

    int i;

    int bound = floor (sqrt((double)v)) + 1;

    for (i=2; i 1);

    }

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    26/55

    Sun Tech Days 07- 08 /Sun Studio - # 26

    void *work(void *arg){

    int start;

    int end;

    int i;

    int val= *((int *) arg);

    start = (N/THREADS) * val;end = start + N/THREADS;

    for (i = start; i < end ; i++) {

    if ( is_prime(i) ) {

    primes[total] = i;

    total++;

    }

    }

    return NULL;

    }

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    27/55

    Sun Tech Days 07- 08 /Sun Studio - # 27

    int main(int argc, char** argv)

    for (i=0; i < N; i++) {pflag[i] = 1;

    }

    for (i = 0; i < (THREADS-1); i++) {

    pthread_create(&tids[i], NULL, work, (void *) &i);

    }

    i = THREADS -1;

    work((void *) &i);

    for (i = 0; i < THREADS ; i++) {

    pthread_join(tids[i], NULL);

    }

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    28/55

    Sun Tech Days 07- 08 /Sun Studio - # 28

    STOP!

    Problem AheadRDT Demo, please

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    29/55

    Sun Tech Days 07- 08 /Sun Studio - # 29

    Data Race Condition

    A data race condition occurs when> multiple threads access a shared memory location> without synchronized accessing order> At least one access is to write a new data

    A data race problem often occurs in shared memoryparallel programming models such as Pthread andOpenMP.> The effect of a data race problem is unpredictable and may

    occur only once during hundreds of runs.

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    30/55

    Sun Tech Days 07- 08 /Sun Studio - # 30

    Thread Analyzer

    Detects data races and deadlocks in a multithreaded applicationPoints to non-deterministic or incorrect execution

    Bugs are notoriously difficult to detect by examination

    Points out actual and potential deadlock situations

    Process:

    Instrument the code with -xinstrument=dataraceDetect runtime condition with collect -r all [or race, detection]

    Use the Graphical Analyzer, tha, to identify conflicts and critical

    regions

    Works with OpenMP, Pthreads, Solaris Threads

    API provided for user-defined synchronization primitives

    Works on Solaris (SPARC, x86/x64) and Linux

    Static lock_lint tool to detect inconsistent use of locks

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    31/55

    Sun Tech Days 07- 08 /Sun Studio - # 31

    A True SPEC Story

    SPEC OMP Benchmark fma3d

    101 source files; 61,000 lines of Fortran code

    Data race in platq.f90 caused sporadic core dumps

    It took several engineers and 6 weeks of work to find the data race manually

    Perils of Having a DataRace Condition

    Program exhibits non-deterministic behavior

    Failure may be hard to reproduce

    Program may continue to execute, leading to failure in unrelated codeA data race is hard to detect using conventional debugging methods andtools

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    32/55

    Sun Tech Days 07-08 /Sun Studio - # 32

    How did Thread Analyzer help?

    SPECOMP Benchmark fma3d

    101 source files; 61,000 lines of Fortran code

    Data race in platq.f90 caused sporadic core dumps

    It took several engineers and 6 weeks of workto find the data race manually

    With the Sun Studio Thread Analyzer, the data racewas detected in just a few hours!

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    33/55

    Sun Tech Days 07- 08 /Sun Studio - # 33

    Easiest Hardest

    SolarisEvent

    Ports

    Posix

    Threads

    Solaris

    ThreadsAtomic

    Operations

    libumem

    Application

    UltraSPARC T1/T2

    SPARC64 VI,

    UltraSPARC IV+

    Intel/AMD

    x86/x64

    Sun Studio Developer Tools

    AutoPar OpenMPMT

    MPI

    Compiler Support : Message Passing Interface

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    34/55

    Sun Tech Days 07- 08 /Sun Studio - # 34

    Message Passing Interface (MPI)

    MPI programming model is a de-facto standard fordistributed memory parallel programming

    MPI API set is quite large (323 subroutines)MPI application can be programmed with less than 10 differentcallsImplemented with very small set of device interconnect lowlevel routines.

    Open MPI: http://www.open-mpi.org/

    MPI home page at Argonne National Laboratorieshttp://www-unix.mcs.anl.gov/mpi/

    http://www.open-mpi.org/http://www.open-mpi.org/
  • 8/14/2019 TD MXC Parallel Programming Tatkar

    35/55

    Sun Tech Days 07- 08 /Sun Studio - # 35

    Message Passing Interface (MPI)

    OpenMPI 2.0 Conformance

    ClusterTools 7.0 with Sun Studio

    Multiple processes runs under Open Runtime

    Environment Pass data messages between processes in point/block

    communication mode

    No race condition with right use of MPI message passing

    calls MPI profiling under Performance Analyzer

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    36/55

    Sun Tech Days 07- 08 /Sun Studio - # 36

    Launching MPI application

    For Single Program Multiple Data (SPMD)> mpirun -np x program1

    For Multiple Program Multiple Data (MPMD)>

    mpirun -np x program1 : -np y program 2 Launching on different nodes (hosts)

    > mpirun -np x -host program1

    And more ...

    Very flexible way of launching

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    37/55

    Sun Tech Days 07- 08 /Sun Studio - # 37

    Comparing OpenMP and MPI

    OpenMP

    Defacto industry standardLimited to one (SMP) system

    Not (yet?) GRID-ready

    Easier to get started

    Assistance from compilers

    Mix and match model

    Requires data scoping

    Increasingly popular (CMT?)

    Preserves sequential code

    Needs a compiler

    No special environment

    Performance issues implicit

    MPI

    Defacto industry standardRuns on number of systems

    GRID ready

    High and steep learning curve

    You're on your own

    All or nothing model

    No data scoping required

    More widely used (but...)

    No sequential version

    No compiler; just a library

    Requires runtime environment

    Easy to control performance

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    38/55

    Sun Tech Days 07-08 /Sun Studio - # 38

    Thank you !

    Vijay TatkarSr. Engineering Manager

    Sun Studio Developer Tools

    http://blogs.sun.com/tatkar 38

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    39/55

    Sun Tech Days 07- 08 /Sun Studio - # 39

    Case Study:

    AutoPar Matrix Multiply

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    40/55

    Sun Tech Days 07- 08 /Sun Studio - # 40

    AutoPar Example Program

    // Matrix Multiplication32 $define MAX 102433 void matrix_mul(float (*x_mat)[MAX],34 float(*y_mat)[MAX], float (*z_mat)[MAX]) {3536 for (int j = 0; j < MAX; j++) {37 for (int k = 0; k < MAX; k++) {38 z_mat[j][k] = 0.0;39 for (int t = 0; t < MAX; t++) {40 z_mat[j][k] += x_mat[j][t] * y_mat[t][k];41 }42 }43 }44 }

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    41/55

    Sun Tech Days 07- 08 /Sun Studio - # 41

    AutoPar Example Compilation

    CC -c mat_mul.cc -g -fast -xrestrict -xautopar-xloopinfo -o mat_mul.o

    "mat_mul.cc", line 36: PARALLELIZED"mat_mul.cc", line 37: not parallelized, not profitable"mat_mul.cc", line 39: not parallelized, unsafe dependence

    Can run er_src command on executable binaryto see internal compiler messages

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    42/55

    Sun Tech Days 07- 08 /Sun Studio - # 42

    %CC mat_mul.cc -g -fast -xrestrict -xinline=no -o noautopar%CC mat_mul.cc -g -fast -xrestrict -xloopinfo -xautopar -xinline=no -o autopar

    %ptime noautoparFinish multiplication of matrix of 1024

    real 1.536user 1.521sys 0.018

    %ptime autoparFinish multiplication of matrix of 1024

    real 1.542user 1.520sys 0.016

    %setenv PARALLEL 2%ptime autoparptime ./autoparFinish multiplication of matrix of 1024

    real 0.817user 1.572

    sys 0.016

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    43/55

    Sun Tech Days 07- 08 /Sun Studio - # 43

    OpenMP Demo

    Parallelizing Primes

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    44/55

    Sun Tech Days 07- 08 /Sun Studio - # 44

    Parallelizing Primes Example (OpenMP)

    Partition the problem space into smaller chunks anddispatch processing of each partition into individual(micro)tasks> A popular and practical example to illustrate how parallel

    software deals with large data> The basic design concept of this program example can be

    applied to many other parallel processing tasks.> The overall program structure is very simple

    >A thread worker routine

    >Main program creating multiple working threads/microtasks

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    45/55

    Sun Tech Days 07- 08 /Sun Studio - # 45

    int main_omp(int argc, char** argv)

    #ifdef _OPENMPomp_set_num_threads( NTHRS );

    omp_set_dynamic(0);

    #endif

    for (i=0; i < N; i++) {

    pflag[i] = 1;}

    #pragma omp parallel for

    for (i = 2; i < N ; i++) {

    if ( is_prime(i) ) {

    primes[total] = i;total++;

    }

    }

    printf("Number of prime numbers between 2 and %d: %d \n", N, total);

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    46/55

    Sun Tech Days 07- 08 /Sun Studio - # 46

    int is_prime(int v)

    int is_prime(int v)

    { int i, bound = floor (sqrt((double)v)) + 1;

    for (i=2; i 1);

    }

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    47/55

    Sun Tech Days 07- 08 /Sun Studio - # 47

    General Race Condition

    A general race condition is caused by anundetermined sequence of executions that violate theprogram state integrity> Data race condition is a simple form of general race condition

    > A general race problem can occur in shared memory and distributedmemory parallel programming

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    48/55

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    49/55

    Sun Tech Days 07- 08 /Sun Studio - # 49

    Design Practice to Avoid Races

    Adopt a higher design abstraction such as OpenMP Use Pass-by-value instead of pass-by-pointer to

    communicate between the threads

    Design the data structure to limit the global variableusage and restrict the access of shared memory

    Analyze a race problem to decide if it is a harmfulprogram bug or a benign race

    Understand and fix the real cause of a race conditioninstead of fixing race condition symptom

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    50/55

    Sun Tech Days 07- 08 /Sun Studio - # 50

    MPI: Single Program Multiple Data

    The processes launchedare in the samecommunicator> mpirun -np 8 msorts

    >The 8 processes launchedbelongs to theMPI_COMM_WORLDcommunicator

    >8 ranks: 0, 1, 2, 3, 4, 5, 6, 7

    >Total size: 8 All 8 processes running

    the same program, controlflow differ by checking theranks.

    MPI_Init(...);MPI_Comm_rank(

    MPI_COMM_WORLD, &rank);MPI_Comm_size(

    MPI_COMM_WORLD,&size);

    if (rank == 0) {...

    else if (rank == 1)...

    else if (rank == 2)

    ...}MPI_Finailize();

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    51/55

    Sun Tech Days 07- 08 /Sun Studio - # 51

    MPI Example: 7 Sorting Processes

    Driver

    Shakersort

    Heapsort

    StraightInsertion

    Sort

    Bubblesort

    StraightSelection

    Sort

    QiucksortBinary

    InsertionSort

    All together 8 processes

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    52/55

    Sun Tech Days 07- 08 /Sun Studio - # 52

    MPI Demo

    7 Sorting Processes

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    53/55

    Sun Tech Days 07- 08 /Sun Studio - # 53

    MPI: Non-Uniform Memory Performance

    Reg L1 L2 Main Memory VirtualMemory

    64 64KB

    8MB

    Tuning area

    Perfor

    mance

    The length of a plateau is related to the size of thatmemory component

    The amount of the drop is related to the latency(or bandwidth) of that memory component

    MPI can help reduce program size

    to fit into good regions

  • 8/14/2019 TD MXC Parallel Programming Tatkar

    54/55

    Sun Tech Days 07- 08 /Sun Studio - # 54

    Sun Studio and HPC

    Sun HPC http://www.sun.com/servers/HPC/index.jspSun HPC ClusterTools 7 Softwarehttp://www.sun.com/software/products/clustertools

    N1 Grid Engine Manager Software

    Other MPI LibrariesOpen Source MPI-CH library for Solaris Sparchttp://www-unix.mcs.anl.gov/mpi/mpich

    LAMMPI ported library for Solaris x86/x64

    http://apstc.sun.com.sg/popup.php?l1=research&l2=projects&l3=s1

    0port&f=applications#LAM/MPIMVAPICH MPI over InfiniBand for Solaris x86/x64

    http://nowlab.cse.ohio-state.edu/projects/mpi-iba

    http://www.sun.com/servers/HPC/index.jsphttp://www.sun.com/software/products/clustertoolshttp://www.sun.com/software/products/clustertoolshttp://www.sun.com/servers/HPC/index.jsp
  • 8/14/2019 TD MXC Parallel Programming Tatkar

    55/55

    Parallel Computing Environment

    Global and Enterprise LevelGrid & SOA

    Local ClusterGrid

    N1Grid

    MPI Appl OpenMP Appl MT Appl Serial Appl

    Multi-Thread

    OpenMP

    Multi-Process

    MPI

    WebService

    SOA

    Grid

    Loosely Coupled

    Tightly Coupled

    UPC/GAS