Introduction to Parallel Computing

1

Introduction to Parallel Computing

2

Presentation Outline• Doing science and engineering using HPC• Basic concepts of parallel computing• Discussion of HPC hardware• Programming approaches (HPC software):

• Library-based approaches• Language-based approaches

• HPC facilities at NIIT

3

High Performance Computing (HPC)• The prime focus of HPC is performance—the ability to

solve biggest possible problems in the least possible time

• Also called “Parallel Computing”: • The use of multiple processors, used in parallel, to solve an

application

• Normally such computing is used to solve challenging scientific problems by doing simulations: • For this reason, it is also called “Scientific Computing”:

• Computational science

• HPC is a highly specialized area:• Probably our best chance to work for world’s top research and

commercial organizations: • NASA, European Agency (ESA) …• Google is known to have immense computational power—the

quantity remains unknown!

4

Doing science and engineering using HPC

• HPC is aiding to solve some of the most important problems in science today by pushing software and hardware technology to its limits

• Scientific Computing (or computational science) is the field of study concerned with:• Constructing mathematical models and numerical solution

techniques• Using computers to analyze and solve scientific and engineering

problems

• Applications areas: • Computer-aided Engineering• Weather forecast simulations• Animated movies (Hollywood!)• Image processing• Cryptography • Hurricane forecasts:

• Path as well intensity (Katrina)

5

HPC driving science? • The Millennium Simulation:

• Computational Astrophysics• Heralded as “the” largest ever model

of the Universe• Follows the evolution of ten billion

“dark matter” particles• The simulation ran on a

supercomputer for almost a month

• The Blue Brain Project:• Computational Neuroscience• An effort to simulate the working of a

mammalian brain• One of the fastest supercomputers in

the world is used for the simulations

Arguably these projects cannot be done without HPC

6

PAM CRASH—A Case Study from Automobile Industry

• PAM CRASH is parallel application for studying structural deformation, employed in simulations of automotive crashes and other situations:• An effective alternative to physical crashes, which are

expensive and time-consuming

• Modern simulations take into account millions of elements:• Such compute-intensive simulations can only be

studied on parallel hardware

• Automobile giants including Audi, BMW, Volkswagen and others are conducting crash simulations using PAM CRASH

8




9

Serial Computation • Traditionally, software has been written for

serial computation:• To be run on a single computer having a single

Central Processing Unit (CPU)• A problem is broken into a discrete series of

instructions

10

Parallel Computation• Parallel computing is the simultaneous use of

multiple compute resources to solve a computational problem:• To be run using multiple CPUs• A problem is broken into discrete parts that can be

solved concurrently

11

Flynn’s Taxonomy• There is no authoritative classification of parallel

computers! • Flynn’s taxonomy is one such classification based on

number of instruction and data stream processed by a parallel computer: • Single Instruction Single Data (SISD)• Multiple Instruction Single Data (MISD)• Single Instruction Multiple Data (SIMD)• Multiple Instruction Multiple Data (MIMD)

• Almost all modern computers fall in this category

12

Flynn’s Taxonomy• Extensions to Flynn’s taxonomy:

• Single Program Multiple Data (SPMD)—a programming model

• This classification is largely outdated!

13




14

HPC Hardware• Traditionally HPC has adopted expensive

parallel hardware: • Massively Parallel Processors (MPP)• Symmetric Multi-Processors (SMP)

• Cluster Computers: • A group of PCs connected through a fast (private)

network

• Other classifications:• Distributed Memory Machines• Shared Memory Machines

15

Massively Parallel Processors (MPP)

• A large parallel processing computer with a shared-nothing approach: • The term signifies that each computer has its own

cache and memory

• Examples include Cray XT3, T3E, T3D, IBM SP/2

16

Symmetric Multi-Processors (SMP)• A SMP is a parallel processing system with a

shared-everything approach:• The term signifies that each processor shares the

main memory and possibly the cache

• Typically a SMP can have 2 to 256 processors• Examples include AMD Athlon, AMD Opteron

200 and 2000 series, Intel XEON etc

17

Cluster Computers• A group of PCs or workstations or Macs (called nodes)

connected to each other via a fast (and private) interconnect: • Each node is an independent computer

• Each cluster has one head-node and multiple compute-nodes:• Users logon to head-node and start parallel jobs on compute-

nodes

• Such cluster can be made with Commodity-Off-The-Shelf (COTS) components: • A major breakthrough in HPC was the adoption of commodity

clusters: • Economics• Fast interconnects like Myrinet, Infiniband, Quadrics

• Two popular cluster classifications: • Beowulf Clusters (http://www.beowulf.org)• Rocks Clusters (http://www.rocksclusters.org)

http://www.beowulf.org/

http://www.rocksclusters.org/

18

Proc 6

Proc 0

Proc 1

Proc 3

Proc 2

Proc 4

Proc 5

Proc 7

message

CPU

Memory LANEthernetMyrinet

Infiniband etc

Cluster Computer

19

Beowulf History• At the most fundamental level, when two or more

computers are used together to solve a problem, it is considered a cluster

• In 1993, Donald Becker and Thomas Sterling started sketching the details of commodity-based cluster system: • The aim was to come up with a cost-effective alternative to

large supercomputers

• The initial prototype was a cluster computer consisting of 16 DX4 processors connected by channel bonded Ethernet

• The idea was an instant success!• Largely due to economics• Open-source software like Linux, GNU compilers, PVM, and

MPI, were a major factor

20

Thomas Sterling with Naegling, Caltech's Beowulf Cluster

21

SMP and Multi-core clusters• Most modern commodity clusters have SMP

and/or multi-core nodes: • Processors not only communicate via interconnect,

but shared memory programming is also required

• This trend is likely to continue: • Even a new name “constellations” has been

proposed

22

Distributed Memory• Each processor has its own local memory• Processors communicate with each other via an

interconnect

23

Shared Memory• All processors have access to shared memory:

• Notion of “Global Address Space”

24

Hybrid• Modern clusters have hybrid architecture:

• Distributed memory for inter-node (between nodes) communications

• Shared memory for intra-node (within a node) communications

25

The TOP500• The TOP500 project was started in 1993:

• Aim is to provide a reliable basis for tracking and detecting trends in HPC

• Twice a year, a list of the sites operating the 500 most powerful computer systems is assembled and released

• The best performance on the Linpack benchmark is used as performance measure for ranking the computer systems

• The latest list was released at Supercomputing 2006 held at Tampa Florida

• The fastest supercomputer is IBM Blue Gene/L at Lawrence Livermore National Lab (LLNL):• Theoretical peak performance: 280.6 TeraFLOPS • Number of Processors: 131072• Main memory: 32768 GB

27

The Top 51. DOE/NNSA/LLNL United States

• BlueGene/L - eServer Blue Gene Solution IBM

2. NNSA/Sandia National Laboratories United States• Red Storm - Sandia/ Cray Red Storm, Opteron 2.4 GHz dual

core Cray Inc.

3. IBM Thomas J. Watson Research Center United States• BGW - eServer Blue Gene Solution IBM

4. DOE/NNSA/LLNL United States• ASC Purple - eServer pSeries p5 575 1.9 GHz IBM

5. Barcelona Supercomputing Center Spain• MareNostrum - BladeCenter JS21 Cluster, PPC 970, 2.3 GHz,

Myrinet IBM

28

The Top 100 on Google Maps

29




30

Writing Parallel Software• There are mainly two approaches for writing parallel

software: • Software that can be executed on parallel hardware to exploit

computational and memory resources

• The first approach is to use libraries (packages) written in already existing languages like C, Fortran, and Java: • Economical • These libraries provide primitives (methods) like send() and recv() for communicating data

• The second and more radical approach is to provide new languages: • HPC has a history of novel parallel languages• These languages provide high level parallelism constructs:

• What is a construct?

31

Library-based Approach• One school of thought is to provide parallelism by

providing message passing between processors• Such libraries are based on the idea of supporting

parallelism in traditional languages like C and Fortran, • Obvious social advantages

• Two popular messaging approaches:• Parallel Virtual Machine (PVM) • Message Passing Interface (MPI)

• Other messaging libraries:• Message Passing Toolkit (MPT)• SHared MEMory (SHMEM) …

• The Message Passing Interface (MPI) has become a de facto standard for writing HPC applications

32

Message Passing Interface (MPI)• MPI is a standard (an interface or an API):

• It defines a set of methods that are used by application developers to write their applications

• MPI library implement these methods• MPI itself is not a library—it is a specification document that is followed!

• Reasons for popularity:• Software and hardware vendors were involved• Significant contribution from academia• MPICH served as an early reference implementation • MPI compilers are simply wrappers to widely used C and Fortran compilers

• MPI is a success story:• It is the mostly adopted programming paradigm of IBM Blue Gene systems

• At least two production-quality MPI libraries:• MPICH2 (http://www-unix.mcs.anl.gov/mpi/mpich2/)• OpenMPI (http://open-mpi.org)

• There’s even a Java library: • MPJ Express (http://mpj-express.org)

http://www-unix.mcs.anl.gov/mpi/mpich2/

http://open-mpi.org/

http://mpj-express.org/

33

Language-based Approach

• There is a long history of novel parallel programming languages:

• The central idea is to support parallelism by providing easy-to-use constructs

• Social aspects to HPC languages:• Dialect or superset of existing languages• Completely new HPC languages - an ambitious approach

• What happens to legacy code?• Conceptually most HPC languages can be categorized

as:• Shared memory languages:

• Mainly for programming on shared memory platforms like SMP

• Partitioned Global Address Space (PGAS) languages:• Mainly for distributed memory HPC platforms

• Distributed memory languages:• Mainly for distributed memory HPC platforms

34

Shared Memory Languages• Designed to support parallel programming on

shared memory platforms:• OpenMP:

• Consists of a set of compiler directives, library routines, and environment variables

• The runtime uses fork-join model of parallel execution

• Cilk:• A design goal was to support asynchronous parallelism• A set of keywords:

• cilk, spawn, sync …

• POSIX Threads (PThreads)

35

Partitioned Global Address Space (PGAS) Languages

• A PGAS is an abstraction that logically divide a process’ address space into two halves:

• Private

• Shared

• Follow the so-called Distributed Shared Memory (DSM) model• Unified Parallel C (UPC):

• We discuss it in detail later

• Titanium:• A Java dialect

• Co-Array Fortran:• Support for co-arrays

36

Distributed Memory Languages

• These purely DM languages support HPC on distributed memory platforms

• High Performance Fortran (HPF):• Data parallelism• An effort to standardize a family of data parallel

Fortran languages

• Fortran M:• Ensured deterministic execution• Added message passing extensions to Fortran 77

• HPJava:• Motivated by HPF

37

MPI

SHMEM

Languages based on Global Address Space

Languages based on Directives

Languages based on Library

C Fortran Java

UPC CoArray Fortran Titanium

HPF

OpenMP

GPMEM

PVM

X10

Languages driven by HPCS

Fortress Chapel

libraries Language extension

A Different Aspect

Runtime level

Credit: Hong Ong, Oak Ridge National Laboratory

38

US High Productivity Computing Systems

• Aims:• To produce systems that double in productivity and value every

18 months• Decrease time-to-solution:

• Development time• Execution time

• Research:• In SW and HW technology:

• New Programming Languages

• Quantifying productivity

• Funding stages:• Three vendors are involved: Sun, IBM, and Cray

• Three new programming languages:• X10, Chapel, and Fortress

Introduction to Parallel Computing

Documents

Transcript of Introduction to Parallel Computing