Introduction to Parallel Computing
description
Transcript of Introduction to Parallel Computing
![Page 1: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/1.jpg)
1
Introduction to Parallel Computing
![Page 2: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/2.jpg)
2
Presentation Outline• Doing science and engineering using HPC• Basic concepts of parallel computing• Discussion of HPC hardware• Programming approaches (HPC software):
• Library-based approaches• Language-based approaches
• HPC facilities at NIIT
![Page 3: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/3.jpg)
3
High Performance Computing (HPC)• The prime focus of HPC is performance—the ability to
solve biggest possible problems in the least possible time
• Also called “Parallel Computing”: • The use of multiple processors, used in parallel, to solve an
application
• Normally such computing is used to solve challenging scientific problems by doing simulations: • For this reason, it is also called “Scientific Computing”:
• Computational science
• HPC is a highly specialized area:• Probably our best chance to work for world’s top research and
commercial organizations: • NASA, European Agency (ESA) …• Google is known to have immense computational power—the
quantity remains unknown!
![Page 4: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/4.jpg)
4
Doing science and engineering using HPC
• HPC is aiding to solve some of the most important problems in science today by pushing software and hardware technology to its limits
• Scientific Computing (or computational science) is the field of study concerned with:• Constructing mathematical models and numerical solution
techniques• Using computers to analyze and solve scientific and engineering
problems
• Applications areas: • Computer-aided Engineering• Weather forecast simulations• Animated movies (Hollywood!)• Image processing• Cryptography • Hurricane forecasts:
• Path as well intensity (Katrina)
![Page 5: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/5.jpg)
5
HPC driving science? • The Millennium Simulation:
• Computational Astrophysics• Heralded as “the” largest ever model
of the Universe• Follows the evolution of ten billion
“dark matter” particles• The simulation ran on a
supercomputer for almost a month
• The Blue Brain Project:• Computational Neuroscience• An effort to simulate the working of a
mammalian brain• One of the fastest supercomputers in
the world is used for the simulations
Arguably these projects cannot be done without HPC
![Page 6: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/6.jpg)
6
PAM CRASH—A Case Study from Automobile Industry
• PAM CRASH is parallel application for studying structural deformation, employed in simulations of automotive crashes and other situations:• An effective alternative to physical crashes, which are
expensive and time-consuming
• Modern simulations take into account millions of elements:• Such compute-intensive simulations can only be
studied on parallel hardware
• Automobile giants including Audi, BMW, Volkswagen and others are conducting crash simulations using PAM CRASH
![Page 7: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/7.jpg)
7
![Page 8: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/8.jpg)
8
Presentation Outline• Doing science and engineering using HPC• Basic concepts of parallel computing• Discussion of HPC hardware• Programming approaches (HPC software):
• Library-based approaches• Language-based approaches
• HPC facilities at NIIT
![Page 9: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/9.jpg)
9
Serial Computation • Traditionally, software has been written for
serial computation:• To be run on a single computer having a single
Central Processing Unit (CPU)• A problem is broken into a discrete series of
instructions
![Page 10: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/10.jpg)
10
Parallel Computation• Parallel computing is the simultaneous use of
multiple compute resources to solve a computational problem:• To be run using multiple CPUs• A problem is broken into discrete parts that can be
solved concurrently
![Page 11: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/11.jpg)
11
Flynn’s Taxonomy• There is no authoritative classification of parallel
computers! • Flynn’s taxonomy is one such classification based on
number of instruction and data stream processed by a parallel computer: • Single Instruction Single Data (SISD)• Multiple Instruction Single Data (MISD)• Single Instruction Multiple Data (SIMD)• Multiple Instruction Multiple Data (MIMD)
• Almost all modern computers fall in this category
![Page 12: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/12.jpg)
12
Flynn’s Taxonomy• Extensions to Flynn’s taxonomy:
• Single Program Multiple Data (SPMD)—a programming model
• This classification is largely outdated!
![Page 13: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/13.jpg)
13
Presentation Outline• Doing science and engineering using HPC• Basic concepts of parallel computing• Discussion of HPC hardware• Programming approaches (HPC software):
• Library-based approaches• Language-based approaches
• HPC facilities at NIIT
![Page 14: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/14.jpg)
14
HPC Hardware• Traditionally HPC has adopted expensive
parallel hardware: • Massively Parallel Processors (MPP)• Symmetric Multi-Processors (SMP)
• Cluster Computers: • A group of PCs connected through a fast (private)
network
• Other classifications:• Distributed Memory Machines• Shared Memory Machines
![Page 15: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/15.jpg)
15
Massively Parallel Processors (MPP)
• A large parallel processing computer with a shared-nothing approach: • The term signifies that each computer has its own
cache and memory
• Examples include Cray XT3, T3E, T3D, IBM SP/2
![Page 16: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/16.jpg)
16
Symmetric Multi-Processors (SMP)• A SMP is a parallel processing system with a
shared-everything approach:• The term signifies that each processor shares the
main memory and possibly the cache
• Typically a SMP can have 2 to 256 processors• Examples include AMD Athlon, AMD Opteron
200 and 2000 series, Intel XEON etc
![Page 17: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/17.jpg)
17
Cluster Computers• A group of PCs or workstations or Macs (called nodes)
connected to each other via a fast (and private) interconnect: • Each node is an independent computer
• Each cluster has one head-node and multiple compute-nodes:• Users logon to head-node and start parallel jobs on compute-
nodes
• Such cluster can be made with Commodity-Off-The-Shelf (COTS) components: • A major breakthrough in HPC was the adoption of commodity
clusters: • Economics• Fast interconnects like Myrinet, Infiniband, Quadrics
• Two popular cluster classifications: • Beowulf Clusters (http://www.beowulf.org)• Rocks Clusters (http://www.rocksclusters.org)
![Page 18: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/18.jpg)
18
Proc 6
Proc 0
Proc 1
Proc 3
Proc 2
Proc 4
Proc 5
Proc 7
message
CPU
Memory LANEthernetMyrinet
Infiniband etc
Cluster Computer
![Page 19: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/19.jpg)
19
Beowulf History• At the most fundamental level, when two or more
computers are used together to solve a problem, it is considered a cluster
• In 1993, Donald Becker and Thomas Sterling started sketching the details of commodity-based cluster system: • The aim was to come up with a cost-effective alternative to
large supercomputers
• The initial prototype was a cluster computer consisting of 16 DX4 processors connected by channel bonded Ethernet
• The idea was an instant success!• Largely due to economics• Open-source software like Linux, GNU compilers, PVM, and
MPI, were a major factor
![Page 20: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/20.jpg)
20
Thomas Sterling with Naegling, Caltech's Beowulf Cluster
![Page 21: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/21.jpg)
21
SMP and Multi-core clusters• Most modern commodity clusters have SMP
and/or multi-core nodes: • Processors not only communicate via interconnect,
but shared memory programming is also required
• This trend is likely to continue: • Even a new name “constellations” has been
proposed
![Page 22: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/22.jpg)
22
Distributed Memory• Each processor has its own local memory• Processors communicate with each other via an
interconnect
![Page 23: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/23.jpg)
23
Shared Memory• All processors have access to shared memory:
• Notion of “Global Address Space”
![Page 24: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/24.jpg)
24
Hybrid• Modern clusters have hybrid architecture:
• Distributed memory for inter-node (between nodes) communications
• Shared memory for intra-node (within a node) communications
![Page 25: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/25.jpg)
25
The TOP500• The TOP500 project was started in 1993:
• Aim is to provide a reliable basis for tracking and detecting trends in HPC
• Twice a year, a list of the sites operating the 500 most powerful computer systems is assembled and released
• The best performance on the Linpack benchmark is used as performance measure for ranking the computer systems
• The latest list was released at Supercomputing 2006 held at Tampa Florida
• The fastest supercomputer is IBM Blue Gene/L at Lawrence Livermore National Lab (LLNL):• Theoretical peak performance: 280.6 TeraFLOPS • Number of Processors: 131072• Main memory: 32768 GB
![Page 26: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/26.jpg)
26
![Page 27: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/27.jpg)
27
The Top 51. DOE/NNSA/LLNL United States
• BlueGene/L - eServer Blue Gene Solution IBM
2. NNSA/Sandia National Laboratories United States• Red Storm - Sandia/ Cray Red Storm, Opteron 2.4 GHz dual
core Cray Inc.
3. IBM Thomas J. Watson Research Center United States• BGW - eServer Blue Gene Solution IBM
4. DOE/NNSA/LLNL United States• ASC Purple - eServer pSeries p5 575 1.9 GHz IBM
5. Barcelona Supercomputing Center Spain• MareNostrum - BladeCenter JS21 Cluster, PPC 970, 2.3 GHz,
Myrinet IBM
![Page 28: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/28.jpg)
28
The Top 100 on Google Maps
![Page 29: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/29.jpg)
29
Presentation Outline• Doing science and engineering using HPC• Basic concepts of parallel computing• Discussion of HPC hardware• Programming approaches (HPC software):
• Library-based approaches• Language-based approaches
• HPC facilities at NIIT
![Page 30: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/30.jpg)
30
Writing Parallel Software• There are mainly two approaches for writing parallel
software: • Software that can be executed on parallel hardware to exploit
computational and memory resources
• The first approach is to use libraries (packages) written in already existing languages like C, Fortran, and Java: • Economical • These libraries provide primitives (methods) like send() and recv() for communicating data
• The second and more radical approach is to provide new languages: • HPC has a history of novel parallel languages• These languages provide high level parallelism constructs:
• What is a construct?
![Page 31: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/31.jpg)
31
Library-based Approach• One school of thought is to provide parallelism by
providing message passing between processors• Such libraries are based on the idea of supporting
parallelism in traditional languages like C and Fortran, • Obvious social advantages
• Two popular messaging approaches:• Parallel Virtual Machine (PVM) • Message Passing Interface (MPI)
• Other messaging libraries:• Message Passing Toolkit (MPT)• SHared MEMory (SHMEM) …
• The Message Passing Interface (MPI) has become a de facto standard for writing HPC applications
![Page 32: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/32.jpg)
32
Message Passing Interface (MPI)• MPI is a standard (an interface or an API):
• It defines a set of methods that are used by application developers to write their applications
• MPI library implement these methods• MPI itself is not a library—it is a specification document that is followed!
• Reasons for popularity:• Software and hardware vendors were involved• Significant contribution from academia• MPICH served as an early reference implementation • MPI compilers are simply wrappers to widely used C and Fortran compilers
• MPI is a success story:• It is the mostly adopted programming paradigm of IBM Blue Gene systems
• At least two production-quality MPI libraries:• MPICH2 (http://www-unix.mcs.anl.gov/mpi/mpich2/)• OpenMPI (http://open-mpi.org)
• There’s even a Java library: • MPJ Express (http://mpj-express.org)
![Page 33: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/33.jpg)
33
Language-based Approach
• There is a long history of novel parallel programming languages:
• The central idea is to support parallelism by providing easy-to-use constructs
• Social aspects to HPC languages:• Dialect or superset of existing languages• Completely new HPC languages - an ambitious approach
• What happens to legacy code?• Conceptually most HPC languages can be categorized
as:• Shared memory languages:
• Mainly for programming on shared memory platforms like SMP
• Partitioned Global Address Space (PGAS) languages:• Mainly for distributed memory HPC platforms
• Distributed memory languages:• Mainly for distributed memory HPC platforms
![Page 34: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/34.jpg)
34
Shared Memory Languages• Designed to support parallel programming on
shared memory platforms:• OpenMP:
• Consists of a set of compiler directives, library routines, and environment variables
• The runtime uses fork-join model of parallel execution
• Cilk:• A design goal was to support asynchronous parallelism• A set of keywords:
• cilk, spawn, sync …
• POSIX Threads (PThreads)
![Page 35: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/35.jpg)
35
Partitioned Global Address Space (PGAS) Languages
• A PGAS is an abstraction that logically divide a process’ address space into two halves:
• Private
• Shared
• Follow the so-called Distributed Shared Memory (DSM) model• Unified Parallel C (UPC):
• We discuss it in detail later
• Titanium:• A Java dialect
• Co-Array Fortran:• Support for co-arrays
![Page 36: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/36.jpg)
36
Distributed Memory Languages
• These purely DM languages support HPC on distributed memory platforms
• High Performance Fortran (HPF):• Data parallelism• An effort to standardize a family of data parallel
Fortran languages
• Fortran M:• Ensured deterministic execution• Added message passing extensions to Fortran 77
• HPJava:• Motivated by HPF
![Page 37: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/37.jpg)
37
MPI
SHMEM
Languages based on Global Address Space
Languages based on Directives
Languages based on Library
C Fortran Java
UPC CoArray Fortran Titanium
HPF
OpenMP
GPMEM
PVM
X10
Languages driven by HPCS
Fortress Chapel
libraries Language extension
A Different Aspect
Runtime level
Credit: Hong Ong, Oak Ridge National Laboratory
![Page 38: Introduction to Parallel Computing](https://reader036.fdocuments.in/reader036/viewer/2022062315/56814f5f550346895dbd1334/html5/thumbnails/38.jpg)
38
US High Productivity Computing Systems
• Aims:• To produce systems that double in productivity and value every
18 months• Decrease time-to-solution:
• Development time• Execution time
• Research:• In SW and HW technology:
• New Programming Languages
• Quantifying productivity
• Funding stages:• Three vendors are involved: Sun, IBM, and Cray
• Three new programming languages:• X10, Chapel, and Fortress