Download - Ppt Parallel Processing

Parallel Processing

By: Bela Desai Guided by: Anita Patel

What is parallel processing?

The simultaneous use of more than one CPU to execute a program is called parallel processing.

It works on the principle that large problems can often be divided into smaller ones, which are then solved concurrently i.e in parallel.

Serial computation

Parallel Computation

Flynn’s Classification

S I S D

Single Instruction, Single DataS I M D

Single Instruction, Multiple Data

M I S D

Multiple Instruction, Single Data

M I M D

Multiple Instruction, Multiple Data

Single Instruction, Single Data (SISD): A serial (non-parallel) computer. Single instruction. Single data. Examples: most PCs, single CPU

workstations and mainframes.

Single Instruction, Multiple Data (SIMD):

Multiple Instruction, Single Data (MISD):

Multiple Instruction, Multiple Data (MIMD):

Types Of Parallel Processing

Bit-level parallelism Instruction-level parallelism Data parallelism Task parallelism

Bit-level parallelism

Speed-up in computer architecture was driven by doubling computer word size—the amount of information the processor can execute per cycle.

Increasing the word size reduces the number of instructions the processor must execute to perform an operation on variables whose sizes are greater than the length of the word.

Instruction-level parallelism The instructions can be re-ordered and combined into

groups which are then executed in parallel without changing the result.

Processor with an N-stage pipeline can have up to N different instructions at different stages of completion.

The canonical example of a pipelined processor is a RISC processor, with five stages: instruction fetch, decode, execute, memory access, and write back. The Pentium 4 processor had a 35-stage pipeline.

Data parallelism 1: PREV2 := 0 2: PREV1 := 1 3: CUR := 1 4: do: 5: CUR := PREV1 + PREV2 6: PREV2 := PREV1 7: PREV1 := CUR 8: while (CUR < 10)

Task parallelism Entirely different calculations can be performed on either

the same or different sets of data. This contrasts with data parallelism, where the same

calculation is performed on the same or different sets of data. Task parallelism does not usually scale with the size of a problem.

Shared Memory

Multiple processors can operate independently but share the same memory resources.

Changes in a memory location effected by one processor are visible to all other processors.

Shared memory machines can be divided into two main classes based upon memory access times: UMA and NUMA

Uniform Memory Access (UMA): Most commonly represented today by Symmetric

Multiprocessor (SMP) machines.

Identical processors.

Equal access and access times to memory.

Sometimes called CC-UMA - Cache Coherent UMA.

Non-Uniform Memory Access (NUMA): Often made by physically linking two or more SMPs. One SMP can directly access memory of another

SMP. Not all processors have equal access time to all

memories. Memory access across link is slower If cache coherency is maintained, then may also be

called CC-NUMA - Cache Coherent NUMA

Distributed Memory

Hybrid Distributed-Shared Memory

Parallel Programming Models Shared Memory Model Thread Model Message Passing Model Data Parallel Model Hybrid

Shared memory model In the shared-memory programming model, tasks share a

common address space, which they read and write

asynchronously. An advantage of this model from the programmer's point of

view is that the notion of data "ownership" is lacking, so there is no need to specify explicitly the communication of data between tasks. Program development can often be simplified.

An important disadvantage in terms of performance is that it becomes more difficult to understand and manage data

locality.

Thread Model

POSIX thread

Library based; requires parallel coding Specified by the IEEE POSIX 1003.1c standard

(1995). C Language only Commonly referred to as Pthreads. Most hardware vendors now offer Pthreads in addition

to their proprietary threads implementations. Very explicit parallelism; requires significant

programmer attention to detail.

OpenMP

Compiler directive based; can use serial code Jointly defined and endorsed by a group of major

computer hardware and software vendors. The OpenMP Fortran API was released October 28, 1997. The C/C++ API was released in late 1998.

Portable / multi-platform, including Unix and Windows NT platforms

Available in C/C++ and Fortran implementations Can be very easy and simple to use - provides for

"incremental parallelism"

Message Passing Model A set of tasks that use their own local memory during

computation. Multiple tasks can reside on the same physical machine as well across an arbitrary number of machines.

Tasks exchange data through communications by sending and receiving messages.

Data transfer usually requires cooperative operations to be performed by each process. For example, a send operation must have a matching receive operation.

Message Passing Model

Data Parallel Model

Most of the parallel work focuses on performing operations on a data set. The data set is typically organized into a common structure, such as an array or cube.

A set of tasks work collectively on the same data structure, however, each task works on a different partition of the same data structure.

Tasks perform the same operation on their partition of work, for example, "add 4 to every array element".

Data Parallel Model

Hybrid1. Single Program Multiple Data (SPMD)

2.Multiple Program Multiple Data (MPMD)

Conclusion

With the help of parallel processing, highly complicated scientific problems that are otherwise extremely difficult to solve can be solved effectively. Parallel computing can be effectively used for tasks that involve a large number of calculations, have time constraints and can be divided into a number of smaller tasks.