Parallel Processing
By: Bela Desai Guided by: Anita Patel
What is parallel processing?
The simultaneous use of more than one CPU to execute a program is called parallel processing.
It works on the principle that large problems can often be divided into smaller ones, which are then solved concurrently i.e in parallel.
Serial computation
Parallel Computation
Flynn’s Classification
S I S D
Single Instruction, Single DataS I M D
Single Instruction, Multiple Data
M I S D
Multiple Instruction, Single Data
M I M D
Multiple Instruction, Multiple Data
Single Instruction, Single Data (SISD): A serial (non-parallel) computer. Single instruction. Single data. Examples: most PCs, single CPU
workstations and mainframes.
Single Instruction, Multiple Data (SIMD):
Multiple Instruction, Single Data (MISD):
Multiple Instruction, Multiple Data (MIMD):
Types Of Parallel Processing
Bit-level parallelism Instruction-level parallelism Data parallelism Task parallelism
Bit-level parallelism
Speed-up in computer architecture was driven by doubling computer word size—the amount of information the processor can execute per cycle.
Increasing the word size reduces the number of instructions the processor must execute to perform an operation on variables whose sizes are greater than the length of the word.
Instruction-level parallelism The instructions can be re-ordered and combined into
groups which are then executed in parallel without changing the result.
Processor with an N-stage pipeline can have up to N different instructions at different stages of completion.
The canonical example of a pipelined processor is a RISC processor, with five stages: instruction fetch, decode, execute, memory access, and write back. The Pentium 4 processor had a 35-stage pipeline.
Data parallelism 1: PREV2 := 0 2: PREV1 := 1 3: CUR := 1 4: do: 5: CUR := PREV1 + PREV2 6: PREV2 := PREV1 7: PREV1 := CUR 8: while (CUR < 10)
Task parallelism Entirely different calculations can be performed on either
the same or different sets of data. This contrasts with data parallelism, where the same
calculation is performed on the same or different sets of data. Task parallelism does not usually scale with the size of a problem.
Shared Memory
Multiple processors can operate independently but share the same memory resources.
Changes in a memory location effected by one processor are visible to all other processors.
Shared memory machines can be divided into two main classes based upon memory access times: UMA and NUMA
Uniform Memory Access (UMA): Most commonly represented today by Symmetric
Multiprocessor (SMP) machines.
Identical processors.
Equal access and access times to memory.
Sometimes called CC-UMA - Cache Coherent UMA.
Non-Uniform Memory Access (NUMA): Often made by physically linking two or more SMPs. One SMP can directly access memory of another
SMP. Not all processors have equal access time to all
memories. Memory access across link is slower If cache coherency is maintained, then may also be
called CC-NUMA - Cache Coherent NUMA
Distributed Memory
Hybrid Distributed-Shared Memory
Parallel Programming Models Shared Memory Model Thread Model Message Passing Model Data Parallel Model Hybrid
Shared memory model In the shared-memory programming model, tasks share a
common address space, which they read and write
asynchronously. An advantage of this model from the programmer's point of
view is that the notion of data "ownership" is lacking, so there is no need to specify explicitly the communication of data between tasks. Program development can often be simplified.
An important disadvantage in terms of performance is that it becomes more difficult to understand and manage data
locality.
Thread Model
POSIX thread
Library based; requires parallel coding Specified by the IEEE POSIX 1003.1c standard
(1995). C Language only Commonly referred to as Pthreads. Most hardware vendors now offer Pthreads in addition
to their proprietary threads implementations. Very explicit parallelism; requires significant
programmer attention to detail.
OpenMP
Compiler directive based; can use serial code Jointly defined and endorsed by a group of major
computer hardware and software vendors. The OpenMP Fortran API was released October 28, 1997. The C/C++ API was released in late 1998.
Portable / multi-platform, including Unix and Windows NT platforms
Available in C/C++ and Fortran implementations Can be very easy and simple to use - provides for
"incremental parallelism"
Message Passing Model A set of tasks that use their own local memory during
computation. Multiple tasks can reside on the same physical machine as well across an arbitrary number of machines.
Tasks exchange data through communications by sending and receiving messages.
Data transfer usually requires cooperative operations to be performed by each process. For example, a send operation must have a matching receive operation.
Message Passing Model
Data Parallel Model
Most of the parallel work focuses on performing operations on a data set. The data set is typically organized into a common structure, such as an array or cube.
A set of tasks work collectively on the same data structure, however, each task works on a different partition of the same data structure.
Tasks perform the same operation on their partition of work, for example, "add 4 to every array element".
Data Parallel Model
Hybrid1. Single Program Multiple Data (SPMD)
2.Multiple Program Multiple Data (MPMD)
Conclusion
With the help of parallel processing, highly complicated scientific problems that are otherwise extremely difficult to solve can be solved effectively. Parallel computing can be effectively used for tasks that involve a large number of calculations, have time constraints and can be divided into a number of smaller tasks.
Top Related