1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What...
-
Upload
eileen-norman -
Category
Documents
-
view
228 -
download
0
Transcript of 1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What...
2
Outline
1. Introduction of parallel processing (1) What is parallel processing(2) Classification of parallel processing
2. Foundation of parallel computing (1) Parallel computational model: PRAM (Parallel Random Access Machine) (2) Analysis of parallel algorithms
3. Design of parallel algorithms (1) Basic techniques
(2) Divide-and-conquer 4. Current trend of parallel processing (1) Cluster computing (2) Grid computing
3
Reference
• J. JaJa, "An Introduction to parallel algorithm" Addison-Wesley Publishing Company, 1992.
Questions to [email protected]
4
Part 1 Introduction to parallel processing
Why Parallel Processing Problems with large computing complexity
• Computing hard problems (NP-complete problems): exponential computing time.
• Problems with large scale of input size: quantum chemistry, statistic mechanics,
relative physics, universal physics, fluid mechanics, biology, genetics engineering, …
For example, it costs 1 hour using the current computer to simulate the procedure
of 1 second reaction of protein molecule and water molecule. It costs
reaction.hour 1 of procedure thesimulate to years1014.1 8
5
Why parallel processing
Physical limitation of CPU computational power
In past 50 years, CPU was speeded up double every 2.5 years. But, there are a physical limitation. Light speed is
Therefore, the limitation of the number of CPU clocks is expected to be about 10GHz.
To solve computing hard problems
• Parallel processing
• DNA computer
• Quantum computer
…
.sec/103 8m
6
What is parallel processing
• Using a number of processors to process one task• Speeding up the processing by distributing it to the processors One problem
7
Classification of parallel computers
Two kinds of classification1. Flann’s Classification• SISD (Single Instruction stream, Single Data stream)• MISD (Multiple Instruction stream, Single Data stream)• SIMD (Single Instruction stream, Multiple Data stream)• MIMD (Multiple Instruction stream, Multiple Data stream)
2. Classification by memory status• Share memory• Distributed memory
8
Flynn’s classification(1)
SISD (Single Instruction Single Data) computer• Von Neuman’s one processor computer
Control Processor MemoryInstruction Stream
Data Stream
9
Flynn’s classification (2)
MISD (Multi Instructions Single Data) computer All processors share a common memory, have their own control devices and execute their own instructions on same data.
Memory
Control ProcessorInstruction
Stream
Data Stream
Control Processor
Instruction Stream
Control Processor
Instruction Stream
10
Flynn’s classification (3)
SIMD (Single Instructions Multi Data) computer• Processors execute the same instructions on different data • Operations of processors are synchronized by global clock.
Shared Memory or Inter- connection Nerwork
ProcessorData Stream
ControlProcessor
Instruction Stream
Processor
Data Stream
Data Stream
11
Flynn’s classification (4)
MIMD (Multi Instruction Multi Data) Computer • Processors have their own control devices, and execute
different instructions on different data.• Operations of processors are executed asynchronously
in most time. • It is also called as distributed computing system.
Shared Memory or Inter- connection Nerwork
ProcessorData Stream
Processor
ControlInstruction
Stream
Processor
Data Stream
Data Stream
ControlInstruction
Stream
ControlInstruction
Stream
12
Classification by memory types (1)
1. Parallel computer with a shared common memory • Communication based on shared memory For example, consider the case that processor i sends some data to processor j. First, processor i writes the data to the share memory, then processor j reads the data from the same address of the share memory.
Shared Memory
Processor
Processor
Processor
13
Classification by memory types (2)
Features of parallel computers with shared common memory
• Programming is easy.
• Exclusive control is necessary for the access to the same memory cell.
• Realization is difficult when the number of processors is large.
( Reason) The number of processors connected to shared memory is limited by
the physical factors such as the size and voltage of units, and the latency cause
d in memory accessing.
14
Classification by memory tapes (3)
2. Parallel computers with distributed memoryCommunication style is one to one based on interconnection network. For example, consider the case that processor i sends data to processor j. First, processor i issues a send command such as “processor j sends xxx to process i”, then processor j gets the data by a receiving command.
Inter- connection Nerwork
Processor Local Memory
Processor Local Memory
Processor Local Memory
15
Classification by memory types (4)
Features of parallel computers using distributed
memory • There are various architectures of interconnection
networks (Generally, the degree of connectivity is not large.) • Programming is difficult since comparing with shared
common memory the communication style is one to one. • It is easy to increase the number of processors.
16
Types of parallel computers with distributed memory
Complete connection type Any two processors are connected Features
Strong communication ability, but not practical (each processor has to be connected to many processors).
Mash connection type Processors are connected as a two-dimension lattice. Feartures - Connected to few processors. Easily to increase
the number of processors.– Large distance between processors: – Existence of processor communication bottleneck.
P1
P2 P5
P3 P4
P0,0 P0,1 P0,2 P0,3
P1,0 P1,1 P1,2 P1,3
P2,0 P2,1 P2,2 P2,3
P3,0 P3,1 P3,2 P3,3
n
17
Hypercube connection type
Processors connected as a hypercube (each processor has a binary number. (Processors are connected if only if one bit of their number are different. )
Features• Small distance between processors: log n.• Balanced communication load because of its symmetric structure.• Easy to increase the number of processors.
0100
0101
0110
0111
1100
1101
1110
1111
0000
0001
0010
0011
1000
1001
1010
1011
18
Other connected type Tree connection type, butterfly connection type, bus connection type. Criterion for selecting an inter-connection network • Small diameter (the largest distance between processors) for small communication delay. • Symmetric structure for easily increasing the number of processors.
The type of inter-connection network depends on application, ability of processors, upper bound of the number of processors and other factors.
The type of inter-connection network depends on application, ability of processors, upper bound of the number of processors and other factors.
19
Real parallel processing system (1)
Early days parallel computer (ILLIAC IV)• Built in 1972• SIMD type with distributed memory, consisting of 64 processors
P0,0 P0,1 P0,2 P0,3
P1,0 P1,1 P1,2 P1,3
P2,0 P2,1 P2,2 P2,3
P3,0 P3,1 P3,2 P3,3
Transformed mash connection type, equipped with common data bus, common control bus, and one controlUnit.
Transformed mash connection type, equipped with common data bus, common control bus, and one controlUnit.
20
Real parallel processing system (2)
Parallel computers in recent days• Shared common memory type
Workstation, SGI Origin2000 and other with 2-8 processors. • Distributed memory type
Name Maker Processor num
Processing type Network type
CM-2 TM 65536 SIMD hypercube
CM-5 TM 1024 MIMD fat tree
nCUBE2 NCUBE 8192 MIMD hypercube
iWarp CMU, Intel 64 MIMD 2D torus
Paragon Intel 4096 MIMD 2D torus
SP-2 IBM 512 MIMD HP switch
AP1000 Fujitsu 1024 MIMD 2D torus
SR2201 Hitachi 1024 MIMD crossbar
21
Real parallel processing system (3)
Deep blue• Developed by IBM for chess game only • Defeating the chess champion • Based on general parallel computer SP-2
memory
RS/6000 Bus Interface
Microchannel Bus
Deep blue chip
Deep blue chip
Deep blue chip( 8個)
RS/6000
node
memory
RS/6000
node
RS/6000
node
( 32 nodes)
Inter-connection network (Generalized hypercube)
memory memory
Architecture of nodes