Csa 05
Transcript of Csa 05
-
7/31/2019 Csa 05
1/117
Chapter 5
Parallel Processing
1
-
7/31/2019 Csa 05
2/117
Multiple Processor Organization
Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD
Multiple instruction, single data stream - MISD
Multiple instruction, multiple data stream-MIMD
2
-
7/31/2019 Csa 05
3/117
SISD
Single processor executes a single instruction stream to
operate on a data stored in single memory
Uni-processor
SIMD
Single machine instruction controls simultaneous
execution of number of processing elements
Each instruction is executed on different set of data by
different processors
Vector and array processors
3
-
7/31/2019 Csa 05
4/117
MISD
Sequence of data transmitted to set of processors
Each processor executes different instruction sequence
Never been implemented
MIMD
Set of processors simultaneously execute different
instruction sequences on different sets of data
SMPs, clusters and NUMA systems
4
-
7/31/2019 Csa 05
5/117
MIMD - Overview
General purpose processors
Each can process all instructions necessary
Further classified by method of processor
communication
5
-
7/31/2019 Csa 05
6/117
Taxonomy of Parallel Processor Architectures
6
-
7/31/2019 Csa 05
7/117
Tightly Coupled - SMP Processors share memory
Communicate via the shared memory
Symmetric Multiprocessor (SMP)
Share single memory or pool of memory
Shared bus to access memory
Memory access time to given area of memory is
approximately the same for each processor
NUMA Non uniform memory access
Access times to different regions of memory may differ
7
-
7/31/2019 Csa 05
8/117
Loosely Coupled - Clusters
Collection of independent uniprocessors orSMPs interconnected to form a cluster
Communication via fixed path or network
connections
8
-
7/31/2019 Csa 05
9/117
Parallel OrganizationsSISD
SIMD
9
-
7/31/2019 Csa 05
10/117
MIMD (Shared Memory)
10
-
7/31/2019 Csa 05
11/117
MIMD (Distributed Memory)
11
-
7/31/2019 Csa 05
12/117
Symmetric Multiprocessors A stand alone computer with the following
characteristics Two or more similar processors of comparable capacity
Processors share same memory and are connected by a
bus or other internal connection such that memoryaccess time is approximately the same for each processor
All processors share access to I/O
All processors can perform the same functions
(symmetric)
System controlled by integrated operating system
Provide interaction between processors and their programs
12
-
7/31/2019 Csa 05
13/117
Multiprogramming and Multiprocessing
13
-
7/31/2019 Csa 05
14/117
SMP Advantages
Performance
If some work can be done in parallel Availability
Since all processors can perform the same functions,
failure of a single processor does not halt the system Incremental growth
Increase performance by adding additional processors
Scaling Vendors can offer range of products based on number of
processors
14
-
7/31/2019 Csa 05
15/117
Block Diagram of Tightly Coupled Multiprocessor
15
-
7/31/2019 Csa 05
16/117
Time Shared Bus Common organization used and it is simple
Structure and interface similar to single processorsystem
Following features provided
Addressing - distinguish modules on bus to determine
source and destination
Arbitration - any module can be temporary master
Time sharing - if one module has the bus, others mustwait and may have to suspend
16
-
7/31/2019 Csa 05
17/117
Symmetric Multiprocessor Organization
17
-
7/31/2019 Csa 05
18/117
Time Share Bus - Advantages Simplicity
Simplest approach for multiprocessor organization
Flexibility
Easy to expand the system by attaching more processors
to the bus.
Reliability
Bus is a passive medium, and the failure of any attached
device should not cause failure of the whole system
18
-
7/31/2019 Csa 05
19/117
Time Shared Bus - Disadvantage
Performance Limited by bus cycle time because all references pass
through the bus
Each processor should have local cache
Reduce number of bus accesses
Leads to problems with cache coherence
Cache is altered in one processor and it has to be
informed to other processor cache also
19
V t C t ti
-
7/31/2019 Csa 05
20/117
Vector Computation Maths problems involving physical processes is difficult
for computation
Aerodynamics, seismology, meteorology, atomic, nuclear Continuous field simulation
High precision repeated floating point calculations onlarge arrays of numbers
Supercomputers handle these types of problem Hundreds of millions of floating point operations
$10-15 million
Optimised for calculation
Limited market
Research, government agencies, meteorology
20
-
7/31/2019 Csa 05
21/117
Another system designed for vector computation -Arrayprocessor
Alternative to supercomputer
Configured as peripherals to mainframe & minicomputers
Just run vector portion of problems
21
-
7/31/2019 Csa 05
22/117
Vector Addition Example
22
-
7/31/2019 Csa 05
23/117
Processor Designs
Pipelined ALU Decomposing of floating point operations into stages
Different stages can operate on different sets of data llly
Can be further enhanced if the vector elements are
available in registers rather than from main memory
Within operations
Across operations
23
A h t V t C t ti
-
7/31/2019 Csa 05
24/117
Approaches to Vector Computation
24
Ch i i
-
7/31/2019 Csa 05
25/117
Chaining
Cray Supercomputers
Vector operation may start as soon as first elementof operand vector available and functional unit is
free
Result from one functional unit is fed immediatelyinto another
If vector registers used, intermediate results do not
have to be stored in memory
25
-
7/31/2019 Csa 05
26/117
Parallel ALUs
Parallel processors
break the task up into multiple processes to be executed
in parallel
effective only if the software and hardware for effective
coordination of parallel processors
26
-
7/31/2019 Csa 05
27/117
Operating System Support
27
-
7/31/2019 Csa 05
28/117
OS
OS is a program that controls the execution of
application programs and acts as an interface
between the user and the hardware
Manages the computers resources,
Provides services for programmers, and
Schedules the execution of other programs.
28
-
7/31/2019 Csa 05
29/117
Objectives and Functions
Convenience Making the computer easier to use
Efficiency
Allowing better use of computer resources
29
L d Vi f C t S t
-
7/31/2019 Csa 05
30/117
Layers and Views of a Computer System
30
O i S S i
-
7/31/2019 Csa 05
31/117
Operating System Services Program creation
Program execution Access to I/O devices
Controlled access to files
System access
Error detection and response
Accounting
31
O/S as a Resource Manager
-
7/31/2019 Csa 05
32/117
O/S as a Resource Manager
32
-
7/31/2019 Csa 05
33/117
Types of Operating System Interactive
Batch
Single program (Uni-programming)
Multi-programming (Multi-tasking)
33
-
7/31/2019 Csa 05
34/117
Early Systems
Late 1940s to mid 1950s
No Operating System
Programs interact directly with hardware
Two main problems:
Scheduling
Setup time
34
-
7/31/2019 Csa 05
35/117
Simple Batch Systems
Resident Monitor program Users submit jobs to operator who batches jobs
Monitor controls sequence of events to process
batch When one job is finished, control returns to
Monitor which reads next job
Monitor handles scheduling
35
-
7/31/2019 Csa 05
36/117
Memory Layout for Resident Monitor
36
-
7/31/2019 Csa 05
37/117
Desirable Hardware Features Memory protection
To protect the Monitor
Timer
To prevent a job using the system
Privileged instructions
Only executed by Monitor
e.g. I/O
Interrupts
Allows regaining control from user program
37
-
7/31/2019 Csa 05
38/117
Multi-programmed Batch Systems
I/O devices very slow
When one program is waiting for I/O, another
can use the CPU
38
-
7/31/2019 Csa 05
39/117
Single Program
39
l h
-
7/31/2019 Csa 05
40/117
Multi-Programming with
Two Programs
40
l h
-
7/31/2019 Csa 05
41/117
Multi-Programming with
Three Programs
41
-
7/31/2019 Csa 05
42/117
Time Sharing Systems
Allow users to interact directly with the
computer
i.e. Interactive
Multi-programming allows a number of users
to interact with the computer
42
-
7/31/2019 Csa 05
43/117
Scheduling
Key to multi-programming
Types
Long term
Medium term
Short term
I/O
43
-
7/31/2019 Csa 05
44/117
Long Term Scheduling
Determines which programs are submitted for
processing
i.e. controls the degree of multi-programming
Once submitted, a job becomes a process for the
short term scheduler
44
-
7/31/2019 Csa 05
45/117
Medium Term Scheduling
Part of the swapping function
Usually based on the need to manage multi-
programming
If no virtual memory, memory management is also
an issue
45
-
7/31/2019 Csa 05
46/117
Short Term Scheduling
Also known as Dispatcher
Fine grained decisions of which job to execute next
Which job actually gets to use the processor in the
next time slot
46
-
7/31/2019 Csa 05
47/117
Five State Process Model
47
PCB Diagram
-
7/31/2019 Csa 05
48/117
PCB Diagram
48
Scheduling Example
-
7/31/2019 Csa 05
49/117
Scheduling Example
49
Key Elements involved in scheduling
-
7/31/2019 Csa 05
50/117
Key Elements involved in scheduling
50
Process Scheduling
-
7/31/2019 Csa 05
51/117
Process Scheduling
51
-
7/31/2019 Csa 05
52/117
Reduced Instruction Set Computers
52
Major Advances in Computers(1)
-
7/31/2019 Csa 05
53/117
Major Advances in Computers(1) The family concept
IBM System/360 in 1964
DEC PDP-8
Microporgrammed control unit
Idea by Wilkes in 1951
Produced by IBM S/360 in 1964
Cache memory
IBM S/360 model 85 in 1969
Pipelining Introduces parallelism into fetch execute cycle
Multiple processors
53
h
-
7/31/2019 Csa 05
54/117
The Next Step - RISC
Reduced Instruction Set Computer
Key features
Large number of general purpose registers
Limited and simple instruction set
Emphasis on optimising the instruction pipeline
54
Comparison of processors
-
7/31/2019 Csa 05
55/117
Comparison of processors
55
Driving force for CISC
-
7/31/2019 Csa 05
56/117
Driving force for CISC Software costs far exceed hardware costs
Increasingly complex high level languages Semantic gap
difference between the operations provided in HLLs and
those provided in computer architecture.
Leads to:
Large instruction sets
More addressing modes
CASE machine instruction on the VAX
e.g. CASE (switch) on VAX
56
E i Ch i i
-
7/31/2019 Csa 05
57/117
Execution Characteristics
Studies have been done to determine the
characteristics of execution of machine instructionsgenerated from HLL programs
Different approach: namely, to make the
architecture that supports the HLL simpler
Operations performed
Operands used
Execution sequencing
57
O ti
-
7/31/2019 Csa 05
58/117
Operations Assignments - predominate
Movement of data is of high importance
Conditional statements (IF, LOOP)
Sequence control
Implemented in machine language
Procedure call-return is very time consuming
58
-
7/31/2019 Csa 05
59/117
59
Operands
-
7/31/2019 Csa 05
60/117
Operands Mainly local scalar variables
Optimisation should concentrate on accessing localvariables
60
-
7/31/2019 Csa 05
61/117
Procedure Calls
Very time consuming
Depends on number of parameters passed
Depends on level of nesting
Most programs do not do a lot of calls followed bylots of returns
61
I li i
-
7/31/2019 Csa 05
62/117
Implications Attempt to make the instruction set architecture
close to HLLs is not the most effective
Best support is given by optimising most used and
most time consuming features
Large number of registers
Careful design of pipelines
Simplified (reduced) instruction set
62
Wh CISC (1)?
-
7/31/2019 Csa 05
63/117
Why CISC (1)?
Compiler simplification?
Complex machine instructions harder to exploit
Optimization more difficult
Smaller programs?
Program takes up less memory but
Memory is now cheap
May not occupy less bits, just look shorter in
symbolic form
63
-
7/31/2019 Csa 05
64/117
Why CISC (2)?
instruction execution would be faster.
More complex control unit
Microprogram control store larger
It is far from clear that a trend to increasingly
complex instruction sets is appropriate
64
RISC Characteristics
-
7/31/2019 Csa 05
65/117
RISC Characteristics
One instruction per cycle
Register to register operations
Few, simple addressing modes
Few, simple instruction formats
65
-
7/31/2019 Csa 05
66/117
RISC v CISC
Not clear cut
Many designs borrow from both philosophies
e.g. PowerPC and Pentium II
66
RISC Pipelining
-
7/31/2019 Csa 05
67/117
RISC Pipelining
Most instructions are register to register
Two phases of execution I: Instruction fetch
E: Execute
ALU operation with register input and output For load and store
I: Instruction fetch
E: Execute Calculate memory address
D: Memory
Register to memory or memory to register operation67
Effects of Pipelining
-
7/31/2019 Csa 05
68/117
p g
68
Optimization of Pipelining
-
7/31/2019 Csa 05
69/117
Optimization of Pipelining Delayed branch
makes use of a branch that does not take effect until after
execution of the following instruction
Delayed Load Register to be target is locked by processor
Continue execution of instruction stream until register required
Idle until load complete Re-arranging instructions can allow useful work
Loop Unrolling Replicate body of loop a number of times
Reduces loop overhead
Increases instruction parallelism
Improved register, data cache or TLB locality
69
Delayed branch
-
7/31/2019 Csa 05
70/117
e ayed b a c
70
Use of Delayed Branch
-
7/31/2019 Csa 05
71/117
y
71
-
7/31/2019 Csa 05
72/117
72
Controversy
-
7/31/2019 Csa 05
73/117
y Quantitative
compare program sizes and execution speeds
Qualitative
examine issues of high level language support
Problems
No pair of RISC and CISC that are directly comparable
No definitive set of test programs
Most comparisons done on toy machines rather than
production machines Most commercial devices are a mixture
73
-
7/31/2019 Csa 05
74/117
Control Unit Operation
74
Micro-Operations
-
7/31/2019 Csa 05
75/117
p
A computer executes a program
Fetch/execute cycle Each cycle has a number of steps
pipelining
Called micro-operations Each step does very little
75
Constituent Elements of Program Execution
-
7/31/2019 Csa 05
76/117
Constituent Elements of Program Execution
76
Fetch - 4 Registers
-
7/31/2019 Csa 05
77/117
g Memory Address Register (MAR)
Connected to address bus
Specifies address for read or write op
Memory Buffer Register (MBR)
Connected to data bus
Holds data to write or last data read
Program Counter (PC)
Holds address of next instruction to be fetched
Instruction Register (IR)
Holds last instruction fetched
77
Fetch Sequence
-
7/31/2019 Csa 05
78/117
q Address of next instruction is in PC and it is moved to MAR
Control unit issues READ command
Result (data from memory) appears on data bus
Data from data bus copied into MBR
PC incremented by 1 (in parallel with data fetch from
memory) Data (instruction) moved from MBR to IR
78
Fetch Sequence (symbolic)
-
7/31/2019 Csa 05
79/117
q ( y ) consists of three steps and four micro operations
second and third micro-operations both takeplace during the second time unit
79
Rules for groupings of micro-operations
-
7/31/2019 Csa 05
80/117
Proper sequence must be followed
MAR
-
7/31/2019 Csa 05
81/117
y MBR contains an address
IR is now in same state as if direct addressing hadbeen used
81
Interrupt Cycle
-
7/31/2019 Csa 05
82/117
This is a minimum May be additional micro-ops to get addresses
Saving context is done by interrupt handler routine, not
micro-ops
82
Execute Cycle (ADD)
-
7/31/2019 Csa 05
83/117
Different for each instruction
e.g. ADD R1,X - add the contents of location X toRegister 1 , result in R1
83
Instruction Cycle
-
7/31/2019 Csa 05
84/117
Each phase decomposed into sequence ofelementary micro-operations
E.g. fetch, indirect, and interrupt cycles Assume new 2-bit register
Instruction cycle code (ICC) designates which part of
cycle processor is in 00: Fetch
01: Indirect
10: Execute
11: Interrupt
84
Flowchart for Instruction Cycle
-
7/31/2019 Csa 05
85/117
85
Functional Requirements
-
7/31/2019 Csa 05
86/117
Define basic elements of processor
Describe micro-operations processor performs Determine the functions that the control unit
must perform to cause the micro-operations
to be performed
86
B i El t f P
-
7/31/2019 Csa 05
87/117
Basic Elements of Processor
ALU
Registers
Internal data pahs
External data paths
Control Unit
87
T f Mi ti
-
7/31/2019 Csa 05
88/117
Types of Micro-operation
Transfer data between registers
Transfer data from register to external interface
Transfer data from external interface to register
Perform arithmetic or logical operations
88
F ti f C t l U it
-
7/31/2019 Csa 05
89/117
Functions of Control Unit
Sequencing Causing the CPU to step through a series of micro-
operations
Execution Causing the performance of each micro-op
This is done using Control Signals
89
Control Signalsl k
-
7/31/2019 Csa 05
90/117
Clock
This is how the control unit keeps time.
Instruction register Op-code for current instruction
Determines which micro-instructions are performed
Flags Status of CPU
Results of previous ALU operations
Control signals from control bus Interrupts
Acknowledgements
90
Model of Control Unit
-
7/31/2019 Csa 05
91/117
91
Control Signals - output
-
7/31/2019 Csa 05
92/117
Within CPU
Cause data movement Activate specific ALU functions
To control bus
To memory To I/O modules
92
Implementation
-
7/31/2019 Csa 05
93/117
Implementation
two categories:
Hardwired implementation
Microprogrammed implementation
In hardwired , the control unit is essentially a
combinational circuit.
Input logic signals are transformed into a set
of output logic signals, which are the control
signals
93
-
7/31/2019 Csa 05
94/117
CPU Structure and Function
94
CPU Structure
-
7/31/2019 Csa 05
95/117
CPU Structure
CPU must: Fetch instructions
Interpret instructions
Fetch data
Process data
Write data
95
CPU With Systems Bus
-
7/31/2019 Csa 05
96/117
96
CPU Internal Structure
-
7/31/2019 Csa 05
97/117
CPU Internal Structure
97
Registers
-
7/31/2019 Csa 05
98/117
Registers
CPU must have some working space (temporary
storage) called registers
Number and function vary between processor
Top level of memory hierarchy
Perform two roles:
User-visible registers
Control and status registers
98
User Visible Registers
-
7/31/2019 Csa 05
99/117
User Visible Registers
General Purpose
Data
Address
Condition Codes(Flags)
99
User Visible Registers
-
7/31/2019 Csa 05
100/117
User Visible Registers
May be true general purpose
May be restricted
May be used for data or addressing
Data
Accumulator
Addressing
Segment registers
Index registers
Stack pointer
100
-
7/31/2019 Csa 05
101/117
How Many GP Registers?
Between 8 - 32 Fewer = more memory references
More does not reduce memory references
How big? Large enough to hold full address
Large enough to hold full word
Often possible to combine two data registers
101
Condition Code Registers
-
7/31/2019 Csa 05
102/117
Condition Code Registers
Sets of individual bits
e.g. result of last operation was zero
Can be read (implicitly) by programs
e.g. Jump if zero
Can not be set by programs
102
Control & Status Registers
-
7/31/2019 Csa 05
103/117
Control & Status Registers
Program Counter
Instruction Decoding Register
Memory Address Register
Memory Buffer Register
103
Program Status Word
-
7/31/2019 Csa 05
104/117
Program Status Word A set of bits Includes Condition Codes
Sign
Zero
Carry
Equal
Overflow
Interrupt enable/disable Supervisor
104
Other Registers
-
7/31/2019 Csa 05
105/117
Other Registers
May have registers pointing to: Process control blocks
Interrupt Vectors
105
Example Register Organizations
-
7/31/2019 Csa 05
106/117
106
Indirect Cycle
-
7/31/2019 Csa 05
107/117
Indirect Cycle
May require memory access to fetch operands
Indirect addressing requires more memory
accesses
Can be thought of as additional instructionsubcycle
107
Data Flow (Instruction Fetch)
-
7/31/2019 Csa 05
108/117
( ) Depends on CPU design
Fetch
PC contains address of next instruction
Address moved to MAR
Address placed on address bus
Control unit requests memory read
Result placed on data bus, copied to MBR, then to IR
Meanwhile PC incremented by 1
108
Data Flow (Data Fetch)
-
7/31/2019 Csa 05
109/117
( )
IR is examined
If indirect addressing, indirect cycle is
performed
Right most N bits of MBR transferred to MAR
Control unit requests memory read
Result (address of operand) moved to MBR
109
Data Flow (Fetch Diagram)
-
7/31/2019 Csa 05
110/117
110
Data Flow (Indirect Diagram)
-
7/31/2019 Csa 05
111/117
111
Data Flow (Interrupt Diagram)
-
7/31/2019 Csa 05
112/117
112
Pipelining Fetch instruction
-
7/31/2019 Csa 05
113/117
Fetch instruction
Decode instruction
Calculate operands (i.e. EAs)
Fetch operands
Execute instructions Write result
Overlap these operations
113
Two Stage Instruction Pipeline
-
7/31/2019 Csa 05
114/117
Two Stage Instruction Pipeline
114
Timing Diagram
-
7/31/2019 Csa 05
115/117
115
Effect of a Conditional Branch Instruction
-
7/31/2019 Csa 05
116/117
116
Alternative Pipeline Depiction
-
7/31/2019 Csa 05
117/117