Csa 05

7/31/2019 Csa 05

1/117

Chapter 5

Parallel Processing

1

7/31/2019 Csa 05

2/117

Multiple Processor Organization

Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD

Multiple instruction, single data stream - MISD

Multiple instruction, multiple data stream-MIMD

2

7/31/2019 Csa 05

3/117

SISD

Single processor executes a single instruction stream to

operate on a data stored in single memory

Uni-processor

SIMD

Single machine instruction controls simultaneous

execution of number of processing elements

Each instruction is executed on different set of data by

different processors

Vector and array processors

3

7/31/2019 Csa 05

4/117

MISD

Sequence of data transmitted to set of processors

Each processor executes different instruction sequence

Never been implemented

MIMD

Set of processors simultaneously execute different

instruction sequences on different sets of data

SMPs, clusters and NUMA systems

4

7/31/2019 Csa 05

5/117

MIMD - Overview

General purpose processors

Each can process all instructions necessary

Further classified by method of processor

communication

5

7/31/2019 Csa 05

6/117

Taxonomy of Parallel Processor Architectures

6

7/31/2019 Csa 05

7/117

Tightly Coupled - SMP Processors share memory

Communicate via the shared memory

Symmetric Multiprocessor (SMP)

Share single memory or pool of memory

Shared bus to access memory

Memory access time to given area of memory is

approximately the same for each processor

NUMA Non uniform memory access

Access times to different regions of memory may differ

7

7/31/2019 Csa 05

8/117

Loosely Coupled - Clusters

Collection of independent uniprocessors orSMPs interconnected to form a cluster

Communication via fixed path or network

connections

8

7/31/2019 Csa 05

9/117

Parallel OrganizationsSISD

SIMD

9

7/31/2019 Csa 05

10/117

MIMD (Shared Memory)

10

7/31/2019 Csa 05

11/117

MIMD (Distributed Memory)

11

7/31/2019 Csa 05

12/117

Symmetric Multiprocessors A stand alone computer with the following

characteristics Two or more similar processors of comparable capacity

Processors share same memory and are connected by a

bus or other internal connection such that memoryaccess time is approximately the same for each processor

All processors share access to I/O

All processors can perform the same functions

(symmetric)

System controlled by integrated operating system

Provide interaction between processors and their programs

12

7/31/2019 Csa 05

13/117

Multiprogramming and Multiprocessing

13

7/31/2019 Csa 05

14/117

SMP Advantages

Performance

If some work can be done in parallel Availability

Since all processors can perform the same functions,

failure of a single processor does not halt the system Incremental growth

Increase performance by adding additional processors

Scaling Vendors can offer range of products based on number of

processors

14

7/31/2019 Csa 05

15/117

Block Diagram of Tightly Coupled Multiprocessor

15

7/31/2019 Csa 05

16/117

Time Shared Bus Common organization used and it is simple

Structure and interface similar to single processorsystem

Following features provided

Addressing - distinguish modules on bus to determine

source and destination

Arbitration - any module can be temporary master

Time sharing - if one module has the bus, others mustwait and may have to suspend

16

7/31/2019 Csa 05

17/117

Symmetric Multiprocessor Organization

17

7/31/2019 Csa 05

18/117

Time Share Bus - Advantages Simplicity

Simplest approach for multiprocessor organization

Flexibility

Easy to expand the system by attaching more processors

to the bus.

Reliability

Bus is a passive medium, and the failure of any attached

device should not cause failure of the whole system

18

7/31/2019 Csa 05

19/117

Time Shared Bus - Disadvantage

Performance Limited by bus cycle time because all references pass

through the bus

Each processor should have local cache

Reduce number of bus accesses

Leads to problems with cache coherence

Cache is altered in one processor and it has to be

informed to other processor cache also

19

V t C t ti

7/31/2019 Csa 05

20/117

Vector Computation Maths problems involving physical processes is difficult

for computation

Aerodynamics, seismology, meteorology, atomic, nuclear Continuous field simulation

High precision repeated floating point calculations onlarge arrays of numbers

Supercomputers handle these types of problem Hundreds of millions of floating point operations

$10-15 million

Optimised for calculation

Limited market

Research, government agencies, meteorology

20

7/31/2019 Csa 05

21/117

Another system designed for vector computation -Arrayprocessor

Alternative to supercomputer

Configured as peripherals to mainframe & minicomputers

Just run vector portion of problems

21

7/31/2019 Csa 05

22/117

Vector Addition Example

22

7/31/2019 Csa 05

23/117

Processor Designs

Pipelined ALU Decomposing of floating point operations into stages

Different stages can operate on different sets of data llly

Can be further enhanced if the vector elements are

available in registers rather than from main memory

Within operations

Across operations

23

A h t V t C t ti

7/31/2019 Csa 05

24/117

Approaches to Vector Computation

24

Ch i i

7/31/2019 Csa 05

25/117

Chaining

Cray Supercomputers

Vector operation may start as soon as first elementof operand vector available and functional unit is

free

Result from one functional unit is fed immediatelyinto another

If vector registers used, intermediate results do not

have to be stored in memory

25

7/31/2019 Csa 05

26/117

Parallel ALUs

Parallel processors

break the task up into multiple processes to be executed

in parallel

effective only if the software and hardware for effective

coordination of parallel processors

26

7/31/2019 Csa 05

27/117

Operating System Support

27

7/31/2019 Csa 05

28/117

OS

OS is a program that controls the execution of

application programs and acts as an interface

between the user and the hardware

Manages the computers resources,

Provides services for programmers, and

Schedules the execution of other programs.

28

7/31/2019 Csa 05

29/117

Objectives and Functions

Convenience Making the computer easier to use

Efficiency

Allowing better use of computer resources

29

L d Vi f C t S t

7/31/2019 Csa 05

30/117

Layers and Views of a Computer System

30

O i S S i

7/31/2019 Csa 05

31/117

Operating System Services Program creation

Program execution Access to I/O devices

Controlled access to files

System access

Error detection and response

Accounting

31

O/S as a Resource Manager

7/31/2019 Csa 05

32/117

O/S as a Resource Manager

32

7/31/2019 Csa 05

33/117

Types of Operating System Interactive

Batch

Single program (Uni-programming)

Multi-programming (Multi-tasking)

33

7/31/2019 Csa 05

34/117

Early Systems

Late 1940s to mid 1950s

No Operating System

Programs interact directly with hardware

Two main problems:

Scheduling

Setup time

34

7/31/2019 Csa 05

35/117

Simple Batch Systems

Resident Monitor program Users submit jobs to operator who batches jobs

Monitor controls sequence of events to process

batch When one job is finished, control returns to

Monitor which reads next job

Monitor handles scheduling

35

7/31/2019 Csa 05

36/117

Memory Layout for Resident Monitor

36

7/31/2019 Csa 05

37/117

Desirable Hardware Features Memory protection

To protect the Monitor

Timer

To prevent a job using the system

Privileged instructions

Only executed by Monitor

e.g. I/O

Interrupts

Allows regaining control from user program

37

7/31/2019 Csa 05

38/117

Multi-programmed Batch Systems

I/O devices very slow

When one program is waiting for I/O, another

can use the CPU

38

7/31/2019 Csa 05

39/117

Single Program

39

l h

7/31/2019 Csa 05

40/117

Multi-Programming with

Two Programs

40

l h

7/31/2019 Csa 05

41/117

Multi-Programming with

Three Programs

41

7/31/2019 Csa 05

42/117

Time Sharing Systems

Allow users to interact directly with the

computer

i.e. Interactive

Multi-programming allows a number of users

to interact with the computer

42

7/31/2019 Csa 05

43/117

Scheduling

Key to multi-programming

Types

Long term

Medium term

Short term

I/O

43

7/31/2019 Csa 05

44/117

Long Term Scheduling

Determines which programs are submitted for

processing

i.e. controls the degree of multi-programming

Once submitted, a job becomes a process for the

short term scheduler

44

7/31/2019 Csa 05

45/117

Medium Term Scheduling

Part of the swapping function

Usually based on the need to manage multi-

programming

If no virtual memory, memory management is also

an issue

45

7/31/2019 Csa 05

46/117

Short Term Scheduling

Also known as Dispatcher

Fine grained decisions of which job to execute next

Which job actually gets to use the processor in the

next time slot

46

7/31/2019 Csa 05

47/117

Five State Process Model

47

PCB Diagram

7/31/2019 Csa 05

48/117

PCB Diagram

48

Scheduling Example

7/31/2019 Csa 05

49/117

Scheduling Example

49

Key Elements involved in scheduling

7/31/2019 Csa 05

50/117

Key Elements involved in scheduling

50

Process Scheduling

7/31/2019 Csa 05

51/117

Process Scheduling

51

7/31/2019 Csa 05

52/117

Reduced Instruction Set Computers

52

Major Advances in Computers(1)

7/31/2019 Csa 05

53/117

Major Advances in Computers(1) The family concept

IBM System/360 in 1964

DEC PDP-8

Microporgrammed control unit

Idea by Wilkes in 1951

Produced by IBM S/360 in 1964

Cache memory

IBM S/360 model 85 in 1969

Pipelining Introduces parallelism into fetch execute cycle

Multiple processors

53

h

7/31/2019 Csa 05

54/117

The Next Step - RISC

Reduced Instruction Set Computer

Key features

Large number of general purpose registers

Limited and simple instruction set

Emphasis on optimising the instruction pipeline

54

Comparison of processors

7/31/2019 Csa 05

55/117

Comparison of processors

55

Driving force for CISC

7/31/2019 Csa 05

56/117

Driving force for CISC Software costs far exceed hardware costs

Increasingly complex high level languages Semantic gap

difference between the operations provided in HLLs and

those provided in computer architecture.

Leads to:

Large instruction sets

More addressing modes

CASE machine instruction on the VAX

e.g. CASE (switch) on VAX

56

E i Ch i i

7/31/2019 Csa 05

57/117

Execution Characteristics

Studies have been done to determine the

characteristics of execution of machine instructionsgenerated from HLL programs

Different approach: namely, to make the

architecture that supports the HLL simpler

Operations performed

Operands used

Execution sequencing

57

O ti

7/31/2019 Csa 05

58/117

Operations Assignments - predominate

Movement of data is of high importance

Conditional statements (IF, LOOP)

Sequence control

Implemented in machine language

Procedure call-return is very time consuming

58

7/31/2019 Csa 05

59/117

59

Operands

7/31/2019 Csa 05

60/117

Operands Mainly local scalar variables

Optimisation should concentrate on accessing localvariables

60

7/31/2019 Csa 05

61/117

Procedure Calls

Very time consuming

Depends on number of parameters passed

Depends on level of nesting

Most programs do not do a lot of calls followed bylots of returns

61

I li i

7/31/2019 Csa 05

62/117

Implications Attempt to make the instruction set architecture

close to HLLs is not the most effective

Best support is given by optimising most used and

most time consuming features

Large number of registers

Careful design of pipelines

Simplified (reduced) instruction set

62

Wh CISC (1)?

7/31/2019 Csa 05

63/117

Why CISC (1)?

Compiler simplification?

Complex machine instructions harder to exploit

Optimization more difficult

Smaller programs?

Program takes up less memory but

Memory is now cheap

May not occupy less bits, just look shorter in

symbolic form

63

7/31/2019 Csa 05

64/117

Why CISC (2)?

instruction execution would be faster.

More complex control unit

Microprogram control store larger

It is far from clear that a trend to increasingly

complex instruction sets is appropriate

64

RISC Characteristics

7/31/2019 Csa 05

65/117

RISC Characteristics

One instruction per cycle

Register to register operations

Few, simple addressing modes

Few, simple instruction formats

65

7/31/2019 Csa 05

66/117

RISC v CISC

Not clear cut

Many designs borrow from both philosophies

e.g. PowerPC and Pentium II

66

RISC Pipelining

7/31/2019 Csa 05

67/117

RISC Pipelining

Most instructions are register to register

Two phases of execution I: Instruction fetch

E: Execute

ALU operation with register input and output For load and store

I: Instruction fetch

E: Execute Calculate memory address

D: Memory

Register to memory or memory to register operation67

Effects of Pipelining

7/31/2019 Csa 05

68/117

p g

68

Optimization of Pipelining

7/31/2019 Csa 05

69/117

Optimization of Pipelining Delayed branch

makes use of a branch that does not take effect until after

execution of the following instruction

Delayed Load Register to be target is locked by processor

Continue execution of instruction stream until register required

Idle until load complete Re-arranging instructions can allow useful work

Loop Unrolling Replicate body of loop a number of times

Reduces loop overhead

Increases instruction parallelism

Improved register, data cache or TLB locality

69

Delayed branch

7/31/2019 Csa 05

70/117

e ayed b a c

70

Use of Delayed Branch

7/31/2019 Csa 05

71/117

y

71

7/31/2019 Csa 05

72/117

72

Controversy

7/31/2019 Csa 05

73/117

y Quantitative

compare program sizes and execution speeds

Qualitative

examine issues of high level language support

Problems

No pair of RISC and CISC that are directly comparable

No definitive set of test programs

Most comparisons done on toy machines rather than

production machines Most commercial devices are a mixture

73

7/31/2019 Csa 05

74/117

Control Unit Operation

74

Micro-Operations

7/31/2019 Csa 05

75/117

p

A computer executes a program

Fetch/execute cycle Each cycle has a number of steps

pipelining

Called micro-operations Each step does very little

75

Constituent Elements of Program Execution

7/31/2019 Csa 05

76/117

Constituent Elements of Program Execution

76

Fetch - 4 Registers

7/31/2019 Csa 05

77/117

g Memory Address Register (MAR)

Connected to address bus

Specifies address for read or write op

Memory Buffer Register (MBR)

Connected to data bus

Holds data to write or last data read

Program Counter (PC)

Holds address of next instruction to be fetched

Instruction Register (IR)

Holds last instruction fetched

77

Fetch Sequence

7/31/2019 Csa 05

78/117

q Address of next instruction is in PC and it is moved to MAR

Control unit issues READ command

Result (data from memory) appears on data bus

Data from data bus copied into MBR

PC incremented by 1 (in parallel with data fetch from

memory) Data (instruction) moved from MBR to IR

78

Fetch Sequence (symbolic)

7/31/2019 Csa 05

79/117

q ( y ) consists of three steps and four micro operations

second and third micro-operations both takeplace during the second time unit

79

Rules for groupings of micro-operations

7/31/2019 Csa 05

80/117

Proper sequence must be followed

MAR

7/31/2019 Csa 05

81/117

y MBR contains an address

IR is now in same state as if direct addressing hadbeen used

81

Interrupt Cycle

7/31/2019 Csa 05

82/117

This is a minimum May be additional micro-ops to get addresses

Saving context is done by interrupt handler routine, not

micro-ops

82

Execute Cycle (ADD)

7/31/2019 Csa 05

83/117

Different for each instruction

e.g. ADD R1,X - add the contents of location X toRegister 1 , result in R1

83

Instruction Cycle

7/31/2019 Csa 05

84/117

Each phase decomposed into sequence ofelementary micro-operations

E.g. fetch, indirect, and interrupt cycles Assume new 2-bit register

Instruction cycle code (ICC) designates which part of

cycle processor is in 00: Fetch

01: Indirect

10: Execute

11: Interrupt

84

Flowchart for Instruction Cycle

7/31/2019 Csa 05

85/117

85

Functional Requirements

7/31/2019 Csa 05

86/117

Define basic elements of processor

Describe micro-operations processor performs Determine the functions that the control unit

must perform to cause the micro-operations

to be performed

86

B i El t f P

7/31/2019 Csa 05

87/117

Basic Elements of Processor

ALU

Registers

Internal data pahs

External data paths

Control Unit

87

T f Mi ti

7/31/2019 Csa 05

88/117

Types of Micro-operation

Transfer data between registers

Transfer data from register to external interface

Transfer data from external interface to register

Perform arithmetic or logical operations

88

F ti f C t l U it

7/31/2019 Csa 05

89/117

Functions of Control Unit

Sequencing Causing the CPU to step through a series of micro-

operations

Execution Causing the performance of each micro-op

This is done using Control Signals

89

Control Signalsl k

7/31/2019 Csa 05

90/117

Clock

This is how the control unit keeps time.

Instruction register Op-code for current instruction

Determines which micro-instructions are performed

Flags Status of CPU

Results of previous ALU operations

Control signals from control bus Interrupts

Acknowledgements

90

Model of Control Unit

7/31/2019 Csa 05

91/117

91

Control Signals - output

7/31/2019 Csa 05

92/117

Within CPU

Cause data movement Activate specific ALU functions

To control bus

To memory To I/O modules

92

Implementation

7/31/2019 Csa 05

93/117

Implementation

two categories:

Hardwired implementation

Microprogrammed implementation

In hardwired , the control unit is essentially a

combinational circuit.

Input logic signals are transformed into a set

of output logic signals, which are the control

signals

93

7/31/2019 Csa 05

94/117

CPU Structure and Function

94

CPU Structure

7/31/2019 Csa 05

95/117

CPU Structure

CPU must: Fetch instructions

Interpret instructions

Fetch data

Process data

Write data

95

CPU With Systems Bus

7/31/2019 Csa 05

96/117

96

CPU Internal Structure

7/31/2019 Csa 05

97/117

CPU Internal Structure

97

Registers

7/31/2019 Csa 05

98/117

Registers

CPU must have some working space (temporary

storage) called registers

Number and function vary between processor

Top level of memory hierarchy

Perform two roles:

User-visible registers

Control and status registers

98

User Visible Registers

7/31/2019 Csa 05

99/117


General Purpose

Data

Address

Condition Codes(Flags)

99


7/31/2019 Csa 05

100/117


May be true general purpose

May be restricted

May be used for data or addressing

Data

Accumulator

Addressing

Segment registers

Index registers

Stack pointer

100

7/31/2019 Csa 05

101/117

How Many GP Registers?

Between 8 - 32 Fewer = more memory references

More does not reduce memory references

How big? Large enough to hold full address

Large enough to hold full word

Often possible to combine two data registers

101

Condition Code Registers

7/31/2019 Csa 05

102/117

Condition Code Registers

Sets of individual bits

e.g. result of last operation was zero

Can be read (implicitly) by programs

e.g. Jump if zero

Can not be set by programs

102

Control & Status Registers

7/31/2019 Csa 05

103/117

Control & Status Registers

Program Counter

Instruction Decoding Register

Memory Address Register

Memory Buffer Register

103

Program Status Word

7/31/2019 Csa 05

104/117

Program Status Word A set of bits Includes Condition Codes

Sign

Zero

Carry

Equal

Overflow

Interrupt enable/disable Supervisor

104

Other Registers

7/31/2019 Csa 05

105/117

Other Registers

May have registers pointing to: Process control blocks

Interrupt Vectors

105

Example Register Organizations

7/31/2019 Csa 05

106/117

106

Indirect Cycle

7/31/2019 Csa 05

107/117

Indirect Cycle

May require memory access to fetch operands

Indirect addressing requires more memory

accesses

Can be thought of as additional instructionsubcycle

107

Data Flow (Instruction Fetch)

7/31/2019 Csa 05

108/117

( ) Depends on CPU design

Fetch

PC contains address of next instruction

Address moved to MAR

Address placed on address bus

Control unit requests memory read

Result placed on data bus, copied to MBR, then to IR

Meanwhile PC incremented by 1

108

Data Flow (Data Fetch)

7/31/2019 Csa 05

109/117

( )

IR is examined

If indirect addressing, indirect cycle is

performed

Right most N bits of MBR transferred to MAR

Control unit requests memory read

Result (address of operand) moved to MBR

109

Data Flow (Fetch Diagram)

7/31/2019 Csa 05

110/117

110

Data Flow (Indirect Diagram)

7/31/2019 Csa 05

111/117

111

Data Flow (Interrupt Diagram)

7/31/2019 Csa 05

112/117

112

Pipelining Fetch instruction

7/31/2019 Csa 05

113/117

Fetch instruction

Decode instruction

Calculate operands (i.e. EAs)

Fetch operands

Execute instructions Write result

Overlap these operations

113

Two Stage Instruction Pipeline

7/31/2019 Csa 05

114/117

Two Stage Instruction Pipeline

114

Timing Diagram

7/31/2019 Csa 05

115/117

115

Effect of a Conditional Branch Instruction

7/31/2019 Csa 05

116/117

116

Alternative Pipeline Depiction

7/31/2019 Csa 05

117/117

Csa 05

Documents

Transcript of Csa 05