Csa 05

download Csa 05

of 117

Transcript of Csa 05

  • 7/31/2019 Csa 05

    1/117

    Chapter 5

    Parallel Processing

    1

  • 7/31/2019 Csa 05

    2/117

    Multiple Processor Organization

    Single instruction, single data stream - SISD Single instruction, multiple data stream - SIMD

    Multiple instruction, single data stream - MISD

    Multiple instruction, multiple data stream-MIMD

    2

  • 7/31/2019 Csa 05

    3/117

    SISD

    Single processor executes a single instruction stream to

    operate on a data stored in single memory

    Uni-processor

    SIMD

    Single machine instruction controls simultaneous

    execution of number of processing elements

    Each instruction is executed on different set of data by

    different processors

    Vector and array processors

    3

  • 7/31/2019 Csa 05

    4/117

    MISD

    Sequence of data transmitted to set of processors

    Each processor executes different instruction sequence

    Never been implemented

    MIMD

    Set of processors simultaneously execute different

    instruction sequences on different sets of data

    SMPs, clusters and NUMA systems

    4

  • 7/31/2019 Csa 05

    5/117

    MIMD - Overview

    General purpose processors

    Each can process all instructions necessary

    Further classified by method of processor

    communication

    5

  • 7/31/2019 Csa 05

    6/117

    Taxonomy of Parallel Processor Architectures

    6

  • 7/31/2019 Csa 05

    7/117

    Tightly Coupled - SMP Processors share memory

    Communicate via the shared memory

    Symmetric Multiprocessor (SMP)

    Share single memory or pool of memory

    Shared bus to access memory

    Memory access time to given area of memory is

    approximately the same for each processor

    NUMA Non uniform memory access

    Access times to different regions of memory may differ

    7

  • 7/31/2019 Csa 05

    8/117

    Loosely Coupled - Clusters

    Collection of independent uniprocessors orSMPs interconnected to form a cluster

    Communication via fixed path or network

    connections

    8

  • 7/31/2019 Csa 05

    9/117

    Parallel OrganizationsSISD

    SIMD

    9

  • 7/31/2019 Csa 05

    10/117

    MIMD (Shared Memory)

    10

  • 7/31/2019 Csa 05

    11/117

    MIMD (Distributed Memory)

    11

  • 7/31/2019 Csa 05

    12/117

    Symmetric Multiprocessors A stand alone computer with the following

    characteristics Two or more similar processors of comparable capacity

    Processors share same memory and are connected by a

    bus or other internal connection such that memoryaccess time is approximately the same for each processor

    All processors share access to I/O

    All processors can perform the same functions

    (symmetric)

    System controlled by integrated operating system

    Provide interaction between processors and their programs

    12

  • 7/31/2019 Csa 05

    13/117

    Multiprogramming and Multiprocessing

    13

  • 7/31/2019 Csa 05

    14/117

    SMP Advantages

    Performance

    If some work can be done in parallel Availability

    Since all processors can perform the same functions,

    failure of a single processor does not halt the system Incremental growth

    Increase performance by adding additional processors

    Scaling Vendors can offer range of products based on number of

    processors

    14

  • 7/31/2019 Csa 05

    15/117

    Block Diagram of Tightly Coupled Multiprocessor

    15

  • 7/31/2019 Csa 05

    16/117

    Time Shared Bus Common organization used and it is simple

    Structure and interface similar to single processorsystem

    Following features provided

    Addressing - distinguish modules on bus to determine

    source and destination

    Arbitration - any module can be temporary master

    Time sharing - if one module has the bus, others mustwait and may have to suspend

    16

  • 7/31/2019 Csa 05

    17/117

    Symmetric Multiprocessor Organization

    17

  • 7/31/2019 Csa 05

    18/117

    Time Share Bus - Advantages Simplicity

    Simplest approach for multiprocessor organization

    Flexibility

    Easy to expand the system by attaching more processors

    to the bus.

    Reliability

    Bus is a passive medium, and the failure of any attached

    device should not cause failure of the whole system

    18

  • 7/31/2019 Csa 05

    19/117

    Time Shared Bus - Disadvantage

    Performance Limited by bus cycle time because all references pass

    through the bus

    Each processor should have local cache

    Reduce number of bus accesses

    Leads to problems with cache coherence

    Cache is altered in one processor and it has to be

    informed to other processor cache also

    19

    V t C t ti

  • 7/31/2019 Csa 05

    20/117

    Vector Computation Maths problems involving physical processes is difficult

    for computation

    Aerodynamics, seismology, meteorology, atomic, nuclear Continuous field simulation

    High precision repeated floating point calculations onlarge arrays of numbers

    Supercomputers handle these types of problem Hundreds of millions of floating point operations

    $10-15 million

    Optimised for calculation

    Limited market

    Research, government agencies, meteorology

    20

  • 7/31/2019 Csa 05

    21/117

    Another system designed for vector computation -Arrayprocessor

    Alternative to supercomputer

    Configured as peripherals to mainframe & minicomputers

    Just run vector portion of problems

    21

  • 7/31/2019 Csa 05

    22/117

    Vector Addition Example

    22

  • 7/31/2019 Csa 05

    23/117

    Processor Designs

    Pipelined ALU Decomposing of floating point operations into stages

    Different stages can operate on different sets of data llly

    Can be further enhanced if the vector elements are

    available in registers rather than from main memory

    Within operations

    Across operations

    23

    A h t V t C t ti

  • 7/31/2019 Csa 05

    24/117

    Approaches to Vector Computation

    24

    Ch i i

  • 7/31/2019 Csa 05

    25/117

    Chaining

    Cray Supercomputers

    Vector operation may start as soon as first elementof operand vector available and functional unit is

    free

    Result from one functional unit is fed immediatelyinto another

    If vector registers used, intermediate results do not

    have to be stored in memory

    25

  • 7/31/2019 Csa 05

    26/117

    Parallel ALUs

    Parallel processors

    break the task up into multiple processes to be executed

    in parallel

    effective only if the software and hardware for effective

    coordination of parallel processors

    26

  • 7/31/2019 Csa 05

    27/117

    Operating System Support

    27

  • 7/31/2019 Csa 05

    28/117

    OS

    OS is a program that controls the execution of

    application programs and acts as an interface

    between the user and the hardware

    Manages the computers resources,

    Provides services for programmers, and

    Schedules the execution of other programs.

    28

  • 7/31/2019 Csa 05

    29/117

    Objectives and Functions

    Convenience Making the computer easier to use

    Efficiency

    Allowing better use of computer resources

    29

    L d Vi f C t S t

  • 7/31/2019 Csa 05

    30/117

    Layers and Views of a Computer System

    30

    O i S S i

  • 7/31/2019 Csa 05

    31/117

    Operating System Services Program creation

    Program execution Access to I/O devices

    Controlled access to files

    System access

    Error detection and response

    Accounting

    31

    O/S as a Resource Manager

  • 7/31/2019 Csa 05

    32/117

    O/S as a Resource Manager

    32

  • 7/31/2019 Csa 05

    33/117

    Types of Operating System Interactive

    Batch

    Single program (Uni-programming)

    Multi-programming (Multi-tasking)

    33

  • 7/31/2019 Csa 05

    34/117

    Early Systems

    Late 1940s to mid 1950s

    No Operating System

    Programs interact directly with hardware

    Two main problems:

    Scheduling

    Setup time

    34

  • 7/31/2019 Csa 05

    35/117

    Simple Batch Systems

    Resident Monitor program Users submit jobs to operator who batches jobs

    Monitor controls sequence of events to process

    batch When one job is finished, control returns to

    Monitor which reads next job

    Monitor handles scheduling

    35

  • 7/31/2019 Csa 05

    36/117

    Memory Layout for Resident Monitor

    36

  • 7/31/2019 Csa 05

    37/117

    Desirable Hardware Features Memory protection

    To protect the Monitor

    Timer

    To prevent a job using the system

    Privileged instructions

    Only executed by Monitor

    e.g. I/O

    Interrupts

    Allows regaining control from user program

    37

  • 7/31/2019 Csa 05

    38/117

    Multi-programmed Batch Systems

    I/O devices very slow

    When one program is waiting for I/O, another

    can use the CPU

    38

  • 7/31/2019 Csa 05

    39/117

    Single Program

    39

    l h

  • 7/31/2019 Csa 05

    40/117

    Multi-Programming with

    Two Programs

    40

    l h

  • 7/31/2019 Csa 05

    41/117

    Multi-Programming with

    Three Programs

    41

  • 7/31/2019 Csa 05

    42/117

    Time Sharing Systems

    Allow users to interact directly with the

    computer

    i.e. Interactive

    Multi-programming allows a number of users

    to interact with the computer

    42

  • 7/31/2019 Csa 05

    43/117

    Scheduling

    Key to multi-programming

    Types

    Long term

    Medium term

    Short term

    I/O

    43

  • 7/31/2019 Csa 05

    44/117

    Long Term Scheduling

    Determines which programs are submitted for

    processing

    i.e. controls the degree of multi-programming

    Once submitted, a job becomes a process for the

    short term scheduler

    44

  • 7/31/2019 Csa 05

    45/117

    Medium Term Scheduling

    Part of the swapping function

    Usually based on the need to manage multi-

    programming

    If no virtual memory, memory management is also

    an issue

    45

  • 7/31/2019 Csa 05

    46/117

    Short Term Scheduling

    Also known as Dispatcher

    Fine grained decisions of which job to execute next

    Which job actually gets to use the processor in the

    next time slot

    46

  • 7/31/2019 Csa 05

    47/117

    Five State Process Model

    47

    PCB Diagram

  • 7/31/2019 Csa 05

    48/117

    PCB Diagram

    48

    Scheduling Example

  • 7/31/2019 Csa 05

    49/117

    Scheduling Example

    49

    Key Elements involved in scheduling

  • 7/31/2019 Csa 05

    50/117

    Key Elements involved in scheduling

    50

    Process Scheduling

  • 7/31/2019 Csa 05

    51/117

    Process Scheduling

    51

  • 7/31/2019 Csa 05

    52/117

    Reduced Instruction Set Computers

    52

    Major Advances in Computers(1)

  • 7/31/2019 Csa 05

    53/117

    Major Advances in Computers(1) The family concept

    IBM System/360 in 1964

    DEC PDP-8

    Microporgrammed control unit

    Idea by Wilkes in 1951

    Produced by IBM S/360 in 1964

    Cache memory

    IBM S/360 model 85 in 1969

    Pipelining Introduces parallelism into fetch execute cycle

    Multiple processors

    53

    h

  • 7/31/2019 Csa 05

    54/117

    The Next Step - RISC

    Reduced Instruction Set Computer

    Key features

    Large number of general purpose registers

    Limited and simple instruction set

    Emphasis on optimising the instruction pipeline

    54

    Comparison of processors

  • 7/31/2019 Csa 05

    55/117

    Comparison of processors

    55

    Driving force for CISC

  • 7/31/2019 Csa 05

    56/117

    Driving force for CISC Software costs far exceed hardware costs

    Increasingly complex high level languages Semantic gap

    difference between the operations provided in HLLs and

    those provided in computer architecture.

    Leads to:

    Large instruction sets

    More addressing modes

    CASE machine instruction on the VAX

    e.g. CASE (switch) on VAX

    56

    E i Ch i i

  • 7/31/2019 Csa 05

    57/117

    Execution Characteristics

    Studies have been done to determine the

    characteristics of execution of machine instructionsgenerated from HLL programs

    Different approach: namely, to make the

    architecture that supports the HLL simpler

    Operations performed

    Operands used

    Execution sequencing

    57

    O ti

  • 7/31/2019 Csa 05

    58/117

    Operations Assignments - predominate

    Movement of data is of high importance

    Conditional statements (IF, LOOP)

    Sequence control

    Implemented in machine language

    Procedure call-return is very time consuming

    58

  • 7/31/2019 Csa 05

    59/117

    59

    Operands

  • 7/31/2019 Csa 05

    60/117

    Operands Mainly local scalar variables

    Optimisation should concentrate on accessing localvariables

    60

  • 7/31/2019 Csa 05

    61/117

    Procedure Calls

    Very time consuming

    Depends on number of parameters passed

    Depends on level of nesting

    Most programs do not do a lot of calls followed bylots of returns

    61

    I li i

  • 7/31/2019 Csa 05

    62/117

    Implications Attempt to make the instruction set architecture

    close to HLLs is not the most effective

    Best support is given by optimising most used and

    most time consuming features

    Large number of registers

    Careful design of pipelines

    Simplified (reduced) instruction set

    62

    Wh CISC (1)?

  • 7/31/2019 Csa 05

    63/117

    Why CISC (1)?

    Compiler simplification?

    Complex machine instructions harder to exploit

    Optimization more difficult

    Smaller programs?

    Program takes up less memory but

    Memory is now cheap

    May not occupy less bits, just look shorter in

    symbolic form

    63

  • 7/31/2019 Csa 05

    64/117

    Why CISC (2)?

    instruction execution would be faster.

    More complex control unit

    Microprogram control store larger

    It is far from clear that a trend to increasingly

    complex instruction sets is appropriate

    64

    RISC Characteristics

  • 7/31/2019 Csa 05

    65/117

    RISC Characteristics

    One instruction per cycle

    Register to register operations

    Few, simple addressing modes

    Few, simple instruction formats

    65

  • 7/31/2019 Csa 05

    66/117

    RISC v CISC

    Not clear cut

    Many designs borrow from both philosophies

    e.g. PowerPC and Pentium II

    66

    RISC Pipelining

  • 7/31/2019 Csa 05

    67/117

    RISC Pipelining

    Most instructions are register to register

    Two phases of execution I: Instruction fetch

    E: Execute

    ALU operation with register input and output For load and store

    I: Instruction fetch

    E: Execute Calculate memory address

    D: Memory

    Register to memory or memory to register operation67

    Effects of Pipelining

  • 7/31/2019 Csa 05

    68/117

    p g

    68

    Optimization of Pipelining

  • 7/31/2019 Csa 05

    69/117

    Optimization of Pipelining Delayed branch

    makes use of a branch that does not take effect until after

    execution of the following instruction

    Delayed Load Register to be target is locked by processor

    Continue execution of instruction stream until register required

    Idle until load complete Re-arranging instructions can allow useful work

    Loop Unrolling Replicate body of loop a number of times

    Reduces loop overhead

    Increases instruction parallelism

    Improved register, data cache or TLB locality

    69

    Delayed branch

  • 7/31/2019 Csa 05

    70/117

    e ayed b a c

    70

    Use of Delayed Branch

  • 7/31/2019 Csa 05

    71/117

    y

    71

  • 7/31/2019 Csa 05

    72/117

    72

    Controversy

  • 7/31/2019 Csa 05

    73/117

    y Quantitative

    compare program sizes and execution speeds

    Qualitative

    examine issues of high level language support

    Problems

    No pair of RISC and CISC that are directly comparable

    No definitive set of test programs

    Most comparisons done on toy machines rather than

    production machines Most commercial devices are a mixture

    73

  • 7/31/2019 Csa 05

    74/117

    Control Unit Operation

    74

    Micro-Operations

  • 7/31/2019 Csa 05

    75/117

    p

    A computer executes a program

    Fetch/execute cycle Each cycle has a number of steps

    pipelining

    Called micro-operations Each step does very little

    75

    Constituent Elements of Program Execution

  • 7/31/2019 Csa 05

    76/117

    Constituent Elements of Program Execution

    76

    Fetch - 4 Registers

  • 7/31/2019 Csa 05

    77/117

    g Memory Address Register (MAR)

    Connected to address bus

    Specifies address for read or write op

    Memory Buffer Register (MBR)

    Connected to data bus

    Holds data to write or last data read

    Program Counter (PC)

    Holds address of next instruction to be fetched

    Instruction Register (IR)

    Holds last instruction fetched

    77

    Fetch Sequence

  • 7/31/2019 Csa 05

    78/117

    q Address of next instruction is in PC and it is moved to MAR

    Control unit issues READ command

    Result (data from memory) appears on data bus

    Data from data bus copied into MBR

    PC incremented by 1 (in parallel with data fetch from

    memory) Data (instruction) moved from MBR to IR

    78

    Fetch Sequence (symbolic)

  • 7/31/2019 Csa 05

    79/117

    q ( y ) consists of three steps and four micro operations

    second and third micro-operations both takeplace during the second time unit

    79

    Rules for groupings of micro-operations

  • 7/31/2019 Csa 05

    80/117

    Proper sequence must be followed

    MAR

  • 7/31/2019 Csa 05

    81/117

    y MBR contains an address

    IR is now in same state as if direct addressing hadbeen used

    81

    Interrupt Cycle

  • 7/31/2019 Csa 05

    82/117

    This is a minimum May be additional micro-ops to get addresses

    Saving context is done by interrupt handler routine, not

    micro-ops

    82

    Execute Cycle (ADD)

  • 7/31/2019 Csa 05

    83/117

    Different for each instruction

    e.g. ADD R1,X - add the contents of location X toRegister 1 , result in R1

    83

    Instruction Cycle

  • 7/31/2019 Csa 05

    84/117

    Each phase decomposed into sequence ofelementary micro-operations

    E.g. fetch, indirect, and interrupt cycles Assume new 2-bit register

    Instruction cycle code (ICC) designates which part of

    cycle processor is in 00: Fetch

    01: Indirect

    10: Execute

    11: Interrupt

    84

    Flowchart for Instruction Cycle

  • 7/31/2019 Csa 05

    85/117

    85

    Functional Requirements

  • 7/31/2019 Csa 05

    86/117

    Define basic elements of processor

    Describe micro-operations processor performs Determine the functions that the control unit

    must perform to cause the micro-operations

    to be performed

    86

    B i El t f P

  • 7/31/2019 Csa 05

    87/117

    Basic Elements of Processor

    ALU

    Registers

    Internal data pahs

    External data paths

    Control Unit

    87

    T f Mi ti

  • 7/31/2019 Csa 05

    88/117

    Types of Micro-operation

    Transfer data between registers

    Transfer data from register to external interface

    Transfer data from external interface to register

    Perform arithmetic or logical operations

    88

    F ti f C t l U it

  • 7/31/2019 Csa 05

    89/117

    Functions of Control Unit

    Sequencing Causing the CPU to step through a series of micro-

    operations

    Execution Causing the performance of each micro-op

    This is done using Control Signals

    89

    Control Signalsl k

  • 7/31/2019 Csa 05

    90/117

    Clock

    This is how the control unit keeps time.

    Instruction register Op-code for current instruction

    Determines which micro-instructions are performed

    Flags Status of CPU

    Results of previous ALU operations

    Control signals from control bus Interrupts

    Acknowledgements

    90

    Model of Control Unit

  • 7/31/2019 Csa 05

    91/117

    91

    Control Signals - output

  • 7/31/2019 Csa 05

    92/117

    Within CPU

    Cause data movement Activate specific ALU functions

    To control bus

    To memory To I/O modules

    92

    Implementation

  • 7/31/2019 Csa 05

    93/117

    Implementation

    two categories:

    Hardwired implementation

    Microprogrammed implementation

    In hardwired , the control unit is essentially a

    combinational circuit.

    Input logic signals are transformed into a set

    of output logic signals, which are the control

    signals

    93

  • 7/31/2019 Csa 05

    94/117

    CPU Structure and Function

    94

    CPU Structure

  • 7/31/2019 Csa 05

    95/117

    CPU Structure

    CPU must: Fetch instructions

    Interpret instructions

    Fetch data

    Process data

    Write data

    95

    CPU With Systems Bus

  • 7/31/2019 Csa 05

    96/117

    96

    CPU Internal Structure

  • 7/31/2019 Csa 05

    97/117

    CPU Internal Structure

    97

    Registers

  • 7/31/2019 Csa 05

    98/117

    Registers

    CPU must have some working space (temporary

    storage) called registers

    Number and function vary between processor

    Top level of memory hierarchy

    Perform two roles:

    User-visible registers

    Control and status registers

    98

    User Visible Registers

  • 7/31/2019 Csa 05

    99/117

    User Visible Registers

    General Purpose

    Data

    Address

    Condition Codes(Flags)

    99

    User Visible Registers

  • 7/31/2019 Csa 05

    100/117

    User Visible Registers

    May be true general purpose

    May be restricted

    May be used for data or addressing

    Data

    Accumulator

    Addressing

    Segment registers

    Index registers

    Stack pointer

    100

  • 7/31/2019 Csa 05

    101/117

    How Many GP Registers?

    Between 8 - 32 Fewer = more memory references

    More does not reduce memory references

    How big? Large enough to hold full address

    Large enough to hold full word

    Often possible to combine two data registers

    101

    Condition Code Registers

  • 7/31/2019 Csa 05

    102/117

    Condition Code Registers

    Sets of individual bits

    e.g. result of last operation was zero

    Can be read (implicitly) by programs

    e.g. Jump if zero

    Can not be set by programs

    102

    Control & Status Registers

  • 7/31/2019 Csa 05

    103/117

    Control & Status Registers

    Program Counter

    Instruction Decoding Register

    Memory Address Register

    Memory Buffer Register

    103

    Program Status Word

  • 7/31/2019 Csa 05

    104/117

    Program Status Word A set of bits Includes Condition Codes

    Sign

    Zero

    Carry

    Equal

    Overflow

    Interrupt enable/disable Supervisor

    104

    Other Registers

  • 7/31/2019 Csa 05

    105/117

    Other Registers

    May have registers pointing to: Process control blocks

    Interrupt Vectors

    105

    Example Register Organizations

  • 7/31/2019 Csa 05

    106/117

    106

    Indirect Cycle

  • 7/31/2019 Csa 05

    107/117

    Indirect Cycle

    May require memory access to fetch operands

    Indirect addressing requires more memory

    accesses

    Can be thought of as additional instructionsubcycle

    107

    Data Flow (Instruction Fetch)

  • 7/31/2019 Csa 05

    108/117

    ( ) Depends on CPU design

    Fetch

    PC contains address of next instruction

    Address moved to MAR

    Address placed on address bus

    Control unit requests memory read

    Result placed on data bus, copied to MBR, then to IR

    Meanwhile PC incremented by 1

    108

    Data Flow (Data Fetch)

  • 7/31/2019 Csa 05

    109/117

    ( )

    IR is examined

    If indirect addressing, indirect cycle is

    performed

    Right most N bits of MBR transferred to MAR

    Control unit requests memory read

    Result (address of operand) moved to MBR

    109

    Data Flow (Fetch Diagram)

  • 7/31/2019 Csa 05

    110/117

    110

    Data Flow (Indirect Diagram)

  • 7/31/2019 Csa 05

    111/117

    111

    Data Flow (Interrupt Diagram)

  • 7/31/2019 Csa 05

    112/117

    112

    Pipelining Fetch instruction

  • 7/31/2019 Csa 05

    113/117

    Fetch instruction

    Decode instruction

    Calculate operands (i.e. EAs)

    Fetch operands

    Execute instructions Write result

    Overlap these operations

    113

    Two Stage Instruction Pipeline

  • 7/31/2019 Csa 05

    114/117

    Two Stage Instruction Pipeline

    114

    Timing Diagram

  • 7/31/2019 Csa 05

    115/117

    115

    Effect of a Conditional Branch Instruction

  • 7/31/2019 Csa 05

    116/117

    116

    Alternative Pipeline Depiction

  • 7/31/2019 Csa 05

    117/117