William Stallings Computer Organization and...

56
William Stallings Computer Organization and Architecture Chapter 12 12 - Rev. 3.3 (2009-10) by Enrico Nardelli 1 CPU Structure and Function

Transcript of William Stallings Computer Organization and...

Page 1: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

William Stallings

Computer Organization

and Architecture

Chapter 12

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 1

Chapter 12

CPU Structure and Function

Page 2: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

CPU Functions

• CPU must:

� Fetch instructions

� Decode instructions

� Fetch operands

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 2

� Fetch operands

� Execute instructions / Process data

� Store data

� Check (and possibly serve) interrupts

Page 3: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

CPU Components

ALU

Registri

CP

U In

tern

al B

us

PC

IR

Da

ta L

ine

s

Ad

dre

ss L

ine

s

Co

ntr

ol L

ine

s

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 3

Control Unit

ALUC

PU

In

tern

al B

us

Control Signals

MARMBRAC

Ad

dre

ss L

ine

s

Co

ntr

ol L

ine

s

Page 4: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Kind of Registers

• User visible and modifiable

� General Purpose

� Data (e.g. accumulator)

� Address (e.g. base addressing, index addressing)

• Control registers (not visible to user)

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 4

• Control registers (not visible to user)

� Program Counter (PC)

� Instruction Decoding Register (IR)

� Memory Address Register (MAR)

� Memory Buffer Register (MBR)

• State register (visible to user but not directly modifiable)

� Program Status Word (PSW)

Page 5: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Kind of General Purpose Registers

• May be used in a general way or be restricted to contains only data or only addresses

• Advantages of general purpose registers

� Increase flexibility and programmer options

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 5

� Increase flexibility and programmer options

� Increase instruction size & complexity

• Advantages of specialized (data/address) registers

� Smaller (faster) instructions

� Less flexibility

Page 6: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

How Many General Purposes

Registers?

• Between 8 - 32

• Fewer = more memory references

• More does not reduce memory references and takes up processor real estate

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 6

takes up processor real estate

Page 7: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

How many bits per register?

• Large enough to hold full address value

• Large enough to hold full data value

• Often possible to combine two data registers to obtain a single register with a double length

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 7

obtain a single register with a double length

Page 8: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

State Registers

• Sets of individual bits

� e.g. store if result of last operation was zero or not

• Can be read (implicitly) by programs

� e.g. Jump if zero

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 8

� e.g. Jump if zero

• Can not (usually) be set by programs

• There is always a Program Status Word (see later)

• Possibly (for operating system purposes):

� Interrupt vectors

� Memory page table (virtual memory)

� Process control blocks (multitasking)

Page 9: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Program Status Word

• A set of bits, including condition code bits, giving the status of the program� Sign of last result

� Zero

Carry

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 9

� Carry

� Equal

� Overflow

� Interrupt enable/disable

� Supervisor mode (allow to execute privileged instructions)• Used by operating system (not available to user programs)

Page 10: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Example Register

Organizations

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 10

Page 11: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Instruction Cycle (with Interrupt)

Fetch Phase Execute Phase Interrupt Phase

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 11

Page 12: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Instruction Cycle (with

Indirect Addressing)

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 12

Page 13: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

• Execute

• Decode – Execute

• Decode – Fetch Operand – Execute

A closer look at the execution phase

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 13

• Decode – Fetch Operand – Execute

• Decode – Calculate Address – Fetch Operand – Execute

• Decode – Calculate Address – Fetch Address – Fetch Operand – Execute

• Decode – Calculate … – … – … Operand – Execute – Write Result

Page 14: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Instruction Cycle State

Diagram (with Indirection)

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 14

Page 15: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow for Instruction Fetch

• PC contains address of next instruction

• Sequence of actions needed to execute instruction fetch:

1. PC is moved to MAR

2. MAR content is placed on address bus

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 15

2. MAR content is placed on address bus

3. Control unit requests memory read

4. Memory read address bus and places result on data bus

5. Data bus is copied to MBR

6. MBR is copied to IR

7. Meanwhile, PC is incremented by 1

• Action 7 can be executed in parallel with any other action after the first

Page 16: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Diagrams representing Data Flows

• The previous example shows 7 distinct actions, each corresponding to a DF (= data flow)

• Distinct DFs are not necessarily executed at distinct time steps (i.e.: DFn and DFn+1 might be executed during the same time step – see

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 16

n n+1

executed during the same time step – see chapter 14)

• Large arrows in white represents DFs with a true flow of data

• Large hatched arrows represents DFs where flow of data acts as a control: only the more relevant controls are shown

Page 17: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow Diagram for

Instruction Fetch

MARMemory

CPU

PC2

3

4

4

1

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 17

IR MBR

Control

Unit

Data

Bus

Address

Bus

Control

Bus

MAR = Memory Address Register

MBR = Memory Buffer Register

IR = Instruction Register

PC = Program Counter

3

56

7

Page 18: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow for Data Fetch:

Immediate and Register Addressing

• ALWAYS:

� IR is examined to determine addressing mode

• Immediate addressing:

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 18

• Immediate addressing:

� The operand is already in IR

• Register addressing:

� Control unit requests read from register selected according to value in IR

Page 19: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow Diagram for Data

Fetch with Register Addressing

CPU

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 19

IR

Control

UnitRegisters

Data

Bus

Address

Bus

Control

Bus

MAR = Memory Address Register

MBR = Memory Buffer Register

IR = Instruction Register

PC = Program Counter

Page 20: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow for Data Fetch:

Direct Addressing

• Direct addressing:

1. Address field is moved to MAR

2. MAR content is placed on address bus

3. Control unit requests memory read

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 20

4. Memory reads address bus and places result on data bus

5. Data bus (= operand) is copied to MBR

Page 21: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow Diagram for Data

Fetch with Direct Addressing

MARMemory

CPU

2

3

4

4

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 21

MBR

Control

Unit

Data

Bus

Address

Bus

Control

Bus

MAR = Memory Address Register

MBR = Memory Buffer Register

IR = Instruction Register

PC = Program Counter

3

5

1

IR

Page 22: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow for Data Fetch:

Register Indirect Addressing

• Register indirect addressing:

1. Control unit requests read from register selected according to value in IR

2. Selected register value is moved to MAR

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 22

2. Selected register value is moved to MAR

3. MAR content is placed on address bus

4. Control unit requests memory read

5. Memory reads address bus and places result on data bus

6. Data bus (= operand) is moved to MBR

Page 23: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow Diagram for Data Fetch

with Register Indirect Addressing

MARMemory

CPU

3

4

5

52

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 23

IR MBR

Control

Unit

Data

Bus

Address

Bus

Control

Bus

MAR = Memory Address Register

MBR = Memory Buffer Register

IR = Instruction Register

PC = Program Counter

Registers4

6

1

1

Page 24: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow for Data Fetch:

Indirect Addressing

• Indirect addressing:

1. Address field is moved to MAR

2. MAR content is placed on address bus

3. Control unit requests memory read

4. Memory reads address bus and places result on data bus

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 24

4. Memory reads address bus and places result on data bus

5. Data bus (= address of operand) is moved to MBR

6. MBR is transferred to MAR

7. MAR content is placed on address bus

8. Control unit requests memory read

9. Memory reads address bus and places result on data bus

10. Data bus (= operand) is copied to MBR

Page 25: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow Diagram for Data

Fetch with Indirect Addressing

MARMemory

CPU

2 - 7

3 - 8

4 - 9

4 - 9

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 25

MBR

Control

Unit

Data

Bus

Address

Bus

Control

Bus

MAR = Memory Address Register

MBR = Memory Buffer Register

IR = Instruction Register

PC = Program Counter

3 - 8

5 - 10

1

IR

6

Page 26: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow for Data Fetch:

Relative Addressing

• Relative addressing (a form of displacement):1. Address field is moved to ALU

2. PC is moved to ALU

3. Control unit requests sum to ALU

4. Result from ALU is moved to MAR

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 26

4. Result from ALU is moved to MAR

5. MAR content is placed on address bus

6. Control unit requests memory read

7. Memory reads address bus and places result on data bus

8. Data bus (= operand) is copied to MBR

Page 27: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow Diagram for Data

Fetch with Relative Addressing

MARMemory

CPU

ALU

PC

5

6

7

7

2

4

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 27

IR MBR

Control

Unit

Data

Bus

Address

Bus

Control

Bus

MAR = Memory Address Register

MBR = Memory Buffer Register

IR = Instruction Register

PC = Program Counter

ALU = Arithmetic Logic Unit

ALU

6

8

1

3

Page 28: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow for Data Fetch: Base

Addressing

• Base addressing (a form of displacement):

1. Control unit requests read from register selected according to value in IR (explicit selection)

2. Selected register value is moved to ALU

3. Address field is moved to ALU

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 28

3. Address field is moved to ALU

4. Control unit requests sum to ALU

5. Result from ALU is moved to MAR

6. MAR content is placed on address bus

7. Control unit requests memory read

8. Memory reads address bus and places result on data bus

9. Result (= operand) is moved to MBR

Page 29: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow Diagram for Data

Fetch with Base Addressing

MARMemory

CPU

2

5

ALU

6

7

8

8

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 29

IR MBR

Control

Unit

Data

Bus

Address

Bus

Control

Bus

MAR = Memory Address Register

MBR = Memory Buffer Register

IR = Instruction Register

PC = Program Counter

ALU = Arithmetic Logic Unit

Registers

1

1

2

3

4

7

9

Page 30: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow for Data Fetch:

Indexed Addressing

• Same data flow as Base addressing

• Indexed addressing (a form of displacement):

1. Control unit requests read from register selected according to value in IR (explicit selection)

2. Selected register value is moved to ALU

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 30

2. Selected register value is moved to ALU

3. Address field is moved to ALU

4. Control unit requests sum to ALU

5. Result from ALU is moved to MAR

6. MAR content is placed on address bus

7. Control unit requests memory read

8. Memory reads address bus and places result on data bus

9. Result (= operand) is moved to MBR

Page 31: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow Diagram for Data

Fetch with Indexed Addressing

• The diagram is the same as for Base addressing

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 31

Page 32: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow for Data Fetch with

indirection and displacement

• Two different combinations of displacement and indirection (pre-index and post-index)

• See chapter 11 for the logical diagrams

• The data flow is a combination of what happens

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 32

• The data flow is a combination of what happens with the two techniques

• Try drawing the data flow diagrams yourself !

Page 33: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow for Execute

• May take many forms

• Depends on the actual instruction being executed

• May include

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 33

• May include

� Memory read/write

� Input/Output

� Register transfers

� ALU operations

Page 34: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow for Interrupt

• Current PC has to be saved (usually to stack) to allow resumption after interrupt and execution has to continue at the interrupt handler routine1. Save the content of PC

a. Contents of PC is copied to MBRb. Special memory location (e.g. stack pointer register) is loaded to MARc. Contents of MAR and MBR are placed, respectively, on address and data bus

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 34

c. Contents of MAR and MBR are placed, respectively, on address and data busd. Control unit requests memory writee. Memory reads address and data bus and store to memory locationf. Stack pointer register is updated (data flow not shown)

2. PC is loaded with address of the handling routine for the specific interrupt (usually by means of indirect addressing through the Interrupt Vector)a. Move to MAR the address into the interrupt vector for the specific interruptb. MAR content is placed on address busc. Control unit requests memory readd. Memory reads address bus and places result on data buse. Data bus is copied to MBRf. MBR is moved to PC

• Next instruction (first of the specific interrupt handler) can now be fetched

Page 35: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Data Flow Diagram for Interrupt

MARMemory

CPU

1b

1c - 2b

1d - 2c

1e

2a

2d

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 35

Data

Bus

Address

Bus

Control

Bus

MAR = Memory Address Register

MBR = Memory Buffer Register

IR = Instruction Register

PC = Program Counter

Registers

PC

Control

Unit

MBR

1b

1a 1c

1d - 2c

1e - 2d

2a

2e2f

Page 36: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Prefetch

• Fetch accesses main memory

• Execution usually does not access main memory

• CPU could fetch next instruction during the execution of current instruction

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 36

execution of current instruction

• Requires two sub-parts in CPU able to operate independently

• Called instruction prefetch

Page 37: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Improved Performance

• But performance is not doubled:

� Fetch usually shorter than execution (but for simple operations with many operands)

• Prefetch more than one instruction?

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 37

• Prefetch more than one instruction?

� Any conditional jump or branch means that prefetched instructions may be useless

• Performance can be improved by adding more stages in instruction processing ...

� … and more independent sub-parts in CPU

Page 38: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Instruction pipelining

• Similar to the use of an assembly line in a manufacturing plant

� product goes through various stages of production

� products at various stages can be worked on

12 -Rev. (2008-09) by Luciano Gualà 38

� products at various stages can be worked on simultaneously

• In a pipeline, new inputs are accepted at one end before previously accepted inputs appear as outputs at the other end

Page 39: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Pipelining

• Instruction cycle can be decomposed in elementary phases, for example:

� FI: Fetch instruction

� DI: Decode instruction

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 39

� CO: Calculate operands (i.e. calculate EAs)

� FO: Fetch operands

� EI: Execute instructions

� WO: Write output

• Pipelining improves performance by overlapping these phases (ideally can all be overlapped)

Page 40: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Timing of Pipeline

Set up time Time

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 40

Page 41: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Some remarks

• An instruction can have only some of the six phases

• We are assuming that all the phases can be performed in parallel

12 -Rev. (2008-09) by Luciano Gualà 41

performed in parallel

� e.g., no bus conflicts, no memory conflicts…

• The maximum improvement is obtained when the phases take more or less the same time

Page 42: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

A general principle

• The more overlapping phases are in a pipeline the more additional processing is needed to manage each phase and synchronization among phases

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 42

phases

� Logical dependencies between phases

• There is a trade-off between number of phases and speed-up of instruction execution

Page 43: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Control

flow (1)

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 43

Page 44: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Branch in a Pipeline (1)

Branch Penalty Time

• Instruction 3 is an conditional branch to instruction 15

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 44

Page 45: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Control

flow (2)

• But an unconditional branch might be managed earlier than EI phase

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 45

Empty

Back Pipe

EI phase

Page 46: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Branch in a Pipeline (2)

• The unconditional branch is managed after CO phase

Branch Penalty Time Branch PenaltyTime

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 46

Page 47: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Control

flow (3)

• But conditional branches still have a large penalty

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 47

Empty

Back Pipe

Page 48: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Branch in a Pipeline (3)

Branch Penalty Time

• Here instruction 3 is a conditional branch to instruction 15

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 48

Page 49: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Dealing with Branches

• Prefetch Branch Target

• Multiple Streams

• Loop buffer

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 49

• Branch prediction

• Delayed branching

Page 50: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Prefetch Branch Target

• Target of branch is prefetched, in addition to instruction following branch, and stored in an additional dedicated register

• Keep target until branch is executed

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 50

• Keep target until branch is executed

• Used by IBM 360/91

Page 51: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Multiple Streams

• Have two pipelines

• Prefetch each branch into a separate pipeline

• Use appropriate pipeline

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 51

• Leads to bus & register contention (only the sub-parts making up the pipeline are doubled)

• Additional branches entering the pipeline lead to further pipelines being needed

Page 52: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Loop Buffer

• Very fast memory internal to CPU

• Record the last n fetched instructions

• Maintained by fetch stage of pipeline

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 52

• Check loop buffer before fetching from memory

• Very good for small loops or close jumps

• The same concept as cache memory

• Used by CRAY-1

Page 53: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Branch Prediction (1)

• Predict never taken

� Assume that jump will not happen

� Always fetch next instruction

� 68020 & VAX 11/780

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 53

� 68020 & VAX 11/780

� VAX will not prefetch after branch if a page fault would result (O/S v CPU design)

• Predict always taken

� Assume that jump will happen

� Always fetch target instruction

Page 54: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Branch Prediction (2)

• Predict by Opcode

� Some instructions are more likely to result in a jump than others

� Can get up to 75% success

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 54

� Can get up to 75% success

• Use a Taken/Not taken switch

� Based on previous history of the instruction

� Good for loops

Page 55: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Branch Prediction State

Diagram

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 55

Page 56: William Stallings Computer Organization and Architecturenardelli/architettura-calcolatori/Ch_12-rev-3... · William Stallings Computer Organization and Architecture Chapter 12 Rev.

Delayed Branch

• Do not take jump until you have to

• Rearrange instructions

12 -Rev. 3.3 (2009-10) by Enrico Nardelli 56

• Used for RISC (Reduced Instructions Set Computer) architectures