Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M...

63
EE457 EE457 Out of Order (OoO) Execution Introduction to Dynamic Out of Order (OoO) Execution Introduction to Dynamic Scheduling of Instructions (The Tomasulo Algorithm) By Gandhi Puvvada

Transcript of Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M...

Page 1: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

EE457EE457

Out of Order (OoO) Execution

Introduction to Dynamic

Out of Order (OoO) Execution

Introduction to Dynamic Scheduling of Instructions(The Tomasulo Algorithm)

ByGandhi Puvvada

Page 2: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

References• EE557 Textbook

• Prof. Dubois’ EE557 ClassnotesProf. Dubois EE557 Classnotes

• Prof Annavaram’s slides• Prof. Annavaram s slides

P f P tt ’ L t lid2

• Prof. Patterson’s Lecture slides

Page 3: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Programs often have several small fragments of d hi h b t d i dcode, which can be executed in any order.

3

Page 4: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

OoO (Out of Order) executionOoO (Out of Order) execution

Io = In orderIo = In order

”Execution” here means producing the results.C l ti itti ltCompletion means committing results. (writing into register file or memory).

IoI (IoD) OoE IoCIn order Issue/Dispatch Out of orderIn order Issue/Dispatch, Out of order Execution and finally In order completion/commitmentcompletion/commitment

4

Page 5: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

IoC or OoC?IoC or OoC?

IoI (IoD) OoE IoCIoI (IoD) OoE IoCIoC (In order completion) is necessary to support exceptions (ex: page fault)support exceptions (ex: page fault).

Here we present firstHere we present firstIoI (IoD) OoE OoCand then (at the end)and then (at the end)IoI (IoD) OoE IoC

5

Page 6: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

OoC? But branchesOoC? But branches ..

OoC? Hope you are not executingOoC? Hope you are not executing instruction beyond a branch and committing them!them!

Well we dispatch a branch and suspendWell we dispatch a branch and suspend dispatching and wait until the branch is resolved Then we resume dispatchingresolved. Then we resume dispatching instructions beyond the branch at either the fall-through area or at the target areafall through area or at the target area.

6

Page 7: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Instruction Scheduling(R d i f i i )(Re-ordering of instructions)

• Basic block = a straight-line code sequence withBasic block = a straight-line code sequence with no branches.

• Compiler can perform static instruction scheduling.

• Tomasulo Algorithm lets us schedule instructions dynamically (in hardware).

• Branch prediction and speculative execution beyond a branch (of course with ability to flush wrong-path

7

( y g pinstructions on misprediction) will be covered later (and implemented on FPGA in EE560).

Page 8: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Register renaming to allow later instructions to proceed

lw $8, 40($2);dd $8 $8 $8

lw $8, 40($2);dd $8 $8 $8add $8, $8, $8;

sw $8, 40($2);add $8, $8, $8;sw $8, 40($2);

lw $8, 60($3);add $8 $8 $8;

lw $48, 60($3);add $48 $48 $48;add $8, $8, $8;

sw $8, 60($3);add $48, $48, $48;sw $48, 60($3);

8

Page 9: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Static Scheduling (based on Prof. Dubois slide)g• Strengths

-- Hardware simplicity-- Compiler has a global view of the code (does not help the hardware much)

W k• Weaknesses-- can not be CPU-implementation specific-- can not foresee dynamic events y

-- cache misses -- data-dependent delays-- conditional branches-- conditional branches

can only reschedule instructions in a basic block (basic block = a straight-line code sequence with no branches)

t t dd9

-- can not pre-compute memory addresses

Page 10: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

10

Page 11: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Simple 5-stage pipelineIn order executionIn-order executionRAW dependency

Solve it by forwarding, if not by stallingif not, by stalling

Dependent instructions are stalled in the ID stage

IM DMIM DM

11

IF ID EX M WB

Page 12: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Simple 5-stage pipeline: Dependent instructions are stalled in the ID stageDependent instructions are stalled in the ID stage

and lw

12

Page 13: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Simple 5-stage pipeline: Dependent instructions can not be stalled in theDependent instructions can not be stalled in the EX stage. Why?

andlwlw

13

Page 14: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Provide multiple functional unitsp(for simplicity, we avoid talking about floating point

execution unit and floating point register file)

Stall after decoding in queuesStall, after decoding, in queues

Multiply

Divide

IMInteger

IF ID

Queues and

WBDM Load/Store

14

Queues andFunctional unit

Page 15: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Why junior instructions carry their source register IDs into EX stage? Well they need to get help from Senior #1 orEX stage? Well they need to get help from Senior #1 or Senior #2 in EX stage under the control of the FU.No more of that. There may be 40 seniors in front of

S I th di t h it ill t ll f hi hyou. So I, the dispatch unit, will tell you from which senior you need to get help for which source register.

rs, rt (IDs)rs, rt (IDs) are carried

into EX

15

Page 16: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Tomasulo’s planTomasulo s plan

• OoO Out of order execution• OoO Out of order execution

• Multiple functional units(say, Integer, DM, Multiplier, Divider)( y g p )

• Queues between ID and EX stages• Queues between ID and EX stages(in place of ID/EX register)

16

Page 17: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Out of order execution ?!Out of order execution ?!Problems all over ??!!

• For the time, no branch prediction, no speculative execution beyondbranches, ,just stall on a conditional branch

• No support for precise exceptions for the time

17Even then, …

Page 18: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

RAW WAR and WAWRAW, WAR, and WAWRAW = Read After Write

l $8 40($2)lw $8, 40($2); add $9, $8, $7;

WAR W it ft R dWAR = Write after Readadd $9, $8, $6; lw $8, 40($2); $ , ($ );

WAW = Write after Writeadd $9 $8 $6;add $9, $8, $6; lw $9, 40($2); WAW ?

How is it possible?Why would anyone produce some lt i $9 d ith t tili i

18

How is it possible?Consider a printer or a FIFO

result in $9 and without utilizing that result, why would he overwrite

it with another result?

Page 19: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

WAW can easily occur!WAW can easily occur!WAW ? How is it possible?In out of order execution instructions before the branchIn out of order execution, instructions before the branch and instruction after the branch can co-exist.For example, multiple iterations of this loop can coexist i th ti

$ $

in the execution area. So, what?

Loop: LW $2, 40($1);MULT $4 $2, $3;SW $4, 40($1);$ , ($ );ADDI $1, $1, -4;BNE $1, $0, Loop;

19

Page 20: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Say a company gives standard bonus to most of the employees and a higher bonus to the managersemployees and a higher bonus to the managers.So you load into $3 standard bonus from the stdbonus location in memory. And then you check to see if it is a case

$of a manager, and then load into $3 again (overwriting the earlier $3) the special bonus from the special location in memory.y

LW $3 stdbonus ($0)

BNE $1, $2, SKIP

LW $3 special ($0)

20

Page 21: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

RAW, WAR, and WAW(some terminology to remember)

RAW Read After WriteRAW = Read After Writelw $8, 40($2); add $9, $8, $7;

RAW A true dependency

WAR = Write after Readadd $9, $8, $6; WAR

A true dependency

nces

add $9, $8, $6; lw $8, 40($2);

WAW = Write after Write

WAR An anti-dependency

pend

en

WAW = Write after Writeadd $9, $8, $6; lw $9, 40($2); WAW

A t t d dame

Dep

21

An output dependencyNa

Page 22: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

RAW, WAR, and WAW

• In order execution:• In-order execution: We need to deal with RAW only.

• Out of order execution:Now we need to deal with WAR and WAW besides RAW

22

WAR and WAW besides RAW.

Page 23: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

23

Page 24: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Limited Architectural RegistersLimited Architectural RegistersMore Physical Registersy g

Register Renaminglw $8, 40($2);add $8, $8, $8;

It is clear that compiler is using $8 as a temporary register.

sw $8, 40($2);

l $8 60($3)

If there is a delay in obtaining $2, the first part of the code can not proceed.

lw $8, 60($3);add $8, $8, $8;

Unfortunately, the second part of the code can not proceed because of

d d f24

sw $8, 60($3);name dependency for $8.

Page 25: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

If we had 64 registers instead of 32 registers, g gthen perhaps compiler might have used $48 instead of $8 and we could have executed the

d t f th d b f th fi t t!second part of the code before the first part!

$ $lw $8, 40($2);add $8, $8, $8;sw $8, 40($2);

lw $48, 60($3); This is an example of$ , ($ );add $48, $48, $48;sw $48, 60($3);

This is an example of name dependency.

25

sw $48, 60($3);

Page 26: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Four different temporary registers can be used here as shown: $8 $18 $28 and $48used here as shown: $8, $18, $28, and $48(or called with coded names, LION, TIGER, CAT and ANT)CAT, and ANT).

$ $ $lw $8, 40($2);add $18, $8, $8;

lw LION, 40($2);add TIGER, LION, LION;

sw $18, 40($2);

lw $28, 60($3);

sw TIGER, 40($2);

lw CAT, 60($3);$ , ($ );add $48, $28, $28;sw $48, 60($3);

, ($ );add ANT, CAT, CAT;sw ANT, 60($3);

26

sw $48, 60($3); sw ANT, 60($3);

Page 27: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Can a later implementation provide 64 registers (instead of 32) while maintaining binary compatibilitymaintaining binary compatibilitywith previously compiled codes?

Answer: Yes / No

Why?

27

Page 28: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Answer: Can not change the number of Architectural Registers

Register Renaming Through TaggingRegistersRegisters

This solves name dependencyThis solves name dependency problems (WAR and WAW) while

di d d (RAW)attending to true dependency (RAW) through waiting in queues.

28

g g q

Page 29: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

square root $2 $10; $1 $1

RST RF

lw $8, 40($2);

square_root $2, $10; $1$2$3$4

$1$2$3$4( )

add $8, $8, $8;sw $8 40($2);

$5$6$7$8

$5$6$7$8sw $8, 40($2);

lw $8, 60($3);

$8...$31

$8...$31

add $8, $8, $8;sw $8 60($3);

$31 $31

dependentsw $8, 60($3);

RST = Register Status Tabledestination

dependentsource

29

gRF = Register File

Page 30: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

square root $2 $10; $1 $1

RST RF

lw $8, 40($2);

square_root $2, $10; $1$2$3$4

$1$2$3$4( )

add $8, $8, $8;sw $8 40($2);

$5$6$7$8

$5$6$7$8sw $8, 40($2);

lw $8, 60($3);

$8...$31

$8...$31

add $8, $8, $8;sw $8 60($3);

$31 $31

sw $8, 60($3);

30

Page 31: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

square root $2 $10; $1 $1

RST RF

lw $8, 40($2);

square_root $2, $10; $1$2$3$4

$1$2$3$4( )

add $8, $8, $8;sw $8 40($2);

$5$6$7$8

$5$6$7$8sw $8, 40($2);

lw $8, 60($3);

$8...$31

$8...$31

add $8, $8, $8;sw $8 60($3);

$31 $31

sw $8, 60($3);

31

Page 32: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

square root $2 $10; $1 $1

RST RF

lw $8, 40($2);

square_root $2, $10; $1$2$3$4

$1$2$3$4( )

add $8, $8, $8;sw $8 40($2);

$5$6$7$8

$5$6$7$8sw $8, 40($2);

lw $8, 60($3);

$8...$31

$8...$31

add $8, $8, $8;sw $8 60($3);

$31 $31

sw $8, 60($3);

32

Page 33: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Di t h it d d dsquare root $2 $10; Dispatch unit decodes and dispatches instructions.

lw $8, 40($2);

square_root $2, $10;

For destination operand, an instruction carries a TAG (but

( )add $8, $8, $8;sw $8 40($2); (

not the actual register name)!

F d

sw $8, 40($2);

lw $8, 60($3);For source operands, an instruction carries either the values or TAGs of the

add $8, $8, $8;sw $8 60($3); values or TAGs of the

operands (but not the actual register names)!

sw $8, 60($3);

33

g )

Page 34: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Register RenamingRegister Renaming

34

Page 35: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

TAGs for destinations or sources or for both?

• A new tag is assigned to the destination register of the g g ginstruction being dispatched.

• For each of the source registers (source operands) of g ( p )the instruction being dispatched, either the value of the source register (if it has not been previously tagged) or the existing tag associated with the source register (if it has been tagged already) is conveyed to theit has been tagged already) is conveyed to the instruction.

• If a tag is conveyed for a source then the instruction• If a tag is conveyed for a source, then the instruction needs to wait for the original instruction with that destination tag to go on to the CDB and announce the value.

35

value.

Page 36: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Unique TAG4 Unique TAG

4

4

• Like SSN, we need a unique TAG

4

• SSNs are reused.

• Similarly TAGs can be reused.y

• TAGs are similar to the number TOKENs36

TAGs are similar to the number TOKENs.

Page 37: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Take a number vs Take a tokenTake a number vs. Take a token

44

In State Bank of India, the cashier issues brass tokens to customers trying to draw money as an identification (and not at all to put them in any virtual queue). Token numbers are in random order.

Helps to create a Virtual Queue.

The cashier verifies the signature in the records room and returns with money, call the token number and issues the money.

We do not need that here!

37Tokens are reclaimed and reused.

Page 38: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

TAGs (= Tokens)4

( )4

• How many Tokens should the bank cashier have to start with?

• What happens if the tokens are run out?• What happens if the tokens are run out?

• Does he need to have any order in holding y gtokens and issuing tokens?

D h h t ll t t k b k?38

• Does he have to collect tokens back?

Page 39: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

TAGs (= Tokens)4

( )4

• How many Tokens should the bank cashier have to start with?

• What happens if the tokens are run out?• What happens if the tokens are run out?

• Does he need to have any order in holding y gtokens and issuing tokens?

D h h t ll t t k b k?38

• Does he have to collect tokens back?

Page 40: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

TAG FIFO (FIFO t ht i EE560)(FIFOs are taught in EE560)

• To issue and collect Tokens (TAGs), use a circular FIFO (First in First Out) unituse a circular FIFO (First-in-First-Out) unit.While the FIFO-order is not important here, a FIFO is the easiest to implement in hardware compared to a random order in a pile.

• Filled with (say) 64 tokens (in any order) initially on resetFilled with (say) 64 tokens (in any order) initially on reset. • Tokens return in out of order anyway.• Put tokens back in the FIFO and issue.

01

wp rp wp 1wp

63

2 rp

63

2

63

rp2

39Full 2 tokens issued 1 token returned

Page 41: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Simplified Block Diagramprovided by P f D b ifor EE457 Prof. Dubois

TAG FIFO

63

2

63

IntegerMultiplier

nt.

Div

ider

Issue UnitIn D

40CDB = Common Data Bus (compare it to a Public Announcing System)

Page 42: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

IoI-OoE-OoC with RSTI-Cache Block Diagram

Adapted from Prof. Michel Dubois

(Simplified for EE 457)

Register Status Table

Integer / Branch

D-Cache Div Mul

TAG FIFO

Instruc. Queue

Reg

. File

Int.

Que

ue

L/S

Que

ue

Div

Que

ue

Mul

t. Q

ueue

CDB

Issue Unit

Dispatch

Load Buffer

Page 43: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Front-End & Back-EndFront End & Back End• IFQ Instruction Fetch Queue (a FIFO structure)Q Q ( )

• Dispatch unit (including RST, RF, Tag FIFO)

• Load Store and other Issue Queues

• Issue Unit

• Functional units

CDB (C D t B )41

• CDB (Common Data Bus)

Page 44: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

42

Page 45: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Bottle neck in the designBottle neck in the design

• CDB = Common Data BusCDB Common Data Bus

Do all instructions use CDB?Do all instructions use CDB?

sw ?• sw ?

j (jump)?• j (jump)?

beq43

• beq

Page 46: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

load store queueload store queue

• Address calculationAddress calculation

• Memory disambiguationy g

Mr. Bruin: Let me take a guess!gYou will now propose to have a MST (Memory Status Table) (like the RST).And you will rename memory locations to solve WAW and WAR problems among memory locations right?!

44

locations, right?!

Page 47: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

MST (Memory Status Table)? No way! It is too big!No way! It is too big!

We will just ask the junior to stall and wait to solve his WAR and WAW problems with his seniors.

0 0

MST Memory$1 $1

RST RF01

01

$1$2$3$4

$1$2$3$4

. .

$5$6$7$8

$5$6$7$8

.

.

.

.

.

$8...$31

$8...$31

45

$31 $31

Page 48: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Address calculation for lw and sw

EE557 approach for address calculation

EE457/560 approach for address calculationDedicated adder to computeDedicated adder, to compute address, attached to the load-store queue

46

store queue.

Page 49: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Memory DisambiguationMemory DisambiguationEE557

47

Page 50: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Memory Disambiguationy gRAW

sw $2 2000($0);sw $2, 2000($0);

lw $8, 2000($0);

WAWsw $2, 2000($0);, ( );

sw $8, 2000($0);

WARlw $2, 2000($0);

48sw $8, 2000($0);

Page 51: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Memory Disambiguationy gRAW

sw $2 2000($0);sw $2, 2000($0);

lw $8, 2000($0);

WAWsw $2, 2000($0);, ( );

sw $8, 2000($0);

WARlw $2, 2000($0);

48sw $8, 2000($0);

Page 52: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Memory Disambiguationy gRAW

sw $2 2000($0);This later lw can proceed only if there is no store ahead sw $2, 2000($0);

lw $8, 2000($0);

yof it with the same address.

WAWsw $2, 2000($0);

This later sw can proceed only if there is no store ahead , ( );

sw $8, 2000($0);

yof it with the same address.

WARlw $2, 2000($0);

This later sw can proceed only if there is no load ahead

f it ith th dd49sw $8, 2000($0);

of it with the same address.

Page 53: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Maintaining instructions in the order gof arrival (issue order/program order)

in a queuein a queueIs it necessary or is it desirable?

I th f L S Q ?In the case of L-S Queue ?

In the case of Integer and other queues (mult queue, div queue)?q , q )

50

Page 54: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Maintaining instructions in the order gof arrival (issue order/program order)

in a queuein a queueIs it necessary or is it desirable?

In the case of L S Queue ?In the case of L-S Queue ? NECESSARY to enforce memory disambiguation rules

In the case of Integer and other queues (mult queue, div queue)?

DESIRABLE so that an earlier instructionDESIRABLE, so that an earlier instruction gets executed whenever possible, there by perhaps reducing too many instructions

51

waiting on it.

Page 55: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Priority (based on the order y (of arrival) among instructions ready

to executeto execute• Is it necessary or is it desirable?y

L l i it ith i th• Local priority with in the queues

• Global priority across the queues

52

Page 56: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Issue UnitIssue UnitCDB

• CDB availability constraint

• Pipelined functional unitvsvs.

Multi-cycle functional unit

• Conflict resolutionRound robin priority adequate? well

53

Round-robin priority adequate?, well, …

Page 57: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Conditional branches• Dispatch unit stops dispatching until the

branch is resolvedbranch is resolved.

• CDB broadcasts the result of the branch

• Dispatching continues there after either at the fall-through instruction or at target instruction.

• Successful branch shall cause flushing of IFQ very much like jump

54

IFQ very much like jump.

Page 58: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Conditional branchesConditional branches• Since we stop dispatching instructions

after a branch does it mean that thisafter a branch, does it mean that this branch is the last instruction to be executed in the back-end ?executed in the back end ?

• Is it possible that the back-end holdsIs it possible that the back end holds simultaneously (a) some instructions dispatched before the branch and (b) p ( )some instructions issued after the branch was resolved?

55

Page 59: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Tomasulo Loop ExampleLoop: LW $2, 40($1);

MULT $4 $2, $3;SW $4 40($1)SW $4, 40($1);ADDI $1, $1, -4;BNE $1 $0 Loop;BNE $1, $0, Loop;

• Assume Multiply takes 4 clocksp y• Assume first load takes 8 clocks (cache

miss), second load takes 1 clock (hit)

Based on Prof. Annavaram’s lecture slide56

Based on Prof. Annavaram s lecture slide

Page 60: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

How could Tomasulo overlap it ti f l ?iterations of loops?

Loop: LW $2, 40($1);MULT $4 $2, $3;SW $4, 40($1);ADDI $1, $1, -4;ADDI $1, $1, 4;BNE $1, $0, Loop;

The destination registers bear different TAGs in different iterations These tags wereiterations. These tags were given in place of the source operands to the dependent i t ti f ll i th

57

instructions following them.

Page 61: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Say, only two iterations.L t ll th t it ti

Loop: LW $2, 40($1);

Let us unroll the two iterations.p $ , ($ );

MULT $4 $2, $3;SW $4, 40($1);

destination register$ , ($ );

ADDI $1, $1, -4;BNE $1, $0, Loop;$ , $ , p;

Loop: LW $2, 40($1);MULT $4 $2 $3; dependent sourceMULT $4 $2, $3;SW $4, 40($1);ADDI $1, $1, -4;

register(s)

58

ADDI $1, $1, 4;BNE $1, $0, Loop;

Page 62: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

Say, only two iterations.L t ll th t it ti

Loop: LW $2, 40($1);

Let us unroll the two iterations.p $ , ($ );

MULT $4 $2, $3;SW $4, 40($1);

destination register$ , ($ );

ADDI $1, $1, -4;BNE $1, $0, Loop;$ , $ , p;

Loop: LW $2, 40($1);MULT $4 $2 $3; dependent sourceMULT $4 $2, $3;SW $4, 40($1);ADDI $1, $1, -4;

register(s)

58

ADDI $1, $1, 4;BNE $1, $0, Loop;

Page 63: Out of Order (OoO) ExecutionDependent instructions are stalled in the ID stage IM DM 11 IF ID EX M WB. Simple 5-stage pipeline: Dependent instructions areDependent instructions are

59Because, there is no reorder buffer.

Note: Your EE560 project will use a reorder buffer and much more!