Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM...

66
Anne Bracy CS 3410 Computer Science Cornell University See P&H Chapter: 4.5-4.8 The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer.

Transcript of Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM...

Page 1: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

AnneBracyCS3410

ComputerScienceCornellUniversity

SeeP&HChapter:4.5-4.8

The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer.

Page 2: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

2

insn0.fetch, dec, exec

Single-cycle

Multi-cycle

insn1.fetch, dec, exec

insn0.decinsn0.fetchinsn1.decinsn1.fetch

insn0.execinsn1.exec

Pipelinedinsn0.decinsn0.fetch

insn1.decinsn1.fetchinsn0.exec

insn1.exec

Page 3: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

5-stagePipeline• Implementation• WorkingExample

3

Hazards• Structural• DataHazards• ControlHazards

Page 4: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

Write-BackMemory

InstructionFetch Execute

InstructionDecode

extend

registerfile

control

5

alu

memory

din dout

addrPC

memory

newpc

inst

IF/ID ID/EX EX/MEM MEM/WB

imm

BA

ctrl

ctrl

ctrl

BD D

M

computejump/branch

targets

+4

Page 5: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

6

1 2 3 4 5 6 7 8 9Cycle

Latency:Throughput:

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

Latency: 5cyclesThroughput: 1insn/cycle CPI=1

add

nand

lw

add

sw

Page 6: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

• Breakdatapath intomultiplecycles (here5)• Parallelexecutionincreasesthroughput• Balancedpipelineveryimportant

• Sloweststagedeterminesclockrate• Imbalancekillsperformance

• Addpipelineregisters(flip-flops) forisolation• Eachstagebeginsbyreadingvaluesfromlatch• Eachstageendsbywritingvaluestolatch

• Resolvehazards

7

Page 7: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

8

Stage PerformFunctionality Latchvaluesofinterest

Fetch UsePCtoindexProgramMemory,increment PC

Instructionbits (tobedecoded)PC+4(tocomputebranchtargets)

Decode Decode instruction, generatecontrolsignals,readregisterfile

Controlinformation,Rdindex,immediates, offsets,register values(Ra,Rb),PC+4(tocomputebranchtargets)

ExecutePerformALUoperationComputetargets(PC+4+offset,etc.)incasethis isabranch,decide ifbranchtaken

Controlinformation,Rdindex, etc.Result ofALUoperation,value incasethis isastoreinstruction

Memory Performload/store ifneeded,address isALUresult

Controlinformation,Rdindex,etc.Result ofload,passresult from execute

Writeback Selectvalue,writetoregisterfile

Page 8: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

9

PC

instructionmemory

inst

addr mc

00=readword

IF/ID

Restofp

ipeline

+4

PC+4

pcsel

pcregpcrel

pcabs• PC+4• pcreg (PCregisters: JR)• pcrel (PC-relative: BEQ,BNE)• pcabs (PCabsolute: JandJAL)

Page 9: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

10

ctrl

ID/EX

Restofp

ipeline

PC+4

inst

IF/ID

PC+4

Stage1:InstructionFetch

registerfile

WERd

Ra Rb

DB

A

BA

extend imm

decode

result

dest

Page 10: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

Stage2:InstructionDe

code

pcrel

pcabs

11

ctrl

EX/MEM

Restofp

ipeline

BD

ctrl

ID/EX

PC+4

BA

alu

+

branch?im

mpcsel

pcreg

target

Page 11: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

12

ctrl

MEM/WB

Restofp

ipeline

Stage3:Execute

MD

ctrl

EX/MEM

BD

memory

din doutaddr

mctarget

branch?pcsel

pcrel

pcabs

pcreg

Page 12: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

13

Stage4:M

emory

ctrl

MEM/WB

MD

result

dest

Page 13: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

IF/ID

+4

ID/EX EX/MEM MEM/WB

mem

din dout

addrinst

PC+4

BA

Rt

BD

MD

PC+4

imm

ctrl

target

OP

Rd

OP

PC

instmem

Rd

Ra Rb

DB

A

Rd

14

Page 14: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

• Instructionssamelength• 32bits,easytofetchandthendecode

• 3typesofinstructionformats• Easytoroutebitsbetweenstages• Canreadaregistersourcebeforeevenknowing

whattheinstructionis• Memoryaccessthroughlw andsw only

• AccessmemoryafterALU

15

Page 15: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

Consideranon-pipelinedprocessorwithclockperiodC(e.g.,50ns).IfyoudividetheprocessorintoNstages(e.g.,6),yournewclockperiodwillbe:

A. CB. NC. lessthanC/ND. C/NE. greaterthanC/N

16

Page 16: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

5-stagePipeline• Implementation• WorkingExample

17

Hazards• Structural• DataHazards• ControlHazards

Page 17: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

add r3 ß r1, r2 nand r6 ß r4, r5 lw r4 ß 20(r2)add r5 ß r2, r5sw r7 à 12(r3)

Assume8-registermachine

18

Page 18: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

data

dest

IF/ID ID/EX EX/MEM MEM/WB

extend

0MUX

0

Time:019

Page 19: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

nop

0

0

0

040

0

nop

0

0

nop

0

0

0

0

add312

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits26-31

data

dest

Fetch:add312

add312

IF/ID ID/EX EX/MEM MEM/WB

extend

0MUX

0

Time:120

Page 20: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

add

3

9

36

480

0

nop

0

0

nop

0

0

0

0nand645

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

12

Bits26-31

data

dest

Fetch:nand645

nand 645 add312

IF/ID ID/EX EX/MEM MEM/WB

extend

2MUX

3

Time:221

Page 21: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

nand

6

7

18

8124

45

add

3

9

nop

0

0

0

0lw420(2)

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

45

Bits26-31

data

dest

Fetch:lw420(2)

lw 420(2) nand 645 add312

36

9

3

IF/ID ID/EX EX/MEM MEM/WB

extend

5MUX

6 32

Time:322

Page 22: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

lw

20

18

9

12168

-3

nand

6

7

add

3

45

0

0add525

912187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

24

Bits26-31

data

dest

Fetch:add525

add525 lw 420(2) nand 645 add312

18

7

6

45

3

IF/ID ID/EX EX/MEM MEM/WB

extend

4MUX

0 65

Time:4

nand

18=0100107=000111-------------------3=111101

23

Page 23: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

add

5

7

9

162012

29

lw

4

18

nand

6

-3

0

0sw712(3)

945187

36

41

0

22

R2

R3

R4

R5

R1

R6

R0

R7

25

Bits26-31

data

dest

Fetch:sw712(3)

sw 712(3) add525 lw 420(2) nand 645add312

9

20

4

-3

6

45

3

IF/ID ID/EX EX/MEM MEM/WB

extend

5MUX

5 04

Time:524

Page 24: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

sw

12

22

45

2016

16

add

5

7

lw

4

29

99

0945187

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

37

Bits26-31

data

dest

Nomoreinstructions

nop sw 712(3) add525 lw 420(2) nand 645

9

7

5

29

4

-3

6

IF/ID ID/EX EX/MEM MEM/WB

extend

7MUX

0 55

Time:625

Page 25: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

20

57

sw

7

22

add

5

16

0

0945997

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits26-31

data

dest

Nomoreinstructions

nop nop sw 712(3) add525 lw 420(2)

45

7

12

16

5

99

4

IF/ID ID/EX EX/MEM MEM/WB

extend

MUX

07

Time:726

Page 26: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

sw

7

57

0

9459916

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits26-31

data

dest

Nomoreinstructions

nop nop nop sw 712(3) add525

2257

22

16

5

SlidesthankstoSallyMcKee

IF/ID ID/EX EX/MEM MEM/WB

extend

MUX

Time:827

Page 27: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

PC

Registerfile

MUXA

LU

MUX

4

Datamem

+

MUX

Bits11-15Bits16-20

9459916

36

-3

0

22

R2

R3

R4

R5

R1

R6

R0

R7

Bits21-23

data

dest

Nomoreinstructions

nop nop nop nop sw 712(3)

IF/ID ID/EX EX/MEM MEM/WB

extend

MUX

Time:928

Page 28: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

5-stagePipeline• Implementation• WorkingExample

29

Hazards• Structural• DataHazards• ControlHazards

Page 29: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

Correctnessproblemsassociatedw/processordesign

1. StructuralhazardsSameresourceneededfordifferentpurposesatthesametime(Possible:ALU, RegisterFile,Memory)

2. DatahazardsInstructionoutputneededbeforeit’savailable

3. ControlhazardsNextinstructionPCunknownattimeofFetch

30

Page 30: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

31

addr3,r1,r2nopnop

addr6,r3,r8

datamem

instmem

DB

A

IF ID Ex M WIF ID Ex M W

IF ID Ex M W

add r3, r1,r2nopaddr6,r3,r8

Problem:NeedtoreadavaluethatiscurrentlybeingwrittenSolution:negateRFclock:writefirsthalf,readsecondhalf

nop

IF ID Ex M W

Page 31: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

Dependence:relationshipbetweentwoinsns• Data:twoinsnsusesamestoragelocation• Control: 1insnaffectswhetheranotherexecutesatall• Notabadthing,programswouldbeboring otherwise• Enforcedbymakingolderinsngobeforeyoungerone

– Happensnaturallyinsingle-/multi-cycle designs– Butnotinapipeline

Hazard:dependence&possibilityofwronginsnorder• Effectsofwronginsnordercannotbeexternallyvisible• Hazardsareabadthing:mostsolutionseithercomplicatethehardwareorreduceperformance

32

Page 32: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

DataHazards• registerfilereadsoccurinstage2(ID)• registerfilewritesoccurinstage5(WB)• nextinstructionsmayreadvaluesabouttobewritten

add r3, r1, r2

sub r5, r3, r4

Isthereadependence?Isthereahazard?Howdowedetectthis?

33

Page 33: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

34

IF ID MEM

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

Clockcycle1 2 3 4 5 6 7 8 9

sub r5,r3,r4

lw r6,4(r3)

or r5,r3,r5

sw r6,12(r3)

addr3,r1,r2

time

WBX

X

X

X

X

backwardsarrowsrequiretimetravel

Page 34: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

35

IF ID MEM

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

Clockcycle1 2 3 4 5 6 7 8 9

sub r5,r3,r4

lw r6,4(r3)

or r5,r3,r5

sw r6,12(r3)

addr3,r1,r2

time

WBX

X

X

X

X

Page 35: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

36

IF ID MEM

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

IF ID MEM WB

Clockcycle1 2 3 4 5 6 7 8 9

sub r5,r3,r4

lw r6,4(r3)

or r5,r3,r5

sw r6,12(r3)

addr3,r1,r2

time

WBX

X

X

X

X

Page 36: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

IF/ID

+4

ID/EX EX/MEM MEM/WB

mem

din dout

addrinst

PC+4

OP

BA

Rt

BD

MD

PC+4

imm

OP

Rd

OP

Rd

PC

instmem

Rd

Ra Rb

DB

A

Rd

Detecting Data Hazards

IF/ID.Ra ≠0?

37

Ra==? Ra==

?

add r3, r1, r2subr5,r3,r4

Stall=(IF/ID.Ra !=0&& (IF/ID.Ra==ID/EX.Rd||IF/ID.Ra==EX/M.Rd))

Page 37: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

1. DoNothing• ChangetheISAtomatchimplementation• “Heycompiler:don’tcreatecodew/datahazards!”

(Wecandobetterthanthis)2. Stall

• Pausecurrentandsubsequentinstructionstillsafe3. Forward/bypass

• Forwarddatavaluetowhereitisneeded(Onlyworksifvalueactuallyexistsalready)

38

Page 38: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

HowtostallaninstructioninIDstage• preventIF/IDpipelineregisterupdate

– stallstheIDstageinstruction

• convertIDstageinsn intonop forlaterstages– innocuous“bubble”passesthroughpipeline

• preventPCupdate– stallsthenext(IFstage)instruction

39

Page 39: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

IF/ID

+4

ID/EX EX/MEM MEM/WB

mem

din dout

addr

PC

instmem

Rd

Ra Rb

DB

A

40

Rd

addr3,r1,r2subr5,r3,r5orr6,r3,r4addr6,r3,r8

inst

PC+4

OP

BA

Rt

BD

MD

PC+4

imm

OP

Rd

OP

Rd

Ifhazard:

WE=0MemWr=0RegWr=0

detecthazard

Page 40: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

41

datamem

B

A

B

D

M

Dinstmem

DrD B

A

Rd RdRd

WE

WE

Op

WE

Op

rA rB

PC

+4

Opnop

inst

/stall

addr3,r1,r2

(MemWr=0RegWr=0)

NOP=If(IF/ID.rA ≠0&&(IF/ID.rA==ID/Ex.RdIF/ID.rA==Ex/M.Rd))

subr5,r3,r5

orr6,r3,r4 (WE=0)

STALLCONDITIONMET

Page 41: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

42

datamem

B

A

B

D

M

Dinstmem

DrD B

A

Rd RdRd

WE

WE

Op

WE

Op

rA rB

PC

+4

Opnop

inst

/stall

nop

(MemWr=0RegWr=0)

NOP=If(IF/ID.rA ≠0&&(IF/ID.rA==ID/Ex.RdIF/ID.rA==Ex/M.Rd))

addr3,r1,r2subr5,r3,r5

(MemWr=0RegWr=0)

orr6,r3,r4 (WE=0)

STALLCONDITIONMET

Page 42: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

43

datamem

B

A

B

D

M

Dinstmem

DrD B

A

Rd RdRd

WE

WE

Op

WE

Op

rA rB

PC

+4

Opnop

inst

/stall

nop

NOP=If(IF/ID.rA ≠0&&(IF/ID.rA==ID/Ex.RdIF/ID.rA==Ex/M.Rd))

addr3,r1,r2subr5,r3,r5

(MemWr=0RegWr=0)

orr6,r3,r4 (WE=1)NOSTALLCONDITIONMET:suballowedtoleavedecode stage

nop

Page 43: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

44

Clockcycle1 2 3 4 5 6 7 8

addr3,r1,r2

subr5,r3,r5

or r6,r3,r4

addr6,r3,r8

time

Page 44: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

1. DoNothing• ChangetheISAtomatchimplementation• “Compiler:don’tcreatecodewithdatahazards!”

(Nicetry,wecandobetterthanthis)2. Stall

• Pausecurrentandsubsequentinstructionstillsafe3. Forward/bypass

• Forwarddatavaluetowhereitisneeded(Onlyworksifvalueactuallyexistsalready)

46

Page 45: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

47

datamemim

mB

A

B

D

M

Dinstmem

DB

A

Rd Rd

Rb

WE

WE

MC

Ra

MC

forwardunit

detecthazard

Twotypesofforwarding/bypass• ForwardingfromEx/Mem registerstoExstage(M→Ex)• ForwardingfromMem/WBregistertoExstage(W→ Ex)

IF/ID ID/Ex Ex/Mem Mem/WB

Page 46: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

48

addr3,r1,r2

subr5,r3,r1

datamem

instmem

DB

A

IF ID Ex M W

IF ID Ex M W

addr3,r1,r2subr5,r3,r1

Problem:EXneedsALUresultthatisinMEMstageSolution:addabypassfromEX/MEM.DtostartofEX

Ex/Mem

Page 47: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

49

datamem

instmem

DB

A

DetectionLogicinExStage:forward=(Ex/M.WE&&EX/M.Rd !=0&&

ID/Ex.Ra ==Ex/M.Rd)||(sameforRb)

addr3,r1,r2subr5,r3,r1

Ex/Mem

Page 48: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

50

addr3,r1,r2

subr5,r3,r1

orr6,r3,r4

datamem

instmem

DB

A

IF ID Ex M WIF ID

IF WEx M WID Ex M

Problem:EXneedsvaluebeingwrittenbyWBSolution:AddbypassfromWBfinalvaluetostartofEX

Mem/WB

add r3, r1,r2subr5,r3,r1orr6,r3,r4

Page 49: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

51

datamem

instmem

DB

A

DetectionLogic:forward=(M/WB.WE&&M/WB.Rd !=0&&

ID/Ex.Ra ==M/WB.Rd &&not(ID/Ex.WE &&Ex/M.Rd !=0&&

ID/Ex.Ra ==Ex/M.Rd)||(sameforRb)

Mem/WB

add r3, r1,r2subr5,r3,r1orr6,r3,r4

Page 50: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

52

Clockcycle1 2 3 4 5 6 7 8

addr3,r1,r2

sub r5,r3,r4

lwr6,4(r3)

or r5,r3,r5

sw r6,12(r3)

IF ID Ex M W

IF ID

IF W

Ex M W

ID Ex M

IF ID Ex

time

M W

IF ID Ex M W

Page 51: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

Datadependencyafteraloadinstruction:• ValuenotavailableuntilaftertheMstageàNextinstructioncannotproceedifdependent

THEKILLERHAZARD53

datamem

instmem

DB

A

lw r4,20(r8)orr6,r3,r4

Page 52: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

54

lw r4,20(r8)

or r6,r3,r4

datamem

instmem

DB

A

IF ID Ex

IF ID

lw r4,20(r8)orr6,r4,r1

Page 53: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

55

datamem

instmem

DB

A

NOPorr6,r4,r1 lw r4,20(r8)

lw r4,20(r8)

or r6,r3,r4

IF ID Ex M W

IF ID Ex M WID*Stall

Page 54: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

56

datamem

instmem

DB

A

NOPorr6,r4,r1 lw r4,20(r8)

Ex

lw r4,20(r8)

or r6,r3,r4

IF ID Ex M W

IF ID Ex M WID*Stall

Page 55: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

57

datamemim

mB

A

B

D

M

Dinstmem

DB

A

Rd Rd

Rb

WE

WE

MCRa

MC

forwardunit

detecthazard

IF/ID ID/Ex Ex/Mem Mem/WB

Stall=If(ID/Ex.MemRead &&IF/ID.Ra==ID/Ex.Rd

RdMC

Page 56: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

TwoMIPSSolutions:• MIPS2000/3000:delayslot

– ISAsaysresultsofloadsarenotavailableuntilonecyclelater

–Assemblerinsertsnop,orreorderstofilldelayslot

• MIPS4000onwards:stall–Butreally,programmer/compilerreorderstoavoidstallingintheloaddelayslot

58

Page 57: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

5-stagePipeline• Implementation• WorkingExample

59

Hazards• Structural• DataHazards• ControlHazards

Page 58: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

ControlHazards• instructionsarefetchedinstage1(IF)• branchandjumpdecisionsoccurinstage3(EX)à nextPCnotknownuntil2cycles after branch/jump

0x10: beq r1,r2,L0x14: addr3,r0,r30x18: subr5,r4,r60x1C:L: orr3,r2,r4

60

Branchnot taken?NoProblem!

Branchtaken?Justfetchedadd,sub…à Zap&Flush

Page 59: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

61

beq r1,r2,Ladd r3, r0,r3subr5,r4,r6L:orr3,r2, r4

datamem

instmem D

B

A

PC

+4

NOPIF ID Ex M W

IF ID NOP NOPNOPIF NOP NOP NOP

branchcalc

decidebranch

IF ID Ex M W

10:

14:18:

1C:

IfbranchTaken→Zap

• preventPCupdate• clearIF/IDlatch• branchcontinues

NewPC=1C

Page 60: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

62

beq r1,r2,Ladd r3, r0,r3subr5,r4,r6L:orr3,r2, r4

datamem

instmem D

B

A

PC

+4

NOPIF ID Ex M W

IF ID NOP NOPNOPIF NOP NOP NOP

branchcalc

decidebranch

IF ID Ex M W

10:

14:18:

1C:

Foreverytakenbranch?OUCH!!!NewPC=1C

Page 61: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

1. DelaySlot• YouMUSTdothis• MIPSISA:1insn afterctrlinsn always executed

• Whetherbranchtakenornot

2. ResolveBranchatDecode• SomegroupsdothisforProject2,yourchoice• Movebranchcalc fromEXtoID• Alternative:justzap2nd instructionwhenbranchtaken

3. BranchPrediction• Notin3410,buteveryprocessorworthanythingdoesthis

63

Page 62: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

64

beq r1,r2,L

add r3, r0,r3

subr5,r4,r6

L:orr3,r2, r4

datamem

instmem D

B

A

PC

+4

IF ID Ex M W

IF

IF ID Ex M W

10:

14:

18:

1C:

ID Ex M WNOPIF NOP NOP NOP

branchcalc

decidebranchNewPC=1C

Page 63: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

65

beq r1,r2,L

add r3, r0,r3

subr5,r4,r6

L:orr3,r2, r4

datamem

instmem D

B

A

PC

+4

IF ID Ex M W

IF ID Ex M W

10:

14:

18:

1C:

NewPC=1C

IF ID Ex M W

branchcalc

decidebranch

Page 64: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

MostprocessorsupportSpeculativeExecution• Guess directionofthebranch

– Allowinstructionstomovethroughpipeline– Zapthemlaterifguessturnsouttobewrong

• Amustforlongpipelines

66

Page 65: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

Datahazardsoccurwhenaoperand(register)dependsontheresultofapreviousinstructionthatmaynotbecomputedyet.Pipelinedprocessorsneedtodetectdatahazards.

Stalling,preventingadependentinstructionfromadvancing,isonewaytoresolvedatahazards.StallingintroducesNOPs(“bubbles”)intoapipeline.IntroduceNOPsby(1)preventingthePCfromupdating,(2)preventingwritestoIF/IDregistersfromchanging,and(3)preventingwritestomemoryandregisterfile.Nops significantlydecreaseperformance.

Forwardingbypassessomepipelinedstagesforwardingaresulttoadependentinstructionoperand(register).Betterperformancethanstalling.

67

Page 66: Anne Bracy CS 3410 · 2016. 3. 20. · lw4 20(2) nand 6 4 5 add 3 1 2 36 9 3 IF/ID ID/EX EX/MEM MEM/WB extend 5 M U X 6 3 2 Time: 3 22. PC e M U A X L U M U X 4 Data mem + M U X Bits

ControlhazardsoccurbecausethePCfollowingacontrolinstructionisnotknownuntilcontrolinstructionisexecuted.Ifbranchistakenà needtozapinstructions.1cycleperformancepenalty.

DelaySlotscanpotentiallyincreaseperformanceduetocontrolhazards.Theinstructioninthedelayslotwillalwaysbeexecuted.Requiressoftware(compiler)tomakeuseofdelayslot.Putnop indelayslotifnotabletoputusefulinstructionindelayslot.

WecanreducecostofacontrolhazardbymovingbranchdecisionandcalculationfromExstagetoIDstage.Withadelayslot,thisremovestheneedtoflushinstructionsontakenbranches.

68