ELEC 5200-001/6200-001 Computer Architecture and Design ...

54
Fall 2013 ELEC 5200-001/6200-001 Pipelining 1 ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2013 Pipelining (Chapter 4.5, 4.6) Vishwani D. Agrawal & Victor P. Nelson Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849

Transcript of ELEC 5200-001/6200-001 Computer Architecture and Design ...

Page 1: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 1

ELEC 5200-001/6200-001 Computer Architecture and Design

Fall 2013 Pipelining (Chapter 4.5, 4.6)

Vishwani D. Agrawal & Victor P. Nelson Department of Electrical and Computer Engineering

Auburn University, Auburn, AL 36849

Page 2: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 2

ILP: Instruction Level Parallelism

Single-cycle and multi-cycle datapaths execute one instruction at a time. How can we get better performance? Answer: Execute multiple instructions at a time:

Pipelining – Enhance a multi-cycle datapath to fetch one instruction every cycle. Parallelism – Fetch multiple instructions every cycle.

Page 3: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 3

Automobile Team Assembly

1 car assembled every four hours 6 cars per day 180 cars per month 2,040 cars per year

1 hour

1 hour

1 hour 1 hour

Page 4: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 4

Automobile Assembly Line Task 1 1 hour

Task 2 1 hour

Task 3 1 hour

Task 4 1 hour

First car assembled in 4 hours (pipeline latency) thereafter, 1 car completed per hour 21 cars on first day, thereafter 24 cars per day 717 cars per month 8,637 cars per year What gives 4X increase?

Mecahnical Electrical Painting Testing

Page 5: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 5

Throughput: Team Assembly

Mechanical Electrical Painting Testing Mechanical Electrical Painting Testing

Time of assembling one car = n hours where n is the number of nearly equal subtasks, each requiring 1 unit of time Throughput = 1/n cars per unit time

Red car completed

Red car started

Time Blue car started

Blue car completed

Page 6: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 6

Throughput: Assembly Line Mechanical Electrical Painting Testing

Mechanical Electrical Painting Testing

Mechanical Electrical Painting Testing

Mechanical Electrical Painting Testing

Car 1 Car 2 Car 3 Car 4 . .

Car 1 complete

Car 2 complete

Time to complete first car = n time units (latency) Cars completed in time T = T – n + 1 Throughput = 1 – (n – 1)/ T cars per unit time Throughput (assembly line) 1 – (n – 1)/ T n(n – 1) ─────────────────── = ──────── = n – ───── → n Throughput (team assembly) 1/n T as T→∞

time

Page 7: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 7

Some Features of Assembly Line

Task 1 1 hour

Task 2 1 hour

Task 3 1 hour

Task 4 1 hour

Mechanical Electrical Painting Testing

Electrical parts delivered (JIT)

Defect found

Stall assembly line to fix the cause of defect

3 cars in the assembly line are suspects, to be removed (flush pipeline)

Page 8: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Pros and Cons Advantages:

Efficient use of labor. Specialists can do better job. Just in time (JIT) methodology eliminates warehouse cost.

Disadvantages: Penalty of defect latency. Lack of flexibility in production. Assembly line work is monotonous and boring. http://www.youtube.com/watch?v=c8LxscnmdNY&feature=related http://www.metacafe.com/watch/752497/chaplin_with_a_spanner_set_modern_times/ http://www.metacafe.com/watch/762944/crazy_chaplin_screwing_up_everything_modern_times

Fall 2013 ELEC 5200-001/6200-001 Pipelining 8

Page 9: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 9

Pipelining in a Computer Divide datapath into nearly equal tasks, to be performed serially and requiring non-overlapping resources. Insert registers at task boundaries in the datapath; registers pass the output data from one task as input data to the next task. Synchronize tasks with a clock having a cycle time that just exceeds the time required by the longest task. Break each instruction down into the set of tasks so that instructions can be executed in a staggered fashion.

Page 10: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 10

Pipelining a Single-Cycle Datapath Instruction

class Instr. fetch (IF)

Instr. Decode

(also reg. file read)

(ID)

Execution (ALU

Operation) (EX)

Data access (MEM)

Write Back (Reg.

file write) (WB)

Total time

lw 2ns 1ns 2ns 2ns 1ns 8ns

sw 2ns 1ns 2ns 2ns 8ns R-format

add, sub, and, or, slt 2ns 1ns 2ns 1ns 8ns

B-format, beq 2ns 1ns 2ns 8ns

No operation on data; idle times equalize instruction lengths.

Page 11: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 11

Execution Time: Single-Cycle

IF ID EX MEM WB IF ID EX MEM WB

IF ID EX MEM WB

0 2 4 6 8 10 12 14 16 . . Time (ns) lw $1, 100($0)

lw $2, 200($0) lw $3, 300($0)

Clock cycle time = 8 ns Total time for executing three lw instructions = 24 ns

Page 12: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 12

Pipelined Datapath Instruction

class Instr. fetch (IF)

Instr. Decode

(also reg. file read)

(ID)

Execu-tion

(ALU Opera-

tion) (EX)

Data access (MEM)

Write Back (Reg.

file write) (WB)

Total time

lw 2ns 1ns 2ns

2ns 2ns 1ns 2ns

10ns

sw 2ns 1ns 2ns

2ns 2ns 1ns 2ns

10ns

R-format: add, sub, and, or, slt 2ns

1ns 2ns

2ns 2ns 1ns 2ns

10ns

B-format: beq

2ns 1ns 2ns

2ns 2ns 1ns 2ns

10ns

No operation on data; idle time inserted to equalize instruction lengths.

Page 13: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 13

Execution Time: Pipeline

IF ID EX MEM RW

IF ID EX MEM RW

IF ID EX MEM RW

0 2 4 6 8 10 12 14 16 . . Time (ns) lw $1, 100($0)

lw $2, 200($0) lw $3, 300($0)

Clock cycle time = 2 ns, four times faster than single-cycle clock Total time for executing three lw instructions = 14 ns Single-cycle time 24 Performance ratio = ──────────── = ── = 1.7 Pipeline time 14

Page 14: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 14

Pipeline Performance Clock cycle time = 2 ns 1,003 lw instructions: Total time for executing 1,003 lw instructions = 2,014 ns Single-cycle time 8,024 Performance ratio = ──────────── = ──── = 3.98 Pipeline time 2,014 10,003 lw instructions: Performance ratio = 80,024 / 20,014 = 3.998 → Clock cycle ratio (4)

Pipeline performance approaches clock-cycle ratio for long programs.

Page 15: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 15

Instr. mem. PC

Add

Reg

. File

Data mem. 1

mux

0

1 m

ux 0

0 m

ux 1

4

1 m

ux 0

Sign ext.

Shift left 2

ALU Cont.

CO

NTR

OL

opcode

MemWrite MemRead

ALU

Branch

zero

0-15

0-5

11-15

16-20

21-25

26-31

ALU

Single-Cycle Datapath

MemtoReg

ALUOp

ALUSrc

RegDst

RegWrite

IF: Instr. fetch ID: Instr. decode, reg. file read

EX: Execute, address calc.

MEM: mem. access

WB: write back

Page 16: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 16

Pipelining of RISC Instructions (From Lecture 3, Slide 6)

Fetch Instruction

Examine Opcode

ALU Operation

Memory Read/Write

Store Result

Although an instruction takes five clock cycles, one instruction is completed every cycle.

IF ID EX MEM WB Instruction Decode Execute Memory Write Fetch instruction and Operation Back Fetch operands to Reg file

Page 17: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 17

Instr. mem. PC

Add

Reg

. File

Data mem. 1

mux

0

1 m

ux 0

0 m

ux 1

4

1 m

ux 0

Sign ext.

Shift left 2

ALU Cont.

CO

NTR

OL

opcode

MemWrite MemRead

ALU

Branch

zero

0-15

0-5

11-15

16-20

21-25

26-31

ALU

Pipeline Registers

MemtoReg

ALUOp

ALUSrc

RegDst

RegWrite

IF/ID ID/EX EX/MEM

MEM/WB

This requires a CONTROL not too different from single-cycle

Page 18: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 18

Pipeline Register Functions Four pipeline registers are added:

Register name Data held

IF/ID PC+4, Instruction word (IW)

ID/EX PC+4, R1, R2, IW(0-15) sign ext., IW(11-15)

EX/MEM PC+4, zero, ALUResult, R2, IW(11-15) or IW(16-20)

MEM/WB M[ALUResult], ALUResult, IW(11-15) or IW(16-20)

Page 19: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 19

Instr

mem PC

Add

Reg

. File

Data mem.

1

mux

0

1 m

ux 0

0 m

ux 1

4

Sign ext.

Shift left 2

opcode

ALU

zero

0-15

16-20

21-25

26-31

ALU

Pipelined Datapath IF/ID ID/EX EX/MEM MEM/WB

11-15 for R-type 16-20 for I-type lw

Page 20: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 20

Five-Cycle Pipeline CC1 CC2 CC3 CC4 CC5

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

PC

Page 21: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 21

Add Instruction add $t0, $s1, $s2 Machine instruction word 000000 10001 10010 01000 00000 100000 opcode $s1 $s2 $t0 function

CC1 CC2 CC3 CC4 CC5

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IF ID EX MEM WB read $s1 add write $t0 read $s2 $s1+$s2

PC

Page 22: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 22

PC

Reg

. File

1 m

ux 0

1 m

ux 0

0 m

ux 1

4

Sign ext.

Shift left 2

opcode

ALU

zero

0-15

16-20

21-25

26-31

ALU

Pipelined Datapath Executing add IF/ID ID/EX EX/MEM MEM/WB

s1

$s1

11-15 for R-type 16-20 for I-type lw

t0

Instr

mem s2 $s2

Add

addr

Data mem data

Page 23: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 23

Load Instruction

lw $t0, 1200 ($t1) 100011 01001 01000 0000 0100 1000 0000 opcode $t1 $t0 1200

CC1 CC2 CC3 CC4 CC5

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IF ID EX MEM WB read $t1 add read write $t0 sign ext $t1+1200 M[addr] 1200

PC

Page 24: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 24

PC

Add

Reg

. File

addr

Data mem data

1 m

ux 0

1 m

ux 0

0 m

ux 1

4

Sign ext.

Shift left 2

opcode

ALU

zero

0-15

16-20

21-25

26-31

ALU

Pipelined Datapath Executing lw IF/ID ID/EX EX/MEM MEM/WB

t1

1200

$t1

11-15 for R-type 16-20 for I-type lw

t0

Instr

mem

Page 25: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 25

Store Instruction sw $t0, 1200 ($t1)

101011 01001 01000 0000 0100 1000 0000 opcode $t1 $t0 1200

CC1 CC2 CC3 CC4 CC5

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IF ID EX MEM WB read $t1 add write sign ext $t1+1200 M[addr] 1200 (addr) ← $t0

PC

Page 26: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 26

PC

Reg

. File

addr

Data mem data

1 m

ux 0

1 m

ux 0

0 m

ux 1

4

Sign ext.

Shift left 2

opcode

ALU

zero

0-15

16-20

21-25

26-31

ALU

Pipelined Datapath Executing sw IF/ID ID/EX EX/MEM MEM/WB

t1

1200

$t1

11-15 for R-type 16-20 for I-type lw

Instr

mem t0 $t0

Add

Page 27: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 27

Executing a Program

Consider a five-instruction segment: lw $10, 20($1) sub $11, $2, $3 add $12, $3, $4 lw $13, 24($1) add $14, $5, $6

Page 28: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 28

Program Execution CC1 CC2 CC3 CC4 CC5

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

lw $10, 20($1)

sub $11, $2, $3

add $12, $3, $4

lw $13, 24($1)

add $14, $5, $6

time

Prog

ram

inst

ruct

ions

PC

PC

PC

PC

Page 29: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 29

Instr

mem PC

Add

Reg

. File

Data mem.

1

mux

0

1 m

ux 0

0 m

ux 1

4

Sign ext.

Shift left 2

opcode

ALU

zero

0-15

16-20

21-25

26-31

ALU

CC5

IF/ID ID/EX EX/MEM MEM/WB

11-15 for R-type 16-20 for I-type lw

IF: add $14, $5, $6 ID: lw $13, 24($1) EX: add $12, $3, $4 MEM: sub $11, $2, $3

WB: lw $10, 20($1)

Page 30: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 30

Advantages of Pipeline After the fifth cycle (CC5), one instruction is completed each cycle; CPI ≈ 1, neglecting the initial pipeline latency of 5 cycles. – Pipeline latency is defined as the number of stages in

the pipeline, or – The number of clock cycles after which the first

instruction is completed. The clock cycle time is about four times shorter than that of single-cycle datapath and about the same as that of multicycle datapath. For multicycle datapath, CPI = 3. …. So, pipelined execution is faster, but . . .

Page 31: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 31

Science is always wrong. It never solves a problem without creating ten more. George Bernard Shaw

Page 32: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 32

Pipeline Hazards Definition: Hazard in a pipeline is a situation in which the next instruction cannot complete execution one clock cycle after completion of the present instruction. Three types of hazards: – Structural hazard (resource conflict) – Data hazard – Control hazard

Page 33: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 33

Structural Hazard Two instructions cannot execute due to a resource conflict. Example: Consider a computer with a common data and instruction memory. The fourth cycle of a lw instruction requires memory access (memory read) and at the same time the first cycle of the fourth instruction requires instruction fetch (memory read). This will cause a memory resource conflict.

Page 34: ELEC 5200-001/6200-001 Computer Architecture and Design ...

PC

PC

Fall 2013 ELEC 5200-001/6200-001 Pipelining 34

Example of Structural Hazard CC1 CC2 CC3 CC4 CC5

IM/D

M

ID, R

EG.

FILE

R

EAD

AL

U

IM/D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IM

/DM

ID

, REG

. FI

LE

REA

D

AL

U

IM/D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IM

/DM

ID

, REG

. FI

LE

REA

D

AL

U

IM/D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IM

/DM

ID

, REG

. FI

LE

REA

D

AL

U

IM/D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

lw $10, 20($1)

sub $11, $2, $3

add $12, $3, $4

lw $13, 24($1)

time

Prog

ram

inst

ruct

ions

Common data and instr. Mem.

Nedded by two instructions

PC

Page 35: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 35

Possible Remedies for Structural Hazards

Provide duplicate hardware resources in datapath. Control unit or compiler can insert delays (no-op cycles) between instructions. This is known as pipeline stall or bubble.

Page 36: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 36

Stall (Bubble) for Structural Hazard CC1 CC2 CC3 CC4 CC5

IM

/DM

ID

, REG

. FI

LE

REA

D

AL

U

IM/D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IM

/DM

ID

, REG

. FI

LE

REA

D

AL

U

IM/D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IM/D

M

ID, R

EG.

FILE

R

EAD

AL

U

IM/D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IM

/DM

ID

, REG

. FI

LE

REA

D

AL

U

IM/D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

lw $10, 20($1)

sub $11, $2, $3

add $12, $3, $4

lw $13, 24($1)

time

Prog

ram

inst

ruct

ions

Stall (bubble)

PC

PC

PC

Page 37: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 37

Data Hazard

Data hazard means that an instruction cannot be completed because the needed data, being generated by another instruction in the pipeline, is not available. Example: consider two instructions:

add $s0, $t0, $t1 sub $t2, $s0, $t3 # needs $s0

Page 38: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 38

Example of Data Hazard CC1 CC2 CC3 CC4 CC5

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

add $s0, $t0, $t1

sub $t2, $s0, $t3

time

Prog

ram

inst

ruct

ions

Write s0 in CC5

Read s0 and t3 in CC3

We need to read s0 from reg file in cycle 3 But s0 will not be written in reg file until cycle 5 However, s0 will only be used in cycle 4 And it is available at the end of cycle 3

PC

PC

Page 39: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 39

Forwarding or Bypassing

Output of a resource used by an instruction is forwarded to the input of some resource being used by another instruction. Forwarding can eliminate some, but not all, data hazards.

Page 40: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 40

Forwarding for Data Hazard CC1 CC2 CC3 CC4 CC5

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

add $s0, $t0, $t1

sub $t2, $s0, $t3

time

Prog

ram

inst

ruct

ions

Write s0 in CC5

Read s0 and t3 in CC3

PC

PC

Page 41: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 41

Forwarding Unit Hardware

Forwarding Unit

Data Mem. A

LU

FORW

. MU

X

FORW

. MU

X

ID/EX EX/MEM MEM/WB

MU

X

Destination registers Source reg. IDs from opcode

Data to reg.

file

Page 42: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 42

Forwarding Alone May Not Work CC1 CC2 CC3 CC4 CC5

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

lw $s0, 20($s1)

sub $t2, $s0, $t3

time

Prog

ram

inst

ruct

ions

Write s0 in CC5

Read s0 and t3 in CC3

data needed by sub (data hazard)

data available from memory only at the end of cycle 4

PC

PC

Page 43: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 43

Use Bubble and Forwarding CC1 CC2 CC3 CC4 CC5

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

lw $s0, 20($s1)

sub $t2, $s0, $t3

time

Prog

ram

inst

ruct

ions

Write s0 in CC5

stall (bubble)

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

PC

PC

Page 44: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 44

Hazard Detection Unit Hardware

Forwarding Unit

Data Mem. A

LU

FORW

. MU

X

FORW

. MU

X

ID/EX EX/MEM MEM/WB

Control signals

register IDs from

prev. instr.

to reg. file

Hazard Detection

Unit

PC

Control N

OP

MU

X 0

IF/I

D

Inst

ruct

ion

Disable write

Page 45: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 45

Resolving Hazards

Hazards are resolved by Hazard detection and forwarding units. Compiler’s understanding of how these units work can improve performance.

Page 46: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 46

C code: A = B + E; C = B + F; MIPS code: lw $t1, 0($t0) . $t1 written lw $t2, 4($t0) . . $t2 written add $t3, $t1, $t2 . . . $t1, $t2 needed sw $t3, 12($t0) . . . . lw $t4, 8($t0) . . . . . $t4 written add $t5, $t1, $t4 . . . . . $t4 needed sw $t5, 16($t0) . . . . . . . . . . . . . . .

Avoiding Stall by Code Reorder

Page 47: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 47

Reordered Code

C code: A = B + E; C = B + F; MIPS code: lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 no hazard sw $t3, 12($t0) add $t5, $t1, $t4 no hazard sw $t5, 16($t0)

Page 48: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 48

Control Hazard Instruction to be fetched is not known! Example: Instruction being executed is branch-type, which will determine the next instruction: add $4, $5, $6 beq $1, $2, 40 next instruction . . . 40 and $7, $8, $9

Page 49: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 49

Stall on Branch CC1 CC2 CC3 CC4 CC5

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

add $4, $5, $6

beq $1, $2, 40

next instruction or and $7, $8, $9

time

Prog

ram

inst

ruct

ions

Stall (bubble)

PC

PC

PC

Page 50: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 50

Why Only One Stall?

Extra hardware in ID phase: Additional ALU to compute branch address Comparator to generate zero signal Hazard detection unit writes the branch address in PC

Page 51: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 51

Ways to Handle Branch Stall or bubble Branch prediction: – Heuristics

Next instruction Prediction based on statistics (dynamic) Hardware decision (dynamic)

– Prediction error: pipeline flush Delayed branch

Page 52: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 52

Delayed Branch Example

Stall on branch add $4, $5, $6 beq $1, $2, skip next instruction . . . skip or $7, $8, $9

Delayed branch beq $1, $2, skip add $4, $5, $6 next instruction . . . skip or $7, $8, $9

Instruction executed irrespective of branch decision

Page 53: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 53

Delayed Branch CC1 CC2 CC3 CC4 CC5

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

IM

ID

, REG

. FI

LE

REA

D

AL

U

D

M

R

EG.

FILE

W

RIT

E

IF/ID

ID/E

X

MEM

/WB

EX/

MEM

add $4, $5, $6

beq $1, $2, skip

next instruction or skip or $7, $8, $9

time

Prog

ram

inst

ruct

ions

PC

PC

PC

Page 54: ELEC 5200-001/6200-001 Computer Architecture and Design ...

Fall 2013 ELEC 5200-001/6200-001 Pipelining 54

Summary: Hazards Structural hazards – Cause: resource conflict – Remedies: (i) hardware resources, (ii) stall (bubble)

Data hazards – Cause: data unavailablity – Remedies: (i) forwarding, (ii) stall (bubble), (iii) code

reordering Control hazards – Cause: out-of-sequence execution (branch or jump) – Remedies: (i) stall (bubble), (ii) branch prediction/pipeline

flush, (iii) delayed branch/pipeline flush