Pipelining IV

Pipelining IV

TopicsTopics Implementing pipeline control Pipelining and performance analysis

Systems I

2

Implementing Pipeline Control

Combinational logic generates pipeline control signals Action occurs at start of following cycle

E

M

W

F

D

CCCC

rB

srcAsrcB

icode valE valM dstE dstM

Bchicode valE valA dstE dstM

icode ifun valC valA valB dstE dstM srcA srcB

valC valPicode ifun rA

predPC

d_srcB

d_srcA

e_Bch

D_icode

E_icode

M_icode

E_dstM

Pipecontrollogic

D_bubbleD_stall

E_bubble

F_stall

3

Initial Version of Pipeline Controlbool F_stall =

# Conditions for a load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB } ||# Stalling at fetch while ret passes through pipelineIRET in { D_icode, E_icode, M_icode };

bool D_stall = # Conditions for a load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB };

bool D_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Bubble for ret IRET in { D_icode, E_icode, M_icode };

bool E_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Load/use hazardE_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB};

4

Control Combinations

Special cases that can arise on same clock cycle

Combination ACombination A Not-taken branch ret instruction at branch target

Combination BCombination B Instruction that reads from memory to %esp Followed by ret instruction

LoadEUseD

M

Load/use

JXXED

M

Mispredict

JXXED

M

Mispredict

EretD

M

ret 1

retEbubbleD

M

ret 2

bubbleEbubbleD

retM

ret 3

EretD

M

ret 1

EretD

M

ret 1

retEbubbleD

M

ret 2

retEbubbleD

M

ret 2

bubbleEbubbleD

retM

ret 3

bubbleEbubbleD

retM

ret 3

Combination B

Combination A

5

Control Combination A

Should handle as mispredicted branch Stalls F pipeline register But PC selection logic will be using M_valM anyhow

JXXED

M

Mispredict

JXXED

M

Mispredict

EretD

M

ret 1

EretD

M

ret 1

EretD

M

ret 1

Combination A

ConditionCondition FF DD EE MM WW

Processing retProcessing ret stallstall bubblebubble normalnormal normalnormal normalnormal

Mispredicted BranchMispredicted Branch normalnormal bubblebubble bubblebubble normalnormal normalnormal

CombinationCombination stallstall bubblebubble bubblebubble normalnormal normalnormal

E

M

W

F

D

Instructionmemory

Instructionmemory

PCincrement

PCincrement

Registerfile

Registerfile

ALUALU

Datamemory

Datamemory

SelectPC

rB

dstE dstMSelectA

ALUA

ALUB

Mem.control

Addr

srcA srcB

read

write

ALUfun.

Fetch

Decode

Execute

Memory

Write back

icode

data out

data in

A BM

E

M_valA

W_valM

W_valE

M_valA

W_valM

d_rvalA

f_PC

PredictPC

valE valM dstE dstM

Bchicode valE valA dstE dstM

icode ifun valC valA valB dstE dstM srcA srcB

valC valPicode ifun rA

predPC

CCCC

d_srcBd_srcA

e_Bch

M_Bch

CCCC

d_srcBd_srcA

e_Bch

M_Bch

6

Control Combination B

Would attempt to bubble and stall pipeline register D Signaled by processor as pipeline error

LoadEUseD

M

Load/use

EretD

M

ret 1

EretD

M

ret 1

EretD

M

ret 1

Combination B



Load/Use HazardLoad/Use Hazard stallstall stallstall bubblebubble normalnormal normalnormal

CombinationCombination stallstall bubble + bubble + stallstall

bubblebubble normalnormal normalnormal

7

Handling Control Combination B

Load/use hazard should get priority ret instruction should be held in decode stage for additional

cycle

LoadEUseD

M

Load/use

EretD

M

ret 1

EretD

M

ret 1

EretD

M

ret 1

Combination B




CombinationCombination stallstall stallstall bubblebubble normalnormal normalnormal

8

Corrected Pipeline Control Logic

Load/use hazard should get priority ret instruction should be held in decode stage for additional

cycle




CombinationCombination stallstall stallstall bubblebubble normalnormal normalnormal

bool D_bubble =# Mispredicted branch(E_icode == IJXX && !e_Bch) ||# Stalling at fetch while ret passes through pipeline IRET in { D_icode, E_icode, M_icode } # but not condition for a load/use hazard && !(E_icode in { IMRMOVL, IPOPL }

&& E_dstM in { d_srcA, d_srcB });

9

Pipeline SummaryData HazardsData Hazards

Most handled by forwardingNo performance penalty

Load/use hazard requires one cycle stall

Control HazardsControl Hazards Cancel instructions when detect mispredicted branch

Two clock cycles wasted Stall fetch stage while ret passes through pipeline

Three clock cycles wasted

Control CombinationsControl Combinations Must analyze carefully First version had subtle bug

Only arises with unusual instruction combination

10

Performance Analysis with Pipelining

Ideal pipelined machine: CPI = 1Ideal pipelined machine: CPI = 1 One instruction completed per cycle But much faster cycle time than unpipelined machine

However - hazards are working against the idealHowever - hazards are working against the ideal Hazards resolved using forwarding are fine Stalling degrades performance and instruction comletion

rate is interrupted

CPI is measure of “architectural efficiency” of designCPI is measure of “architectural efficiency” of design

CycleSeconds

nInstructioCycles

ProgramnsInstructio

ProgramSeconds timeCPU

11

Computing CPICPICPI

Function of useful instruction and bubbles

Cb/Ci represents the pipeline penalty due to stalls

Can reformulate to account forCan reformulate to account for load penalties (lp) branch misprediction penalties (mp) return penalties (rp)

CPI Ci CbCi

1.0 CbCi

CPI 1.0 lpmp rp

12

Computing CPI - IISo how do we determine the penalties?So how do we determine the penalties?

Depends on how often each situation occurs on average How often does a load occur and how often does that load

cause a stall? How often does a branch occur and how often is it

mispredicted How often does a return occur?

We can measure theseWe can measure these simulator hardware performance counters

We can estimate through historical averagesWe can estimate through historical averages Then use to make early design tradeoffs for architecture

13

Computing CPI - III

CPI = 1 + 0.31 = 1.31 == 31% worse than idealCPI = 1 + 0.31 = 1.31 == 31% worse than idealThis gets worse when:This gets worse when:

Account for non-ideal memory access latency Deeper pipelines (where stalls per hazard increase)

CauseCause NameName InstructionInstructionFrequencyFrequency

ConditionConditionFrequencyFrequency

StallsStalls ProductProduct

Load/UseLoad/Use lplp 0.300.30 0.30.3 11 0.090.09

MispredictMispredict mpmp 0.200.20 0.40.4 22 0.160.16

ReturnReturn rprp 0.020.02 1.01.0 33 0.060.06Total penaltyTotal penalty 0.310.31

14

SummaryTodayToday

Pipeline control logic Effect on CPI and performance

Next TimeNext Time Further mitigation of branch mispredictions State machine design

Pipelining IV

Documents

Transcript of Pipelining IV