Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM...

Fall 2011

Prof. Hyesoon Kim

FE ID EX MEM WB

add r1, r2, r3 add

mul

mul

mul

add

sub r4, r1, r3 sub addsub add

addsubmul r5, r2, r3 mul sub

subsub add

add

add

Add: 2 cycles

add add

add

subsub

subsubmul

L L L L L

FE_stage

FE ID EX MEM WB

br0x800

br target 0x800

add r1, r2,r3 0x804

target sub r2,r3,r4 0x900

br0x804

br

br

br

0x804

0x804

0x900

PC (latch)

add

add

add

sub

0x904

1

cycle

2

3

4

5

6 add sub

FE_stage

• -Eliminate branches

– Predication (more on later)

• Delayed branch slot

– SPARC, MIPS

• Dual-path execution (more on later)

• Or predict?

• Branches are very frequent

– Approx. 20% of all instructions

• Can not wait until we know where it goes

– Long pipelines

• Branch outcome is known after B cycles

• No scheduling past the branch until outcome known

– Superscalars (e.g., 4-way)

• Branch every cycle or so!

• One cycle of work, then bubbles for ~B cycles?

• Predict Branches

– And predict them well!

• Fetch, decode, etc. on the predicted path

– Option 1: No execution until branch is resolved

– Option 2: Execute anyway (speculation)

• Recover from mispredictions

– Restart fetch from correct path

• Need to know two things

– Whether the branch is taken or not (direction)

– The target address if it is taken (target)

• Direct jumps, Function calls: unconditional

branches

– Direction known (always taken), target easy to

compute

• Conditional Branches (typically PC-relative)

– Direction difficult to predict, target easy to compute

• Indirect jumps, function returns

– Direction known (always taken), target difficult

• Needed for conditional branches

– Most branches are of this type

• Many, many kinds of predictors for this

– Static: fixed rule, or compiler annotation(e.g. br.bwh (branch whether hint. IA-64))

– Dynamic: hardware prediction

• Dynamic prediction usually history-based

– Example: predict direction is the sameas the last time this branch was executed

• Always predict NT

– easy to implement

– 30-40% accuracy … not so good

• Always predict T

– 60-70% accuracy

• BTFNT

– loops usually have a few iterations, so this is

like always predicting that the loop is taken

– don’t know target until decode

K bits of branch

instruction address

Index

Branch history

table of 2^K entries,

1 bit per entry

Use this entry to

predict this branch:

0: predict not taken

1: predict taken

When branch direction resolved,

go back into the table and

update entry: 0 if not taken, 1 if taken

0xDC08: for(i=0; i < 100000; i++)

{

0xDC44: if( ( i % 100) == 0 )

tick( );

0xDC50: if( (i & 1) == 1)

odd( );

}

T

N

• Example: short loop (8 iterations)

– Taken 7 times, then not taken once

– Not-taken mispredicted (was taken previously)

Act: TTTTTTTNTTTTTTNTTTTTTTNT…

Pred: XTTTTTTTNTTTTTTNTTTTTTTN

Corr: Xooooo MMooooooMMooooooMM

Misprediction rate: 2/8 = 25%

• Execute the same loop again

– First always mispredicted (previous outcome was not taken)

– Then 6 predicted correctly

– Then last one mispredicted again

• Each fluke/anomaly in a stable pattern results in two

mispredicts per loop

DC08: TTTTTTTTTTT ... TTTTTTTTTTNTTTTTTTTT …

100,000 iterations

How often is branch outcome != previous outcome?

2 / 100,000

TN

NT

DC44: TTTTT ... TNTTTTT … TNTTTTT …

2 / 100

DC50: TNTNTNTNTNTNTNTNTNTNTNTNTNTNT …

2 / 2

99.998%

Prediction

Rate98.0%

0.0%

0 1

FSM for Last-time

Prediction

0 1

2 3

FSM for 2bC

(2-bit Counter)

Predict NT

Predict T

Transistion on T outcome

Transistion on NT outcome

2

T

3

T

3

T

…3

N

N

1

T

0

0

T

1

T T T T…

T

1 1 1 1

T

1

T…1

0

T

1

T

2

T

3

T

3

T… 3

T

Initial Training/Warm-up1bC:

2bC:

Only 1 Mispredict per N branches now!

DC08: 99.999% DC44: 99.0%

0 1

2 3

We can

live with

these

These

are good

This is bad!

• 98% 99%

– Who cares?

– Actually, it’s 2% misprediction rate 1%

– That’s a halving of the number of mispredictions

• So what?

– If a pipeline can fetch 5 instructions at a cycle and the branch

resolution time is 20 cycles

– To Fetch 500 instructions

– 100 accuracy : 100 cycles

– 98 accuracy:

• 100 (correctly fetch) + 20 (misprediction)*10 = 300 cycles

– 99 accuracy

• 100 (correctly fetch) + 20 misprediction *5 = 200 cycles

1 1 ….. 1 0

BHR

(branch

history

register)

00 …. 00

00 …. 01

00 …. 10

11 …. 11

0 1

2 3

index

Pattern History Table

previous one

Yeh&patt’92

0 0 0 0 0 0

History length

Initialization value (0 or 1)

1 : branch is taken

0: branch is not-taken

Old history New history

New BHR = old BHR<<1 | (br_dir)

Example

BHR: 00000

Br1 : taken BHR 00001

Br 2: not-taken BHR 00010

Br 3: taken BHR 00101

20

• Yeh and Patt 3-letter naming scheme

– Type of history collected

• G (global), P (per branch), S (per set)

– PHT type

• A (adaptive), S (static)

– PHT organization

• g (global), p (per branch), s (per set)

GBHR

PC

PHT

GAp

BHR Table

PC

PAp

• Local Behavior

– What is the predicted direction of Branch A

given the outcomes of previous instances of

Branch A?

• Global Behavior

– What is the predicted direction of Branch Z

given the outcomes of all* previous branches

A, B, …, X and Y?

* number of previous branches tracked limited by the history length

• Branches are correlatedBranch X: if (cond1)

….

Branch Y: if (cond 2)

….

Branch Z : if (cond 1 and cond 2)

…….1 0

Branch

X

Branch

Y

Branch

Z

1 0 0

1 1 1

0 1 0

0 0 0

BHR

…….1 1

…….01

…….00

PHT

1 1 ….. 1 0

2bc

2bc

2bc

2bc

BHR

index

0x809000

PC

XOR

McFarling’93

Predictor size: 2^(history length)*2bit

predict_func(pc, actual_dir)

{

index = PC xor BHR

taken = 2bit_counters[index] > 2 ? 1 : 0

correctly_predictied = (actual_dir == taken) ? 1 : 0 // stats

}

updated_func(pc, actual_dir)

{

index = PC xor BHR

if (actual_dir) SAT_INC( 2bit_counter[index] )

else SAT_DEC ( 2bit_counter[index] )

BHR = BHR << 1 | actual_dir

}

Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM...

Documents

Transcript of Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM...