Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM...

25
Fall 2011 Prof. Hyesoon Kim

Transcript of Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM...

Page 1: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

Fall 2011

Prof. Hyesoon Kim

Page 2: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

FE ID EX MEM WB

add r1, r2, r3 add

mul

mul

mul

add

sub r4, r1, r3 sub addsub add

addsubmul r5, r2, r3 mul sub

subsub add

add

add

Add: 2 cycles

add add

add

subsub

subsubmul

L L L L L

FE_stage

Page 3: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

FE ID EX MEM WB

br0x800

br target 0x800

add r1, r2,r3 0x804

target sub r2,r3,r4 0x900

br0x804

br

br

br

0x804

0x804

0x900

PC (latch)

add

add

add

sub

0x904

1

cycle

2

3

4

5

6 add sub

FE_stage

Page 4: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

• -Eliminate branches

– Predication (more on later)

• Delayed branch slot

– SPARC, MIPS

• Dual-path execution (more on later)

• Or predict?

Page 5: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

• Branches are very frequent

– Approx. 20% of all instructions

• Can not wait until we know where it goes

– Long pipelines

• Branch outcome is known after B cycles

• No scheduling past the branch until outcome known

– Superscalars (e.g., 4-way)

• Branch every cycle or so!

• One cycle of work, then bubbles for ~B cycles?

Page 6: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

• Predict Branches

– And predict them well!

• Fetch, decode, etc. on the predicted path

– Option 1: No execution until branch is resolved

– Option 2: Execute anyway (speculation)

• Recover from mispredictions

– Restart fetch from correct path

Page 7: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

• Need to know two things

– Whether the branch is taken or not (direction)

– The target address if it is taken (target)

• Direct jumps, Function calls: unconditional

branches

– Direction known (always taken), target easy to

compute

• Conditional Branches (typically PC-relative)

– Direction difficult to predict, target easy to compute

• Indirect jumps, function returns

– Direction known (always taken), target difficult

Page 8: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

• Needed for conditional branches

– Most branches are of this type

• Many, many kinds of predictors for this

– Static: fixed rule, or compiler annotation(e.g. br.bwh (branch whether hint. IA-64))

– Dynamic: hardware prediction

• Dynamic prediction usually history-based

– Example: predict direction is the sameas the last time this branch was executed

Page 9: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

• Always predict NT

– easy to implement

– 30-40% accuracy … not so good

• Always predict T

– 60-70% accuracy

• BTFNT

– loops usually have a few iterations, so this is

like always predicting that the loop is taken

– don’t know target until decode

Page 10: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

K bits of branch

instruction address

Index

Branch history

table of 2^K entries,

1 bit per entry

Use this entry to

predict this branch:

0: predict not taken

1: predict taken

When branch direction resolved,

go back into the table and

update entry: 0 if not taken, 1 if taken

Page 11: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

0xDC08: for(i=0; i < 100000; i++)

{

0xDC44: if( ( i % 100) == 0 )

tick( );

0xDC50: if( (i & 1) == 1)

odd( );

}

T

N

Page 12: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

• Example: short loop (8 iterations)

– Taken 7 times, then not taken once

– Not-taken mispredicted (was taken previously)

Act: TTTTTTTNTTTTTTNTTTTTTTNT…

Pred: XTTTTTTTNTTTTTTNTTTTTTTN

Corr: Xooooo MMooooooMMooooooMM

Misprediction rate: 2/8 = 25%

• Execute the same loop again

– First always mispredicted (previous outcome was not taken)

– Then 6 predicted correctly

– Then last one mispredicted again

• Each fluke/anomaly in a stable pattern results in two

mispredicts per loop

Page 13: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

DC08: TTTTTTTTTTT ... TTTTTTTTTTNTTTTTTTTT …

100,000 iterations

How often is branch outcome != previous outcome?

2 / 100,000

TN

NT

DC44: TTTTT ... TNTTTTT … TNTTTTT …

2 / 100

DC50: TNTNTNTNTNTNTNTNTNTNTNTNTNTNT …

2 / 2

99.998%

Prediction

Rate98.0%

0.0%

Page 14: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

0 1

FSM for Last-time

Prediction

0 1

2 3

FSM for 2bC

(2-bit Counter)

Predict NT

Predict T

Transistion on T outcome

Transistion on NT outcome

Page 15: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

2

T

3

T

3

T

…3

N

N

1

T

0

0

T

1

T T T T…

T

1 1 1 1

T

1

T…1

0

T

1

T

2

T

3

T

3

T… 3

T

Initial Training/Warm-up1bC:

2bC:

Only 1 Mispredict per N branches now!

DC08: 99.999% DC44: 99.0%

0 1

2 3

Page 16: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

We can

live with

these

These

are good

This is bad!

Page 17: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

• 98% 99%

– Who cares?

– Actually, it’s 2% misprediction rate 1%

– That’s a halving of the number of mispredictions

• So what?

– If a pipeline can fetch 5 instructions at a cycle and the branch

resolution time is 20 cycles

– To Fetch 500 instructions

– 100 accuracy : 100 cycles

– 98 accuracy:

• 100 (correctly fetch) + 20 (misprediction)*10 = 300 cycles

– 99 accuracy

• 100 (correctly fetch) + 20 misprediction *5 = 200 cycles

Page 18: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

1 1 ….. 1 0

BHR

(branch

history

register)

00 …. 00

00 …. 01

00 …. 10

11 …. 11

0 1

2 3

index

Pattern History Table

previous one

Yeh&patt’92

Page 19: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

0 0 0 0 0 0

History length

Initialization value (0 or 1)

1 : branch is taken

0: branch is not-taken

Old history New history

New BHR = old BHR<<1 | (br_dir)

Example

BHR: 00000

Br1 : taken BHR 00001

Br 2: not-taken BHR 00010

Br 3: taken BHR 00101

Page 20: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

20

• Yeh and Patt 3-letter naming scheme

– Type of history collected

• G (global), P (per branch), S (per set)

– PHT type

• A (adaptive), S (static)

– PHT organization

• g (global), p (per branch), s (per set)

Page 21: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

GBHR

PC

PHT

GAp

BHR Table

PC

PAp

Page 22: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

• Local Behavior

– What is the predicted direction of Branch A

given the outcomes of previous instances of

Branch A?

• Global Behavior

– What is the predicted direction of Branch Z

given the outcomes of all* previous branches

A, B, …, X and Y?

* number of previous branches tracked limited by the history length

Page 23: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

• Branches are correlatedBranch X: if (cond1)

….

Branch Y: if (cond 2)

….

Branch Z : if (cond 1 and cond 2)

…….1 0

Branch

X

Branch

Y

Branch

Z

1 0 0

1 1 1

0 1 0

0 0 0

BHR

…….1 1

…….01

…….00

PHT

Page 24: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

1 1 ….. 1 0

2bc

2bc

2bc

2bc

BHR

index

0x809000

PC

XOR

McFarling’93

Predictor size: 2^(history length)*2bit

Page 25: Fall 2011 Prof. Hyesoon Kimhyesoon/fall11/lec_br1.pdf · Corr: Xooooo MMooooooMMooooooMM Misprediction rate: 2/8 = 25% • Execute the same loop again –First always mispredicted

predict_func(pc, actual_dir)

{

index = PC xor BHR

taken = 2bit_counters[index] > 2 ? 1 : 0

correctly_predictied = (actual_dir == taken) ? 1 : 0 // stats

}

updated_func(pc, actual_dir)

{

index = PC xor BHR

if (actual_dir) SAT_INC( 2bit_counter[index] )

else SAT_DEC ( 2bit_counter[index] )

BHR = BHR << 1 | actual_dir

}