Computer Architecture

97
Computer Architecture Lecture 6 Overview of Branch Prediction

description

Computer Architecture. Lecture 6 Overview of Branch Prediction. 0% 0%. matrix300. 9% 9%. 4096 entries: 2bits per entry Unlimited entries 2 bits per entry. spice. 9% 9%. fpppp. 12% 11%. gcc. 5% 5%. espresso. eqntott. 10% 10%. li. - PowerPoint PPT Presentation

Transcript of Computer Architecture

Page 1: Computer Architecture

Computer Architecture

Lecture 6

Overview of Branch Prediction

Page 2: Computer Architecture

Prediction accuracy of a 4096- entry 2-bit prediction buffer vs. infinite buffer

0% 2% 4% 6% 8% 10% 12% 14% 16% 18% Frequency of mispredictions

10%

10%

5%

5%

12%

11%

9%

9%

9%

9%

0%

0%

li

eqntott

espresso

gcc

fpppp

spice

matrix300

4096 entries:

2bits per entry

Unlimited entries

2 bits per entry

Page 3: Computer Architecture

Comparison of 2 bit predictors

Frequency of mispredictions (%)0 2 4 6 8 10 12 14 16 18

10%

10%

5%

5%

12%

11%

9%

9%

9%

9%

0%

0%

li

eqntott

espresso

gcc

fpppp

spice

matrix300

5%

5%

11%

4%

6%

5%

Local 4096 entries:

2-bits per

Unlimited entries

2-bits

1024 entries (2,2)

Page 4: Computer Architecture

Tournament Predictor

Use predictor P1

11

P1 Correct

P2 Correct

P1 Correct

P1 Correct

P1 Correct

Use predictor

P2

00

Use predictor P1

10

Use predictor P2

01

P2 Correct

P2 Correct

Page 5: Computer Architecture

Misprediction rate of three predictors

• Note that predictors of equal capacity must be compared. Sizes of each level have to be selected to optimize prediction accurate. Influencing factors: degree of interference between branches, program likely to benefit from local/global history

Total Predictor Size (KBits)

Conditional Branch Mis-prediction Rate.

0 32 64 96 128 160 192 224 256 288 320 352 384 416 448 480 512

Correlating Predictor

Local 2-bit Predictor

8%

7%

6%

5%

4%

3%

2%

1%

0%

Tournament Predictor

Page 6: Computer Architecture

Why Prediction

Prediction Reduces Branch hazards in Pipelined Processors.

Used in almost all pipelined processors

0

Mux

1

Branch prediction (T/NT)

Branch Prediction Buffer

Branch Target Address Cache

PC+4

Actual Next PC

Page 7: Computer Architecture

A Branch Target Buffer

Branch predicted taken or untaken

Number

of entries

In branch target

buffer

Predicted PC

PC of instruction to fetchLookup

No: not branch instruction; proceed normally

=

Yes: Instruction is branch, use Predicted PC

Prediction Hardware (Counter Etc)

New PC

Page 8: Computer Architecture

Handling an instruction with a branch-target

ID

Send PC to memory and branch-target buffer

Entry found in the branch-target buffer?

Send out predicted

PCIs

Instruction

a taken branch?

Taken

Branch?

Mispredicted Branch, kill fetched instruction

Enter Branch instruction address and next PC into branch target buffer

No

No

No

Yes

YesYes

Branch correctly Predicted; Continue execution with no stalls

Normal instruction execution

IF

EX

Page 9: Computer Architecture

Penalties for possible combinations of whether the branch is in the buffer

Instruction in

buffer

Prediction Actual branch

Penalty cycles

Yes Taken Taken 0

Yes Taken Not taken 2

No Taken 2

No Not taken 0

Page 10: Computer Architecture
Page 11: Computer Architecture

Static Super Scalar pipeline in operation

Fetch 64-bits/clock cycle; Int on left, FP on right– Can only issue 2nd instruction if 1st instruction issues– More ports for FP registers to do FP load & FP op in a pair

Type Pipe StagesInt. instruction IF ID EX MEM WBFP instruction IF ID EX MEM WBInt. instruction IF ID EX MEM WBFP instruction IF ID EX MEM WBInt. instruction IF ID EX MEMWBFP instruction IF ID EX MEMWB

1 cycle load delay causes delay to 3 instructions in Superscalar instruction in right half can’t use it, nor instructions in

next slot

Page 12: Computer Architecture

Wait for Operands

Check for RS

Check for RAW

Wait for Operands

EXTAC

MemAccess

CDB #1

EX

M1

M2

.

.M7

Divide

Wait for Operands

Wait for Operands

Integer

LD/ST

FP

Write Reg

ISSUE/ Rename to RS

ISSUE/ Rename to RS

Instr.

Cach

e

Wider Bus

CDB #2

Wait for Operands

A1

A2

A3

A4

Wait for OperandsWait for Operands

Wait for Operands

Wait for Operands

Read Reg

Dynamic Super Scalar pipeline in operation

Page 13: Computer Architecture

Example 1

Loop: L.D F0,0(R1) ;F0=array elementADD.D F4,F0,F2S.D F4,0(R1) ; store result ADDIU R1,R1,#-8;8 bytes (per DW)

BNE R1,R2,LOOP ;branch R1!=R2

Page 14: Computer Architecture

Dual issue, 1 Integer Unit FPMUL = 3 cc

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1)

1 DADDIU R1,R1,#-8

1 BNE R1,R2,Loop

2 L.D F0,0(R1)

2 ADD.D F4,F0,F2

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 15: Computer Architecture

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1) 2

1 DADDIU R1,R1,#-8

2

1 BNE R1,R2,Loop

2 L.D F0,0(R1)

2 ADD.D F4,F0,F2

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 16: Computer Architecture

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1) 2 3

1 DADDIU R1,R1,#-8

2

1 BNE R1,R2,Loop 3

2 L.D F0,0(R1)

2 ADD.D F4,F0,F2

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 17: Computer Architecture

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1) 2 3

1 DADDIU R1,R1,#-8

2 4

1 BNE R1,R2,Loop 3

2 L.D F0,0(R1) 4

2 ADD.D F4,F0,F2 4

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 18: Computer Architecture

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3

2 L.D F0,0(R1) 4

2 ADD.D F4,F0,F2 4

2 S.D F4,0(R1) 5

2 DADDIU R1,R1,#-8

5

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 19: Computer Architecture

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5,6 Wait for L.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4

2 ADD.D F4,F0,F2 4

2 S.D F4,0(R1) 5

2 DADDIU R1,R1,#-8 5

2 BNE R1,R2,Loop 6

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 20: Computer Architecture

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5,6,7 Wait for L.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 Wait for BNE

2 ADD.D F4,F0,F2 4

2 S.D F4,0(R1) 5

2 DADDIU R1,R1,#-8 5

2 BNE R1,R2,Loop 6

3 L.D F0,0(R1) 7

3 ADD.D F4,F0,F2 7

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 21: Computer Architecture

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 Wait for L.D

2 S.D F4,0(R1) 5 8 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 Wait for ALU

2 BNE R1,R2,Loop 6 Wait for DADDIU

3 L.D F0,0(R1) 7 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8

3 DADDIU R1,R1,#-8 8

3 BNE R1,R2,Loop

Page 22: Computer Architecture

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 Wait for L.D

2 S.D F4,0(R1) 5 8 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 Wait for ALU

2 BNE R1,R2,Loop 6 Wait for DADDIU

3 L.D F0,0(R1) 7 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 Wait for ALU

3 BNE R1,R2,Loop 9

Page 23: Computer Architecture

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10 Wait for L.D

2 S.D F4,0(R1) 5 8 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 Wait for DADDIU

3 L.D F0,0(R1) 7 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 Wait for ALU

3 BNE R1,R2,Loop 9 Wait for DADDIU

Page 24: Computer Architecture

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10,11 Wait for L.D

2 S.D F4,0(R1) 5 8 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 Wait for ALU

3 BNE R1,R2,Loop 9 Wait for DADDIU

Page 25: Computer Architecture

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10,11,12 Wait for L.D

2 S.D F4,0(R1) 5 8 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 Wait for ALU

3 BNE R1,R2,Loop 9 Wait for DADDIU

Page 26: Computer Architecture

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D

2 S.D F4,0(R1) 5 8 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 13 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 13 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 Wait for ALU

3 BNE R1,R2,Loop 9 Wait for DADDIU

Page 27: Computer Architecture

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D

2 S.D F4,0(R1) 5 8 14 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 13 14 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 13 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 14 Wait for ALU

3 BNE R1,R2,Loop 9 Wait for DADDIU

Page 28: Computer Architecture

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D

2 S.D F4,0(R1) 5 8 14 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 13 14 Wait for BNE

3 ADD.D F4,F0,F2 7 15 Wait for L.D

3 S.D F4,0(R1) 8 13 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 14 15 Wait for ALU

3 BNE R1,R2,Loop 9 Wait for DADDIU

Page 29: Computer Architecture

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D

2 S.D F4,0(R1) 5 8 14 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 13 14 Wait for BNE

3 ADD.D F4,F0,F2 7 15,16 Wait for L.D

3 S.D F4,0(R1) 8 13 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 14 15 Wait for ALU

3 BNE R1,R2,Loop 9 16 Wait for DADDIU

Page 30: Computer Architecture

Dual issue, 1 Integer Unit, FPMUL = 3 cc

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D

2 S.D F4,0(R1) 5 8 14 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 13 14 Wait for BNE

3 ADD.D F4,F0,F2 7 15-17 18 Wait for L.D

3 S.D F4,0(R1) 8 13 19 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 14 15 Wait for ALU

3 BNE R1,R2,Loop 9 16 Wait for DADDIU

Page 31: Computer Architecture
Page 32: Computer Architecture

Dual issue, 2 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1)

1 DADDIU R1,R1,#-8

1 BNE R1,R2,Loop

2 L.D F0,0(R1)

2 ADD.D F4,F0,F2

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 33: Computer Architecture

Dual issue, 2 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1) 2

1 DADDIU R1,R1,#-8

2

1 BNE R1,R2,Loop

2 L.D F0,0(R1)

2 ADD.D F4,F0,F2

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 34: Computer Architecture

Dual issue, 2 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1) 2 3

1 DADDIU R1,R1,#-8

2 3

1 BNE R1,R2,Loop 3

2 L.D F0,0(R1)

2 ADD.D F4,F0,F2

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 35: Computer Architecture

Dual issue, 2 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 Wait for LD.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3

2 L.D F0,0(R1) 4

2 ADD.D F4,F0,F2 4

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 36: Computer Architecture

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5 Wait for LD.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4

2 ADD.D F4,F0,F2 4

2 S.D F4,0(R1) 5

2 DADDIU R1,R1,#-8

5

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 37: Computer Architecture

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5,6 Wait for LD.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 Wait for BNE

2 ADD.D F4,F0,F2 4 Wait for L.D

2 S.D F4,0(R1) 5 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 Executes earlier

2 BNE R1,R2,Loop 6

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 38: Computer Architecture

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5,6,7 Wait for LD.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 Wait for BNE

2 ADD.D F4,F0,F2 4 Wait for L.D

2 S.D F4,0(R1) 5 7 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6

3 L.D F0,0(R1) 7

3 ADD.D F4,F0,F2 7

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Page 39: Computer Architecture

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 Wait for L.D

2 S.D F4,0(R1) 5 7 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8

3 L.D F0,0(R1) 7

3 ADD.D F4,F0,F2 7

3 S.D F4,0(R1) 8

3 DADDIU R1,R1,#-8

8

3 BNE R1,R2,Loop

Page 40: Computer Architecture

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 9 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 9 Wait for L.D

2 S.D F4,0(R1) 5 7 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8 Wait for ADDIU

3 L.D F0,0(R1) 7 9 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8

3 DADDIU R1,R1,#-8

8 9

3 BNE R1,R2,Loop 9

Page 41: Computer Architecture

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 9 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 9,10 Wait for L.D

2 S.D F4,0(R1) 5 7 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8 Wait for ADDIU

3 L.D F0,0(R1) 7 9 10 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 10 Wait for ADD.D

3 DADDIU R1,R1,#-8

8 9 10 Executes earlier

3 BNE R1,R2,Loop 9 Wait for ADDIU

Page 42: Computer Architecture

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 9 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 9,10,11 Wait for L.D

2 S.D F4,0(R1) 5 7 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8 Wait for ADDIU

3 L.D F0,0(R1) 7 9 10 11 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 10 Wait for ADD.D

3 DADDIU R1,R1,#-8

8 9 10 Executes earlier

3 BNE R1,R2,Loop 9 11 Wait for ADDIU

Page 43: Computer Architecture

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 9 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 9-11 12 Wait for L.D

2 S.D F4,0(R1) 5 7 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8 Wait for ADDIU

3 L.D F0,0(R1) 7 9 10 11 Wait for BNE

3 ADD.D F4,F0,F2 7 12 Wait for L.D

3 S.D F4,0(R1) 8 10 Wait for ADD.D

3 DADDIU R1,R1,#-8

8 9 10 Executes earlier

3 BNE R1,R2,Loop 9 11 Wait for ADDIU

Page 44: Computer Architecture

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 9 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 9 12 Wait for L.D

2 S.D F4,0(R1) 5 7 13 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8 Wait for ADDIU

3 L.D F0,0(R1) 7 9 10 11 Wait for BNE

3 ADD.D F4,F0,F2 7 12,13 Wait for L.D

3 S.D F4,0(R1) 8 10 Wait for ADD.D

3 DADDIU R1,R1,#-8

8 9 10 Executes earlier

3 BNE R1,R2,Loop 9 11 Wait for ADDIU

Page 45: Computer Architecture

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 9 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 9 12 Wait for L.D

2 S.D F4,0(R1) 5 7 13 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8 Wait for ADDIU

3 L.D F0,0(R1) 7 9 10 11 Wait for BNE

3 ADD.D F4,F0,F2 7 12-14 15 Wait for L.D

3 S.D F4,0(R1) 8 10 16 Wait for ADD.D

3 DADDIU R1,R1,#-8

8 9 10 Executes earlier

3 BNE R1,R2,Loop 9 11 Wait for ADDIU

Page 46: Computer Architecture

Speculative Execution

Need to overcome Branch Hazards Precise Exception

Page 47: Computer Architecture

Speculative Pipeline

ISSUE/ Rename to

RS

Check for RS

Check for RAW

CDB

A1

A2

A3

A4

Wait for Operands

FP

Write Reg

Wait for Operands

EXTAC

MemAcces

LD/ST

Wait for Operands

EXInteger

M1

M2

.

.M7

Wait for Operands

DivideWait for Operands

ROB

Read Reg

Page 48: Computer Architecture

The Hardware: Reorder Buffer

If inst write results in program order, reg/memory always get the correct values

Reorder buffer (ROB) – reorder out-of-order inst to program order at the time of writing reg/memory (commit)

If some inst goes wrong, handle it at the time of commit – just flush inst afterwards

Inst cannot write reg/memory immediately after execution, so ROB also buffer the results

No such a place in Tomasulo original

ReorderBufferDecode

FU1 FU2

RS RS

Fetch Unit

Rename

L-bufS-buf

DM

Regfile

IM

Page 49: Computer Architecture

Issue — get instruction from FP Op QueueCondition: a free RS at the required FUActions: (1) decode the instruction; (2) allocate a RS

and ROB entry; (3) do source register renaming; (4) do dest register renaming; (5) read register file; (6) dispatch the decoded and renamed instruction to the RS and ROB

Execution — operate on operands (EX)Condition: At a given FU, At lease one instruction is

readyAction: select a ready instruction and send it to the FU

Write result — finish execution (WB)Condition: At a given FU, some instruction finishes FU

executionActions: (1) FU writes to CDB, broadcast to all RSs and

to the ROB; (2) FU broadcast tag (ROB index) to all RS; (3) de-allocate the RS. Note: no register status update at this time

Speculative Tomasulo Algorithm

Page 50: Computer Architecture

Speculative Tomasulo Algorithm

Commit—update register with reorder result Condition: ROB is not empty and ROB head

inst has finished execution Actions if no mis-prediction/exception: (1)

write result to register/memory, (2) update register status, (3) de-allocate the ROB entry

Actions if with mis-prediction/exception: flush the pipeline, e.g. (1) flush IFQ; (2) clear register status; (3) flush all RS and reset FU;

(4) reset ROB

Page 51: Computer Architecture

Loop: LD R2,0(R1) DADDIUR2,R2,#1 SD R2,0(R1) ;store

result

DADDIUR1,R1,#4 ;increment pointer

BNE R2,R3,LOOP ;branch if not last element

Page 52: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 First issue

1 ADDIU R2,R2,#1 1

1 S.D R2,0(R1)

1 DADDIU R1,R1,#4

1 BNE R2,R3,Loop

2 L.D R2,0(R1)

2 ADDIU R2,R2,#1

2 S.D R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 L.D R2,0(R1)

3 ADDIU R2,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 53: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 First issue

1 ADDIU R2,R2,#1 1 Wait for LW

1 S.D R2,0(R1) 2

1 DADDIU R1,R1,#4 2

1 BNE R2,R3,Loop

2 L.D R2,0(R1)

2 ADDIU R2,R2,#1

2 S.D R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 L.D R2,0(R1)

3 ADDIU R2,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 54: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 First issue

1 ADDIU R2,R2,#1 1 Wait for LW

1 S.D R2,0(R1) 2 3 Wait for ADDIU

1 DADDIU R1,R1,#4 2 3

1 BNE R2,R3,Loop 3

2 L.D R2,0(R1)

2 ADDIU R2,R2,#1

2 S.D R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 L.D R2,0(R1)

3 ADDIU R2,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 55: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 Wait for LW

1 S.D R2,0(R1) 2 3 Wait for ADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3

2 L.D R2,0(R1) 4

2 ADDIU R2,R2,#1 4

2 S.D R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 L.D R2,0(R1)

3 ADDIU R2,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 56: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 Wait for LW

1 S.D R2,0(R1) 2 3 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 Wait for DADDIU

2 L.D F0,0(R1) 4 Wait for BNE

2 ADDIU R4,R2,#1 4 Wait for LW

2 S.D R2,0(R1) 5

2 DADDIU R1,R1,#4 5

2 BNE R2,R3,Loop

3 L.D F0,0(R1)

3 ADDIU R4,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 57: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for LW

1 S.D R2,0(R1) 2 3 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 Wait for DADDIU

2 L.D R2,0(R1) 4 Wait for BNE

2 ADDIU R2,R2,#1 4 Wait for LW

2 S.D R2,0(R1) 5 Wait for DADDIU

2 DADDIU R1,R1,#4 5 Wait for BNE

2 BNE R2,R3,Loop 6

3 L.D R2,0(R1)

3 ADDIU R2,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 58: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for LW

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 Wait for BNE

2 ADDIU R2,R2,#1 4 Wait for LW

2 S.D R2,0(R1) 5 Wait for DADDIU

2 DADDIU R1,R1,#4 5 Wait for BNE

2 BNE R2,R3,Loop 6 Wait for DADDIU

3 L.D R2,0(R1) 7

3 ADDIU R2,R2,#1 7

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 59: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 Wait for BNE

2 ADDIU R2,R2,#1 4 Wait for LW

2 S.D R2,0(R1) 5 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 Wait for BNE

2 BNE R2,R3,Loop 6 Wait for DADDIU

3 L.D R2,0(R1) 7 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 S.D R2,0(R1) 8

3 DADDIU R1,R1,#4 8

3 BNE R2,R3,Loop

Page 60: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 Wait for BNE

2 ADDIU R2,R2,#1 4 Wait for LW

2 S.D R2,0(R1) 5 9 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 Wait for DADDIU

3 L.D R2,0(R1) 7 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 S.D R2,0(R1) 8 Wait for DADDIU

3 DADDIU R1,R1,#4 8 Wait for BNE

3 BNE R2,R3,Loop 9

Page 61: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 Wait for LW

2 S.D R2,0(R1) 5 9 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 Wait for DADDIU

3 L.D R2,0(R1) 7 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 S.D R2,0(R1) 8 Wait for DADDIU

3 DADDIU R1,R1,#4 8 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Page 62: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 Wait for LW

2 S.D R2,0(R1) 5 9 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 Wait for DADDIU

3 L.D R2,0(R1) 7 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 S.D R2,0(R1) 8 Wait for DADDIU

3 DADDIU R1,R1,#4 8 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Page 63: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 S.D R2,0(R1) 5 9 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 Wait for DADDIU

3 L.D R2,0(R1) 7 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 S.D R2,0(R1) 8 Wait for DADDIU

3 DADDIU R1,R1,#4 8 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Page 64: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 S.D R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 L.D R2,0(R1) 7 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 S.D R2,0(R1) 8 Wait for DADDIU

3 DADDIU R1,R1,#4 8 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Page 65: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 S.D R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 L.D R2,0(R1) 7 14 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 S.D R2,0(R1) 8 Wait for DADDIU

3 DADDIU R1,R1,#4 8 14 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Page 66: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 S.D R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 L.D R2,0(R1) 7 14 15 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 S.D R2,0(R1) 8 15 Wait for DADDIU

3 DADDIU R1,R1,#4 8 14 15 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Page 67: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 S.D R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 L.D R2,0(R1) 7 14 15 16 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 S.D R2,0(R1) 8 15 Wait for DADDIU

3 DADDIU R1,R1,#4 8 14 15 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Page 68: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 S.D R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 L.D R2,0(R1) 7 14 15 16 Wait for BNE

3 ADDIU R2,R2,#1 7 17 Wait for LW

3 S.D R2,0(R1) 8 15 Wait for DADDIU

3 DADDIU R1,R1,#4 8 14 15 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Page 69: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 S.D R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 L.D R2,0(R1) 7 14 15 16 Wait for BNE

3 ADDIU R2,R2,#1 7 17 18 Wait for LW

3 S.D R2,0(R1) 8 15 Wait for DADDIU

3 DADDIU R1,R1,#4 8 14 15 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Page 70: Computer Architecture

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 S.D R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 L.D R2,0(R1) 7 14 15 16 Wait for BNE

3 ADDIU R2,R2,#1 7 17 18 Wait for LW

3 S.D R2,0(R1) 8 15 19 Wait for DADDIU

3 DADDIU R1,R1,#4 8 14 15 Wait for BNE

3 BNE R2,R3,Loop 9 19 Wait for DADDIU

Page 71: Computer Architecture

Speculative execution:Dual issue, 2 CDB

Page 72: Computer Architecture

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1

1 ADDIU R2,R2,#1 1

1 S.D R2,0(R1)

1 DADDIU R1,R1,#4

1 BNE R2,R3,Loop

2 L.D R2,0(R1)

2 ADDIU R2,R2,#1

2 S.D R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 L.D R2,0(R1)

3 ADDIU R2,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 73: Computer Architecture

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D F0,0(R1) 1 2

1 ADDIU R4,R2,#1 1

1 S.D R2,0(R1) 2

1 DADDIU R1,R1,#4 2

1 BNE R2,R3,Loop

2 L.D F0,0(R1)

2 ADDIU R4,R2,#1

2 S.D R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 L.D F0,0(R1)

3 ADDIU R4,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 74: Computer Architecture

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3

1 ADDIU R2,R2,#1 1

1 S.D R2,0(R1) 2 3

1 DADDIU R1,R1,#4 2 3

1 BNE R2,R3,Loop 3

2 L.D R2,0(R1)

2 ADDIU R2,R2,#1

2 S.D R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 L.D R2,0(R1)

3 ADDIU R2,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 75: Computer Architecture

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4

1 ADDIU R2,R2,#1 1

1 S.D R2,0(R1) 2 3

1 DADDIU R1,R1,#4 2 3 4

1 BNE R2,R3,Loop 3

2 L.D R2,0(R1) 4

2 ADDIU R2,R2,#1 4

2 S.D R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 L.D R2,0(R1)

3 ADDIU R2,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 76: Computer Architecture

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D F0,0(R1) 1 2 3 4 5

1 ADDIU R4,R2,#1 1 5

1 S.D R2,0(R1) 2 3

1 DADDIU R1,R1,#4 2 3 4

1 BNE R2,R3,Loop 3

2 L.D F0,0(R1) 4 5

2 ADDIU R4,R2,#1 4

2 S.D R2,0(R1) 5

2 DADDIU R1,R1,#4 5

2 BNE R2,R3,Loop

3 L.D F0,0(R1)

3 ADDIU R4,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 77: Computer Architecture

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4 5

1 ADDIU R2,R2,#1 1 5 6

1 S.D R2,0(R1) 2 3

1 DADDIU R1,R1,#4 2 3 4

1 BNE R2,R3,Loop 3

2 L.D R2,0(R1) 4 5 6

2 ADDIU R2,R2,#1 4

2 S.D R2,0(R1) 5 6

2 DADDIU R1,R1,#4 5 6

2 BNE R2,R3,Loop 6

3 L.D R2,0(R1)

3 ADDIU R2,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 78: Computer Architecture

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4 5

1 ADDIU R2,R2,#1 1 5 6 7

1 S.D R2,0(R1) 2 3 7

1 DADDIU R1,R1,#4 2 3 4

1 BNE R2,R3,Loop 3 7

2 L.D R2,0(R1) 4 5 6 7

2 ADDIU R2,R2,#1 4

2 S.D R2,0(R1) 5 6

2 DADDIU R1,R1,#4 5 6 7

2 BNE R2,R3,Loop 6

3 L.D R2,0(R1) 7

3 ADDIU R2,R2,#1 7

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Page 79: Computer Architecture

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4 5

1 ADDIU R2,R2,#1 1 5 6 7

1 S.D R2,0(R1) 2 3 7 7

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8

2 L.D R2,0(R1) 4 5 6 7

2 ADDIU R2,R2,#1 4 8

2 S.D R2,0(R1) 5 6

2 DADDIU R1,R1,#4 5 6 7

2 BNE R2,R3,Loop 6

3 L.D R2,0(R1) 7 8

3 ADDIU R2,R2,#1 7

3 S.D R2,0(R1) 8

3 DADDIU R1,R1,#4 8

3 BNE R2,R3,Loop

Page 80: Computer Architecture

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4 5

1 ADDIU R2,R2,#1 1 5 6 7

1 S.D R2,0(R1) 2 3 7 7

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8

2 L.D R2,0(R1) 4 5 6 7 9

2 ADDIU R2,R2,#1 4 8 9

2 S.D R2,0(R1) 5 6

2 DADDIU R1,R1,#4 5 6 7

2 BNE R2,R3,Loop 6

3 L.D R2,0(R1) 7 8 9

3 ADDIU R2,R2,#1 7

3 S.D R2,0(R1) 8 9

3 DADDIU R1,R1,#4 8 9

3 BNE R2,R3,Loop 9

Page 81: Computer Architecture

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4 5

1 ADDIU R2,R2,#1 1 5 6 7

1 S.D R2,0(R1) 2 3 7 7

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8

2 L.D R2,0(R1) 4 5 6 7 9

2 ADDIU R2,R2,#1 4 8 9 10

2 S.D R2,0(R1) 5 6 10

2 DADDIU R1,R1,#4 5 6 7

2 BNE R2,R3,Loop 6 10

3 L.D R2,0(R1) 7 8 9 10

3 ADDIU R2,R2,#1 7

3 S.D R2,0(R1) 8 9

3 DADDIU R1,R1,#4 8 9 10

3 BNE R2,R3,Loop 9

Page 82: Computer Architecture

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4 5

1 ADDIU R2,R2,#1 1 5 6 7

1 S.D R2,0(R1) 2 3 7 7

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8

2 L.D R2,0(R1) 4 5 6 7 9

2 ADDIU R2,R2,#1 4 8 9 10

2 S.D R2,0(R1) 5 6 10

2 DADDIU R1,R1,#4 5 6 7 11

2 BNE R2,R3,Loop 6 10 11

3 L.D R2,0(R1) 7 8 9 10

3 ADDIU R2,R2,#1 7 11

3 S.D R2,0(R1) 8 9

3 DADDIU R1,R1,#4 8 9 10

3 BNE R2,R3,Loop 9

Page 83: Computer Architecture

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4 5

1 ADDIU R2,R2,#1 1 5 6 7

1 S.D R2,0(R1) 2 3 7 7

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8

2 L.D R2,0(R1) 4 5 6 7 9

2 ADDIU R2,R2,#1 4 8 9 10

2 S.D R2,0(R1) 5 6 10

2 DADDIU R1,R1,#4 5 6 11

2 BNE R2,R3,Loop 6 10 11

3 L.D R2,0(R1) 7 8 9 10 12

3 ADDIU R2,R2,#1 7 11 12

3 S.D R2,0(R1) 8 9

3 DADDIU R1,R1,#4 8 9 10

3 BNE R2,R3,Loop 9

Page 84: Computer Architecture

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4 5

1 ADDIU R2,R2,#1 1 5 6 7

1 S.D R2,0(R1) 2 3 7 7

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8

2 L.D R2,0(R1) 4 5 6 7 9

2 ADDIU R2,R2,#1 4 8 9 10

2 S.D R2,0(R1) 5 6 10

2 DADDIU R1,R1,#4 5 6 11

2 BNE R2,R3,Loop 6 10 11

3 L.D R2,0(R1) 7 8 9 10 12

3 ADDIU R2,R2,#1 7 11 12 13

3 S.D R2,0(R1) 8 9 13

3 DADDIU R1,R1,#4 8 9 10

3 BNE R2,R3,Loop 9 13

Page 85: Computer Architecture

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4 5

1 ADDIU R2,R2,#1 1 5 6 7

1 S.D R2,0(R1) 2 3 7 7

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8

2 L.D R2,0(R1) 4 5 6 7 9

2 ADDIU R2,R2,#1 4 8 9 10

2 S.D R2,0(R1) 5 6 10

2 DADDIU R1,R1,#4 5 6 11

2 BNE R2,R3,Loop 6 10 11

3 L.D R2,0(R1) 7 8 9 10 12

3 ADDIU R2,R2,#1 7 11 12 13

3 S.D R2,0(R1) 8 9 13

3 DADDIU R1,R1,#4 8 9 10 14

3 BNE R2,R3,Loop 9 13 14

Page 86: Computer Architecture

IDEAL/Perfect Processor

Register renaming Infinite virtual registers available

Branch prediction All conditional branches are predicted

exactly Jump prediction

All jumps are perfectly predicted Memory address alias analysis

All memory addresses are known exactly.

Page 87: Computer Architecture

ILP perfect processor for six SPEC92

Programs

Instr

ucti

on

Issu

es p

er

cycle

0

20

40

60

80

100

120

140

160

gcc espresso li fpppp doducdtomcatv

54.862.6

17.9

75.2

118.7

150.1

Page 88: Computer Architecture

Effects of reducing the size of the window

Infinite 2k 512 128 32 8 4

160

140

120

100

80

60

40

20

0

Window size

Instruction issues per cycle

Tomcatv

Doduc

Fpppp

li

Practical possibilities

Page 89: Computer Architecture

Another View of Last SlideIPC

Program

Instr

ucti

on

issu

es p

er

cycle

gcc espresso li fpppp

Infinite 2K 512 128 32

doduct

0

10

20

30

40

50

6055

63

18

75

36

41

15

61

10

1512

49

13 11

35

8 8 9

14

10

119

59

16 15

9

150

60

45

34

14

tomcatv

70

80

120

130

140

Window Size

Page 90: Computer Architecture

Effect of branch-prediction schemes(1)

Instruction issues per cycle

Perfect Tournament Standard Static None

predictor 2-bit

60

50

40

30

20

10

0

Branch-prediction scheme

fpppp

Doduc

Tomcatv

li

Practical possibilities

Page 91: Computer Architecture

Effect of branch-prediction schemes(2)

Program

Instr

ucti

on

issu

es p

er

cycle

0

10

20

30

40

50

60

gcc espresso li fpppp doducd tomcatv

35

41

16

61

5860

9

1210

48

15

67 6

46

13

45

6 6 7

45

14

45

2 2 2

29

4

19

46

Perfect Selective predictor Standard 2-bit Static None

Page 92: Computer Architecture

Branch-prediction accuracy for conditional branches in SPEC92

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Frequency of mispredictions

88% 77%

86% 82%

li

espresso

fpppp

tomcatv

86% 82%

99% 99% 100

%

98%

96%

98%

Profile-based

2-bit counter

Tournament

Page 93: Computer Architecture

Intl processor based on the p6 micro- architecture

Processor First ship date

Clock rate range

L1 cache L2 cache

Pentium Pro

1995 100-200 MHz

8KB instr. + 8KB data

256 KB-1024 KB

Pentium II 1998 233-450 MHz

16KB instr. + 16KB data

256 KB-512 KB

Pentium II Xeon

1999 400-450 MHz

16KB instr. + 16KB data

512 KB-2 MB

Celeron 1999 500-900 MHz

16KB instr. + 16KB data

128 KB

Pentium III 1999 450-1100 MHz

16KB instr. + 16KB data

256 KB–512 KB

Pentium Xeon

2000 700-900 MHz

16KB instr. + 16KB data

1 MB-2 MB

Page 94: Computer Architecture

P6 Architecture (P-II Onwards…)

Instruction name

Pipeline stages

Repeat rate

Integer ALU 1 1

Integer Load 3 1

Integer Multiply 4 1

FP Add 3 1

FP multiply 5 2

FP divide (64-bits)

32 32

Page 95: Computer Architecture

P6 processor pipeline

Instruction

Fetch

16 bytes

Per cycle

16 bytesInstruction

Decode

3 instructions

Per cycle

6 uopsRenaming

3 upos

Per cycle

Reservation station

(20)Execution unit

(5 total)

Graduation unit

(3 uops per cycle)

Reorder buffer

(40 entries)The P6 processor pipeline showing the

throughput of each stage and the total buffering provided between stages:

Page 96: Computer Architecture

Speculation factor

Percentage of instructions that do not commit in Pentium 3

Ben

chm

ark

s

0

10

20

30

40

50

60

gcc tomcatv perl compressgo li vortex apsi fpppp hydro2d

Page 97: Computer Architecture

Performance: Pentium 4 vs IIISpec

rati

o

0

200

400

600

800

1000

gcc mgridvortex applu

SPEC2000 benchmarks