Please see “ portrait orientation ” PowerPoint file for Chapter 8

31
see “portrait orientation” PowerPoint file for Cha Figure 8.1. Basic idea of instruction pipelining.

description

Please see “ portrait orientation ” PowerPoint file for Chapter 8. Figure 8.1. Basic idea of instruction pipelining. Please see “ portrait orientation ” PowerPoint file for Chapter 8. Figure 8.2. A 4-stage pipeline. Please see “ portrait orientation ” PowerPoint file for Chapter 8. - PowerPoint PPT Presentation

Transcript of Please see “ portrait orientation ” PowerPoint file for Chapter 8

Page 1: Please see “ portrait orientation ” PowerPoint file for Chapter 8

Please see “portrait orientation” PowerPoint file for Chapter 8

Figure 8.1. Basic idea of instruction pipelining.

Page 2: Please see “ portrait orientation ” PowerPoint file for Chapter 8

Please see “portrait orientation” PowerPoint file for Chapter 8

Figure 8.2. A 4-stage pipeline.

Page 3: Please see “ portrait orientation ” PowerPoint file for Chapter 8

F1

F2

F3

I1

I2

I3

E1

E2

E3

D1

D2

D3

W1

W2

W3

Instruction

F4 D4I4

Clock cycle 1 2 3 4 5 6 7 8 9

Figure 8.3. Effect of an execution operation taking more than one clock cycle.

E4

F5I5 D5

Time

E5

W4

Page 4: Please see “ portrait orientation ” PowerPoint file for Chapter 8

Please see “portrait orientation” PowerPoint file for Chapter 8

Figure 8.4. Pipeline stall caused by a cache miss in F2.

Page 5: Please see “ portrait orientation ” PowerPoint file for Chapter 8

F1

F2

F3

I1

I2 (Load)

I3

E1

M2

D1

D2

D3

W1

W2

Instruction

F4I4

Clock cycle 1 2 3 4 5 6 7

Figure 8.5. Effect of a Load instruction on pipeline timing.

F5I5 D5

Time

E2

E3 W3

E4D4

Page 6: Please see “ portrait orientation ” PowerPoint file for Chapter 8

F1

F2

F3

I1 (Mul)

I2 (Add)

I3

D1

D3

E1

E3

E2

W3

Instruction

Figure 8.6. Pipeline stalled by data dependency between D2 and W1.

1 2 3 4 5 6 7 8 9Clock cycle

W1

D2A W2

F4 D4 E4 W4I4

D2

Time

Figure 8.6. Pipeline stalled by data dependency between D2 and W1.

Page 7: Please see “ portrait orientation ” PowerPoint file for Chapter 8

Please see “portrait orientation” PowerPoint file for Chapter 8

Figure 8.7. Operand forwarding in a pipelined processor.

Page 8: Please see “ portrait orientation ” PowerPoint file for Chapter 8

F2I2 (Branch)

I3

Ik

E2

F3

Fk Ek

Fk+1 Ek+1Ik+1

Instruction

Figure 8.8. An idle cycle caused by a branch instruction.

Execution unit idle

1 2 3 4 5Clock cycleTime

F1I1 E1

6

X

Page 9: Please see “ portrait orientation ” PowerPoint file for Chapter 8

Please see “portrait orientation” PowerPoint file for Chapter 8

Figure 8.9. Branch timing.

Page 10: Please see “ portrait orientation ” PowerPoint file for Chapter 8

F : Fetchinstruction

E : Executeinstruction

W : Writeresults

D : Dispatch/Decode

Instruction queue

Instruction fetch unit

Figure 8.10. Use of an instruction queue in the hardware organization of Figure 8.2b.

unit

Page 11: Please see “ portrait orientation ” PowerPoint file for Chapter 8

X

Figure 8.11. Branch timing in the presence of an instruction queue.Branch target address is computed in the D stage.

F1 D1 E1 E1 E1 W1

F4

W3E3

I5 (Branch)

I1

F2 D2

1 2 3 4 5 6 7 8 9Clock cycle

E2 W2

F3 D3

E4D4 W4

F5 D5

F6

Fk Dk Ek

Fk+1 Dk+1

I2

I3

I4

I6

Ik

Ik+1

Wk

Ek+1

10

1 1 1 1 2 3 2 1 1Queue length 1

Time

Page 12: Please see “ portrait orientation ” PowerPoint file for Chapter 8

Add

LOOP Shift_left R1DecrementBranch=0

R2LOOP

NEXT

(a) Original program loop

LOOP Decrement R2Branch=0

Shift_left

LOOP

R1NEXT

(b) Reordered instructions

Figure 8.12. Reordering of instructions for a delayed branch.

Add

R1,R3

R1,R3

Page 13: Please see “ portrait orientation ” PowerPoint file for Chapter 8

Please see “portrait orientation” PowerPoint file for Chapter 8

Figure 8.13. Execution timing showing the delay slot being filledduring the last two passes through the loop in Figure 8.12.

Page 14: Please see “ portrait orientation ” PowerPoint file for Chapter 8

Please see “portrait orientation” PowerPoint file for Chapter 8

Figure 8.14. Timing when a branch decision has been incorrectly predictedas not taken.

Page 15: Please see “ portrait orientation ” PowerPoint file for Chapter 8

Please see “portrait orientation” PowerPoint file for Chapter 8

Figure 8.15. State-machine representation of branch prediction algorithms.

Page 16: Please see “ portrait orientation ” PowerPoint file for Chapter 8

Please see “portrait orientation” PowerPoint file for Chapter 8

Figure 8.16. Figure 8.16. Equivalent operations using complex and simple addressing modes.

Page 17: Please see “ portrait orientation ” PowerPoint file for Chapter 8

AddCompareBranch=0

R1,R2R3,R4. . .

CompareAddBranch=0

R3,R4R1,R2. . .

(a) A program fragment

(b) Instructions reordered

Figure 8.17. Instruction reordering.

Page 18: Please see “ portrait orientation ” PowerPoint file for Chapter 8

Please see “portrait orientation” PowerPoint file for Chapter 8

Figure 8.18. Datapath modified for pipelined execution, withInterstage buffers at the input and output of the ALU.

Page 19: Please see “ portrait orientation ” PowerPoint file for Chapter 8

W : Writeresults

Dispatchunit

Instruction queue

Floating-pointunit

Integerunit

Figure 8.19. A processor with two execution units.

F : Instructionfetch unit

Page 20: Please see “ portrait orientation ” PowerPoint file for Chapter 8

I1 (Fadd) D1

D2

D3

D4

E1A E1B E1C

E2

E3 E3 E3

E4

W1

W2

W3

W4

I2 (Add)

I3 (Fsub)

I4 (Sub)

Figure 8.20. An example of instruction execution flow in the processor of Figure 8.19,assuming no hazards are encountered.

1 2 3 4 5 6Clock cycleTime

F1

F2

F3

F4

7

Page 21: Please see “ portrait orientation ” PowerPoint file for Chapter 8

Please see “portrait orientation” PowerPoint file for Chapter 8

Figure 8.21. Instruction completion in program order.

Page 22: Please see “ portrait orientation ” PowerPoint file for Chapter 8

LDX R3, 0, R6 Loadnumber ofitemsin thelist.OR R0, R0, R4 R4 to beusedasoffset in thelistOR R0, R0, R7 Clear R7 to be usedasaccumulator.

LOOPSTART LDX R3, R4, R5 Loadlist iteminto R5.ADD R5, R7, R7 Add number toaccumulator.ADD R4, 8, R4 Point to thenext entry.

SUBcc R6, 1, R6 Decrement R6 andsetconditionflags.BG xcc, LOOPSTART Loop if moreitems in the list.

NEXT ...

(a) Desired program loop

LDX R3, 0, R6OR R0, R0, R4OR R0, R0, R7

LOOPSTART LDX R3, R4, R5ADD R4, 8, R4

SUBcc R6, 1, R6BG,pt xcc, LOOPSTART Predictedtaken,Annul bit = 0ADD R5, R7, R7

NEXT ...

(b) Instructions reorganized to use the delay slot

Figure 8.22. An addition loop showing the use of the branch delay slotand branch prediction.

Page 23: Please see “ portrait orientation ” PowerPoint file for Chapter 8

Please see “portrait orientation” PowerPoint file for Chapter 8

Figure 8.23. Main building blocks of the UltraSPARC II processor.

Page 24: Please see “ portrait orientation ” PowerPoint file for Chapter 8

E C N1 N2 N3 W

F D G

Fetch Group

Decode

CheckDelay

Cache

Execute

Delay Write

E C N1 N2 N3 W

R X1 X2 X3 N3 W

R X1 X2 X3 N3 W

Two integerpipelines

Two floating-point

pipelines

Figure 8.24. Pipeline organization of the UltraSPARC II processor.

CheckExecuteExecute

RegisterExecute Write

Instruction

Buffer

Page 25: Please see “ portrait orientation ” PowerPoint file for Chapter 8

Please see “portrait orientation” PowerPoint file for Chapter 8

Figure 8.25. Example of instruction grouping.

Page 26: Please see “ portrait orientation ” PowerPoint file for Chapter 8

ADD R3, R5, R6 G E C N1 N2 N3 WLDSW R4, R7, R6 G E C N1 N2 N3 W

(a) Instructions with common destination

MOVRZ R1, R6, R7 G E C N1 N2 N3 WOR R7, R8, R9 G E C N1 N2 N3 W

(b) Delay caused by MOVR instruction

Figure 8.26 Dispatch delays due to hazards.

Page 27: Please see “ portrait orientation ” PowerPoint file for Chapter 8

Inte

ger

regi

ster

fil

e

Ann

exIEU0

IEU1

ALU

Interstage buffers

Figure 8.27. Integer execution unit.

Page 28: Please see “ portrait orientation ” PowerPoint file for Chapter 8

I1(Icc) G E CI2(BRcc) G E CI3 G E CI4 G E CI5 G EI6 G EI7 G EI8 G EI9 GI10 GI11 GI12 G

Abort

Figure 8.28. Worst-case timing for an incorrectly predicted branch.

Page 29: Please see “ portrait orientation ” PowerPoint file for Chapter 8

Integerregister file/

anne x

Figure 8.29. Load and store unit.

G E C N1

data

tags

dTLB

D-Cache

D-Cache

Compare

Load/store queue

Miss ToE-Cache

Page 30: Please see “ portrait orientation ” PowerPoint file for Chapter 8

Please see “portrait orientation” PowerPoint file for Chapter 8

Figure 8.30. Execution flow.

Page 31: Please see “ portrait orientation ” PowerPoint file for Chapter 8

Please see “portrait orientation” PowerPoint file for Chapter 8

Table 8.1. Examples of SPARC instructions.