Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic,...

31
Chapter 14 Superscalar Processors
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    226
  • download

    5

Transcript of Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic,...

Page 1: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Chapter 14

Superscalar Processors

Page 2: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

What is Superscalar?

• “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

• Equally applicable to RISC & CISC, but more straightforward in RISC machines.

• The order of execution is usually determined by the compiler.

A Superscalar machine executes multiple independent instructions in parallel.

Page 3: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Example Superscalar Organization

• 2 Integer ALU pipelines,

• 2 FP ALU pipelines,

• 1 memory pipeline (?)

Page 4: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Superpipelined Machines

Superpiplined machines overlap pipe stages

- rely on stages being able to begin operations

before the last is complete.

Superscaler machines have multiple instruction pipelines

- process multiple instructions in parallel

Page 5: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Superscalar v Superpipeline

Page 6: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Limitations of Superscalar

Dependent upon:• Instruction level parallelism• Compiler based optimization• Hardware support

• Limited by— True Data dependency— Procedural dependency— Resource conflicts— Output dependency or Antidependency (another form of data

dependency)

Page 7: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

True Data Dependency

ADD r1, r2 (r1+r2 r1) MOVE r3, r1 (r1 r3)

• Can fetch and decode second instruction in parallel with first

• Can NOT execute second instruction until first is finished

Compare with the following?

LOAD r1, X (x r1)

MOVE r3, r1 (r1 r3)• What additional problem do we have here?

Page 8: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Procedural Dependency

• Can’t execute instructions after a branch in parallel with instructions before a branch, because?

Note: Also, if instruction length is not fixed, instructions have to be decoded to find out how many fetches are needed

Page 9: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Resource Conflict

• Two or more instructions requiring access to the same resource at the same time—e.g. two arithmetic instructions

• Solution - Can possibly duplicate resources—e.g. have two arithmetic units

Page 10: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Antidependancy

ADD R4, R3, 1 R3 + 1 R4

ADD R3, R5, 1 R5 + 1 R3

• Cannot complete the second instruction before the first has read R3

Why?

Page 11: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

True data dependency & Antidependency

• True data dependency: result of 1st instr used in 2nd instr (can’t complete 1st too soon)

• Antidenpendency: out of order completion of 2nd instr can write over value to be used in 1st instr (must complete 1st before 2nd changes operand value)

Page 12: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Effect of Dependencies

Page 13: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Instruction-level Parallelism

• Consider:LOAD R1, R2

ADD R3, 1ADD R4, R2

These can be handled in parallel. Why?

• Consider:ADD R3, 1ADD R4, R3STO (R4), R0

These cannot. Why?

Page 14: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Instruction Issue Policies

• Order in which instructions are fetched

• Order in which instructions are executed

• Order in which instructions update registers and memory values

Note: there is also the issue of instruction completion policy

Page 15: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

In-Order Issue -- In-Order Completion

Issue instructions in the order they occur:

• Not very efficient

• Instructions must stall if necessary

Page 16: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

In-Order Issue -- In-Order Completion (Example)

Assume:

• I1 requires 2 cycles to execute

• I3 & I4 conflict for the same functional unit

• I5 depends upon value produced by I4

• I5 & I6 conflict for a functional unit

Page 17: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

In-Order Issue -- Out-of-Order Completion(Example)

How does this effect interrupts?

Again:• I1 requires 2 cycles to execute• I3 & I4 conflict for the same functional unit• I5 depends upon value produced by I4• I5 & I6 conflict for a functional unit

Page 18: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Out-of-Order Issue -- Out-of-Order Completion

• Decouple decode pipeline from execution pipeline

• Can continue to fetch and decode until this pipeline is full

• When a functional unit becomes available an instruction can be executed

• Since instructions have been decoded, processor can look ahead

Page 19: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Out-of-Order Issue -- Out-of-Order Completion (Example)

Note: I5 depends upon I4, but I6 does not

Again:• I1 requires 2 cycles to execute• I3 & I4 conflict for the same functional unit• I5 depends upon value produced by I4• I5 & I6 conflict for a functional unit

Page 20: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Register Renaming

• Output and antidependencies occur becauseregister contents may not reflect the correctordering from the program

Can result in a pipeline stall

• One solution: Allocate Registers dynamically (renaming registers)

Page 21: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Register Renaming example

R3b:=R3a + R5a (I1) R4b:=R3b + 1 (I2) R3c:=R5a + 1 (I3) R7b:=R3c + R4b (I4)

• Without “subscript” refers to logical register in instruction

• With subscript is hardware register allocated: R3a R3b R3c

Note: R3c avoids: antidependency on I2 output dependency I1

Page 22: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Machine Parallelism Support

• Duplication of Resources

• Out of order issue

• Renaming

• Windowing

Page 23: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Speedups of Machine Organizations (Without Procedural Dependencies)

• Not worth duplication of functional units without register renaming• Need instruction window large enough (more than 8, probably not more than 32)

Page 24: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Branch Prediction in Superscalar Machines

• Delayed branch not used much. Why? Multiple instructions need to execute in the delay slot.

This leads to much complexity in recovery.

• Branch prediction may be used - Branch history MAY still be useful

• Are there any alternatives ?

Page 25: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Superscalar Execution

Page 26: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Committing or Retiring Instructions

Results need to be put into order (commit or retire)

• Results sometimes must be held in temporary storage until it is certain they can be placed in “permanent” storage.

(commit or retire)

• Temporary storage requires regular clean up - overhead.

Page 27: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Superscalar Hardware Support• Facilities to simultaneously fetch multiple

instructions

• Logic to determine true dependencies involving register values and Mechanisms to communicate these values

• Mechanisms to initiate multiple instructions in parallel

• Resources for parallel execution of multiple instructions

• Mechanisms for committing process state in correct order

Page 28: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Conclusions

What are the relative benefits of

• Superscalar

• Superpipelining

Page 29: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Superscalar CISC machines

• Can Superscalar design be applied to CISC machines ?

Page 30: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

javax.comm

Basically, javax.comm is no longer supported on Windows (hasn't been since 2002), so we switched to RxTx, which is nearly identical. /:

According to

http://en.wikibooks.org/wiki/Serial_Programming:Serial_Java#RxTx,

"Converting a JavaComm Application to RxTx", all that is required to convert a javacomm application to an RxTx application is simply changing the import statement:

import javax.comm.*; to import gnu.io.*;

Everything else in the program can remain exactly the same because the package gnu.io apparently encompasses the same classes as javax.comm.

Indeed, rxtx version of SimpleWrite is identical to the javacomm version of SimpleWrite except that it imports gnu.io.* rather than javax.comm.*."

Page 31: Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.

Basic Concepts of the IA-64 Architecture

Instruction level parallelism — Explicit in machine instruction rather than determined at run

time by processor

Long or very long instruction words (LIW/VLIW)— Fetch bigger chunks already “preprocessed”

Branch predication (not the same as branch prediction)— Go ahead and fetch & decode instructions, but keep track of

them so the decision to “issue” them, or not, can be practically made later

Speculative loading— Go ahead and load data so it is ready when need, and have a

practical way to recover if speculation proved wrong

Software Pipelining— Allows multiple iterations of a loop to execute in parallel