Christoph Höhne Michael Kaufmann. Motivation Super Scalar Architecture Implementation Details ...

16
Super-Scalar RISC CPU Christoph Höhne Michael Kaufmann

Transcript of Christoph Höhne Michael Kaufmann. Motivation Super Scalar Architecture Implementation Details ...

Super-Scalar RISC CPU

Christoph HöhneMichael Kaufmann

Agenda

Motivation Super Scalar Architecture Implementation Details Annoying Hazards Comparisons Conclusion

Motivation

Take advantage of possible parallelism on instruction level

Better utilization of components Less hardware overhead than

multiple scalar CPU-cores

Original FaPra Architecture

IF ID EX MEM WB

Cache

Regfile

ALU

Memory

Super Scalar Architecture

IF ID EX WB

Cache

Regfile

ALU

MEM

ALUMUL

BR

RS

Details: Register File

Block-RAM(32x32 Bit)

DI1

DI2

DO1.adr

DI1.adr

DO1

DO2

Block-RAM(32x32 Bit)

DO3

DO4

CLK 2x

DO2.adr

DO3.adr

DO4.adr

DI2.adr

Optimal Operation

IF ID EX WB

Cache

Regfile

ALU

MEM

ALUMUL

BR

RS

Add $1, $2, $3

Mul $4, $5, $6

$2, $3

$8, $9

$1

$4

Sub $6, $2, $3

Sub $7, $8, $9

$1

$7

Annoying Structural Hazards (1)

ALU

MEM

MULALU

BR

ADD

MUL

ADD

STST

JMP

MUL

CALL

MUL

MUL

JZ

JMP

ST

LD

ADD

BR

ST

ADD

Annoying Structural Hazards (2)

ALU MUL BR LS

ALU

MUL ✗ ✗

BR ✗ ✗

LS ✗ ✗

structural_hazards <=active_high((std_match(instr_even.instr.ir(Ropcode),MUL) and std_match(instr_odd.instr.ir(Ropcode),MUL))

or (std_match(instr_even.instr.ir(Ropcode),MUL) and std_match(instr_odd.instr.ir(Ropcode),BRNOP))

or (std_match(instr_even.instr.ir(Ropcode),BRNOP ) and std_match(instr_odd.instr.ir(Ropcode),MUL))

or (std_match(instr_even.instr.ir(Ropcode),BRNOP ) and std_match(instr_odd.instr.ir(Ropcode),BRNOP ))

or (std_match(instr_even.instr.ir(Ropcode),BRNOP ) and std_match(instr_odd.instr.ir(Ropcode),LS ))

or (std_match(instr_even.instr.ir(Ropcode),LS ) and std_match(instr_odd.instr.ir(Ropcode),LS )));

odd

even

Annoying Structural Hazards (3)

IF ID EX WB

Cache

Regfile

ALU

MEM

ALUMUL

BR

RS

Mul $1, $2, $3

Mul $4, $5, $6

$2, $3

$5, $6

$1

$4

Annoying *AW Hazards (1)

Read-After-Write Hazard (RAW) Instruction requires operand which has

not yet been written. Solution: Forwarding

Write-After-Write Hazard (WAW) Same target register for concurrent

issued instructions. Solution: Stalling

Annoying *AW Hazards (2)

IF ID EX WB

Cache

Regfile

ALU

MEM

ALUMUL

BR

RS

Mul $1, $2, $3

Add$1, $1, $4

$2, $3

$1, $4

$1

$1

Area Statistics

#4-Input LUTs

#Slices #FlipFlops0

500100015002000250030003500400045005000

Area Comparison

Super ScalarOriginal FaPra

Performance Statistics

Apfelmännchen0

0.10.20.30.40.50.60.70.80.9

1

Runtime comparison

Super ScalarOriginal FaPra

Conclusion

Literature != Practice Timing is hell

MUXes need a lot of time and cannot be simply avoided

Wiring needs a lot of time DLL clock doubling Latches are evil in synthesis

Incomplete sensitivity list costs hours Optimization sometimes leads to

unexpected results

Thank you!Questions?