Christoph Höhne Michael Kaufmann. Motivation Super Scalar Architecture Implementation Details ...
-
Upload
sarah-gilmore -
Category
Documents
-
view
217 -
download
0
Transcript of Christoph Höhne Michael Kaufmann. Motivation Super Scalar Architecture Implementation Details ...
Agenda
Motivation Super Scalar Architecture Implementation Details Annoying Hazards Comparisons Conclusion
Motivation
Take advantage of possible parallelism on instruction level
Better utilization of components Less hardware overhead than
multiple scalar CPU-cores
Details: Register File
Block-RAM(32x32 Bit)
DI1
DI2
DO1.adr
DI1.adr
DO1
DO2
Block-RAM(32x32 Bit)
DO3
DO4
CLK 2x
DO2.adr
DO3.adr
DO4.adr
DI2.adr
Optimal Operation
IF ID EX WB
Cache
Regfile
ALU
MEM
ALUMUL
BR
RS
Add $1, $2, $3
Mul $4, $5, $6
$2, $3
$8, $9
$1
$4
Sub $6, $2, $3
Sub $7, $8, $9
$1
$7
Annoying Structural Hazards (1)
ALU
MEM
MULALU
BR
ADD
MUL
ADD
STST
JMP
MUL
CALL
MUL
MUL
JZ
JMP
ST
LD
ADD
BR
ST
ADD
Annoying Structural Hazards (2)
ALU MUL BR LS
ALU
MUL ✗ ✗
BR ✗ ✗
LS ✗ ✗
structural_hazards <=active_high((std_match(instr_even.instr.ir(Ropcode),MUL) and std_match(instr_odd.instr.ir(Ropcode),MUL))
or (std_match(instr_even.instr.ir(Ropcode),MUL) and std_match(instr_odd.instr.ir(Ropcode),BRNOP))
or (std_match(instr_even.instr.ir(Ropcode),BRNOP ) and std_match(instr_odd.instr.ir(Ropcode),MUL))
or (std_match(instr_even.instr.ir(Ropcode),BRNOP ) and std_match(instr_odd.instr.ir(Ropcode),BRNOP ))
or (std_match(instr_even.instr.ir(Ropcode),BRNOP ) and std_match(instr_odd.instr.ir(Ropcode),LS ))
or (std_match(instr_even.instr.ir(Ropcode),LS ) and std_match(instr_odd.instr.ir(Ropcode),LS )));
odd
even
Annoying Structural Hazards (3)
IF ID EX WB
Cache
Regfile
ALU
MEM
ALUMUL
BR
RS
Mul $1, $2, $3
Mul $4, $5, $6
$2, $3
$5, $6
$1
$4
Annoying *AW Hazards (1)
Read-After-Write Hazard (RAW) Instruction requires operand which has
not yet been written. Solution: Forwarding
Write-After-Write Hazard (WAW) Same target register for concurrent
issued instructions. Solution: Stalling
Annoying *AW Hazards (2)
IF ID EX WB
Cache
Regfile
ALU
MEM
ALUMUL
BR
RS
Mul $1, $2, $3
Add$1, $1, $4
$2, $3
$1, $4
$1
$1
Area Statistics
#4-Input LUTs
#Slices #FlipFlops0
500100015002000250030003500400045005000
Area Comparison
Super ScalarOriginal FaPra
Performance Statistics
Apfelmännchen0
0.10.20.30.40.50.60.70.80.9
1
Runtime comparison
Super ScalarOriginal FaPra
Conclusion
Literature != Practice Timing is hell
MUXes need a lot of time and cannot be simply avoided
Wiring needs a lot of time DLL clock doubling Latches are evil in synthesis
Incomplete sensitivity list costs hours Optimization sometimes leads to
unexpected results