Post on 02-Mar-2021
The Evolution of Microprocessors
Per Stenström
Processor(Core)
Memory
Processor(Core)
Processor(Core)
L2 Cache
L1 Cache L1 Cache L1 Cache
MicroprocessorChip
Time1970 20201980 1990 2000 2010
Multicycle Pipelined Superscalar Multicore …and beyond
Evolution of Microprocessors
Multicycle Computers
Fetc
h ne
xt in
stru
ctio
n
Dec
ode/
Fetc
h da
ta
Exec
ute
Stor
e re
sult
Opcode Assembly code Meaning CommentsADD,SUB, ADD R1,R2,R3 R1=(R2) + (R3) Integer
operationsADDI, SUBI ADDI R1,R2,#3 R1=(R2)+3 ImmediateAND, OR, XOR AND R1,R2,R3 R1=(R2).AND.(R3) bit-wise
logical AND,OR, XOR
SLT SLT R1,R2,R3 If (R2)<(R3) then R1=1 else R1=0
Test R2, R3 outcome in R1
ADD.D, SUB.D, MUL.D, DIV.D
ADD.D F1,F2,F3 F1=(F2)+(F3) Floating-pointoperations
Opcode Assembly code Meaning CommentsLB,LH,LW,LD LW R1,#20(R2) R1=MEM[20+(R2)] For bytes,
half-words, words anddouble words
SB,SH,SW,SD SW R1,#20(R2) MEM[20+(R2)]=R1L.S, L.D L.S F0,#20(R2) F0=MEM[20+(R2)] Single/double
float load
S.S, S.D S.S F0,#20(R2) MEM[20+(R2)]=F0 Single/double float store
Opcode Assembly code Meaning CommentsBEQZ, BNEZ BEQZ R1,Label If (R1)=0 then
PC=LabelConditionalbranch-equal 0/not equal 0
BEQ, BNE BEQ R1,R2,Label If (R1)=(R2) PC=Label
Conditional branch-equal/not equal
J J Target PC=Target Target is an immediate field
JR JR R1 PC=(R1) Target is in register
JAL JAL Target R31 = PC + 4;PC = Target
Jump to target and save return address in R31
Instructionmemory
ReadAddress
Instruction[31..0]
PC
Instructionmemory
ReadAddress
Instruction[31..0]
PC
MU
X
[25..21]
[20..16]
[15..11]
Registers
Rs addrRt addr
Rd addr
Write data
Rs data
Rt data
[15..0]
ALU
Control
Memory transfer
Instructionmemory
ReadAddress
Instruction[31..0]
PC
MU
X
[25..21]
[20..16]
[15..11]
Registers
Rs addrRt addr
Rd addr
Write data
Rs data
Rt data
[15..0]
MU
XALU
ALU
Control
Memory transfer
Instructionmemory
ReadAddress
Instruction[31..0]
PC
MU
X
[25..21]
[20..16]
[15..11]
Registers
Rs addrRt addr
Rd addr
Write data
Rs data
Rt data
[15..0]
MU
XALU
MU
X
MU
X
ALU
Control
Memory transfer
Instructionmemory
ReadAddress
Instruction[31..0]
PC
MU
X
[25..21]
[20..16]
[15..11]
Registers
Rs addrRt addr
Rd addr
Write data
Rs data
Rt dataM
UX
ALU
SignEx-tend
[15..0]
Effective addresssDisplacement + (Rs)
ALU
Control
Memory transfer
Instructionmemory
ReadAddress
Instruction[31..0]
PC
MU
X
[25..21]
[20..16]
[15..11]
Registers
Rs addrRt addr
Rd addr
Write data
Rs data
Rt dataM
UX
ALU
SignEx-tend
[15..0]
Data memory
Rd addrWr addr
Write data
Rd data
ALU
Control
Memory transfer
Instructionmemory
ReadAddress
Instruction[31..0]
PC
MU
X
[25..21]
[20..16]
[15..11]
Registers
Rs addrRt addr
Rd addr
Write data
Rs data
Rt dataM
UX
ALU
SignEx-tend
[15..0]
Data memory
Rd addrWr addr
Write data
Rd data
MU
XALU
Control
Memory transfer
Instructionmemory
ReadAddress
Instruction[31..0]
PC
MU
X
[25..21]
[20..16]
Data memory
Rd addrWr addr
Write data
Rd data
[15..11] MU
X
Registers
Rs addrRt addr
Rd addr
Write data
Rs data
Rt data
MU
X
SignEx-tend
[15..0]
ALU
ADD
+4
MU
XFetching the next instruction
ALU
Control
Memory transfer
Instructionmemory
ReadAddress
Instruction[31..0]
PC
MU
X
[25..21]
[20..16]
Data memory
Rd addrWr addr
Write data
Rd data
[15..11] MU
X
Registers
Rs addrRt addr
Rd addr
Write data
Rs data
Rt data
MU
X
SignEx-tend
[15..0]
ShiftLeft2 ADD
ALU
ADD
+4
MU
X
ALU
Control
Memory transfer
BEQ R1,R2,offset ZERO
Instructionmemory
ReadAddress
Instruction[31..0]
PC
MU
X
[25..21]
[20..16]
Data memory
Rd addrWr addr
Write data
Rd data
[15..11] MU
X
Registers
Rs addrRt addr
Rd addr
Write data
Rs data
Rt data
MU
X
SignEx-tend
[15..0]
ShiftLeft2 ADD
ALU
ADD
+4
MU
X
Instructionmemory
ReadAddress
Instruction[31..0]
PC
MU
X
[25..21]
[20..16]
Data memory
Rd addrWr addr
Write data
Rd data
[15..11] MU
X
Registers
Rs addrRt addr
Rd addr
Write data
Rs data
Rt data
MU
X
SignEx-tend
[15..0]
ShiftLeft2 ADD
ALU
ADD
+4
MU
X
RegDst
RegWrite
ALUSrcALUOp
PCSrc
MemToReg
MemWrite
MemRead
Instructionmemory
ReadAddress
Instruction[31..0]
PC
MU
X
[25..21]
[20..16]
Data memory
Rd addrWr addr
Write data
Rd data
[15..11] MU
X
Registers
Rs addrRt addr
Rd addr
Write data
Rs data
Rt data
MU
X
SignEx-tend
[15..0]
ShiftLeft2 ADD
ALU
ADD
+4
MU
X
DECODE /OPERAND FETCH
EXECUTE MEMORYACCESSINSTRUCTION
FETCHWRITEBACK
Pipelined ”Single-Cycle” Computer
The Conveyor Belt
Instructionmemory
ReadAddress
Instruction[31..0]
PC
MU
X
[25..21]
[20..16]
Data memory
Rd addrWr addr
Write data
Rd data
[15..11] MU
X
Registers
Rs addrRt addr
Rd addr
Write data
Rs data
Rt data
MU
X
SignEx-tend
[15..0]
ShiftLeft2 ADD
ALU
ADD
+4
MU
X
LD R3,0(R11) LD R2,0(R10)ADD R1,R2,R3SD 0(R12),R1SUBI R4,R4,#1
LABEL: LD R2, 0(R10)
LD R3, 0(R11)
ADD R1,R2,R3
SD 0(R12), R1
SUBI R4,R4,#1
ADDI R10,R10,#4
ADDI R11,R11,#4
ADDI R12,R12,#4
BNEZ R4, LABEL
Instructionmemory
ReadAddress
Instruction[31..0]
PC
MU
X
[25..21]
[20..16]
Data memory
Rd addrWr addr
Write data
Rd data
[15..11] MU
X
Registers
Rs addrRt addr
Rd addr
Write data
Rs data
Rt data
MU
X
SignEx-tend
[15..0]
ShiftLeft2 ADD
ALU
ADD
+4
MU
X
LD R3,0(R11) LD R2,0(R10)ADD R1,R2,R3SD 0(R12),R1SUBI R4,R4,#1
Structural hazardsData hazards
Control hazards
Instructionmemory
ReadAddress
Instruction[31..0]
PC
MU
X
[25..21]
[20..16]
Data memory
Rd addrWr addr
Write data
Rd data
[15..11] MU
X
Registers
Rs addrRt addr
Rd addr
Write data
Rs data
Rt data
MU
X
SignEx-tend
[15..0]
ShiftLeft2 ADD
ALU
ADD
+4
MU
X
LD R3,0(R11)ADD R1,R2,R3SD 0(R12),R1SUBI R4,R4,#1 UNUSED
Structural hazardsData hazards
Control hazards
Instructionmemory
ReadAddress
Instruction[31..0]
PC
MU
X
[25..21]
[20..16]
Data memory
Rd addrWr addr
Write data
Rd data
[15..11] MU
X
Registers
Rs addrRt addr
Rd addr
Write data
Rs data
Rt data
MU
X
SignEx-tend
[15..0]
ShiftLeft2 ADD
ALU
ADD
+4
MU
X
ADD R1,R2,R3SD 0(R12),R1SUBI R4,R4,#1 UNUSED UNUSED
Structural hazardsData hazards
Control hazards
Instructionmemory
ReadAddress
Instruction[31..0]
PC
MU
X
[25..21]
[20..16]
Data memory
Rd addrWr addr
Write data
Rd data
[15..11] MU
X
Registers
Rs addrRt addr
Rd addr
Write data
Rs data
Rt data
MU
X
SignEx-tend
[15..0]
ShiftLeft2 ADD
ALU
ADD
+4
MU
X
BEQ R1,offsetUNUSED
Structural hazardsData hazards
Control hazards
Instructionmemory
ReadAddress
Instruction[31..0]
PC
MU
X
[25..21]
[20..16]
Data memory
Rd addrWr addr
Write data
Rd data
[15..11] MU
X
Registers
Rs addrRt addr
Rd addr
Write data
Rs data
Rt data
MU
X
SignEx-tend
[15..0]
ShiftLeft2 ADD
ALU
ADD
+4
MU
X
UNUSED UNUSED BEQ R1,offset
Structural hazardsData hazards
Control hazards
Instructionmemory
ReadAddress
Instruction[31..0]
PC
MU
X
[25..21]
[20..16]
Data memory
Rd addrWr addr
Write data
Rd data
[15..11] MU
X
Registers
Rs addrRt addr
Rd addr
Write data
Rs data
Rt data
MU
X
SignEx-tend
[15..0]
ShiftLeft2 ADD
ALU
ADD
+4
MU
X
UNUSED UNUSED BEQ R1,offsetNEXT INST.
Structural hazardsData hazards
Control hazards
Superscalar Microprocessors
IF IS EXID WB
InstructionFetch
InstructionDecode/Dispatch
InstructionIssue/ReadOperands
Execute WriteBack
IF ID WB
EX 1
EX 2
EX N
IQ 1
IQ 2
IQ N
Fetch Decode/Dispatch
ReadOperands Execute Write
Back
IF ID WB
INT
FP
MEM
I-Q
FP-Q
M-Q
Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop
L.S F0,0(R1)
Cycle: 1
IF ID WB
INT
FP
MEM
I-Q
FP-Q
M-Q
Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop
L.S F0,0(R1)L.S F1,0(R2)
Cycle: 2
IF ID WB
INT
FP
MEM
I-Q
FP-Q
M-Q
Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop
L.S F0,0(R1)
L.S F1,0(R2)ADD.S F2,F1,F0
Cycle: 3
IF ID WB
INT
FP
MEM
I-Q
FP-Q
M-Q
Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop
L.S F0,0(R1)L.S F1,0(R2)
ADD.S F2,F1,F0S.S F2,0(R1)
Cycle: 4
IF ID WB
INT
FP
MEM
I-Q
FP-Q
M-Q
Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop
L.S F0,0(R1)
L.S F1,0(R2)
ADD.S F2,F1,F0
ADDI R1,R1,#4 S.S F2,0(R1)
Cycle: 5
IF ID WB
INT
FP
MEM
I-Q
FP-Q
M-Q
Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop
L.S F1,0(R2)
ADD.S F2,F1,F0
ADDI R1,R1,#4ADDI R2,R2,#4
S.S F2,0(R1)
Cycle: 6
IF ID WB
INT
FP
MEM
I-Q
FP-Q
M-Q
Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop
ADD.S F2,F1,F0
ADDI R2,R2,#4 SUBI R3,R3,#1
S.S F2,0(R1)
ADDI R1,R1,#4Cycle: 7
IF ID WB
INT
FP
MEM
I-Q
FP-Q
M-Q
Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop
ADD.S F2,F1,F0
SUBI R3,R3,#1 BNEZ R3,Loop
S.S F2,0(R1)
ADDI R1,R1,#4ADDI R2,R2,#4 Cycle: 8
The ADDI R1,R1,#4 is being executed Out-of-program-orderwith respect to the S.S F2,0(R1)!
IF ID WB
INT
FP
MEM
I-Q
FP-Q
M-Q
Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop
L.S F0,0(R2)
ADD.S F2,F1,F0
SUBI R3,R3,#1
BNEZ R3,Loop
S.S F2,0(R1)
ADDI R1,R1,#4
ADDI R2,R2,#4 Cycle: 9
IF ID WB
INT
FP
MEM
I-Q
FP-Q
M-Q
Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop
L.S F1,0(R1) L.S F0,0(R2)
ADD.S F2,F1,F0
SUBI R3,R3,#1 BNEZ R3,Loop
S.S F2,0(R1)
ADDI R2,R2,#4
Cycle: 10
IF ID WB
INT
FP
MEM
I-Q
FP-Q
M-Q
Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop
L.S F1,0(R1) L.S F0,0(R2)
ADD.S F2,F1,F0
SUBI R3,R3,#1
BNEZ R3,Loop
S.S F2,0(R1)
Cycle: 11
IF ID WB
INT
FP
MEM
I-Q
FP-Q
M-Q
Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop
L.S F1,0(R1)
L.S F0,0(R2)
ADD.S F2,F1,F0ADD,S F2,F1,F0
BNEZ R3,Loop
S.S F2,0(R1)
Cycle: 12
Multicore Computers
Processor(Core)
Memory
Processor(Core)
Processor(Core)
L2 Cache
L1 Cache L1 Cache L1 Cache
MicroprocessorChip
Lectures on youtube9 interactive sessions, flipped class-room style5 problem solution sessions1 design exploration projectTextbook: Parallel Computer Organization and DesignDubois, Annavaram, Stenström
Blended Learning