EEM 486 : Computer Architecture Designing a Multicycle Processor
The Processor: Multicycle Implementation
description
Transcript of The Processor: Multicycle Implementation
![Page 1: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/1.jpg)
The Processor:Multicycle Implementation
![Page 2: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/2.jpg)
Reminder: Single-Cycle Datapath
![Page 3: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/3.jpg)
Multicycle ApproachBreak up the instructions into steps, each
step takes a cyclebalance the amount of work in cyclesonly one major functional unit is used in each
cycleAt the end of each cycle
store values for use in later cycles introduce additional “internal” registers
We will be recycling functional unitsALU used to compute branch address and to
increment PCThe same memory used for instruction and data
We will use finite state machine for control
![Page 4: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/4.jpg)
Multicycle Implementation
buffer registers between functional units which will be updated every clock cycle
IR: Instruction register MDR: Memory Data register A and B registers to hold register operand values read
from register file ALUOut: to hold output of ALU
![Page 5: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/5.jpg)
Datapath with Control Signals
![Page 6: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/6.jpg)
Handling PC
Multiplexor for selecting the source for the PCALU output after PC+4ALU output after branch target address calculationNew address with jump instructions (26 bits of IR)
Control signals for writing PCUnconditionally after a normal instruction and jump
- PCWriteConditionally overwritten if a branch is taken
When PCWriteCond and zero are asserted, the branch address at the output of ALU is written into the PC.
![Page 7: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/7.jpg)
Handling PC
Address
write data
Memory
MemData
PC 1
PCWrite
PCWriteCond
zero
0
from ALUOut
IorD
PC+4,jump address,branch address
![Page 8: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/8.jpg)
Complete Datapath & Control
![Page 9: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/9.jpg)
Five Execution Steps1. Instruction Fetch
2. Instruction Decode and Register Fetch
3. Execution, Memory Address Computation, or Branch Completion
4. Memory Access or R-type instruction completion
5. Write-back step
INSTRUCTIONS TAKE 3 - 5 CYCLES!
![Page 10: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/10.jpg)
Step 1: Instruction FetchUse PC to get the instruction and put it in IR.Increment the PC by 4 and put the result back
in the PC.Can be described succinctly using RTL
IR <= Memory[PC];PC <= PC + 4;
The ALU is used for incrementing PCControl signals
MemRead = ? IRWrite = ? PCWrite = ?IorD = ? ALUSrcA = ?ALUSrcB = ?? ALUOp = ?? PCSource = ??
![Page 11: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/11.jpg)
Step 2: Instruction Decode & Register Fetch
1. Read registers rs and rt in case we need them2. Compute the branch address in case the
instruction is a branchA <= Reg[IR[25-21]];B <= Reg[IR[20-16]];ALUOut <= PC + (sign-extend(IR[15-
0])<<2); We aren't setting any control lines based on the
instruction type yetALUSrcA = ? (PC)ALUSrcB = ?? (sign-extended & shifted
offset)ALUOp = ?? (add)
![Page 12: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/12.jpg)
Step 3 (Instruction Dependent)
ALU performs for one of the four functions1. Memory Reference:
ALUOut <= A + sign-extend(IR[15-0]);ALUSrcA = ?, ALUSrcB = ??, ALUOp = ??
2. R-type:ALUOut <= A op B;ALUSrcA = ?, ALUSrcB = ??, ALUOp = ??
3. Branch:if (A == B) PC <= ALUOut;ALUSrcA = ?, ALUSrcB = ??, ALUOp = ??,
PCWriteCond = ?, PCSource = ??
4. Jump:PC <= PC[31:28]||(IR[25:0]<<2) PCWrite = ?, PCSource = ??
![Page 13: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/13.jpg)
Step 4 (R-type or Memory-Access)
1. Load and store instructions access memory lw: MDR <= Memory[ALUOut];
MemRead = ?, IorD = ? sw: Memory[ALUOut] <= B;
MemWrite = ?, IorD = ?
2. R-type instructions finishReg[IR[15-11]] <= ALUOut;RegDst = ?, RegWrite = ?, MemtoReg = ?
![Page 14: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/14.jpg)
Write-Back StepMemory read completion stepReg[IR[20-16]] <= MDR;
RegDst = ?, RegWrite = ?, MemtoReg = ?
What about all the other instructions?
![Page 15: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/15.jpg)
Summary of the StepsStep Name Actions for
R-typeActions for memory-
accessAction
for branch
Action for jump
Instruction Fetch IR <= Memory[PC];PC <= PC + 4;
Instruction decode / register fetch
A <= Reg[IR[25-21]];B <= Reg[IR[20-16]];ALUOut <= PC + (sign-extend(IR[15-0]) << 2);
Execution, address calculations, branch / jump completion
ALUOut <= A op B;
ALUOut <= A +
sign-extend(IR[15-0]);
if (A==B) PC <= ALUOut;
PC <= PC[31:28]||(IR[25:0]<<2)
Memory access / R-type completion
Reg[IR[15- 11]] <= ALUOut;
Load: MDR <= Memory[ALUOut];
Store:Memory[ALUOut] <= B;
Memory read completion
Reg[IR[20-16]] <= MDR;
![Page 16: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/16.jpg)
Instructions in Multicycle Implementation
Loads: 5, Stores: 4, ALU Instructions: 4, Branch: 3, Jump: 3
Example: From SPECINT2000, instruction mix: 25 % loads (1% load byte + 24% load word) 10% stores (1% store byte + 9% store word) 11 % branches (6% beq + 5% bne) 2 % jumps (1% jal + 1% jr) 52% ALU operations
CPI = formula = 4.12
![Page 17: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/17.jpg)
ExampleHow many cycles does it take to execute this
code?
lw $t2, 0($t3)lw $t3, 4($t3)beq $t2, $t3, Label #assume not
taken add $t5, $t2, $t3sw $t5, 8($t3)
Label: ...
What exactly is happening during the 8th cycle of execution?
In what cycle does the actual addition of $t2 and $t3 take place?
![Page 18: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/18.jpg)
Implementing the ControlA control unit that will
assert control signals at the right timedetermine the next step
Value of control signals is dependent upon:which instruction is being executedwhich step is being performed (implies sequential
logic)Two specification methods
State diagramsMicroprogramming
Implementation can be derived from this specification
![Page 19: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/19.jpg)
Review: Finite State Machines
Finite state machines:a finite set of states and next state function (determined by current
state and input)output function (determined by current state
and possibly input)We’ll use a Moore machine (output based
only on current state)
![Page 20: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/20.jpg)
Finite State Machines
StateElements
InputsMealy Outputs
Next statestate
clk
Combinational Network
Output Function
Next State Function
Output Function
Combinational Network
MooreOutputs
![Page 21: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/21.jpg)
State Diagram for Fetch & Decode
start
MemReadALUSrcA = 0
IorD = 0IRWrite
ALUSrcB = 01ALUOp = 00
PCWritePCSource = 00
S0
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
S1
op = ‘lw’ or ‘s
w’
Memory Reference
FSM
op = R-type
R-type FSM
op =
‘beq
’
Branch FSM
op
= ‘j’
Jump FSM
![Page 22: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/22.jpg)
State Diagram for Memory-Access
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
Op = ‘lw’ or ‘sw’S2
From state S1
memory address calculation
RegDst = 0RegWrite
MemtoReg = 1
S4 memory read completion
MemWriteIorD = 1
sw
S5
memory accessmemory accessMemReadIorD = 1
lwS3
To state S0
![Page 23: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/23.jpg)
State Diagram for R-type
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
Op = R-type
S6
From state S1
Execution
RegDst = 1RegWrite
MemtoReg = 0
S7R-type completion
To state S0
![Page 24: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/24.jpg)
State Diagram for Branch
S8ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCWriteCondPCSource = 01
Op
= beq
Branch completion
From state S1
To state S0
![Page 25: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/25.jpg)
State Diagram for Jump
To state S0
PCWritePCSource = 10
Op
= j
Jump completion
From state S1
S9
![Page 26: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/26.jpg)
Complete State Diagram How many state bits are needed?
start
MemReadALUSrcA = 0
IorD = 0IRWrite
ALUSrcB = 01ALUOp = 00
PCWritePCSource = 00
S0ALUSrcA = 0ALUSrcB = 11ALUOp = 00
S1
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
Op = ‘lw’ or ‘sw’
S2
MemReadIorD = 1
lw
S3
RegDst = 0RegWrite
MemtoReg = 1
S4
MemWriteIorD = 1
sw
S5RegDst = 1RegWrite
MemtoReg = 0
S7
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
Op = R-type
S6 ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCWriteCondPCSource = 01
Op
= beq
S8PCWrite
PCSource = 10
Op =
j
S9
![Page 27: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/27.jpg)
Implementing the Control
Combinationalcontrollogic
datapathcontrol outputs
outputs
inputs
FFsinputs from instruction
next statestate
Moore or Mealy
machine?
![Page 28: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/28.jpg)
Complete Datapath & Control
![Page 29: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/29.jpg)
Exceptions or InterruptsUnintentional change of normal flow of
instructionsException is an unexpected event from within
the processor, e.g arithmetic overflow.An interrupt may come from outside of the
processor.These terms are observed to be used
interchangeably (IA32 uses interrupts for both) Our simplified MIPS architecture can generate
two exceptions:Arithmetic overflowUndefined instruction.
![Page 30: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/30.jpg)
Interrupt or Exception?
Type of event From where?
MIPS terminology
I/O device request External Interrupt
Invoke the OS from the user program
Internal Exception
Arithmetic overflow Internal Exception
Undefined instruction Internal Exception
Hardware malfunctions
Either Exception or interrupt
![Page 31: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/31.jpg)
When an Exception Occurs 1/2
Exception program counter (EPC) PC of the current instruction (i.e. PC-4) in EPC
Transfer the control to OS at some specified address.
The OS does what needs to be done, it eitherStops the execution of the program and
reports an errorProvides a service to the user program (e.g a
predefined action to an overflow) and return to the point of origin of exception using EPC
![Page 32: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/32.jpg)
When an Exception Occurs 2/2 The OS must know the reason for the
exception An exception is handled using two
approaches:1. cause register contains the reason for
exception (MIPS).2. Vectored interrupt: For each interrupt type,
a separate vector is stored and each interrupt vector targets at the appropriate piece of code.
![Page 33: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/33.jpg)
Vectored Interrupts
Exception type exception vector address (in hex)
Undefined instruction
0xC0000000
Arithmetic overflow
0xC0000020
For each cause of the interrupt, the control goes toa different location in memory to handle the exception
![Page 34: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/34.jpg)
How MIPS Handles ExceptionsRegisters
EPCCause register: individual bits are used to
encode the cause of exceptions (e.g. 0 in the LSB for undefined instruction 1 for overflow)
Control SignalsCauseWriteEPCWriteIntCause: 1-bit signals to set the appropriate
bit of Cause registerException address:
0x8000 0180 (SPIM uses 0x8000 0080)
![Page 35: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/35.jpg)
MIPS Memory Allocation
0
7fff fffc
Reserved
Dynamic Data
Stack
Textpc 0040 0000
gp 1000 8000 Static Data
1000 0000
sp
0x8000 0180
![Page 36: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/36.jpg)
The Multicycle Datapath with Exception Handling
0x80000180
overflow
![Page 37: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/37.jpg)
What is New?PCSource
0
1
Zero
ALU
OF
2
3
ALUOut
jump address
0x8000 0180
EPC
EPCWrite
Cause
CauseWrite
0
1
0
1
IntCausePC-4
![Page 38: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/38.jpg)
Exception States
IntCause = 0CauseWriteALUSrcA=0ALUSrcB=01ALUOp = 01
EPCWritePCWrite
PCSource=11
S10
IntCause = 1CauseWriteALUSrcA=0ALUSrcB=01ALUOp = 01
EPCWritePCWrite
PCSource=11
S11
To state S0 to beginnext instruction
![Page 39: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/39.jpg)
ALUSrcA = 0ALUSrcB = 11ALUOp = 00
MemReadALUSrcA = 0
IorD = 0IRWrite
ALUSrcB = 01ALUOp = 00
PCWritePCSource = 00
PCWritePCSource = 10
ALUSrcA = 1ALUSrcB = 00ALUOp = 01PCWriteCondPCSource = 01
ALUSrcA = 1ALUSrcB = 00ALUOp = 10
ALUSrcA = 1ALUSrcB = 10ALUOp = 00
MemReadIorD = 1
RegDst = 1RegWrite
MemtoReg = 0
RegDst = 0RegWrite
MemtoReg = 1
MemWriteIorD = 1
start
Op = ‘lw’ or ‘sw’Op = R-ty
pe
Op
= b
eq
Op =
J
lw
sw
S0 S1
S2
S3
S4
S5 S7
S6 S8S9
Complete State Diagram with Exceptions
IntCause = 1CauseWriteALUSrcA=0ALUSrcB=01ALUOp = 01
EPCWritePCWrite
PCSource=11
Overflow
S11
IntCause = 0CauseWriteALUSrcA=0ALUSrcB=01ALUOp = 01
EPCWritePCWrite
PCSource=11
Op = other
S10
![Page 40: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/40.jpg)
Hardware/Software Interface
Hardware
ISA
Assembly
Microarchitecture
![Page 41: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/41.jpg)
Control with Microprogramming
Simpler instructions to implement (macro) instructions
Microinstructions are kept in ROMaddressable
Not visible to programmersMicroinstruction format
ALU Control SRC1 SRC2 Reg. Cont. MemoryPC Write Cont. Sequencing
Control signals are directly embedded in microinstructions
![Page 42: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/42.jpg)
Fields of MicroinstructionsField name Function of field
ALU Control
Specify the operation being performed by the ALU during this clock, the result is always written in ALUOut
SRC1 Specify the source for the first ALU operand
SRC2 Specify the source for the second ALU operand
Register Control
Specify read or write for the register file, and the source of the value for a write
Memory Specify read or write, and the source for the memory. For a read specify the destination register
PCWrite Control
Specify the writing of the PC
Sequencing Specify how to choose the next microinstruction
![Page 43: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/43.jpg)
Choosing Next Microinstruction
1. Next instruction Seq (sequential behavior)
2. Branch to the microinstruction that starts execution of the next MIPS instruction Fetch is the label of the initial microinstruction
3. Next microinstruction is chosen based on the input (dispatching, dispatch tables) In MIPS, two dispatch tables: one from S1 and the
other from S2. Dispatch i
![Page 44: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/44.jpg)
Microprogram for the Control Unit
Dispatch 1ReadExtshftPCAdd
SeqALURead PC4PCAddFetch
Sequencing
PCWrite
control
MemoryRegister control
SRC2SRC1
ALU contro
l
Label
12
0
3
4
5
6
7
8
9
![Page 45: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/45.jpg)
Dispatch Tables
Microcode dispatch table 1
Opcode field Opcode name Value
000000 (0) R-format Rformat1
000010 (2) jmp JUMP1
000100 (4) beq BEQ1
100011 (35) lw Mem1
101011 (43) sw Mem1
Microcode dispatch table 2
Opcode field Opcode name value
100011 (35) lw LW2
101011 (43) sw SW2
![Page 46: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/46.jpg)
Microprogram for the Control Unit
Dispatch 2ExtendAAddMem1
SeqRead ALULW2
FetchWrite MDR
FetchJump address
JUMP1
FetchALUOut-cond
BASubBEQ1
Dispatch 1ReadExtshftPCAdd
FetchWrite ALUSW2
FetchWrite ALU
SeqBAFunc code
Rformat1
SeqALURead PC4PCAddFetch
Sequencing
PCWrite
control
MemoryRegister control
SRC2SRC1
ALU contro
l
Label
12
0
3
4
5
6
7
8
9
![Page 47: The Processor: Multicycle Implementation](https://reader031.fdocuments.in/reader031/viewer/2022012919/5681469a550346895db3b1f7/html5/thumbnails/47.jpg)
Implementing Microprogram
Microcode storage
microprogram counter
address select logicadder
1
datapathcontrol outputs
Sequencingcontrol
outputs
input
Dispatch tables